Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lecture 24: Deep Learning for Natural Language Processing

Lecture 24: Deep Learning for Natural Language Processing

AIMA Chapter 24 — 1 hour

Learning Objectives

  • Use word embeddings (Word2Vec, GloVe)

  • Apply RNNs and LSTMs for NLP

  • Understand sequence-to-sequence models

  • Master the Transformer architecture

  • Apply pretraining (BERT, GPT)

Word Embeddings

  • Word2Vec: Skip-gram, CBOW

  • GloVe: Global vectors

  • Contextual: BERT, ELMo — different in different contexts

RNNs for NLP

  • Language model: P(wₜ|w₁,...,wₜ₋₁)

  • Sequence classification: Sentiment, etc.

  • Vanishing gradient: LSTM helps

Sequence-to-Sequence

  • Encoder-decoder: Encode input, decode output

  • Machine translation: Source → target

  • Attention: Focus on relevant parts

Attention

  • Query, Key, Value: Attention = softmax(QK^T/√d) V

  • Attend: To all positions

  • Interpretability: Where model looks

Transformer

  • Self-attention: No recurrence

  • Multi-head: Multiple attention heads

  • Position encoding: Inject position info

  • Parallel: All positions at once

Pretraining

  • Masked LM: Predict masked tokens (BERT)

  • Causal LM: Predict next token (GPT)

  • Fine-tuning: On downstream tasks

Summary

  • Embeddings: Word2Vec → contextual

  • RNN/LSTM: Sequential

  • Transformer: Self-attention

  • Pretrain + fine-tune: Modern paradigm

References

  • AIMA Ch. 24

  • Russell & Norvig, AIMA 4e, Ch. 24

  • Chapter PDF: chapters/chapter-24.pdf

Questions?

Next lecture: Computer Vision (Chapter 25)