Lecture 24: Deep Learning for Natural Language Processing
Learning Objectives¶
Use word embeddings (Word2Vec, GloVe)
Apply RNNs and LSTMs for NLP
Understand sequence-to-sequence models
Master the Transformer architecture
Apply pretraining (BERT, GPT)
Word Embeddings¶
Word2Vec: Skip-gram, CBOW
GloVe: Global vectors
Contextual: BERT, ELMo — different in different contexts
RNNs for NLP¶
Language model: P(wₜ|w₁,...,wₜ₋₁)
Sequence classification: Sentiment, etc.
Vanishing gradient: LSTM helps
Sequence-to-Sequence¶
Encoder-decoder: Encode input, decode output
Machine translation: Source → target
Attention: Focus on relevant parts
Attention¶
Query, Key, Value: Attention = softmax(QK^T/√d) V
Attend: To all positions
Interpretability: Where model looks
Transformer¶
Self-attention: No recurrence
Multi-head: Multiple attention heads
Position encoding: Inject position info
Parallel: All positions at once
Pretraining¶
Masked LM: Predict masked tokens (BERT)
Causal LM: Predict next token (GPT)
Fine-tuning: On downstream tasks
Summary¶
Embeddings: Word2Vec → contextual
RNN/LSTM: Sequential
Transformer: Self-attention
Pretrain + fine-tune: Modern paradigm
References¶
AIMA Ch. 24
Russell & Norvig, AIMA 4e, Ch. 24
Chapter PDF:
chapters/chapter-24.pdf