Weeks 05-06 - CNNs, RNNs, Transformers

Reference extracted from the detailed guide.

CNNs

Watch:

  1. Stanford CS231n: CNNs for Visual Recognition

  2. StatQuest: Neural Network series

Key Architectures to Know:

  • LeNet-5 (basic)
  • AlexNet (ReLU, dropout)
  • VGG (small filters)
  • ResNet (skip connections) ← MOST IMPORTANT
  • Inception (parallel branches)

RNNs & LSTMs

Watch:

  1. StatQuest: RNN/LSTM

  2. Andrew Ng: Sequence Models (Coursera Course 5)

Transformers (CRITICAL for Meta/Google)

Watch (in order):

  1. Illustrated Transformer (blog + video)

  2. StatQuest: Transformer

  3. Andrej Karpathy: Let's build GPT

Must Implement:

# Self-Attention from Scratch
import numpy as np

def self_attention(Q, K, V):
    """
    Q, K, V: (seq_len, d_model)
    """
    d_k = K.shape[-1]
    
    # Attention scores
    scores = np.dot(Q, K.T) / np.sqrt(d_k)  # (seq_len, seq_len)
    
    # Softmax
    attention_weights = np.exp(scores) / np.sum(np.exp(scores), axis=-1, keepdims=True)
    
    # Weighted sum
    output = np.dot(attention_weights, V)  # (seq_len, d_model)
    
    return output, attention_weights

# Multi-Head Attention
def multi_head_attention(X, num_heads, d_model):
    d_k = d_model // num_heads
    heads = []
    
    for _ in range(num_heads):
        W_q = np.random.randn(d_model, d_k)
        W_k = np.random.randn(d_model, d_k)
        W_v = np.random.randn(d_model, d_k)
        
        Q = np.dot(X, W_q)
        K = np.dot(X, W_k)
        V = np.dot(X, W_v)
        
        head, _ = self_attention(Q, K, V)
        heads.append(head)
    
    # Concatenate heads
    concat = np.concatenate(heads, axis=-1)
    
    # Final linear projection
    W_o = np.random.randn(d_model, d_model)
    output = np.dot(concat, W_o)
    
    return output

Comments

Share your approach or ask questions

0 comments
?
|
Markdown supported
Sign in to post

Loading comments...