Emergency AI Study Notes
Within 1 day of notice I need to prepare for an online test.1
You are now ready to take the online logic test that will assess your knowledge in Mathematics, Machine Learning, and Deep Learning, NLP. Please read the instructions carefully before starting.
Prohibited: The use of ChatGPT, AI tools, or any third-party assistance. All answers must be your own.
The test is administered on Google forms. I could imagine this is just a check allowing people to pay $2000/- and participate in the program.
This is my emergency study notes.
Mathematics
Key Resources
Matrix Dimensions
- Convention: (row, col)
- Matrix multiplication: $(m \times n) \cdot (n \times 1) \rightarrow (m \times 1)$
Backpropagation
Postmortem note: I should have revised this
Linear and Softmax
Consider the function: \(L = \frac{1}{1+\exp(w_0 x_0 + w_1 x_1 + w_2)}\)
To find partial derivatives w.r.t. each variable:
Let $u = w_0 x_0 + w_1 x_1 + w_2$
Then: \(L = \frac{1}{1 + \exp(u)}\)
Using the chain rule:2 \(\frac{dL}{du} = \frac{-\exp(u)}{(1 + \exp(u))^2}\)
The partial derivatives are:
- $\frac{dL}{dw_0} = \frac{dL}{du} \cdot \frac{du}{dw_0} = \frac{-\exp(u)}{(1 + \exp(u))^2} \cdot x_0$
- $\frac{dL}{dx_0} = \frac{dL}{du} \cdot \frac{du}{dx_0} = \frac{-\exp(u)}{(1 + \exp(u))^2} \cdot w_0$
- $\frac{dL}{dw_2} = \frac{dL}{du} \cdot \frac{du}{dw_2} = \frac{-\exp(u)}{(1 + \exp(u))^2}$
Skip Connections
Network Architecture:
x0 → [a] → x_a → [b] → x_b → [c] → x_c → [d] → x_d → [e] → x_e <-loss-> t
↓ ↗
└─────── skip ───────┘
Layer Definitions:
- Input: $x_0$
- Node a: $x_a = w_a \times x_0 + b_a$
- Node b: $x_b = w_b \times x_a + b_b$
- Node c: $x_c = w_c \times x_b + b_c$
- Node d: $x_d = w_d \times x_c + b_d + w_{d2} \times x_b + b_{d2}$ (with skip connection)
- Node e: $x_e = w_e \times x_d + b_e$
- Loss: $\text{loss} = (x_e - t)^2$
Gradient Calculation for Skip Connection:
\[\begin{align} \frac{dL}{dx_b} &= \frac{dL}{dx_e} \cdot \frac{dx_e}{dx_b} \\ &= \frac{dL}{dx_e} \cdot \frac{dx_e}{dx_d} \cdot \frac{dx_d}{dx_b} \\ &= \frac{dL}{dx_e} \cdot \frac{dx_e}{dx_d} \cdot \frac{d}{dx_b} (x_c + x_b) \\ &= \frac{dL}{dx_e} \cdot \frac{dx_e}{dx_d} \cdot \left(\frac{dx_c}{dx_b} + 1\right) \end{align}\] \[\frac{dL}{db_b} = \frac{dL}{dx_b} \cdot \frac{dx_b}{db_b} = \frac{dL}{dx_e} \cdot \frac{dx_e}{dx_d} \cdot \left(\frac{dx_c}{dx_b} + 1\right)\] \[\frac{dL}{dw_b} = \frac{dL}{dx_b} \cdot \frac{dx_b}{dw_b} = \frac{dL}{dx_e} \cdot \frac{dx_e}{dx_d} \cdot \left(\frac{dx_c}{dx_b} + 1\right) \cdot x_a\]Attention Mechanism
Resources
Machine Learning
Course Notes
PyTorch Training Loop
optimizer.zero_grad() # PyTorch accumulates gradients by default
prediction = model(data)
loss = criterion(prediction, target)
loss.backward() # Compute gradients
optimizer.step() # Update parameters
Cross-Entropy Formula
Cross-entropy loss: $H(p, q) = -\sum_i p_i \log q_i$
Reinforcement Learning
Resources
- Shusen Wang’s Playlists
- TODO: Create a cheatsheet
Scaling LLMs
Key References
-
Relevant information is available here:
-
If $h(x) = \frac{1}{f(x)}$, then $\frac{dh}{dx} = \frac{d(1/f)}{dx} = -\frac{1}{f^2} \frac{df}{dx}$ ↩