The Learning Machine Analogy¶
A Machine That Learns From Mistakes¶
Before we touch any code, let's build an analogy that will carry us through the entire course.
The Blindfolded Archer¶
Imagine a blindfolded archer trying to hit a target:
flowchart LR
A["🏹 Archer<br>(blindfolded)"] -- "shoots" --> B["❌ Miss!<br>(too high, too right)"]
B -- "friend says:<br>aim lower & left" --> C["🎯 Adjusts aim"]
C -- "shoots again" --> A - The archer shoots an arrow (makes a prediction)
- A friend tells them: "You were 2 meters too high and 1 meter to the right" (the loss — how wrong they were)
- The friend also says: "Aim lower and more to the left" (the gradient — which direction to adjust)
- The archer adjusts their aim slightly (the parameter update)
- They shoot again
After hundreds of attempts, the archer is landing arrows near the bullseye — without ever seeing the target.
Key Insight
This is exactly how neural networks learn. They never "see" the answer directly — they just get told how wrong they were and which direction to adjust.
Mapping the Analogy to Code¶
| Analogy | In microgpt.py | What it means |
|---|---|---|
| The archer's aim (angle, force) | Parameters (lines 74–90) | Thousands of numbers that control the model's behavior |
| Shooting an arrow | Forward pass (lines 163–168) | Running an input through the model to get a prediction |
| "You missed by X" | Loss (line 169) | A single number measuring how bad the prediction was |
| "Aim lower and left" | Gradients (line 172) | The direction to nudge each parameter |
| Adjusting aim | Optimizer (lines 174–182) | The rule for how much to nudge |
| Shooting again | Next training step (line 153) | Repeating with the next example |
The Three Phases¶
The file does three things, in order:
Lines 1–144 — Build the "archer": the model that takes an input and produces a prediction.
At this point the parameters are random, so the predictions are garbage.
Lines 146–184 — Show the model thousands of real names. For each one:
- Let it predict the next character
- Tell it how wrong it was
- Adjust parameters slightly
Step 1: loss = 3.8912 (predictions are random garbage)
Step 100: loss = 2.4561 (starting to learn common patterns)
Step 300: loss = 1.8234 (getting the hang of it)
Step 500: loss = 1.5012 (reasonably good at predicting)
The loss going down means the model is getting better.
Lines 186–200 — Now that the parameters have been tuned, we can use the model to generate new names it has never seen:
These names didn't exist in the training data — the model invented them by learning the patterns of what makes a name "name-like."
Why "Everything Else Is Just Efficiency"¶
Karpathy's claim
"This is the complete algorithm. Everything else is just efficiency."
What does he mean? This 200-line file contains:
| What | Present in microgpt.py? | What the "real world" adds |
|---|---|---|
| Tokenization | Faster tokenizers (BPE) with larger vocabularies | |
| Autograd | GPU-accelerated autograd (PyTorch/JAX) | |
| Transformer architecture | More layers, bigger embeddings, but same structure | |
| Attention mechanism | FlashAttention (same math, faster execution) | |
| Training loop | Distributed training across thousands of GPUs | |
| Adam optimizer | Same optimizer, just parallelized | |
| Text generation | Same algorithm with beam search, nucleus sampling |
The algorithm is identical. What changes at scale is:
- Speed: GPUs instead of Python loops
- Size: Billions of parameters instead of thousands
- Data: Terabytes of text instead of a names file
But the logic — embed, attend, predict, measure error, compute gradients, update — is the same logic you'll learn in this course.
The Road Ahead¶
flowchart TD
HERE["📍 You are here"] --> M1
M1["Module 1: How do we get data<br>and turn it into numbers?"] --> M2
M2["Module 2: How do we automatically<br>find 'which way to nudge'?"] --> M3
M3["Module 3: What math does the model<br>actually do on the numbers?"] --> M4
M4["Module 4: How does the<br>training loop work?"] --> M5
M5["Module 5: How do we<br>generate new text?"] --> DONE
DONE["✅ You understand the entire file"]