MicroGPT: A First-Principles Course¶
Reverse-engineer every single line of a 200-line GPT language model.
From 10th-grade math to building your first LLM. No magic, no hand-waving.
What You'll Build a Mental Model For¶
flowchart LR
A["📝 Raw Text"] --> B["🔢 Tokenization"]
B --> C["📐 Embeddings"]
C --> D["🧠 Transformer"]
D --> E["📊 Probabilities"]
E --> F["✨ Generated Text"] Course Modules¶
Module 0 — The Big Picture¶
What is a language model? A bird's-eye view of the 200 lines and the mental model for how learning machines work.
Module 1 — Data & Tokenization¶
How raw text becomes numbers. Character encoding, vocabularies, and the special BOS token.
Module 2 — Calculus & Autograd¶
Derivatives, the chain rule, and how microgpt.py automatically computes gradients with the Value class.
Module 3 — The Architecture¶
Embeddings, linear layers, softmax, attention, multi-head attention, residual connections, and the full GPT function.
Module 4 — Training¶
Loss functions, backpropagation, gradient descent, the Adam optimizer, and the complete training loop.
Module 5 — Inference & Generation¶
Using the trained model to generate new text. Temperature, sampling, and the complete picture.
Prerequisites¶
What you need to know
- Math: 10th-grade level — basic algebra and exponents. There's a Math Refresher if you need it.
- Programming: Basic Python — variables, loops, functions, lists.
- Machine Learning: Zero prior knowledge required.
Based On¶
This course is built around microgpt.py by Andrej Karpathy — a complete GPT language model in just 200 lines of pure Python using only the standard library.
It implements:
- [x] A custom autograd engine (automatic differentiation)
- [x] A Transformer architecture (attention, MLP, residual connections)
- [x] A training loop with the Adam optimizer
- [x] Text generation with temperature-controlled sampling