MicroGPT: A First-Principles Course¶

Reverse-engineer every single line of a 200-line GPT language model.
From 10th-grade math to building your first LLM. No magic, no hand-waving.

Start Learning View the Code

What You'll Build a Mental Model For¶

flowchart LR
    A["📝 Raw Text"] --> B["🔢 Tokenization"]
    B --> C["📐 Embeddings"]
    C --> D["🧠 Transformer"]
    D --> E["📊 Probabilities"]
    E --> F["✨ Generated Text"]

Course Modules¶

Module 0 — The Big Picture¶

What is a language model? A bird's-eye view of the 200 lines and the mental model for how learning machines work.

Module 1 — Data & Tokenization¶

How raw text becomes numbers. Character encoding, vocabularies, and the special BOS token.

Module 2 — Calculus & Autograd¶

Derivatives, the chain rule, and how microgpt.py automatically computes gradients with the Value class.

Module 3 — The Architecture¶

Embeddings, linear layers, softmax, attention, multi-head attention, residual connections, and the full GPT function.

Module 4 — Training¶

Loss functions, backpropagation, gradient descent, the Adam optimizer, and the complete training loop.

Module 5 — Inference & Generation¶

Using the trained model to generate new text. Temperature, sampling, and the complete picture.

Prerequisites¶

What you need to know

Math: 10th-grade level — basic algebra and exponents. There's a Math Refresher if you need it.
Programming: Basic Python — variables, loops, functions, lists.
Machine Learning: Zero prior knowledge required.

Based On¶

This course is built around microgpt.py by Andrej Karpathy — a complete GPT language model in just 200 lines of pure Python using only the standard library.

It implements:

[x] A custom autograd engine (automatic differentiation)
[x] A Transformer architecture (attention, MLP, residual connections)
[x] A training loop with the Adam optimizer
[x] Text generation with temperature-controlled sampling