Skip to content

MicroGPT: A First-Principles Course

Reverse-engineer every single line of a 200-line GPT language model.
From 10th-grade math to building your first LLM. No magic, no hand-waving.

Start Learning View the Code


What You'll Build a Mental Model For

flowchart LR
    A["📝 Raw Text"] --> B["🔢 Tokenization"]
    B --> C["📐 Embeddings"]
    C --> D["🧠 Transformer"]
    D --> E["📊 Probabilities"]
    E --> F["✨ Generated Text"]

Course Modules

Module 0 — The Big Picture

What is a language model? A bird's-eye view of the 200 lines and the mental model for how learning machines work.

Module 1 — Data & Tokenization

How raw text becomes numbers. Character encoding, vocabularies, and the special BOS token.

Module 2 — Calculus & Autograd

Derivatives, the chain rule, and how microgpt.py automatically computes gradients with the Value class.

Module 3 — The Architecture

Embeddings, linear layers, softmax, attention, multi-head attention, residual connections, and the full GPT function.

Module 4 — Training

Loss functions, backpropagation, gradient descent, the Adam optimizer, and the complete training loop.

Module 5 — Inference & Generation

Using the trained model to generate new text. Temperature, sampling, and the complete picture.


Prerequisites

What you need to know

  • Math: 10th-grade level — basic algebra and exponents. There's a Math Refresher if you need it.
  • Programming: Basic Python — variables, loops, functions, lists.
  • Machine Learning: Zero prior knowledge required.

Based On

This course is built around microgpt.py by Andrej Karpathy — a complete GPT language model in just 200 lines of pure Python using only the standard library.

It implements:

  • [x] A custom autograd engine (automatic differentiation)
  • [x] A Transformer architecture (attention, MLP, residual connections)
  • [x] A training loop with the Adam optimizer
  • [x] Text generation with temperature-controlled sampling