Generating Text¶
From Training to Generation¶
The model is trained. Its 4,064 parameters have been tuned over 500 steps. Now we use it to create — generating names it has never seen.
The Generation Loop (Lines 186–200)¶
microgpt.py — Lines 186-200
temperature = 0.5
for _ in range(20): # generate 20 names
tokens = []
token_id = BOS # start with BOS
keys = [[] for _ in range(n_layer)]
values = [[] for _ in range(n_layer)]
for pos_id in range(block_size): # max 8 characters
logits = gpt(token_id, pos_id, keys, values)
probs = softmax([l / temperature for l in logits])
token_id = random.choices(range(vocab_size), weights=[p.data for p in probs])[0]
if token_id == BOS:
break # BOS signals end-of-name
tokens.append(token_id)
print(''.join(uchars[t] for t in tokens))
Step by Step¶
flowchart TD
START["Start with BOS"] --> GPT["gpt(token_id, pos_id)"]
GPT --> TEMP["Divide logits by temperature"]
TEMP --> SM["Softmax → probabilities"]
SM --> SAMPLE["Random sample from probabilities"]
SAMPLE --> CHECK{"token_id == BOS?"}
CHECK -->|"Yes"| DONE["Print the name"]
CHECK -->|"No"| APPEND["Append character"]
APPEND --> GPT Key Details¶
Autoregressive Generation¶
Each output token becomes the input for the next step. The model generates one character at a time, building the name left-to-right.
The KV cache in action
At each step, the current token's key and value are appended to the cache. So at step 4, the model can attend to all of steps 0–3. No recomputation needed.
Stopping Condition¶
The BOS token serves double duty — it marks both the start and end of sequences. When the model predicts BOS, it's saying "this name is done."
No Gradient Tracking¶
During inference, we don't call backward(). We're using the trained parameters as-is — just doing forward passes. This makes generation much faster than training.
Terminology
| Term | Meaning |
|---|---|
| Inference | Using the trained model to generate predictions (no learning) |
| Autoregressive | Each output becomes the next input |
| Sampling | Randomly choosing a token based on probabilities |
| Stopping condition | When to end generation (BOS token) |
| KV cache | Stored keys/values for efficient multi-step generation |