What it is
nanoGPT is Andrej Karpathy’s compact repository for training and fine-tuning medium-sized GPT models. It is valuable because it shows the full model-training path in relatively small, readable code.
The repository now notes that it is old and that newer work may belong in nanochat, but nanoGPT remains an important educational and historical project.
How the project works
The central files are the training loop and the model. Around them are scripts for data preparation, training, fine-tuning, sampling, and reproducing GPT-2-style experiments.
The strength is visibility: batches, loss, optimization, and token generation are not hidden behind a large framework.
Small experiment
This example shows a learning-sized run, not a production training recipe.
python data/shakespeare_char/prepare.py
python train.py config/train_shakespeare_char.py
python sample.py --out_dir=out-shakespeare-char
What is inside
The repository contains the GPT model, training loop, configs, data preparation, sampling scripts, and efficiency notes. It reads like an engineering notebook.
nanoGPT is especially useful for people who know machine learning basics and want to see transformer training without too many abstraction layers.
Practical context
The best way to learn from nanoGPT is in small steps: run an example, read the training loop, then change model size and data. The code stays understandable instead of becoming random parameters.
Strengths and limits
The main strength is clarity. The code is small enough to trace data from text to gradients and back to generation.
The limit is age and scale. Modern large models, distributed training, and production systems need newer tooling, infrastructure, and safety practices.