← All open source projects

nanoGPT

karpathy/nanoGPT

nanoGPT is a compact Python implementation for training and fine-tuning medium-sized GPT models.

Forks 10,368
Author karpathy
Language Python
License MIT
Synced 2026-06-27

What it is

nanoGPT is Andrej Karpathy’s compact repository for training and fine-tuning medium-sized GPT models. It is valuable because it shows the full model-training path in relatively small, readable code.

The repository now notes that it is old and that newer work may belong in nanochat, but nanoGPT remains an important educational and historical project.

How the project works

The central files are the training loop and the model. Around them are scripts for data preparation, training, fine-tuning, sampling, and reproducing GPT-2-style experiments.

The strength is visibility: batches, loss, optimization, and token generation are not hidden behind a large framework.

Small experiment

This example shows a learning-sized run, not a production training recipe.

Language: Bash
python data/shakespeare_char/prepare.py
python train.py config/train_shakespeare_char.py
python sample.py --out_dir=out-shakespeare-char

What is inside

The repository contains the GPT model, training loop, configs, data preparation, sampling scripts, and efficiency notes. It reads like an engineering notebook.

nanoGPT is especially useful for people who know machine learning basics and want to see transformer training without too many abstraction layers.

Practical context

The best way to learn from nanoGPT is in small steps: run an example, read the training loop, then change model size and data. The code stays understandable instead of becoming random parameters.

Strengths and limits

The main strength is clarity. The code is small enough to trace data from text to gradients and back to generation.

The limit is age and scale. Modern large models, distributed training, and production systems need newer tooling, infrastructure, and safety practices.