← All open source projects

nanochat

karpathy/nanochat

nanochat is Andrej Karpathy’s educational project for walking through the main stages of training a small language model and running a simple chat.

Forks 7,632
Author karpathy
Language Python
License Unknown
Synced 2026-06-27

What it is

nanochat is Andrej Karpathy’s experimental learning project for training small language models. It shows the full path from tokenization and pretraining to finetuning, evaluation, inference, and a simple chat interface.

The project appeared amid interest in understandable implementations of AI systems. Instead of a huge platform, nanochat focuses on minimal, hackable code that can run on a single GPU node and be studied by hand.

Its goal is not to replace industrial model-training platforms, but to give researchers and engineers a clear learning loop. In one place, it shows the stages needed to move from data to a conversational model.

What is inside the repository

The repository contains scripts and settings for the main LLM stages: tokenization, pretraining, finetuning, evaluation, inference, and a web interface for talking to the result. Model depth acts as the main complexity dial.

The documentation gives cost and time references for a GPT-2-level model on modern GPUs. Those numbers are useful as an educational frame, not as a promise of the same result on every cloud and dataset.

How people usually use it

nanochat is used for learning, architecture experiments, hypothesis testing, and explaining the full cycle of creating a small language model. Its value is that it does not hide all stages behind an external service.

A practical scenario is to prepare the environment, choose model depth, run training, inspect metrics, and open the chat. That path shows not only the final answer shape, but also the cost of each stage.

The idea of a full learning loop

This diagram shows why nanochat is interesting: it is not just one inference file, but a path through several stages of model training and evaluation.

Language: Plain text
data
  -> tokenization
  -> pretraining
  -> finetuning
  -> evaluation
  -> inference
  -> chat UI

What it feels like in practice

The project’s strength is compactness. When the code can be read and changed without heavy scaffolding, it is easier to understand how data, tokens, training, evaluation, and model serving connect.

Another advantage is educational honesty. nanochat does not pretend to be a universal replacement for large AI platforms; it shows the mechanics and limits of a small experiment.

Limits and careful spots

The main limitation is hardware and data demand. Even an educational language model needs resources, careful settings, and an understanding that answer quality depends on the whole process, not one run.

The project also should not be treated as a ready base for a user-facing service. Security, filtering, data storage, error monitoring, and separate model behavior evaluation would still be needed.

Who it fits

nanochat best fits developers and researchers who want to understand LLMs beyond theory. It is a laboratory bench, not a boxed business chat product.

In the catalog, nanochat matters as a rare complete learning path: it shows how data turns into a small conversational model and makes that process transparent enough for experiments.