What it is
autoresearch is Andrej Karpathy’s experimental repository about automating research with AI agents. It centers on single-GPU nanochat training and an agent that runs experiments, reads metrics, and proposes the next step.
The interesting part is not “press a button and get science”, but a workflow demonstration: some routine hypothesis iteration, runs, and result inspection can be delegated to an agent when the environment is constrained and reproducible.
What is inside
The README covers quick start with `uv`, data download, tokenizer training, manual single experiments, agent execution, project structure, and design choices. The repository shows both code and the author’s decisions around the experiment.
A practical use is to reproduce a small local research environment, confirm that a single experiment runs manually, and then let an agent perform a series of changes. This fits tasks where results can be measured and compared quickly.
Research loop run
These commands show the idea: prepare the environment and data, then run a single experiment or the agent.
uv sync
uv run python data.py
uv run python train.py
uv run python agent.py
Strengths
The strength is honest experimental scope. autoresearch does not pretend to be a universal platform; it shows a concrete lab around nanochat, useful for thinking about reproducible AI research automation.
Limits
The limitation is narrow scope and overinterpretation risk. Agents can run experiments, but scientific quality still depends on task framing, metrics, randomness control, and human review.