The Autoresearch Family Tree: 6 Forks, 1 Idea

One repo started a movement

On March 6, 2026, Andrej Karpathy pushed a repo called autoresearch. The idea was simple: an AI agent that runs experiments overnight. Modify code, run it, score the result, keep what works, discard what doesn't. Repeat until morning.

Six weeks later, the repo has 72,000+ stars. And it's spawned an entire family of variants — each adapting the core idea for a different platform, a different chip, or a different workflow.

We tracked six of them. Here's the family tree.

The Family Tree

karpathy/autoresearch (72K stars, Mar 6)
├── davebcn87/pi-autoresearch (3.6K, Mar 11) — generic version for any metric
├── trevin-creator/autoresearch-mlx (1.4K, Mar 8) — Apple Silicon port
├── uditgoenka/autoresearch (3.7K, Mar 13) — multi-mode Claude Code plugin
├── drivelineresearch/autoresearch-claude-code (258, Mar 12) — Claude Code skill port
└── leo-lilinxiao/codex-autoresearch (1.4K, Mar 18) — Codex-native with smart recovery

Total family: 82,000+ stars across 6 repos. All created within 12 days of each other.

The Original: karpathy/autoresearch

72,499 stars · Python · Last push: Mar 26

The one that started it all. Karpathy's original implementation is specific: it runs AI research on single-GPU nanochat training automatically. The agent modifies training code, executes it, measures loss, and keeps improvements.

The genius is in the simplicity. The loop is just:

while True:
    modify()     # Agent changes something
    run()        # Execute the experiment
    score()      # Measure the result
    if better:
        keep()   # Commit the change
    else:
        revert() # Throw it away

That's it. No multi-agent orchestration, no knowledge graphs, no 23-stage pipelines. Just a tight loop that runs all night.

Use this if: You want the original experience on a Linux GPU box doing ML training experiments.

pi-autoresearch: The Generalist

3,600 stars · TypeScript · Last push: Apr 13 · View on Claw4Science

The first major fork broke the loop free from ML training. pi-autoresearch works on any measurable optimization target — not just loss curves. Code performance, test coverage, latency, memory usage — if you can measure it, pi-autoresearch can optimize it.

Built as an extension for the Pi agent (hence the name), it's the version that proved the autoresearch pattern isn't specific to ML. It's a general-purpose autonomous improvement loop.

Key difference from original: Works on any codebase with any metric, not just ML training.

Use this if: You want the autoresearch loop for non-ML projects — web performance, compiler optimization, algorithm tuning.

autoresearch (uditgoenka): The Swiss Army Knife

3,716 stars · Shell · Last push: Apr 6 · View on Claw4Science

This one took the autoresearch concept and expanded it into a multi-mode Claude Code plugin. Beyond the core optimization loop, it adds:

Debug mode — autonomous bug hunting
Fix mode — find and fix issues systematically
Security audit mode — scan for vulnerabilities
Ship mode — prepare code for production
Predict mode — forecast the impact of changes

It's the version that says "if an AI can optimize experiments, why not optimize everything?"

Key difference from original: Not just an experiment loop — it's a full autonomous development toolkit.

Use this if: You use Claude Code and want autoresearch plus debugging, security, and shipping capabilities in one skill.

autoresearch-mlx: The Mac Native

1,422 stars · Python · Last push: Mar 10 · View on Claw4Science

Karpathy's original needs CUDA. If you're on a Mac, you're out of luck — unless you use this fork. autoresearch-mlx replaces PyTorch with Apple's MLX framework, letting the overnight loop run natively on M-series chips.

No PyTorch, no CUDA, no Docker. Just pip install and go.

Created just 2 days after Karpathy's original — someone really wanted this on their MacBook Pro.

Key difference from original: MLX instead of PyTorch. Runs on Apple Silicon without any translation layers.

Use this if: You have a Mac and want local ML experiment loops without setting up a cloud GPU.

autoresearch-claude-code: The Skill Port

258 stars · Last push: Mar 24

The most direct port — takes pi-autoresearch and packages it as a drop-in Claude Code skill. No configuration, no setup. Install the skill, and Claude Code gains the ability to run autonomous experiment loops.

Key difference from original: It's a skill, not a standalone tool. Lives inside your agent.

Use this if: You want the simplest possible way to add autoresearch to Claude Code.

codex-autoresearch: The Codex Specialist

1,365 stars · Last push: today · View on Claw4Science

Built specifically for OpenAI's Codex agent. The key addition: smart recovery. When the loop gets stuck — same score for too many iterations, or an error that keeps recurring — it automatically tries a different approach instead of grinding on the same dead end.

The most actively maintained variant (updated today), possibly because Codex's market share is growing fast.

Key difference from original: Codex-native, with stuck-detection and automatic strategy switching.

Use this if: You use Codex and want autoresearch that doesn't get trapped in local optima.

The Comparison Table

Variant	Stars	Platform	Scope	Last Active
karpathy/autoresearch	72,499	Python/CUDA	ML training only	Mar 26
pi-autoresearch	3,600	TypeScript/Pi	Any metric	Apr 13
autoresearch (uditgoenka)	3,716	Claude Code	Multi-mode toolkit	Apr 6
autoresearch-mlx	1,422	Python/MLX	ML on Apple Silicon	Mar 10
autoresearch-claude-code	258	Claude Code	Skill port	Mar 24
codex-autoresearch	1,365	Codex	ML with recovery	Today

What the Family Tree Tells Us

1. Good ideas get ported, not forked.

None of these are GitHub forks. They're independent reimplementations. Each author saw the original, understood the pattern, and rebuilt it for their own context. That's a stronger signal than a fork — it means the idea is valuable, not just the code.

2. The loop is the innovation, not the implementation.

Karpathy's contribution wasn't the code (it's not that much code). It was the insight that a simple modify-run-score-keep loop, running overnight, can produce meaningful research results. Every variant preserves this core loop while changing everything around it.

3. Platform fragmentation is real.

Six variants in six weeks, each for a different platform (CUDA, MLX, Claude Code, Codex, Pi, generic). The AI coding tool ecosystem is fragmented enough that "works everywhere" is a feature, not an assumption.

4. The overnight loop is becoming infrastructure.

When Karpathy posted it, "AI runs experiments while you sleep" was a novelty. Six weeks and 82,000 stars later, it's becoming a standard capability that people expect their agent to have. The variants are converging on a shared interface: give it a metric, point it at code, come back in the morning.

Which One Should You Use?

Decision tree:

Are you doing ML training on NVIDIA GPUs? → karpathy/autoresearch (the original)
Are you on a Mac with Apple Silicon? → autoresearch-mlx
Are you using Claude Code? → uditgoenka/autoresearch (most features) or autoresearch-claude-code (simplest)
Are you using Codex? → codex-autoresearch
Are you optimizing something that isn't ML? → pi-autoresearch

Or just try them all. They're all open source, and the loop only takes a few minutes to understand.

All 6 variants are listed in our project directory. For the full ecosystem of 142 AI science agents, visit claw4science.org.

The Autoresearch Family Tree: 6 Forks, 1 Idea

Table of Contents