Legacy Code, Refactoring, and Large Codebases

Legacy code isn't just "old code." It's code that runs in production, that nobody dares to touch, and whose documentation was last updated in 2017. Large codebases add another layer of complexity on top: hundreds of thousands of lines, non-obvious dependencies, and architectural decisions made without any explanation.

Claude Code changes this equation — but not by magic. The agent can read code quickly and tirelessly, build hypotheses about structure, and suggest concrete refactoring steps. What it cannot do is guess the business context behind architectural decisions that "just happened to end up this way."

Быстрое повторение

Какое главное ограничение Claude имеет при анализе легаси-кода?

Reconnaissance: Understanding a Codebase in an Hour

The first step when entering an unfamiliar project isn't to read the code line by line — it's to understand the structure top-down. The agent does this faster than any human:

You are entering an unfamiliar Python project. Start by understanding its structure:
1. Read the README, if there is one
2. Look at the top-level directory structure
3. Find entry points (main.py, __main__.py, setup.py, pyproject.toml)
4. Identify which external dependencies are used and why
5. Find the largest files — that's likely where the core logic lives

After the reconnaissance, write a short summary: what the project does, how it's organized,
and which 3-5 files are the heart of the codebase.

This is the "exploration mode" from Typical Workflows, applied to someone else's code. The agent will read the files, trace the imports, and produce a map — not a perfect one, but good enough for the next step.

For a deeper dive into a specific module:

Explain src/core/transaction_processor.py to me.
Don't walk through the code line by line — what I need is:
- What business problem does this module solve?
- What is the main data flow (inputs → transformations → output)?
- What external dependencies does it use?
- What looks non-standard or risky?

flowchart TD A["Enter legacy project"] --> B["Top-down reconnaissance\nstructure, dependencies, entry points"] B --> C["Create a draft CLAUDE.md\narchitecture, pitfalls, patterns"] C --> D{Task type} D -->|"Find and understand code"| E["Minimal context\nonly the needed files"] D -->|"Change / refactor"| F["Write tests\nfor current behavior"] D -->|"Explain to a newcomer"| G["/onboard slash-command"] D -->|"Audit a large codebase"| H["Subagents\nper module"] F --> I["Refactor in small steps"] I --> J["Tests green → commit"] E --> K["/clear between tasks"] J --> K G --> K H --> K

flowchart TD
    A["Enter legacy project"] --> B["Top-down reconnaissance\nstructure, dependencies, entry points"]
    B --> C["Create a draft CLAUDE.md\narchitecture, pitfalls, patterns"]
    C --> D{Task type}
    D -->|"Find and understand code"| E["Minimal context\nonly the needed files"]
    D -->|"Change / refactor"| F["Write tests\nfor current behavior"]
    D -->|"Explain to a newcomer"| G["/onboard slash-command"]
    D -->|"Audit a large codebase"| H["Subagents\nper module"]
    F --> I["Refactor in small steps"]
    I --> J["Tests green → commit"]
    E --> K["/clear between tasks"]
    J --> K
    G --> K
    H --> K

Workflow for onboarding into a legacy project: from reconnaissance to safely changing code

CLAUDE.md as an Onboarding Document

In a large project, CLAUDE.md serves a dual purpose: it's an instruction set for the agent and an onboarding document for new developers. The difference from a regular README is that CLAUDE.md is written in an imperative style ("do this," "don't do this") rather than a descriptive one.

For a legacy project, CLAUDE.md is especially valuable: without it, the agent will keep "rediscovering" the same pitfalls over and over. An example structure:

# Project: LegacyBilling


::widget{id="rc-3"}

## Architecture (important to understand before making changes)
- src/core/ — billing engine; modify with care, minimal test coverage
- src/legacy/ — code from 2012, DO NOT touch unless absolutely necessary
- src/api/ — REST API; all changes must go through a Feature Flag

## Known Pitfalls
- calculate_invoice() in billing.py has a side effect: it writes directly to audit_log.
  When testing, you need to mock audit_log.write.
- The payments/ module uses global state — do not run tests in parallel.
- .env.example contains an outdated DB_LEGACY_URL — it's only needed for migrations.

## How to Run Tests
pytest tests/ -x --ignore=tests/legacy/  (legacy tests are broken, known issue)

## Code Patterns
- All new classes use dataclass + TypedDict for return values
- Logging via structlog, not print or logging directly

How to generate a first draft with the agent:

Read the project structure and key files.
Generate a draft CLAUDE.md with the following sections:
- Project architecture (2-3 paragraphs)
- Known pitfalls and non-standard decisions
- Commands for running and testing
- Code patterns the project follows

I'll edit the draft — right now I need a foundation, not a final document.

The agent doesn't know the business context or the historical reasons behind architectural decisions — you'll fill that part in manually. But getting a scaffold in 2 minutes instead of 2 hours is already a win. For more detail on the structure and capabilities of CLAUDE.md, see the article CLAUDE.md and the Memory System.

Проверь себя

You're opening a legacy project with 150,000 lines of code for the first time. CLAUDE.md doesn't exist yet. What's your first step — and why that specifically, rather than something else?

Safe Refactoring: Small Steps

The biggest risk of refactoring legacy code is breaking something that "just worked." The agent doesn't insure against this on its own, but it helps you build a process where every step is verifiable.

The "test → refactor → test" pattern:

Here is the process_payment() function from src/payments/processor.py.
Before refactoring:
1. Write unit tests that cover the function's current behavior
   (including edge cases and error handling)
2. Confirm the tests pass against the current code
3. Only then propose the refactoring

Goal: remove the duplication between process_payment() and process_refund() —
there are ~40 lines of identical code.

This is a direct application of the approach from Plan Mode and Test-Driven Development to a legacy context. Tests on the current behavior are the "safety net" that locks down the function's contract before anyone starts changing it.

The Strangler Fig pattern — gradually replacing old code with new — is something the agent can implement through concrete steps:

I want to apply the Strangler Fig pattern to src/legacy/user_auth.py.
The new implementation will live in src/auth/.

1. Create src/auth/authenticator.py with the same public interface
2. Add a feature flag AUTH_USE_NEW_IMPL to settings.py
3. Update the entry point: when flag=True — use new code, when flag=False — use old code
4. Write tests that run against both implementations

Start by analyzing the legacy module and propose a plan before writing any code.

Проверь себя

In front of you is a 200-line process_payment() function with no tests. You want to break it into several smaller ones. What should be the first step — and what happens if you skip it?

What not to delegate to the agent during refactoring:

The decision of whether to refactor at all — that's an architectural and business decision
Deleting code that "appears unused" without explicitly verifying all usage points across the project
Refactoring complex modules with side effects without test coverage in place first

Subagents for Analyzing a Large Codebase

If the project is huge — 200+ files, several large modules — a single agent context won't be enough for a complete analysis. This is where the pattern from Subagents and Context Isolation comes in: break the analysis into independent subtasks.

Conduct an architectural audit of the project. Divide the work:

Subagent 1: Analyze src/api/ — endpoint structure, validation patterns,
error handling. Return: a list of issues by category
(security, code quality, test coverage).

Subagent 2: Analyze src/core/ — business logic, module coupling,
test coverage. Return: a dependency graph and highly coupled modules.

Subagent 3: Analyze src/data/ — database interactions, SQL queries, ORM patterns.
Return: potential N+1 queries, unindexed columns.

Combine the results into a unified report with prioritized issues.

Each subagent gets its own slice of the codebase and doesn't "know" about the other agents' findings — context isolation allows more code to fit into each context window. The main agent synthesizes the results.

Managing Context When Working with a Large Codebase

In very large projects, the most common problem is the context window filling up quickly with irrelevant files. A few practices from Context Window Management, specific to legacy work:

Start each session with minimal context. Instead of "show me the whole project" — "show me only module X and its direct dependencies." CLAUDE.md provides the map; the agent reads only the files it needs.

Use /clear between tasks. After finishing a refactor of one module, clear the context before moving to the next — artifacts from the previous task interfere with focus.

Git worktrees for parallel refactoring. If you need to refactor two independent modules simultaneously, launch two instances of Claude Code in separate worktrees:

git worktree add ../project-auth-refactor auth-refactor
git worktree add ../project-api-refactor api-refactor
# Run a separate claude session in each directory

This eliminates conflicts and allows agents to work in parallel without overlapping contexts. For more on worktrees, see Git, Commits, and Pull Requests.

Проверь себя

You're refactoring two independent modules in a large project — auth and payments. What's better: working sequentially in a single session, or using git worktrees with separate Claude Code sessions?

Onboarding via the Agent: The /onboard Command

A CLAUDE.md written for the agent also works for humans — but you can go further. Create a custom slash command /onboard that kicks off a structured tour of the project:

# .claude/commands/onboard.md
You are an experienced developer introducing a new colleague to the project.

1. Read CLAUDE.md and explain the project architecture in plain terms
2. Walk through the main flow: how a typical request is processed from HTTP to the database
3. List the 5 files that need to be understood first
4. Warn about the 3 main pitfalls from CLAUDE.md
5. Suggest a small first task for getting familiar with the codebase

To use it: simply type /onboard — and the agent will guide a newcomer (or yourself after a break) through the project in a structured way. For details on how custom slash commands work and what can be stored in the frontmatter, see the article Slash Commands: Built-in and Custom.