Plan Mode and Test-Driven Development

The previous article covered the explore → plan → code → commit pattern as an architectural approach to working with an agent. This article takes a detailed look at the two tools that make that pattern reliable: plan mode and the TDD loop. Both solve the same problem: the agent has no way of knowing when the work is done correctly unless you give it a way to check.

Plan Mode: Protection Against Premature Code

Plan mode (plan mode) is a permission mode in which Claude can read files, run read-only commands, and propose changes — but cannot write anything to disk. No Edit, no Write, no destructive Bash commands.

You can enable it in three ways:

# Launch directly in plan mode
claude --permission-mode plan

# Toggle mid-session
# Shift+Tab — cycles through permission modes

# Or explicitly from the command line inside a session:
/plan

Toggling with Shift+Tab works as a cycle: default → acceptEdits → plan → default. Useful when you're already in a session and want to safely explore before committing to any changes.

What specifically happens in plan mode:

✅ Allowed:             ❌ Blocked:
──────────────────────────────────────────────
Read (file reading)    Edit (file editing)
Grep, Glob             Write (file creation)
Bash (read-only)       Bash (with side effects)
Analysis & planning    All destructive commands

Once Claude has drafted a plan, press Ctrl+G — the plan will open in your editor. You can remove unnecessary steps, add constraints, and adjust priorities. Claude will follow exactly this version of the plan when you exit plan mode.

What Makes a Good Plan

An agent in plan mode is an architect without a hammer. Use that to your advantage: ask questions that require understanding the structure, not implementing it.

# Weak plan-mode request:
add Google OAuth

# Strong request:
I need to add Google OAuth. Read @src/auth/ and figure out
how the current login flow works. Then:
1. Show which files need to change and why
2. What new files will be created
3. How the session flow will change after the OAuth callback
4. Are there any backward-compatibility risks with existing sessions
Draft a plan with specific file names and steps.

The more specific the questions, the more precise the plan. A good plan includes:

Files with specific changes (src/auth/handlers.ts — add handleOAuthCallback function)
A sequence of steps with rationale for the ordering
Verification checkpoints ("after step 3, run npm test auth")
Explicit constraints ("do not modify the existing sessionMiddleware")

There's also a practical trick for complex features: have Claude run an "interview" before the plan.

I want to implement a notification system. Conduct a detailed interview
using AskUserQuestion. Ask about technical approaches,
UX, edge cases, and trade-offs — especially ones I
might not have thought of. After the interview, write a full spec in SPEC.md.

Once you have SPEC.md, start a fresh session for implementation — keep the planning context separate from the implementation context.

The TDD Loop: Tests as an Oracle for the Agent

Now for the most important part. The Anthropic documentation puts it this way:

Give Claude a check it can run: tests, a build, a screenshot to compare. It's the difference between a session you watch and one you walk away from.

Without a check, Claude stops when things "look ready." With a test, it stops when the test is green. That's a fundamental difference.

TDD in the context of Claude doesn't have to be the strict Red-Green-Refactor cycle from the textbooks. It's a pattern: first define what "correct" looks like, then let the agent iterate toward that state.

flowchart TD A(["Task: write function X"]) --> B["📝 Write failing test\n(in plan-mode or manually)"] B --> C["▶ Run test\n→ FAIL"] C --> D["🤖 Claude: implement X\n(in default-mode)"] D --> E["▶ Run test"] E --> F{"Test\npassed?"} F -->|"❌ FAIL"| G["Claude: read error\nand fix"] G --> E F -->|"✅ PASS"| H["▶ Run full suite"] H --> I{"All green?"} I -->|"❌ Regression"| G I -->|"✅ OK"| J(["Done: commit and PR"]) style A fill:#e8f4f8,stroke:#2980b9 style J fill:#e8f8e8,stroke:#27ae60 style F fill:#fff3cd,stroke:#f39c12 style I fill:#fff3cd,stroke:#f39c12

flowchart TD
    A(["Task: write function X"]) --> B["📝 Write failing test\n(in plan-mode or manually)"]
    B --> C["▶ Run test\n→ FAIL"]
    C --> D["🤖 Claude: implement X\n(in default-mode)"]
    D --> E["▶ Run test"]
    E --> F{"Test\npassed?"}
    F -->|"❌ FAIL"| G["Claude: read error\nand fix"]
    G --> E
    F -->|"✅ PASS"| H["▶ Run full suite"]
    H --> I{"All green?"}
    I -->|"❌ Regression"| G
    I -->|"✅ OK"| J(["Done: commit and PR"])

    style A fill:#e8f4f8,stroke:#2980b9
    style J fill:#e8f8e8,stroke:#27ae60
    style F fill:#fff3cd,stroke:#f39c12
    style I fill:#fff3cd,stroke:#f39c12

TDD loop with an agent: the test as an oracle, the agent iterates until green

In practice, it looks like this:

# Step 1: write a failing test (or ask Claude to write one)
# Step 2: give Claude a task with an explicit stopping condition

claude --permission-mode plan
> Read src/utils/tokenRefresh.ts and write a failing test
  that verifies: if the refresh token has expired, the function
  should throw a TokenExpiredError rather than return null.
  Do not implement anything — only the test.

Once the test is written and run, you see a red result:

FAIL src/utils/tokenRefresh.test.ts
  ✕ throws TokenExpiredError when refresh token is expired
    Expected: TokenExpiredError
    Received: null

Now exit plan mode and give Claude the full task:

# Exit plan mode: Shift+Tab

implement the refreshToken function in src/utils/tokenRefresh.ts so that
it passes the test you wrote. After each change,
run `npm test -- tokenRefresh` and iterate until green.
If the test fails three times in a row with the same error — stop
and explain the problem, don't keep guessing.

Key elements of a good TDD prompt for an agent:

1. A specific verification command (npm test -- tokenRefresh, not just "run the tests")

2. Iterate to completion ("iterate until green")

3. A stop condition (so the agent doesn't spin in an infinite loop when stuck)

Check yourself

You write a prompt: "implement a parseDate function". The agent implements it, says "done" — and waits for your response. Name one specific change to the prompt that will allow the agent to determine on its own that the work has been done correctly.

Verifier Reference Table

A test isn't the only way to give the agent feedback. Here's what works for different types of tasks:

Task type	Verifier	Example command
Logic / algorithms	Unit test	`pytest tests/unit/test_parser.py`
Refactoring	Test + type check	`tsc --noEmit && npm test`
Bug fix	Failing test → fix → pass	Write a test that reproduces the bug
UI changes	Screenshot + comparison	`claude --chrome` + visual diff
Migration	Linter + types + tests	`eslint . && tsc && pytest`
API / integration	End-to-end	`curl` + status code check
Infrastructure	Dry-run or plan	`terraform plan`

The verifier command should be stated explicitly in the prompt. "Run the tests" is too ambiguous: Claude might run the entire suite instead of a single file, which is slow and noisy.

Combining Plan Mode and TDD

Both tools work together: plan mode handles understanding and design, TDD handles safe iterative implementation.

Phase 1 (plan mode): understand → design → write the test
Phase 2 (default mode): implement → run the test → iterate → green

A complete prompt for a complex task:

[In plan mode]
Explore @src/payments/ and understand how we process
refunds. Then:
1. Draft a plan for adding partial refunds
2. Write a failing test in tests/unit/test_refunds.py
   that verifies: a partial refund for an amount greater than
   the original payment should throw an OverRefundError
3. Do not implement anything — only the plan and the test

[Exit plan mode: Shift+Tab]
Implement the plan from step 1. After each change,
run `pytest tests/unit/test_refunds.py -v`.
Once all tests are green, run the full suite:
`pytest tests/ -x` and make sure nothing is broken.
Then open a PR with a descriptive message.

Stop Hook: A Deterministic Barrier

For autonomous sessions without constant supervision, there's a next level: Stop Hook. This is a shell command that runs every time the agent attempts to end a session. If the command exits with a non-zero code, the agent cannot finish.

// .claude/settings.json
{
  "hooks": {
    "Stop": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "npm test -- --passWithNoTests"
          }
        ]
      }
    ]
  }
}

With this hook in place, Claude cannot finish until npm test returns 0. This is a deterministic barrier: unlike an instruction in CLAUDE.md (which the agent might "forget"), the hook always runs regardless of context state.

Important: the documentation notes that Claude Code forcibly terminates the session after 8 consecutive hook blocks — a safeguard against infinite loops. So Stop Hook is best suited for tasks where the test genuinely should pass, not as a permanent guardian.

Check yourself

You have configured a Stop Hook that runs the full test suite. The agent is working on a task and has gotten stuck: every attempt to finish is blocked because one test has been failing for 10 iterations. What will happen automatically?

When Plan Mode and TDD Aren't Needed

Both tools add overhead. Not every task justifies that cost.

Skip plan mode if:

The task fits in one sentence: "add a null-check on line 42"
You know the code being changed well
The diff will touch one file in one place

Skip TDD if:

The change is purely cosmetic (renaming, formatting)
The change is documentation or configuration
You can verify the result visually faster than you can write a test

The rule: if you're uncertain about the approach or unfamiliar with the code being changed — enable plan mode. If the result is hard to verify visually and there's something worth testing — provide a test.