Plan Mode and Test-Driven Development
The previous article covered the explore → plan → code → commit pattern as an architectural approach to working with an agent. This article takes a detailed look at the two tools that make that pattern reliable: plan mode and the TDD loop. Both solve the same problem: the agent has no way of knowing when the work is done correctly unless you give it a way to check.
Plan Mode: Protection Against Premature Code
Plan mode (plan mode) is a permission mode in which Claude can read files, run read-only commands, and propose changes — but cannot write anything to disk. No Edit, no Write, no destructive Bash commands.
You can enable it in three ways:
# Launch directly in plan mode
claude --permission-mode plan
# Toggle mid-session
# Shift+Tab — cycles through permission modes
# Or explicitly from the command line inside a session:
/planToggling with Shift+Tab works as a cycle: default → acceptEdits → plan → default. Useful when you're already in a session and want to safely explore before committing to any changes.
What specifically happens in plan mode:
✅ Allowed: ❌ Blocked:
──────────────────────────────────────────────
Read (file reading) Edit (file editing)
Grep, Glob Write (file creation)
Bash (read-only) Bash (with side effects)
Analysis & planning All destructive commandsOnce Claude has drafted a plan, press Ctrl+G — the plan will open in your editor. You can remove unnecessary steps, add constraints, and adjust priorities. Claude will follow exactly this version of the plan when you exit plan mode.
What Makes a Good Plan
An agent in plan mode is an architect without a hammer. Use that to your advantage: ask questions that require understanding the structure, not implementing it.
# Weak plan-mode request:
add Google OAuth
# Strong request:
I need to add Google OAuth. Read @src/auth/ and figure out
how the current login flow works. Then:
1. Show which files need to change and why
2. What new files will be created
3. How the session flow will change after the OAuth callback
4. Are there any backward-compatibility risks with existing sessions
Draft a plan with specific file names and steps.The more specific the questions, the more precise the plan. A good plan includes:
- Files with specific changes (
src/auth/handlers.ts— addhandleOAuthCallbackfunction) - A sequence of steps with rationale for the ordering
- Verification checkpoints ("after step 3, run
npm test auth") - Explicit constraints ("do not modify the existing
sessionMiddleware")
There's also a practical trick for complex features: have Claude run an "interview" before the plan.
I want to implement a notification system. Conduct a detailed interview
using AskUserQuestion. Ask about technical approaches,
UX, edge cases, and trade-offs — especially ones I
might not have thought of. After the interview, write a full spec in SPEC.md.Once you have SPEC.md, start a fresh session for implementation — keep the planning context separate from the implementation context.
The TDD Loop: Tests as an Oracle for the Agent
Now for the most important part. The Anthropic documentation puts it this way:
Give Claude a check it can run: tests, a build, a screenshot to compare. It's the difference between a session you watch and one you walk away from.
Without a check, Claude stops when things "look ready." With a test, it stops when the test is green. That's a fundamental difference.
TDD in the context of Claude doesn't have to be the strict Red-Green-Refactor cycle from the textbooks. It's a pattern: first define what "correct" looks like, then let the agent iterate toward that state.
flowchart TD
A(["Task: write function X"]) --> B["📝 Write failing test\n(in plan-mode or manually)"]
B --> C["▶ Run test\n→ FAIL"]
C --> D["🤖 Claude: implement X\n(in default-mode)"]
D --> E["▶ Run test"]
E --> F{"Test\npassed?"}
F -->|"❌ FAIL"| G["Claude: read error\nand fix"]
G --> E
F -->|"✅ PASS"| H["▶ Run full suite"]
H --> I{"All green?"}
I -->|"❌ Regression"| G
I -->|"✅ OK"| J(["Done: commit and PR"])
style A fill:#e8f4f8,stroke:#2980b9
style J fill:#e8f8e8,stroke:#27ae60
style F fill:#fff3cd,stroke:#f39c12
style I fill:#fff3cd,stroke:#f39c12In practice, it looks like this:
# Step 1: write a failing test (or ask Claude to write one)
# Step 2: give Claude a task with an explicit stopping condition
claude --permission-mode plan
> Read src/utils/tokenRefresh.ts and write a failing test
that verifies: if the refresh token has expired, the function
should throw a TokenExpiredError rather than return null.
Do not implement anything — only the test.Once the test is written and run, you see a red result:
FAIL src/utils/tokenRefresh.test.ts
✕ throws TokenExpiredError when refresh token is expired
Expected: TokenExpiredError
Received: nullNow exit plan mode and give Claude the full task:
# Exit plan mode: Shift+Tab
implement the refreshToken function in src/utils/tokenRefresh.ts so that
it passes the test you wrote. After each change,
run `npm test -- tokenRefresh` and iterate until green.
If the test fails three times in a row with the same error — stop
and explain the problem, don't keep guessing.Key elements of a good TDD prompt for an agent:
1. A specific verification command (npm test -- tokenRefresh, not just "run the tests")
2. Iterate to completion ("iterate until green")
3. A stop condition (so the agent doesn't spin in an infinite loop when stuck)
Verifier Reference Table
A test isn't the only way to give the agent feedback. Here's what works for different types of tasks:
| Task type | Verifier | Example command |
|---|---|---|
| Logic / algorithms | Unit test | pytest tests/unit/test_parser.py |
| Refactoring | Test + type check | tsc --noEmit && npm test |
| Bug fix | Failing test → fix → pass | Write a test that reproduces the bug |
| UI changes | Screenshot + comparison | claude --chrome + visual diff |
| Migration | Linter + types + tests | eslint . && tsc && pytest |
| API / integration | End-to-end | curl + status code check |
| Infrastructure | Dry-run or plan | terraform plan |
The verifier command should be stated explicitly in the prompt. "Run the tests" is too ambiguous: Claude might run the entire suite instead of a single file, which is slow and noisy.
Combining Plan Mode and TDD
Both tools work together: plan mode handles understanding and design, TDD handles safe iterative implementation.
Phase 1 (plan mode): understand → design → write the test
Phase 2 (default mode): implement → run the test → iterate → greenA complete prompt for a complex task:
[In plan mode]
Explore @src/payments/ and understand how we process
refunds. Then:
1. Draft a plan for adding partial refunds
2. Write a failing test in tests/unit/test_refunds.py
that verifies: a partial refund for an amount greater than
the original payment should throw an OverRefundError
3. Do not implement anything — only the plan and the test[Exit plan mode: Shift+Tab]
Implement the plan from step 1. After each change,
run `pytest tests/unit/test_refunds.py -v`.
Once all tests are green, run the full suite:
`pytest tests/ -x` and make sure nothing is broken.
Then open a PR with a descriptive message.Stop Hook: A Deterministic Barrier
For autonomous sessions without constant supervision, there's a next level: Stop Hook. This is a shell command that runs every time the agent attempts to end a session. If the command exits with a non-zero code, the agent cannot finish.
// .claude/settings.json
{
"hooks": {
"Stop": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "npm test -- --passWithNoTests"
}
]
}
]
}
}With this hook in place, Claude cannot finish until npm test returns 0. This is a deterministic barrier: unlike an instruction in CLAUDE.md (which the agent might "forget"), the hook always runs regardless of context state.
Important: the documentation notes that Claude Code forcibly terminates the session after 8 consecutive hook blocks — a safeguard against infinite loops. So Stop Hook is best suited for tasks where the test genuinely should pass, not as a permanent guardian.
When Plan Mode and TDD Aren't Needed
Both tools add overhead. Not every task justifies that cost.
Skip plan mode if:
- The task fits in one sentence: "add a null-check on line 42"
- You know the code being changed well
- The diff will touch one file in one place
Skip TDD if:
- The change is purely cosmetic (renaming, formatting)
- The change is documentation or configuration
- You can verify the result visually faster than you can write a test
The rule: if you're uncertain about the approach or unfamiliar with the code being changed — enable plan mode. If the result is hard to verify visually and there's something worth testing — provide a test.
See also
- Typical workflows: explore, plan, implement — the explore → plan → code → commit pattern that both tools in this article grow from
- Best practices and code organization patterns — the full list of Anthropic recommendations, including context discipline and the verification principle
- Hooks — lifecycle events — a detailed look at Stop Hook and other lifecycle events for deterministic automation
- Subagents and context isolation — the Writer/Reviewer pattern: one agent writes, another verifies in a clean context
- Managing the context window — why plan mode helps keep context clean, and how /clear and /compact complement it
- Git, commits, and pull requests — the final phase after a TDD cycle: committing and opening a PR with a meaningful description