Model Selection and Thinking Modes
The configuration settings from the previous article define rules and permissions — but it is the model choice that determines how "smart" and fast the agent will be. This is the most direct lever for controlling the quality/cost trade-off in your work, and the right configuration here saves real money.
The Model Family: Current Landscape
As of mid-2026, Claude Code works with four primary model tiers:
| Model | API ID | Price (input/output, per 1M tokens) | Context | Thinking |
|---|---|---|---|---|
| Haiku 4.5 | claude-haiku-4-5 | $1 / $5 | 200k | Extended ✓ |
| Sonnet 4.6 | claude-sonnet-4-6 | $3 / $15 | 1M | Extended + Adaptive ✓ |
| Opus 4.8 | claude-opus-4-8 | $5 / $25 | 1M | Adaptive ✓ |
| Fable 5 | claude-fable-5 | $10 / $50 | 1M | Adaptive ✓ |
Haiku is the fastest and cheapest option. It handles routine tasks well: boilerplate generation, simple edits, renames, formatting, and short summaries.
Sonnet is the workhorse for most developers. It offers good speed at high quality and supports both extended and adaptive thinking. One important detail: a 1M token context window versus Haiku's 200k — this matters significantly when working with large codebases.
Opus 4.8 is for genuinely complex tasks: architectural decisions, analysis of tangled dependencies, multi-step reasoning. It runs with effort: high by default — this is documented by Anthropic.
Fable 5 is the most powerful publicly available model at this time; adaptive thinking is always enabled. It is only justified where the cost of failure is high and other models fall short.
In practice, Sonnet covers 90% of work in Claude Code — it is used by default. Haiku and Opus are targeted switches for specific task types.
flowchart LR
H["Haiku 4.5\n$1/$5 per MTok\n200k context\nExtended thinking"]
S["Sonnet 4.6\n$3/$15 per MTok\n1M context\nExtended + Adaptive"]
O["Opus 4.8\n$5/$25 per MTok\n1M context\nAdaptive thinking"]
F["Fable 5\n$10/$50 per MTok\n1M context\nAdaptive thinking"]
H -- "pricier · smarter" --> S -- "pricier · smarter" --> O -- "pricier · smarter" --> FHow to Switch Models
Four ways, from quickest to most permanent.
1. Within a session — /model
/model claude-opus-4-8The switch happens instantly, with no restart. The next request will already go through Opus. Useful when you realize mid-session that a task is more complex than expected.
2. For the entire team — settings.json
{
"model": "claude-sonnet-4-6"
}As covered in Settings and Configuration Hierarchy: the project-level file is committed to the repo and applies to everyone. Personal overrides go in settings.local.json.
3. Environment variable — ANTHROPIC_MODEL
ANTHROPIC_MODEL=claude-haiku-4-5 claude -p "brief summary of this file"Useful in CI/CD: a cheap model for routine checks, an expensive one only for code review.
4. CLI flag — --model
claude --model claude-opus-4-8 "figure out this architecture"Overrides all settings for the duration of a single request.
Full priority order:
CLI --model > settings.local.json > settings.json > ~/.claude/settings.json > ANTHROPIC_MODELExtended Thinking and Adaptive Thinking
These are two distinct mechanisms — they are often confused, but they solve different problems.
Extended thinking is an explicit mode: you set a token budget for "internal reasoning" yourself, and the model thinks before responding. Supported by Haiku 4.5 and (in backward-compatibility mode) Sonnet 4.6. In Claude Code, it is enabled via:
{
"alwaysThinkingEnabled": true
}The thinking budget is controlled by an environment variable:
MAX_THINKING_TOKENS=8000 claudeBy default, the "thoughts" are hidden — only the final answer is visible. To enable summary display:
{
"showThinkingSummaries": true
}To disable extended thinking entirely (for example, for fast automated tasks):
MAX_THINKING_TOKENS=0 claude -p "add a docstring"Adaptive thinking is a different mechanism: the model decides for itself when and how deeply to think, without an explicit budget. It is built into Sonnet 4.6, Opus 4.8, and Fable 5. The user does not control this directly, but the depth of reasoning depends on effortLevel (see below).
Practical implication: there is no point enabling alwaysThinkingEnabled on Opus 4.8 — it does not support extended thinking, only adaptive. On Sonnet 4.6, both mechanisms work, but Anthropic recommends moving to effortLevel instead of an explicit token budget.
effortLevel: A Dial Between Speed and Depth
The effortLevel parameter is a finer-grained control than model selection. It governs how much effort the agent puts into each request, and operates on top of adaptive thinking:
| Level | When to use |
|---|---|
low | Quick edits, renames, formatting |
medium | Standard development work (default for most models) |
high | Refactoring, debugging non-trivial bugs |
xhigh | Architectural decisions, analysis of large codebases |
Set it in settings.json:
{
"effortLevel": "high"
}Switch it within a session:
/effort xhighApply it before a single request via CLI:
CLAUDE_CODE_EFFORT_LEVEL=low claude -p "add comments to the function"effortLevel and model selection are orthogonal settings. Sonnet with effortLevel: xhigh on a specific task can outperform Opus with effortLevel: low. Opus 4.8 runs with effort: high by default — if you want to save costs, explicitly set low or medium.
Practical Guide: When to Pay for a More Powerful Model
Haiku is sufficient for:
- Generating unit tests from existing code
- Writing CHANGELOGs or commit messages
- Renaming variables, formatting
- Brief summaries of a file or document
Sonnet — the default for most tasks:
- Implementing new features of moderate size
- Debugging a non-trivial bug with stack and log context
- Code review with explanation of issues
- Working with large context (when more than 100k tokens are needed)
Opus 4.8 / Fable 5 — when complexity is genuinely high:
- Reverse-engineering someone else's architecture without documentation
- Refactoring legacy code with non-obvious invariants
- Architectural decisions with many trade-offs
- Tasks where Sonnet has repeatedly failed or lost the thread
The key principle: start with Sonnet, and switch to Opus or Fable 5 only when you see the model "not keeping up" — giving shallow answers or making clearly poor decisions. For CI/CD automation, consider Haiku — the cost difference across hundreds of requests per day is significant.
fallbackModel and Team-Level Management
If the primary model is overloaded or unavailable, Claude Code will automatically switch to a fallback:
{
"model": "claude-opus-4-8",
"fallbackModel": ["claude-sonnet-4-6", "claude-haiku-4-5"]
}The chain supports up to three models, tried in order. This list is not merged across settings files — only the one from the highest-priority file takes effect.
For enterprise deployments, you can restrict available models via managed settings:
{
"availableModels": ["sonnet", "haiku"],
"enforceAvailableModels": true
}This prevents developers from accidentally switching to an expensive model where the budget does not allow for it.
See also
- Settings and Configuration Hierarchy — where to set
modelandeffortLevel, and how they are inherited across levels - Context Window Management — Haiku is limited to 200k tokens, while Sonnet and Opus provide 1M; this directly affects model choice for large codebases
- CLAUDE.md and the Memory System — you can instruct the agent in CLAUDE.md to request a more powerful model for specific task types
- Prompt Caching, Batches, and Cost Optimization — how to reduce costs when using expensive models through caching and the batch API
- Subagents and Context Isolation — subagents can be assigned a different model than the main session; a typical pattern is using Haiku for routine subtasks
- Headless Mode and CLI Scripting —
--modelandANTHROPIC_MODELin CI/CD pipelines