Model Selection and Thinking Modes

The configuration settings from the previous article define rules and permissions — but it is the model choice that determines how "smart" and fast the agent will be. This is the most direct lever for controlling the quality/cost trade-off in your work, and the right configuration here saves real money.

The Model Family: Current Landscape

As of mid-2026, Claude Code works with four primary model tiers:

Model	API ID	Price (input/output, per 1M tokens)	Context	Thinking
Haiku 4.5	`claude-haiku-4-5`	$1 / $5	200k	Extended ✓
Sonnet 4.6	`claude-sonnet-4-6`	$3 / $15	1M	Extended + Adaptive ✓
Opus 4.8	`claude-opus-4-8`	$5 / $25	1M	Adaptive ✓
Fable 5	`claude-fable-5`	$10 / $50	1M	Adaptive ✓

Haiku is the fastest and cheapest option. It handles routine tasks well: boilerplate generation, simple edits, renames, formatting, and short summaries.

Sonnet is the workhorse for most developers. It offers good speed at high quality and supports both extended and adaptive thinking. One important detail: a 1M token context window versus Haiku's 200k — this matters significantly when working with large codebases.

Opus 4.8 is for genuinely complex tasks: architectural decisions, analysis of tangled dependencies, multi-step reasoning. It runs with effort: high by default — this is documented by Anthropic.

Fable 5 is the most powerful publicly available model at this time; adaptive thinking is always enabled. It is only justified where the cost of failure is high and other models fall short.

In practice, Sonnet covers 90% of work in Claude Code — it is used by default. Haiku and Opus are targeted switches for specific task types.

flowchart LR H["Haiku 4.5\n$1/$5 per MTok\n200k context\nExtended thinking"] S["Sonnet 4.6\n$3/$15 per MTok\n1M context\nExtended + Adaptive"] O["Opus 4.8\n$5/$25 per MTok\n1M context\nAdaptive thinking"] F["Fable 5\n$10/$50 per MTok\n1M context\nAdaptive thinking"] H -- "pricier · smarter" --> S -- "pricier · smarter" --> O -- "pricier · smarter" --> F

flowchart LR
    H["Haiku 4.5\n$1/$5 per MTok\n200k context\nExtended thinking"]
    S["Sonnet 4.6\n$3/$15 per MTok\n1M context\nExtended + Adaptive"]
    O["Opus 4.8\n$5/$25 per MTok\n1M context\nAdaptive thinking"]
    F["Fable 5\n$10/$50 per MTok\n1M context\nAdaptive thinking"]
    H -- "pricier · smarter" --> S -- "pricier · smarter" --> O -- "pricier · smarter" --> F

The four Claude model tiers: from fastest and most affordable to most powerful

How to Switch Models

Four ways, from quickest to most permanent.

1. Within a session — /model

/model claude-opus-4-8

The switch happens instantly, with no restart. The next request will already go through Opus. Useful when you realize mid-session that a task is more complex than expected.

2. For the entire team — settings.json

{
  "model": "claude-sonnet-4-6"
}

As covered in Settings and Configuration Hierarchy: the project-level file is committed to the repo and applies to everyone. Personal overrides go in settings.local.json.

3. Environment variable — ANTHROPIC_MODEL

ANTHROPIC_MODEL=claude-haiku-4-5 claude -p "brief summary of this file"

Useful in CI/CD: a cheap model for routine checks, an expensive one only for code review.

4. CLI flag — --model

claude --model claude-opus-4-8 "figure out this architecture"

Overrides all settings for the duration of a single request.

Full priority order:

CLI --model  >  settings.local.json  >  settings.json  >  ~/.claude/settings.json  >  ANTHROPIC_MODEL

Extended Thinking and Adaptive Thinking

These are two distinct mechanisms — they are often confused, but they solve different problems.

Extended thinking is an explicit mode: you set a token budget for "internal reasoning" yourself, and the model thinks before responding. Supported by Haiku 4.5 and (in backward-compatibility mode) Sonnet 4.6. In Claude Code, it is enabled via:

{
  "alwaysThinkingEnabled": true
}

The thinking budget is controlled by an environment variable:

MAX_THINKING_TOKENS=8000 claude

By default, the "thoughts" are hidden — only the final answer is visible. To enable summary display:

{
  "showThinkingSummaries": true
}

To disable extended thinking entirely (for example, for fast automated tasks):

MAX_THINKING_TOKENS=0 claude -p "add a docstring"

Adaptive thinking is a different mechanism: the model decides for itself when and how deeply to think, without an explicit budget. It is built into Sonnet 4.6, Opus 4.8, and Fable 5. The user does not control this directly, but the depth of reasoning depends on effortLevel (see below).

Practical implication: there is no point enabling alwaysThinkingEnabled on Opus 4.8 — it does not support extended thinking, only adaptive. On Sonnet 4.6, both mechanisms work, but Anthropic recommends moving to effortLevel instead of an explicit token budget.

effortLevel: A Dial Between Speed and Depth

The effortLevel parameter is a finer-grained control than model selection. It governs how much effort the agent puts into each request, and operates on top of adaptive thinking:

Level	When to use
`low`	Quick edits, renames, formatting
`medium`	Standard development work (default for most models)
`high`	Refactoring, debugging non-trivial bugs
`xhigh`	Architectural decisions, analysis of large codebases

Set it in settings.json:

{
  "effortLevel": "high"
}

Switch it within a session:

/effort xhigh

Apply it before a single request via CLI:

CLAUDE_CODE_EFFORT_LEVEL=low claude -p "add comments to the function"

effortLevel and model selection are orthogonal settings. Sonnet with effortLevel: xhigh on a specific task can outperform Opus with effortLevel: low. Opus 4.8 runs with effort: high by default — if you want to save costs, explicitly set low or medium.

Practical Guide: When to Pay for a More Powerful Model

Haiku is sufficient for:

Generating unit tests from existing code
Writing CHANGELOGs or commit messages
Renaming variables, formatting
Brief summaries of a file or document

Sonnet — the default for most tasks:

Implementing new features of moderate size
Debugging a non-trivial bug with stack and log context
Code review with explanation of issues
Working with large context (when more than 100k tokens are needed)

Opus 4.8 / Fable 5 — when complexity is genuinely high:

Reverse-engineering someone else's architecture without documentation
Refactoring legacy code with non-obvious invariants
Architectural decisions with many trade-offs
Tasks where Sonnet has repeatedly failed or lost the thread

The key principle: start with Sonnet, and switch to Opus or Fable 5 only when you see the model "not keeping up" — giving shallow answers or making clearly poor decisions. For CI/CD automation, consider Haiku — the cost difference across hundreds of requests per day is significant.

fallbackModel and Team-Level Management

If the primary model is overloaded or unavailable, Claude Code will automatically switch to a fallback:

{
  "model": "claude-opus-4-8",
  "fallbackModel": ["claude-sonnet-4-6", "claude-haiku-4-5"]
}

The chain supports up to three models, tried in order. This list is not merged across settings files — only the one from the highest-priority file takes effect.

For enterprise deployments, you can restrict available models via managed settings:

{
  "availableModels": ["sonnet", "haiku"],
  "enforceAvailableModels": true
}

This prevents developers from accidentally switching to an expensive model where the budget does not allow for it.

Quick recall

Почему контекстное окно Sonnet (1 миллион токенов) критичнее для больших кодовых баз, чем 200k Haiku?