Claude API and the Anthropic SDK: Fundamentals — Claude Code: Developer Reference

Claude API and Anthropic SDK: The Basics

In the previous article, we covered how Claude Agent SDK: Building Agents Programmatically takes over the agent loop for you: you call query(), and Claude handles the planning, tool calls, and deciding when the task is complete. But sometimes you need a different level of control — when you're managing the loop yourself, every token's cost matters, or the logic is too custom for a ready-made agent engine. That's exactly what the Anthropic Client SDK and the underlying Claude API are for.

The distinction is straightforward: the Agent SDK is an agent engine with built-in orchestration; the Client SDK is a thin wrapper over the HTTP API where you have full control over every request and tool-calling loop.

Installation and Authentication

Anthropic provides official SDKs for seven languages: Python, TypeScript/Node.js, C#, Go, Java, PHP, and Ruby. Python or TypeScript will cover most use cases.

# Python
pip install anthropic

# TypeScript / Node.js
npm install @anthropic-ai/sdk

Authentication works through the ANTHROPIC_API_KEY environment variable. The SDK picks it up automatically; you only need to pass the key explicitly in non-standard scenarios:

import anthropic

# Option 1: SDK reads ANTHROPIC_API_KEY from env automatically
client = anthropic.Anthropic()

# Option 2: explicit passing (e.g. in a multi-tenant application)
client = anthropic.Anthropic(api_key="sk-ant-...")

Direct HTTP requests require two mandatory headers: x-api-key with your key value and anthropic-version: 2023-06-01. The SDK adds these automatically — which is one of the main reasons to use it instead of requests/fetch.

The base API URL is https://api.anthropic.com. If your infrastructure requires a specific cloud, Anthropic supports Amazon Bedrock, Google Vertex AI, Claude Platform on AWS, and Microsoft Foundry. Model IDs differ across these platforms (more on this below).

Messages API: Anatomy of a Request

The central endpoint is POST /v1/messages. A minimal request looks like this:

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain what tail call optimization is"}
    ]
)
print(message.content[0].text)

There are three required parameters: model (the model identifier), max_tokens (a hard cap on response length), and messages (an array containing the conversation history).

One key point: the Messages API is stateless. No state is stored on Anthropic's side between calls. With every request, you send the full conversation history — which is exactly why the messages array contains all previous turns in their entirety.

sequenceDiagram participant App as Your code participant API as Claude API App->>API: POST /v1/messages\n{model, messages: [turn1]} API-->>App: {content, stop_reason, usage} Note over App: We save the response locally App->>API: POST /v1/messages\n{model, messages: [turn1, reply1, turn2]} API-->>App: {content, stop_reason, usage} Note over API: API does not store history —\nyou pass it in full each time

sequenceDiagram
    participant App as Your code
    participant API as Claude API

    App->>API: POST /v1/messages\n{model, messages: [turn1]}
    API-->>App: {content, stop_reason, usage}
    Note over App: We save the response locally

    App->>API: POST /v1/messages\n{model, messages: [turn1, reply1, turn2]}
    API-->>App: {content, stop_reason, usage}
    Note over API: API does not store history —\nyou pass it in full each time

The Messages API is stateless: every call must include the complete conversation history.

The system prompt is passed as a separate top-level parameter — not as an element inside messages:

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system="You are an experienced code reviewer. Be concise and to the point.",
    messages=[
        {"role": "user", "content": "Take a look at this SQL query: SELECT * FROM users"},
        {"role": "assistant", "content": "SELECT * is bad practice: it pulls unnecessary columns and breaks caching."},
        {"role": "user", "content": "What's the better approach?"}
    ]
)

Response Anatomy and stop_reason

The response is structured JSON. In the Python SDK it's a Message object with typed fields:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "SELECT id, name, email FROM users"}],
  "model": "claude-sonnet-4-6",
  "stop_reason": "end_turn",
  "usage": {"input_tokens": 87, "output_tokens": 12}
}

The stop_reason field tells you why the model stopped:

Value	Reason
`end_turn`	The model finished its response naturally
`max_tokens`	The `max_tokens` limit was reached — response is truncated
`tool_use`	The model invoked a tool and is waiting for the result
`stop_sequence`	One of the specified stop sequences was encountered
`refusal`	The model declined to respond (introduced in recent models)

usage is your token counter for cost calculation. The input_tokens field includes everything: the system prompt, the full message history, and any instructions. This is worth paying attention to — in long conversations, context costs add up quickly.

The Model Family and Their Identifiers

As of June 2026, the lineup looks like this. Claude Fable 5 (claude-fable-5) is the most powerful publicly available model, with a 1M-token context window and a maximum output of 128k tokens.

The core working set consists of three tiers:

Model	API ID	Context	Price (input/output, $)
Claude Opus 4.8	`claude-opus-4-8`	1M tokens	$5 / $25 per MTok
Claude Sonnet 4.6	`claude-sonnet-4-6`	1M tokens	$3 / $15 per MTok
Claude Haiku 4.5	`claude-haiku-4-5`	200k tokens	$1 / $5 per MTok

A note on naming. Starting with generation 4.6, model identifiers no longer include a date — claude-sonnet-4-6 — and are fixed snapshots rather than "evergreen" pointers. Older models with a date in the ID, such as claude-sonnet-4-5-20250929, are also fixed snapshots. If your code uses an alias like claude-sonnet-4-0, that points to a specific dated snapshot; it's better to specify the full ID explicitly.

Model identifiers differ across clouds: Bedrock uses the format anthropic.claude-opus-4-8, Vertex AI uses claude-opus-4-8 (no prefix). Claude Platform on AWS uses the same IDs as the direct Claude API.

Note on Opus 4.7 and later: the temperature, top_p, and top_k parameters are not supported on these models — passing non-zero values will return a 400 error. Control response style through the prompt instead.

What Else the API Offers

The Messages API is the primary endpoint, but not the only one. Generally available:

Message Batches API (POST /v1/messages/batches) — asynchronous processing of large request volumes at a 50% cost discount. If you don't need an immediate response, this is the cheapest way to use the API.
Token Counting API (POST /v1/messages/count_tokens) — count tokens before sending a request. Useful for managing costs and staying within context window limits.
Models API (GET /v1/models) — a programmatic list of available models with their capabilities and token limits.

In Beta:

Files API — upload files for reuse across multiple API calls (without resending the data each time).
Agents API / Sessions API — the infrastructure for Claude Managed Agents: versioned agent configurations and stateful sessions in managed sandbox environments.

For details on prompt caching and cost savings at scale, see Prompt Caching, Batches, and Cost Optimization. For tool use and streaming, see Tool Use, MCP, and Streaming in the API.

Client SDK vs. Agent SDK: The Final Breakdown

In short:

Client SDK:  you → [request] → API → [response] → you → [decide what's next]
Agent SDK:   you → [task] → engine → [runs the loop itself] → result → you

Choose the Client SDK when full control over each step matters: custom tool loop logic, precise cost tracking, integration into an existing orchestrator, or minimal overhead. Typical examples include an LLM judge in a test pipeline, a request classifier, or content generation adjacent to embeddings with strict post-processing.

Choose the Agent SDK when the task is "do X with the codebase" and you want to delegate the entire agent loop. Most scenarios inside Claude Code use the Agent SDK under the hood.