context_length_exceeded

HTTP 400 invalid_request_error Not Retryable

Your conversation (system prompt + messages + max_tokens) exceeds the model's 200,000 token context window. You must reduce the input size before retrying.

What the error looks like

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "prompt is too long: 247583 tokens > 200000 maximum"
  }
}

The error is surfaced as invalid_request_error with a message containing "prompt is too long" or "context length".

Context window breakdown

Component Counts toward limit?
System prompt Yes
All user messages (entire history) Yes
All assistant messages (entire history) Yes
Tool definitions (JSON schema) Yes
max_tokens you request Yes (reserved)
Total limit = 200,000 tokens. max_tokens is reserved from this pool even if Claude outputs less.

Fix 1: Truncate old messages

Simplest approach — drop oldest turns when approaching the limit:

import anthropic

client = anthropic.Anthropic()
MAX_INPUT_TOKENS = 180_000   # leave headroom for response

def count_tokens(messages, model="claude-sonnet-4-6"):
    result = client.messages.count_tokens(
        model=model,
        messages=messages,
    )
    return result.input_tokens

def trim_history(messages, model="claude-sonnet-4-6"):
    """Remove oldest pairs (user+assistant) until under limit."""
    while count_tokens(messages, model) > MAX_INPUT_TOKENS and len(messages) > 2:
        # Remove the oldest user+assistant pair (keep at least last turn)
        messages = messages[2:]
    return messages

# Usage
messages = [...]   # your conversation history
messages = trim_history(messages)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=messages,
)

Fix 2: Summarize long history

Better for preserving context — summarize early turns with a cheap call:

def summarize_history(messages, keep_last=10):
    """Summarize all but the last N messages."""
    if len(messages) <= keep_last:
        return messages

    to_summarize = messages[:-keep_last]
    recent = messages[-keep_last:]

    # Use haiku for cheap summarization
    summary_resp = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": (
                "Summarize this conversation history concisely, "
                "preserving key facts and decisions:\n\n"
                + "\n".join(
                    f"{m['role']}: {m['content']}"
                    for m in to_summarize
                    if isinstance(m.get('content'), str)
                )
            )
        }]
    )

    summary_text = summary_resp.content[0].text
    summary_message = {
        "role": "user",
        "content": f"[Earlier conversation summary: {summary_text}]"
    }

    return [summary_message] + recent

Fix 3: Check token count before sending

# Always count tokens before expensive calls
token_count = client.messages.count_tokens(
    model="claude-sonnet-4-6",
    system="Your system prompt here",
    messages=messages,
)

print(f"Input tokens: {token_count.input_tokens} / 200,000")

if token_count.input_tokens > 195_000:
    messages = trim_history(messages)   # or summarize

Fix 4: Use prompt caching for large static context

If you have a large but unchanging system prompt or document, use prompt caching — it doesn't reduce tokens but makes large contexts 90% cheaper and faster:

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": very_long_document,
        "cache_control": {"type": "ephemeral"},   # cache this!
    }],
    messages=messages,
)

FAQ

Do all Claude models have the same context limit?
Yes — all Claude 3 and Claude 4 models (Haiku, Sonnet, Opus) share the 200,000 token context window as of 2026.
How many words is 200k tokens?
Roughly 150,000 words, or about 500 pages of text. One token ≈ 0.75 words in English. Code is denser (fewer tokens per line).
Can I use max_tokens to avoid the error?
Reducing max_tokens frees up space in the 200k window (since it's reserved). But the real fix is reducing your input. Set max_tokens to only what you need.