context_length_exceeded
Your conversation (system prompt + messages + max_tokens) exceeds the model's 200,000 token context window. You must reduce the input size before retrying.
What the error looks like
{
"type": "error",
"error": {
"type": "invalid_request_error",
"message": "prompt is too long: 247583 tokens > 200000 maximum"
}
}
The error is surfaced as invalid_request_error with a message containing "prompt is too long" or "context length".
Context window breakdown
| Component | Counts toward limit? |
|---|---|
| System prompt | Yes |
| All user messages (entire history) | Yes |
| All assistant messages (entire history) | Yes |
| Tool definitions (JSON schema) | Yes |
max_tokens you request |
Yes (reserved) |
Total limit = 200,000 tokens.
max_tokens is reserved from this pool even if Claude outputs less.Fix 1: Truncate old messages
Simplest approach — drop oldest turns when approaching the limit:
import anthropic
client = anthropic.Anthropic()
MAX_INPUT_TOKENS = 180_000 # leave headroom for response
def count_tokens(messages, model="claude-sonnet-4-6"):
result = client.messages.count_tokens(
model=model,
messages=messages,
)
return result.input_tokens
def trim_history(messages, model="claude-sonnet-4-6"):
"""Remove oldest pairs (user+assistant) until under limit."""
while count_tokens(messages, model) > MAX_INPUT_TOKENS and len(messages) > 2:
# Remove the oldest user+assistant pair (keep at least last turn)
messages = messages[2:]
return messages
# Usage
messages = [...] # your conversation history
messages = trim_history(messages)
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
messages=messages,
)
Fix 2: Summarize long history
Better for preserving context — summarize early turns with a cheap call:
def summarize_history(messages, keep_last=10):
"""Summarize all but the last N messages."""
if len(messages) <= keep_last:
return messages
to_summarize = messages[:-keep_last]
recent = messages[-keep_last:]
# Use haiku for cheap summarization
summary_resp = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=1024,
messages=[{
"role": "user",
"content": (
"Summarize this conversation history concisely, "
"preserving key facts and decisions:\n\n"
+ "\n".join(
f"{m['role']}: {m['content']}"
for m in to_summarize
if isinstance(m.get('content'), str)
)
)
}]
)
summary_text = summary_resp.content[0].text
summary_message = {
"role": "user",
"content": f"[Earlier conversation summary: {summary_text}]"
}
return [summary_message] + recent
Fix 3: Check token count before sending
# Always count tokens before expensive calls
token_count = client.messages.count_tokens(
model="claude-sonnet-4-6",
system="Your system prompt here",
messages=messages,
)
print(f"Input tokens: {token_count.input_tokens} / 200,000")
if token_count.input_tokens > 195_000:
messages = trim_history(messages) # or summarize
Fix 4: Use prompt caching for large static context
If you have a large but unchanging system prompt or document, use prompt caching — it doesn't reduce tokens but makes large contexts 90% cheaper and faster:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[{
"type": "text",
"text": very_long_document,
"cache_control": {"type": "ephemeral"}, # cache this!
}],
messages=messages,
)
FAQ
Do all Claude models have the same context limit?
Yes — all Claude 3 and Claude 4 models (Haiku, Sonnet, Opus) share the 200,000 token context window as of 2026.
How many words is 200k tokens?
Roughly 150,000 words, or about 500 pages of text. One token ≈ 0.75 words in English. Code is denser (fewer tokens per line).
Can I use max_tokens to avoid the error?
Reducing
max_tokens frees up space in the 200k window (since it's reserved). But the real fix is reducing your input. Set max_tokens to only what you need.