What is the default timeout for the Anthropic Claude API?

The Anthropic Python SDK defaults to a 600 second (10 minute) total request timeout. The Node.js SDK defaults to 10 minutes as well. For streaming requests, the timeout applies to the entire stream duration, not just the first token.

Why does my Claude API request keep timing out?

Common causes: (1) max_tokens is set very high with a slow model, (2) your client timeout is lower than the SDK default (e.g. a 30s proxy timeout upstream), (3) network issues between your server and Anthropic, (4) the request is genuinely taking a long time due to complex generation. Use streaming to receive tokens as they arrive instead of waiting for the full response.

How do I increase the timeout in the Anthropic Python SDK?

Pass a httpx.Timeout object to the Anthropic client: client = anthropic.Anthropic(timeout=httpx.Timeout(60.0, read=300.0)). Or per-request: client.messages.create(..., timeout=300). The default is 600 seconds but many cloud functions have shorter limits.

Should I use streaming to avoid timeouts?

Yes. Streaming is the correct solution for long responses because it delivers tokens as they are generated. Your client stays active and doesn't hit idle-connection timeouts. Use client.messages.stream() in Python or client.messages.stream() in Node.js.

timeout_error

Network Retryable Config

The request to Claude exceeded the configured timeout before a response was returned. Either the response was genuinely taking too long (use streaming), or your client timeout is shorter than needed for the request complexity.

What it looks like (SDK exception)

# Python SDK
anthropic.APITimeoutError: Request timed out after 600.0 seconds

# Node.js SDK
APIError: Request timed out

# Raw HTTP (if no SDK)
# The connection hangs, then drops — no JSON body returned

Why it happens

High max_tokens with a large model — Claude Opus 4.7 with max_tokens=8192 on a complex prompt can take 60–120 seconds
Upstream proxy timeout — Vercel Edge, Cloudflare Workers, API Gateways have their own timeouts (typically 30–60s) shorter than the SDK default
Default client timeout in your framework — requests, fetch, axios — all have their own defaults that may conflict
Network instability — transient; retry with backoff

Fix 1: Use streaming (best practice for long responses)

Streaming keeps the connection alive and delivers tokens as they arrive. No idle-connection timeout fires because data keeps flowing.

import anthropic

client = anthropic.Anthropic()

# ✅ Stream long responses — no timeout issues
with client.messages.stream(
    model="claude-opus-4-7",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Write a detailed analysis..."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

# Or collect the full message after streaming
final_message = stream.get_final_message()

Fix 2: Increase timeout in the SDK

import anthropic
import httpx

# Per-client (applies to all requests from this client)
client = anthropic.Anthropic(
    timeout=httpx.Timeout(
        connect=10.0,    # connection timeout
        read=300.0,      # read timeout (waiting for response)
        write=10.0,      # request write timeout
        pool=10.0,       # connection pool timeout
    )
)

# Or a single float (sets all timeouts to this value)
client = anthropic.Anthropic(timeout=300.0)

# Per-request override
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    messages=[{"role": "user", "content": "..."}],
    timeout=300.0,  # override just for this call
)

Fix 2 (TypeScript / Node.js)

import Anthropic from "@anthropic-ai/sdk";

// Per-client
const client = new Anthropic({
  timeout: 300 * 1000, // 300 seconds in ms
});

// Per-request
const message = await client.messages.create(
  {
    model: "claude-opus-4-7",
    max_tokens: 4096,
    messages: [{ role: "user", content: "..." }],
  },
  { timeout: 300 * 1000 }
);

Fix 3: Reduce response length

Lower max_tokens to reduce generation time, or split a large generation task into smaller sequential requests:

# Instead of one 8192-token request:
response = client.messages.create(max_tokens=8192, ...)

# Split into smaller chunks with a continuation prompt:
part1 = client.messages.create(max_tokens=2048, ...)
# Extract last paragraph of part1 and continue
part2 = client.messages.create(
    max_tokens=2048,
    messages=[
        {"role": "user", "content": original_prompt},
        {"role": "assistant", "content": part1.content[0].text},
        {"role": "user", "content": "Continue from where you left off."},
    ]
)

Retry on timeout

import anthropic
import time

client = anthropic.Anthropic()

def call_with_retry(prompt: str, max_attempts: int = 3) -> str:
    for attempt in range(max_attempts):
        try:
            message = client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=2048,
                messages=[{"role": "user", "content": prompt}],
                timeout=120.0,
            )
            return message.content[0].text
        except anthropic.APITimeoutError:
            if attempt == max_attempts - 1:
                raise
            wait = 2 ** attempt * 5  # 5s, 10s, 20s
            print(f"Timeout, retrying in {wait}s...")
            time.sleep(wait)

Vercel / serverless timeout workaround

Vercel Hobby functions have a 10s execution limit; Pro has 60s. For Claude generation that takes longer, use one of these strategies:

Edge functions with streaming — Vercel Edge supports streaming responses. Stream Claude output directly to the client to avoid the function timeout
Background job queue — accept the user request, enqueue a background job (Inngest, Upstash QStash), and poll for the result
Upgrade to Vercel Pro — increases function timeout to 300s

Related errors

overloaded_error (529) — server busy, also retryable
api_error (500) — server error mid-response
rate_limit_error (429) — too many requests

SDK exception	`APITimeoutError`
Default timeout	600 seconds
Retryable?	Yes (with backoff)
Best fix	Use streaming