overloaded_error

HTTP 529 Transient Retryable

Claude's servers are temporarily at capacity and cannot accept your request. This is not a problem with your code — it's a transient Anthropic server state. Wait and retry.

What the error looks like

{
  "type": "error",
  "error": {
    "type": "overloaded_error",
    "message": "Overloaded"
  }
}

HTTP status: 529. The response body always contains "type": "overloaded_error".

Why it happens

  • Anthropic's inference servers are at peak capacity
  • Spikes in global API usage (model releases, viral demos, batch jobs)
  • Your request hit a server that was already at its queue limit
Unlike rate_limit_error, this is not your fault — it affects all API users. The SDK retries it by default.

Fix: Exponential Backoff (Python)

The Anthropic Python SDK retries 529 automatically (default: 2 retries). Raise max_retries for high-availability workloads:

import anthropic
import time

client = anthropic.Anthropic(
    api_key="your-key",
    max_retries=5,           # SDK handles backoff automatically
)

# Manual backoff if you need full control:
def call_with_backoff(prompt, max_attempts=6):
    delay = 30
    for attempt in range(max_attempts):
        try:
            return client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                messages=[{"role": "user", "content": prompt}],
            )
        except anthropic.APIStatusError as e:
            if e.status_code == 529 and attempt < max_attempts - 1:
                print(f"Overloaded, waiting {delay}s (attempt {attempt+1})")
                time.sleep(delay)
                delay = min(delay * 2, 300)   # cap at 5 min
            else:
                raise

Fix: Exponential Backoff (TypeScript)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
  maxRetries: 5,    // SDK retries 529 automatically
});

// Manual control if needed:
async function callWithBackoff(prompt: string, maxAttempts = 6) {
  let delay = 30_000; // 30s
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    try {
      return await client.messages.create({
        model: "claude-sonnet-4-6",
        max_tokens: 1024,
        messages: [{ role: "user", content: prompt }],
      });
    } catch (err: any) {
      if (err?.status === 529 && attempt < maxAttempts - 1) {
        console.log(`Overloaded, waiting ${delay/1000}s`);
        await new Promise(r => setTimeout(r, delay));
        delay = Math.min(delay * 2, 300_000); // cap 5 min
      } else throw err;
    }
  }
}

Check the Retry-After header

import requests

response = requests.post(
    "https://api.anthropic.com/v1/messages",
    headers={"x-api-key": "...", "anthropic-version": "2023-06-01"},
    json={...}
)

if response.status_code == 529:
    retry_after = int(response.headers.get("Retry-After", 60))
    print(f"Retry after {retry_after} seconds")
    time.sleep(retry_after)

FAQ

Does the SDK retry overloaded_error automatically?
Yes. The official Anthropic Python and TypeScript SDKs retry 529 errors up to 2 times with backoff by default. Set max_retries=5 for production workloads.
overloaded_error vs rate_limit_error — what's the difference?
overloaded_error (529) = Anthropic's global capacity is full, affects all users. rate_limit_error (429) = you specifically exceeded your account's RPM/TPM limit. Both need backoff.
How long until overloaded_error resolves?
Usually 30–120 seconds. Sustained overload during major incidents can last longer — check status.anthropic.com for active incidents.
Can I avoid overloaded_error by using a different model?
Overload is per-model. If claude-sonnet is overloaded, claude-haiku may be available. Add a fallback model if uptime is critical.