What are the Claude API rate limits?

Rate limits depend on your usage tier. Free tier: 5 RPM. Build tier: 1000 RPM / 80,000 TPM for claude-sonnet. Enterprise tiers have higher limits. Check your Anthropic Console for exact limits and usage.

How do I check my remaining Claude API rate limit?

Every API response includes rate limit headers: x-ratelimit-limit-requests, x-ratelimit-remaining-requests, x-ratelimit-limit-tokens, x-ratelimit-remaining-tokens, x-ratelimit-reset-requests, x-ratelimit-reset-tokens.

rate_limit_error

Q: What is rate_limit_error in Claude API?

rate_limit_error (HTTP 429) means you exceeded your account's requests-per-minute (RPM) or tokens-per-minute (TPM) limit. Implement exponential backoff and reduce request concurrency.

HTTP 429 Retryable Account Limit

You've exceeded your account's requests-per-minute (RPM) or tokens-per-minute (TPM) limit. Unlike overloaded_error, this is specific to your API key/account. Reduce concurrency and implement proper backoff.

What the error looks like

{
  "type": "error",
  "error": {
    "type": "rate_limit_error",
    "message": "Number of request tokens has exceeded your per-minute rate limit (https://docs.anthropic.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."
  }
}

Rate limit headers (read these!)

Every API response includes headers showing your current limits and usage:

x-ratelimit-limit-requests: 1000
x-ratelimit-remaining-requests: 247
x-ratelimit-limit-tokens: 80000
x-ratelimit-remaining-tokens: 12450
x-ratelimit-reset-requests: 2026-05-18T22:01:00Z
x-ratelimit-reset-tokens: 2026-05-18T22:00:45Z
retry-after: 23

Parse these headers proactively to throttle before hitting the limit.

Rate limit tiers (Claude API)

Limits are per-model. Verify exact limits in your Anthropic Console.

Tier	RPM (Sonnet)	TPM (Sonnet)
Free	5	25,000
Build ($5 spent)	1,000	80,000
Build ($100 spent)	2,000	160,000
Scale / Enterprise	Custom	Custom

Fix: Proactive throttling (Python)

import anthropic
import time

client = anthropic.Anthropic(api_key="your-key", max_retries=4)

def smart_call(messages, model="claude-sonnet-4-6"):
    """Reads rate-limit headers and sleeps proactively."""
    response = client.messages.create(
        model=model,
        max_tokens=1024,
        messages=messages,
    )
    return response

# Use a semaphore to cap concurrent requests
import asyncio
import anthropic

sem = asyncio.Semaphore(5)   # max 5 concurrent requests

async def rate_limited_call(async_client, prompt):
    async with sem:
        return await async_client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=512,
            messages=[{"role": "user", "content": prompt}],
        )

async def batch_process(prompts):
    async_client = anthropic.AsyncAnthropic()
    tasks = [rate_limited_call(async_client, p) for p in prompts]
    return await asyncio.gather(*tasks, return_exceptions=True)

Fix: Token-aware throttling (TypeScript)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ maxRetries: 4 });

// Track remaining tokens from response headers
let remainingTokens = Infinity;

async function throttledCall(prompt: string) {
  // If we're low on tokens, wait for the reset
  if (remainingTokens < 5000) {
    console.log("Low on tokens, waiting 10s...");
    await new Promise(r => setTimeout(r, 10_000));
  }

  const response = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 1024,
    messages: [{ role: "user", content: prompt }],
  });

  // Parse headers (available on the raw response)
  // @ts-ignore — access via response._request.response
  // Use the SDK's on("response") hook for production use
  return response;
}

FAQ

How do I check my current rate limits?

Go to console.anthropic.com → Settings → Limits. Or read the x-ratelimit-* headers on any API response.

How do I request higher rate limits?

Spend more (tier advancement is automatic above spend thresholds) or contact Anthropic sales for custom enterprise limits.

Why am I hitting TPM limits with few requests?

Long prompts or large max_tokens values consume tokens fast. Reduce prompt length, use max_tokens conservatively, or switch to a model with higher TPM limits.

Can I use the Batch API to avoid rate limits?

Yes — the Message Batches API has separate (higher) limits and is 50% cheaper. Use it for workloads that can tolerate 24h latency.