Skip to content

Rate Limit Calculator

Check API rate limits for OpenAI, Anthropic, and Google models by tier. Calculate time to complete batch jobs and find bottlenecks.

FreeNo SignupNo Server UploadsZero Tracking

RPM Limit

500

requests / minute

TPM Limit

30,000

tokens / minute

Effective RPM

60

Requests / sec

1.0

Time to Complete

2.8 hours

Optimal Batch

60

TPM Bottleneck Detected

At 500 tokens/request, you will hit the 30,000 TPM limit before the 500 RPM limit. Your effective rate is 60 RPM.

ModelRPMTPMEffective RPMTime to Complete
GPT-4o50030,000602.8 hours
GPT-4o Mini500200,00040025.0 min
GPT-4.150030,000602.8 hours
GPT-4.1 Mini500200,00040025.0 min
GPT-4.1 Nano500200,00040025.0 min
o3-mini500200,00040025.0 min

Rate limits vary by tier/plan and model. Effective RPM is the lower of RPM and TPM/tokens_per_request. Real-world throughput may be lower due to network latency and response time. Check provider documentation for the latest limits.

Export

How to Use Rate Limit Calculator

  1. 1

    Select provider and tier

    Choose your API provider and your current tier or plan level.

  2. 2

    Pick your model

    Select the specific model to see its RPM (requests per minute) and TPM (tokens per minute) limits.

  3. 3

    Enter your workload

    Input total requests to process and average tokens per request.

  4. 4

    Review throughput

    See effective RPM, time to complete, optimal batch size, and whether RPM or TPM is your bottleneck.

Frequently Asked Questions

RPM (requests per minute) limits how many API calls you can make. TPM (tokens per minute) limits total tokens processed. Your effective rate is whichever limit you hit first. Small requests tend to be RPM-limited; large requests tend to be TPM-limited.

OpenAI uses a tier system based on total spend (Tier 1 at $5, up to Tier 5 at $200+). Anthropic offers Build and Scale tiers. Google has free and pay-as-you-go tiers. Higher tiers unlock significantly higher limits.

Effective RPM = min(RPM_limit, TPM_limit / tokens_per_request). It represents your actual throughput considering both limits. If your requests are large, TPM becomes the bottleneck and effective RPM drops below the RPM limit.

Rate limits change frequently as providers update tiers. We maintain current published limits but recommend checking provider documentation for the latest values, especially for new models.

Time = total_requests / effective_RPM (in minutes). This assumes sustained maximum throughput. Real-world time will be longer due to API latency, retries, and rate limit cooldowns.