ClaudeCodePricing.com is an independent pricing guide. We are not affiliated with, endorsed by, or connected to Anthropic, Claude, Claude Code, or any AI vendor. All pricing data is sourced from publicly available information and may change without notice.

Updated April 2026

Claude Code Usage Limits: What You Get on Each Plan

Claude Code uses a rolling 5-hour window for usage limits, not a monthly quota. Understanding how this works helps you choose the right plan and avoid unexpected throttling during critical work sessions.

How Usage Limits Work

Claude Code subscription plans use a rolling 5-hour usage window rather than a daily or monthly message quota. The system continuously tracks your usage over the most recent 5 hours and compares it against your plan's limit. As older messages age beyond the 5-hour mark, that capacity becomes available again.

There are two types of limits running simultaneously: an all-models limit and a Sonnet-only limit. The all-models limit covers total usage across any model. The Sonnet-only limit is typically higher, allowing more Sonnet usage even when your all-models quota is reached. This design encourages using Sonnet for most tasks and reserving Opus for when you specifically need its capabilities.

Usage is measured in "usage units" rather than simple message counts. A short question consumes fewer units than a message that asks Claude Code to read 10 files and generate a comprehensive refactor plan. The approximate message counts published (45 for Pro, 225 for Max 5x, 900 for Max 20x) assume typical message complexity. Your actual message count before hitting the limit will vary based on how complex your interactions are.

Limits by Plan

PlanApprox Msgs / 5hrUsage MultiplierContext WindowMonthly Price
Pro~451x200K$20
Max 5x~2255x1M$100
Max 20x~90020x1M$200

Message counts are approximate and assume typical coding interactions. Complex operations consume more usage units per message.

What Happens When You Hit the Limit

The most important thing to know: you are never charged overage fees on a subscription plan. When you hit your limit, the system responds differently depending on your plan tier and how far over the limit you are.

On Pro: When you approach your limit, Claude Code may redirect you to use Haiku instead of Sonnet, resulting in faster but potentially less capable responses. If you exceed the limit further, responses may be delayed by several minutes, or you may be temporarily paused until capacity frees up in your rolling window. The experience can feel frustrating during active coding sessions, which is why many developers who hit Pro limits regularly upgrade to Max.

On Max plans: Throttling is more gradual. You maintain access to your chosen model for longer, and the degradation is gentler - slightly slower responses rather than model downgrades or hard pauses. The 5x and 20x multipliers mean most developers on Max plans rarely hit limits during normal working hours.

On API: Rate limits are separate and work differently. You have requests per minute (RPM) and tokens per minute (TPM) limits based on your account tier. Exceeding these returns HTTP 429 errors. Your application needs to implement retry logic with exponential backoff. API rate limits are not about monthly cost but about burst throughput.

API Rate Limits

API rate limits are separate from subscription usage limits and are based on your account tier. Higher tiers unlock more throughput. Tiers increase automatically with spending history.

TierRequests/MinTokens/Min (Input)Tokens/Min (Output)
Tier 1 (New)5040,0008,000
Tier 21,00080,00016,000
Tier 32,000160,00032,000
Tier 44,000400,00080,000

Approximate values for Sonnet 4.6. Limits vary by model. Check console.anthropic.com for your current tier.

Tips to Stay Within Limits

The most effective way to stay within usage limits is to keep sessions efficient. Use /compact every 8-10 messages to compress context and reduce the usage units per message. Start fresh conversations when switching tasks rather than extending a single session. Be specific in your prompts to avoid unnecessary back-and-forth.

Plan your heavy usage strategically. If you know you have a complex refactoring task, start it when your 5-hour window is fresh. Avoid starting intensive tasks near the end of a heavy usage period when you might hit limits mid-task. The /cost command shows your current usage level, helping you decide whether to start a new intensive task or wait for capacity to free up.

For teams, stagger intensive Claude Code usage across team members when possible. If five developers all run heavy sessions simultaneously, the per-developer limits are not affected (each has independent limits), but coordinating can help ensure that the most critical tasks get unthrottled access when needed.

Understanding Limit Resets

The rolling window means your limits recover gradually, not all at once. Imagine you are on Pro and you send 45 messages between 9:00am and 9:30am, hitting your limit. Here is how recovery works:

  • 9:30am - Limit reached. Throttling begins.
  • 12:00pm - Messages from 9:00-9:05am age out. Small capacity restored.
  • 1:00pm - Messages from 9:00-9:15am aged out. More capacity.
  • 2:00pm - All 9:00am messages aged out. Significant capacity restored.
  • 2:30pm - All messages from the burst session aged out. Full capacity.

The practical takeaway: if you burn through your limits in an intensive burst, expect to wait 4-5 hours for full recovery. If you spread usage across the 5-hour window, you can maintain steady throughput without ever hitting hard limits. This is why many developers prefer medium-length sessions spaced throughout the day over one marathon session.

Frequently Asked Questions

How does the Claude Code 5-hour rolling window work?

The 5-hour rolling window tracks your usage over the most recent 5 hours continuously, not in fixed blocks. If you send 45 messages between 9am and 10am on the Pro plan, you hit your limit. But as time passes, older messages age out of the window. By 2pm, those messages are outside the 5-hour window and your capacity is fully restored. The window rolls continuously, so if you space your usage across 5 hours instead of concentrated in one hour, you can use Claude Code throughout the day without hitting limits.

What happens when I reach my Claude Code usage limit?

When you reach your usage limit on a subscription plan, you are never charged extra. Instead, Claude Code may throttle your responses (making them slower), redirect you to a lighter model like Haiku, or temporarily pause responses until capacity frees up in your rolling window. On the Pro plan, throttling is more aggressive - you might wait several minutes between responses or be limited to shorter responses. On Max plans, throttling is more gradual and you maintain access to your chosen model for longer.

Are Claude Code usage limits based on messages or tokens?

Usage limits are measured in usage units, not simple message counts. The approximate message numbers (45 for Pro, 225 for Max 5x, 900 for Max 20x) assume typical message complexity. Longer messages that involve reading large files or generating extensive code consume more usage units per message than short questions. This means a session of complex multi-file operations will hit the limit faster than a session of quick questions. The /cost command in Claude Code can help you track your current usage level.

How do Claude Code API rate limits work?

API rate limits are separate from subscription usage limits and are measured in requests per minute (RPM) and tokens per minute (TPM). New API accounts start at Tier 1 with lower limits, and you can increase your tier by maintaining a consistent spending history. Tier 1 allows 50 RPM and 40,000 TPM for Sonnet. Tier 4 allows 4,000 RPM and 400,000 TPM. Rate limit increases happen automatically based on your account age and spending, or you can request a tier increase through the Anthropic console.

Can I check my current usage level in Claude Code?

Yes, use the /cost command in Claude Code to see your current session costs and usage level. For subscription users, this shows how much of your current 5-hour window capacity you have consumed. For API users, it shows the total tokens and cost for your current session. Monitoring your usage regularly helps you develop an intuition for how quickly different types of tasks consume your allocation, making it easier to plan your work within the limits.