Claude Code API Pricing: Pay-Per-Token Costs Explained
When you use Claude Code with your own API key, you pay only for the tokens you consume. No monthly subscription, no usage caps. This guide covers exact per-model pricing, caching discounts, and when the API is cheaper than a subscription.
Token Pricing by Model
All prices are per 1 million tokens. Input tokens cover your prompts, code files, and conversation context. Output tokens cover Claude's responses and generated code. Cache read tokens apply when prompt caching is active.
| Model | Input / 1M | Output / 1M | Cache Read / 1M | Batch Input / 1M | Batch Output / 1M |
|---|---|---|---|---|---|
| Sonnet 4.6 | $3.00 | $15.00 | $0.30 | $1.50 | $7.50 |
| Opus 4.6 | $5.00 | $25.00 | $0.50 | $2.50 | $12.50 |
| Haiku 4.5 | $1.00 | $5.00 | $0.10 | $0.50 | $2.50 |
What Is a Token?
A token is the fundamental unit of text that language models process. Tokens are not words - they are pieces of words, punctuation, and whitespace that the model's tokenizer splits text into. For English text, one token is roughly 3/4 of a word, meaning 1,000 tokens equals approximately 750 words. Code tends to tokenize less efficiently than prose because of special characters, indentation, and variable names.
To put this in developer-relevant terms: a typical 100-line TypeScript file is approximately 500-800 tokens. A full React component with JSX might be 300-600 tokens. A package.json file is usually 200-400 tokens. When Claude Code reads your codebase, every file it examines consumes input tokens, and every line of code it writes back consumes output tokens.
The distinction between input and output tokens matters because output tokens cost roughly 5x more across all models. This is because generating new tokens requires significantly more computation than processing existing ones. In a typical Claude Code session, you might send 25,000 input tokens (prompts plus file reads plus conversation history) and receive 8,000 output tokens (generated code and explanations). With Sonnet 4.6, that session costs about $0.075 for input and $0.12 for output, totaling roughly $0.20.
# Approximate token counts
100-line TS file = 500-800 tokens
React component = 300-600 tokens
package.json = 200-400 tokens
1,000 words = ~1,333 tokens
Full codebase scan = 50K-200K tokens
Input vs Output Tokens: Why Output Costs 5x More
The 5x cost difference between input and output tokens reflects the computational difference between reading and writing. Processing input tokens (your code and prompts) is relatively efficient because the model processes them in parallel. Generating output tokens (Claude's code and responses) requires sequential computation where each token depends on all previous tokens, making it much more expensive.
Context accumulation is the biggest hidden cost driver in Claude Code. Every message you send includes the entire conversation history as input tokens. A 3-message conversation might send 5,000 input tokens per message. By message 30, you could be sending 100,000 input tokens per message because the full conversation context is resent each time. This is why long conversations cost exponentially more than starting fresh sessions.
The practical impact: a 5-message session might cost $0.05 total, but extending that same session to 50 messages could cost $3.00 or more, not because each message costs more but because the cumulative context grows. This is why the /compact command and starting fresh sessions are among the most effective cost-saving techniques for API users.
Prompt Caching: 90% Savings on Repeated Content
Prompt caching is one of the most powerful cost-saving features available on the Claude API. When enabled, content that remains the same across requests (system prompts, CLAUDE.md files, previously read file contents) is cached and subsequent reads cost only 10% of the normal input token rate. For Sonnet 4.6, cached tokens cost $0.30 per 1M instead of $3.00.
Claude Code takes advantage of prompt caching automatically in many scenarios. Your CLAUDE.md project file, which gets sent with every message, is cached after the first request. File contents that Claude reads remain cached for the duration of your session. The system prompt that configures Claude Code's behavior is also cached.
The savings compound over longer sessions. In a 20-message conversation where your CLAUDE.md is 2,000 tokens and you reference the same 5 files (totaling 10,000 tokens), those 12,000 tokens are cached after the first message. Over 19 subsequent messages, you save approximately $0.60 with Sonnet 4.6 on the cached content alone. For developers with large CLAUDE.md files or projects that repeatedly reference the same modules, prompt caching can reduce input costs by 40-60% over a full working day.
API vs Subscription: Break-Even Analysis
The key question for many developers: when does a subscription become cheaper than the API? Anthropic reports that the average Claude Code API user spends roughly $6 per day. Over 22 working days, that is approximately $132/month, making the Max 5x plan at $100/month a clear saving of $32/month for average-usage developers.
The break-even point for Pro ($20/month) is about $1/day in API costs, which corresponds to very light usage - roughly 2-3 short sessions daily using Sonnet 4.6. If you consistently spend more than $1/day on the API, Pro is cheaper. If you consistently spend more than $4.50/day (roughly $100/month for 22 working days), Max 5x becomes the better deal.
| Daily API Spend | Monthly (22 days) | Cheapest Option | Monthly Savings |
|---|---|---|---|
| $0.50 | $11 | API ($11) | - |
| $1.00 | $22 | Pro ($20) | $2 |
| $3.00 | $66 | Pro ($20) | $46 |
| $6.00 | $132 | Max 5x ($100) | $32 |
| $10.00 | $220 | Max 20x ($200) | $20 |
| $15.00 | $330 | Max 20x ($200) | $130 |
Batch API: 50% Off for Non-Urgent Work
The Claude batch API processes requests within a 24-hour window instead of providing real-time responses. In exchange for this flexibility, every token costs 50% less. Sonnet 4.6 drops from $3/$15 to $1.50/$7.50 per million tokens. Opus 4.6 drops from $5/$25 to $2.50/$12.50.
For Claude Code users, the batch API is ideal for tasks where you do not need immediate results. Common batch-friendly workflows include: running code reviews on pull requests overnight, generating documentation for your codebase, writing test suites for existing code, performing security audits, and analyzing large codebases for refactoring opportunities. If you can submit the work at the end of your day and review results the next morning, you save 50% on every token.
To use the batch API, you submit requests programmatically through the Anthropic API rather than through the interactive Claude Code CLI. This requires some setup but the cost savings are substantial for teams with predictable, non-interactive workloads.
Setting Up API Access for Claude Code
Using Claude Code with your own API key is straightforward. First, create an account at console.anthropic.com and generate an API key. Then set the ANTHROPIC_API_KEY environment variable in your shell configuration. When you run the claude command in your terminal, it will detect the API key and bill usage directly to your API account instead of requiring a subscription.
# 1. Get your API key from console.anthropic.com
# 2. Set the environment variable
export ANTHROPIC_API_KEY=sk-ant-...
# 3. Run Claude Code
claude
# 4. Check your costs
/cost
Frequently Asked Questions
How much does a typical Claude Code session cost on the API?
A typical Claude Code session on the API costs between $0.10 and $2.00 depending on length and model choice. A quick bug fix with Sonnet 4.6 might cost $0.02-$0.05, while a deep refactoring session with Opus 4.6 could run $4-$10. Anthropic reports the average daily spend across API users is roughly $6, suggesting most developers have moderate usage that totals around $130/month for 22 working days.
What is the difference between input and output tokens in Claude Code?
Input tokens are everything you send to Claude, including your prompt, file contents Claude reads, and the entire conversation history. Output tokens are what Claude generates in response, such as code, explanations, and tool calls. Output tokens cost approximately 5x more than input tokens because they require more computation. In a typical Claude Code session, about 60-70% of tokens are input and 30-40% are output, but output tokens drive most of the cost.
How does prompt caching work with Claude Code?
Prompt caching in Claude Code automatically caches frequently used context like system prompts, CLAUDE.md files, and repeated file reads. Cached input tokens cost 90% less than regular input tokens. When Claude Code reads the same file multiple times in a session, subsequent reads use cached tokens at the reduced rate. This is particularly valuable for long sessions where the same codebase context is sent with every message.
When should I use the API instead of a Claude Code subscription?
Use the API when your usage is sporadic or unpredictable. If you only use Claude Code a few times a week, the API might cost $10-$30/month compared to $20/month for Pro. The API is also better if you want to use Haiku 4.5 for simple tasks at $1/$5 per 1M tokens, significantly cheaper than any subscription. However, if you use Claude Code daily for moderate work, the subscription is almost always cheaper because the average API user spends roughly $6/day.
What is the Claude batch API and how much does it save?
The Claude batch API lets you submit requests that are processed within 24 hours instead of in real time. In exchange for the delayed response, you get a 50% discount on all token costs. This makes Sonnet 4.6 cost just $1.50/$7.50 per 1M tokens instead of $3/$15. Batch API is ideal for non-urgent tasks like code reviews, documentation generation, test writing, and codebase analysis where you do not need immediate results.
Not Sure Which Pricing Model Is Right?
Our calculator estimates your monthly cost on both API and subscription plans.