Anthropic just made the full 1M context window generally available for Opus 4.6 and Sonnet 4.6. No beta header, no long-context premium — a 900k token request is billed at the same per-token rate as a 9k one. A-noice!
That’s genuinely great news. You can now throw an entire codebase, thousands of pages of docs, or a full agent conversation history into a single request and it just works. No chunking, no lossy summarisation, no context clearing hacks.
But let’s do some napkin maths on those chat turns…
The scenario
You’ve got a beefy 900k token context — maybe your codebase, your docs, a database schema, the lot. You want to have a 5-turn conversation with Claude Opus 4.6 about it. Each turn you send a short message (~500 tokens) and get back a decent response (~2k tokens).
Opus 4.6 pricing:
- Input: $5 / million tokens
- Output: $25 / million tokens
- Cache write (5 min): $6.25 / million tokens (1.25x input)
- Cache read: $0.50 / million tokens (0.1x input)
Without caching
Every turn you re-send that full 900k context as fresh input. The API is stateless — it doesn’t remember your last turn. You’re paying full input price every time.
| Turn | Input tokens | Input cost | Output cost | Turn total |
|---|---|---|---|---|
| 1 | 900,500 | $4.50 | $0.05 | $4.55 |
| 2 | 903,000 | $4.52 | $0.05 | $4.57 |
| 3 | 905,500 | $4.53 | $0.05 | $4.58 |
| 4 | 908,000 | $4.54 | $0.05 | $4.59 |
| 5 | 910,500 | $4.55 | $0.05 | $4.60 |
| Total | ~$22.89 |
That’s nearly $23 for a 5-turn chat. The context is doing all the damage — your actual messages are rounding errors.
With caching
Turn 1 writes the 900k context to cache (1.25x). Turns 2–5 read from cache (0.1x). You only pay full input price on the new tokens each turn.
| Turn | Cache cost | New input cost | Output cost | Turn total |
|---|---|---|---|---|
| 1 (write) | $5.63 | $0.00 | $0.05 | $5.68 |
| 2 (read) | $0.45 | $0.01 | $0.05 | $0.51 |
| 3 (read) | $0.45 | $0.02 | $0.05 | $0.52 |
| 4 (read) | $0.45 | $0.03 | $0.05 | $0.53 |
| 5 (read) | $0.45 | $0.04 | $0.05 | $0.54 |
| Total | ~$7.78 |
That’s a 66% saving. And it gets better the more turns you take — every additional turn costs roughly 50 cents instead of $4.55.
The takeaway
The flat rate is genuinely brilliant. No penalty for using the full window. But flat rate × big number is still a big number.
The rate is flat. The bill is not.
Prompt caching isn’t an optimisation here — it’s basically mandatory. One cache_control parameter and you break even after a single turn. Skip it and you’re paying for 900k tokens of the same context five times over.
Anthropic also released “Automatic caching” a while back, so you can drop the cache_control property of {"type": "ephemeral"} at the root of you prompt call now (see )
So go wild
Load your entire codebase. Throw in your database schema. Add the docs, the README, the deployment config, every skill file you can find. Have a proper conversation about it all — the 1M window means Claude can actually hold all of that and reason across it without forgetting what it read on page one.
Just turn on prompt caching first!
Links: