1 Million Token Context — But Don't Skip the Cache


Anthropic just made the full 1M context window generally available for Opus 4.6 and Sonnet 4.6. No beta header, no long-context premium — a 900k token request is billed at the same per-token rate as a 9k one. A-noice!

That’s genuinely great news. You can now throw an entire codebase, thousands of pages of docs, or a full agent conversation history into a single request and it just works. No chunking, no lossy summarisation, no context clearing hacks.

But let’s do some napkin maths on those chat turns…

The scenario

You’ve got a beefy 900k token context — maybe your codebase, your docs, a database schema, the lot. You want to have a 5-turn conversation with Claude Opus 4.6 about it. Each turn you send a short message (~500 tokens) and get back a decent response (~2k tokens).

Opus 4.6 pricing:

  • Input: $5 / million tokens
  • Output: $25 / million tokens
  • Cache write (5 min): $6.25 / million tokens (1.25x input)
  • Cache read: $0.50 / million tokens (0.1x input)

Without caching

Every turn you re-send that full 900k context as fresh input. The API is stateless — it doesn’t remember your last turn. You’re paying full input price every time.

TurnInput tokensInput costOutput costTurn total
1900,500$4.50$0.05$4.55
2903,000$4.52$0.05$4.57
3905,500$4.53$0.05$4.58
4908,000$4.54$0.05$4.59
5910,500$4.55$0.05$4.60
Total~$22.89

That’s nearly $23 for a 5-turn chat. The context is doing all the damage — your actual messages are rounding errors.

With caching

Turn 1 writes the 900k context to cache (1.25x). Turns 2–5 read from cache (0.1x). You only pay full input price on the new tokens each turn.

TurnCache costNew input costOutput costTurn total
1 (write)$5.63$0.00$0.05$5.68
2 (read)$0.45$0.01$0.05$0.51
3 (read)$0.45$0.02$0.05$0.52
4 (read)$0.45$0.03$0.05$0.53
5 (read)$0.45$0.04$0.05$0.54
Total~$7.78

That’s a 66% saving. And it gets better the more turns you take — every additional turn costs roughly 50 cents instead of $4.55.

The takeaway

The flat rate is genuinely brilliant. No penalty for using the full window. But flat rate × big number is still a big number.

The rate is flat. The bill is not.

Prompt caching isn’t an optimisation here — it’s basically mandatory. One cache_control parameter and you break even after a single turn. Skip it and you’re paying for 900k tokens of the same context five times over.

Anthropic also released “Automatic caching” a while back, so you can drop the cache_control property of {"type": "ephemeral"} at the root of you prompt call now (see )

So go wild

Load your entire codebase. Throw in your database schema. Add the docs, the README, the deployment config, every skill file you can find. Have a proper conversation about it all — the 1M window means Claude can actually hold all of that and reason across it without forgetting what it read on page one.

Just turn on prompt caching first!


Links: