I was halfway through writing a script to chunk a 340-page legal contract into 8K-token segments when Sandra pinged me on Teams. "Check Claude blog. Right now." I closed the chunking script, opened the post, read it twice, and then deleted the script entirely. Anthropic just made the full 1 million token context window generally available for both Claude Opus 4.6 and Sonnet 4.6 — at standard pricing, with no long-context premium, and no beta header required.
Let me say that again because it genuinely matters: a 900K-token request now costs the same per-token rate as a 9K-token request. The era of paying a multiplier for long context is over, at least on the Anthropic side of things.
What Actually Changed on March 13
The official announcement is refreshingly straightforward, which is rare for an AI company blog post. Here is the summary:
Standard pricing across the full window. Opus 4.6 stays at $5/$25 per million tokens (input/output). Sonnet 4.6 stays at $3/$15. No multiplier at any context length. This is a significant shift from the beta period, where requests over 200K tokens carried a premium that made long-context work prohibitively expensive for most teams.
Full rate limits everywhere. Your standard account throughput applies whether you are sending 10K tokens or 950K tokens. During the beta, long-context requests were throttled separately, which meant you could technically have access to 1M context but only use it three times per minute.
600 images or PDF pages per request. Up from 100. This one flew under the radar in most of the coverage I have seen, but it is enormous for document-heavy workflows. Derek, who runs our compliance team and regularly needs to cross-reference 400-page regulatory filings, nearly fell out of his chair when I told him. "You mean I can just... upload the whole thing?" Yes, Derek. You can just upload the whole thing.
No beta header required. Requests over 200K tokens just work now. If you have been sending the anthropic-beta: max-tokens-3-5-sonnet-2024-07-15 header, it gets silently ignored. No code changes needed.
The MRCR Score Nobody Is Talking About
Buried in the announcement is a benchmark number that deserves more attention: Opus 4.6 scores 78.3% on MRCR v2 (Multi-turn Retrieval with Contextual Reasoning), which Anthropic claims is the highest among frontier models at that context length. For context (pun intended), this is a test that measures whether a model can find specific information buried deep in a massive context window and then reason about it correctly.
I have been running my own informal tests since the announcement dropped at around 10 AM EST yesterday. I loaded a 847K-token codebase — the entire backend of a SaaS product I consult for — into a single Opus 4.6 conversation and asked it to find a specific race condition I had spent three days tracking down last month. It found it in 14 seconds. Not approximately where it was. The exact file, the exact line, and a correct explanation of why the mutex was being released too early.
Tom, my perpetual skeptic, immediately tried to break it. He loaded 900K tokens of mixed content — code, legal documents, and random Wikipedia articles — and asked the model to find a specific clause in a contract that was sandwiched between two unrelated codebases. It got it right. He then asked a follow-up question that required connecting information from three different documents separated by 300K tokens of unrelated content. It got that right too. Tom has not said anything nice about an AI product since 2024. His exact words were "okay fine, that is actually useful."
Why This Matters for Claude Code Users
The announcement specifically mentions Claude Code, Anthropic's CLI coding tool. For Max, Team, and Enterprise users, Opus 4.6 sessions now use the full 1M context window automatically. The practical impact: fewer compactions.
If you have used Claude Code for any serious project, you know the pain of compaction — that moment when the model silently summarizes and discards earlier parts of the conversation because the context window filled up. It is lossy, unpredictable, and it always seems to throw away the one piece of context you actually needed. With 1M context, a typical coding session can run for hours before hitting the window limit.
I tested this yesterday by running a 3.5-hour Claude Code session refactoring a payment processing module. In previous sessions of similar length, I would hit compaction 4-6 times. This time: zero compactions. The model remembered a design decision I made in the first five minutes of the session when I referenced it two hours later. That alone is worth the upgrade for anyone doing serious development work.
The Competitive Landscape Just Shifted
Google's Gemini has offered 1M+ context windows for a while, but with variable performance at the edges and pricing that scales with context length. OpenAI's models top out at 128K tokens for most use cases. Anthropic offering 1M at flat-rate pricing with what appears to be genuinely reliable retrieval at scale is a meaningful competitive move.
Sandra spent about $47 running comparison tests across all three providers yesterday (she expensed it, naturally). Her conclusion: "Gemini handles more raw tokens, Claude handles them better, and GPT handles fewer of them more expensively." That is an oversimplification, but it captures the current state of things reasonably well.
What This Enables That Was Not Practical Before
The combination of flat pricing and reliable long-context retrieval opens up workflows that were previously either too expensive or too unreliable:
Full codebase analysis. Most production codebases fit comfortably in 1M tokens. You can now have a conversation about your entire codebase without chunking, RAG pipelines, or manual context management.
Complete document review. Legal contracts, regulatory filings, patent applications — the kind of documents where missing a single clause can cost millions. Loading the entire document eliminates the risk of chunking boundaries hiding relevant information.
Long-running agent traces. AI agents that make tool calls, observe results, and reason across hundreds of steps can now maintain their full conversation history. No more agents that forget what they did 50 steps ago.
The engineering work, lossy summarization, and context clearing that long-context work previously required are, to quote the Anthropic blog post directly, "no longer needed." Whether that proves true at scale remains to be seen, but the early signs are promising.
The Bottom Line
This is one of those quiet announcements that changes more than it seems. A million tokens of reliable, flat-rate context is not just a bigger number — it is a different category of capability. The chunking scripts, the RAG pipelines built specifically to work around context limits, the careful prompt engineering to fit everything into 128K tokens — a lot of that infrastructure just became optional.
I deleted my chunking script. Tom admitted something was useful. Derek can upload his regulatory filings without a preprocessing pipeline. These are small victories, but they are real ones.
Related Reading
For more on AI capabilities, explore the 8 levels of agentic engineering and how AI agents work while you sleep.
Related Reading
If you found this useful, check out these related articles: