Anthropic Ships Claude Opus 4.7: Extended Thinking Hits New Benchmark Ceiling\n\nAnthropic on Thursday released Claude Opus 4.7, the latest iteration of its flagship model, with a headline claim that is difficult to dismiss: the model achieves a 92.3% pass rate on SWE-Bench Verified, the industry’s most closely watched agentic software engineering benchmark — a jump of nearly 7 percentage points over Claude Opus 4.5 (Anthropic model card, April 24, 2026). For a research lab that has consistently positioned safety and reliability over raw capability numbers, the timing — one day after Mistral Large 3 undercut frontier pricing — is pointed.\n\n## What Changed Under the Hood\n\nThree architectural updates drive the headline numbers:\n\n- Extended Thinking v2. Anthropic overhauled the model’s internal chain-of-thought mechanism, allowing dynamic compute budget allocation across sub-problems within a single prompt rather than applying a flat thinking token cap. In internal benchmarks, this reduces reasoning errors on multi-step coding and mathematics by roughly 31% compared to Claude Opus 4.5 with the same output token budget.\n\n- 200K token native context with improved retrieval fidelity. Prior Opus versions showed measurable degradation on needle-in-a-haystack retrieval tasks past the 100K mark. Anthropic’s release notes show Opus 4.7 holding above 94% recall accuracy at 190K tokens, attributed to changes in attention head routing rather than post-hoc context compression.\n\n- Improved tool-call reliability. On Terminal-Bench Agentic — a benchmark testing whether models complete 90-step autonomous CLI workflows without human intervention — Opus 4.7 completes 78% of tasks end-to-end, versus 61% for Claude Opus 4.5 and 54% for GPT-5 Turbo on the same eval (Anthropic technical report, April 2026).\n\n## Pricing and Availability\n\nAnthropic is keeping the premium tier intact: Claude Opus 4.7 launches at $15 per million input tokens and $75 per million output tokens through the Anthropic API, unchanged from Opus 4.5. The message is deliberate — Anthropic is not competing on price with Mistral or Meta’s Llama 4 family; it is betting that long-horizon agent reliability is worth the delta.\n\nFor enterprise buyers on AWS Bedrock or Google Cloud Vertex AI, Opus 4.7 is available immediately under existing contracts. Anthropic is also shifting Claude Haiku 4.5 to the default model for Claude.ai consumer plans, reserving Opus 4.7 for API and enterprise workloads.\n\n## The Reliability Premium Holds — For Now\n\nThe debate Mistral’s Large 3 launch crystallised yesterday is not going away: how much of frontier-tier pricing is justified by capability, and how much is pricing power that competition will erode? Opus 4.7’s SWE-Bench numbers give Anthropic a concrete answer for the next 90 days.\n\nBut the window is closing. Google DeepMind’s next Gemini release is widely expected before Q3 2026, and Meta has publicly committed to closing the agentic task gap with a Llama 4 fine-tune explicitly targeting SWE-Bench. For enterprise platform owners running multi-model pipelines, the practical question is no longer which model is best — it is which model belongs at each task tier in a hybrid routing architecture. Opus 4.7 belongs at the top for complex agent workflows. Whether it belongs at $75 per million output tokens depends on the CFO’s spreadsheet.

Lois Vance

Contributing writer at Clarqo, covering technology, AI, and the digital economy.

Anthropic Ships Claude Opus 4.7: Extended Thinking Hits New Benchmark Ceiling