Apple Intelligence 2.0: WWDC 2026 Leaks Point to On-Device Models That Rival GPT-4o-Mini

Apple’s biggest AI moment of 2026 may still be six weeks away, but the pre-WWDC leak season has already started. Developer documentation fragments and supply-chain sources reported by Bloomberg and The Information this week suggest Apple Intelligence 2.0—expected to headline WWDC on June 8—will ship a substantially upgraded on-device language model capable of handling tasks currently routed to cloud APIs.

The Benchmark Claims

According to internal documents reviewed by Bloomberg’s Mark Gurman, Apple’s new foundation model—codenamed “Ferret-2” internally—achieves scores within 8% of OpenAI’s GPT-4o-mini on the MMLU reasoning benchmark while running entirely on the Neural Engine of M4-series chips. On coding tasks measured by HumanEval, the model reportedly matches 78% of GPT-4o-mini performance with no network round-trip required.

The company processed over 2.1 billion on-device AI requests per day in March 2026, according to a figure Apple disclosed at its Q1 2026 earnings call—a number that underscores both the scale of its existing deployment and the motivation to reduce dependence on OpenAI’s servers, which Apple continues to pay for under a revenue-sharing arrangement reportedly worth $700 million annually.

Why On-Device Matters for Privacy and Latency

The strategic logic is straightforward: cloud inference introduces latency, cost, and data-exposure risk. Apple’s existing Private Cloud Compute architecture routes sensitive requests to hardened servers, but the company has consistently positioned on-device processing as the privacy gold standard.

For enterprise customers—who account for a disproportionate share of MacBook and iPhone fleet purchases—the ability to process sensitive documents, emails, and code locally without any data leaving the device removes a significant procurement objection. JAMF, one of Apple’s largest enterprise deployment partners, told analysts this month that “on-device AI assurance” is now a top-five evaluation criterion for large fleet decisions.

Competitive Pressure From Google and Microsoft

The leak lands at a competitive moment. Google expanded Gemini Nano’s on-device capabilities on Pixel 9 devices in February 2026, and Microsoft’s Phi-4 small model now ships embedded in Copilot+ PCs, handling offline tasks across Windows AI features. Apple’s reported benchmark improvements—if accurate—would represent a meaningful narrowing of the gap between what’s possible on a $999 iPhone versus a $20-per-month cloud subscription.

Analysts at Morgan Stanley estimate that a successful Apple Intelligence 2.0 rollout could reduce Apple’s OpenAI API spend by 35–40% within 12 months, improving Services segment margins by roughly 90 basis points.

What’s Still Unknown

The documentation fragments don’t address multimodal performance—image and video understanding—which remains a relative weakness in Apple’s current model. Nor do they clarify whether the upgraded model will run on older A-series chips, which still represent the majority of active iPhones globally.

Apple declined to comment on the reported benchmarks. WWDC 2026 opens June 8.

Lois Vance

Contributing writer at Clarqo, covering technology, AI, and the digital economy.