Two years ago, running GPT-4 cost developers roughly $30 per million tokens. Today, equivalent intelligence can be purchased for less than a dollar — in some cases, fractions of a cent. The collapse in AI pricing is one of the fastest cost deflations in the history of commercial software, and it is reshaping how businesses are built, staffed, and valued.
How the Floor Fell Out
The price war began in earnest in late 2024, when DeepSeek — a Chinese AI lab — released open-weight models that matched frontier performance at a fraction of the training cost its rivals had spent. The move forced every major provider to reconsider their pricing architecture.
Anthropic cut Claude prices by 67% in a single announcement. OpenAI followed. Google made Gemini Flash available at near-zero marginal cost. The implicit gentlemen’s agreement to keep frontier pricing high had broken down, and it was not coming back.
The story did not end at the API layer. Engineering improvements compounded the effect. Techniques like quantization, speculative decoding, KV-cache optimization, and batching efficiency have allowed the same underlying model to be served three to five times more efficiently than was possible eighteen months ago — through software alone, with no new hardware required.
Winners, Losers, and the Startup Equation
For startups, the pricing collapse is almost unambiguously good news. Building an AI-native product in 2024 meant projecting scary inference bills into every financial model. Building one in 2026 means treating AI compute as a line item closer to cloud storage than to specialized hardware.
The losers are primarily incumbents who built pricing moats around API access. Middleware companies that charged a premium for “managed” access to frontier models — with no proprietary data or workflow on top — are facing existential pressure as the underlying commodity approaches zero.
For enterprise buyers, the dynamic is more nuanced. Price dropping by 97% means the argument “AI is too expensive to deploy at scale” has evaporated. The new objection is risk: hallucinations, data sovereignty, regulatory compliance, and the organizational change required to embed AI into production workflows. The bottleneck has shifted from cost to competence.
The Open-Source Wildcard
What makes this price war structurally different from previous commodity technology cycles is the presence of high-quality open-source alternatives. Llama 3, Qwen, Mistral, and DeepSeek variants now match or approach GPT-4 on most standard benchmarks. For organizations willing to run their own inference infrastructure, the marginal API cost is not cheap — it is zero.
The gap between open-source and proprietary is narrowing faster than most analysts predicted. Open-weight models typically lag the frontier by six to eighteen months. That window is compressing. If the trend continues, the question for commercial AI providers becomes less “how do we price our models?” and more “what do we provide that a fine-tuned open-weight model cannot?”
The answer, for now, seems to converge on three things: multimodal capabilities at the edge of the frontier, deep tool integration and ecosystem lock-in, and trust — the implicit guarantee that a commercially backed model will not go rogue, get discontinued, or expose training data in a lawsuit. Whether those moats hold as open-source quality improves is the defining strategic question of the next two years.
The prices are down. The stakes are not.