Sponsored

Nvidia’s GB300 Blackwell Ultra Enters Mass Production, Ships to Hyperscalers

Nvidia has confirmed that its GB300 “Blackwell Ultra” GPU — the follow-on to the B200 that began shipping to cloud providers in late 2025 — has entered volume production at TSMC’s N3P process node and begun shipping to priority hyperscaler customers. Microsoft Azure, Google Cloud, and Meta’s AI infrastructure division are receiving initial allocations in April, with broader availability planned for Q3 2026 (Nvidia partner briefing, April 2026). The chip extends the Blackwell architecture with a new memory configuration and improved NVLink fabric that Nvidia says will meaningfully reduce the cost of training frontier AI models at scale.

What Changed in GB300

The GB300 retains the fundamental compute die of the B200 but introduces three significant changes:

Memory capacity and bandwidth. The B200 ships with 192GB of HBM3e at 8 TB/s of memory bandwidth. The GB300 upgrades to 288GB of HBM3e and pushes bandwidth to 10 TB/s — a 25% improvement in each dimension. Memory capacity has been the binding constraint for running the largest frontier models without model parallelism across multiple chips, and the jump to 288GB materially raises the threshold at which a single GPU can hold a complete model in memory.

NVLink 6. The GB300 introduces the sixth generation of Nvidia’s high-speed chip-to-chip interconnect, with a per-port bandwidth of 900 GB/s (bidirectional), up from 800 GB/s in NVLink 5. In a DGX GB300 system with 8 GPUs, the aggregate NVLink bandwidth within the node reaches 57.6 TB/s — roughly equivalent to the total memory bandwidth of 5 full-stack B200 systems. This matters most for model parallelism workloads where tensors must move between GPUs repeatedly during a forward pass.

Inference throughput. Nvidia is citing 1.5x the inference tokens per second of the B200 on standard benchmarks running dense transformer models at FP8 precision. For inference-heavy deployments — serving conversational AI at scale, powering coding agents, or running multimodal pipelines — this translates directly to a lower cost per token, which drives the economics of large-scale AI deployment.

The Supply Picture

The Blackwell launch in late 2025 was constrained by two factors: initial yield issues at TSMC on the new packaging process, and NVLink switch module supply. Nvidia has resolved both. The GB300 benefits from a mature N3P process that TSMC has now run at volume for multiple tape-outs, and the NVLink switch redesign that caused delays in the B200 NVL72 rack configurations has been resolved in production, according to supply chain sources cited by The Information (April 22, 2026).

Initial allocation is skewed toward customers who committed to forward purchases — Microsoft, Google, and Meta collectively account for approximately 65% of Q2 allocation, consistent with the purchase commitments disclosed in their respective earnings reports over the past two quarters. AWS is expected to receive GB300 inventory in Q3 as part of a separate supply agreement, following its decision to accelerate its own Trainium chip program while hedging with continued Nvidia purchases.

Pricing for the GB300 is not publicly disclosed, but industry sources estimate it carries a 15–20% premium over the B200 list price of approximately $35,000 per GPU, placing it in the $40,000–$42,000 range for on-contract hyperscaler pricing.

Why the Timing Matters

The GB300 arrives as the industry’s compute demand forecasts are being revised upward for the third consecutive quarter. OpenAI’s API traffic reportedly grew 3x between Q3 2025 and Q1 2026. Google disclosed 16 billion tokens per minute in TPU API throughput at Cloud Next last week. Anthropic, in its Series F documentation, cited inference compute costs as its primary operating expense.

The confluence of agentic AI workloads — which run extended context windows and chain many model calls — and multimodal pipelines means that the effective compute demand per user interaction is rising faster than raw user growth. In that environment, a 1.5x throughput improvement at comparable power is not incremental: it allows operators to serve 50% more traffic from the same physical footprint.

For Nvidia, the GB300 also serves a strategic function: sustaining pricing power and demand ahead of the GB400 NVL144 rack — Nvidia’s next major architecture step — which is expected to sample in Q4 2026 and enter production in 2027.

L
Lois Vance

Contributing writer at Clarqo, covering technology, AI, and the digital economy.