Nvidia Unveils Blackwell B200 and GB200 Superchip to Redefine AI Computing Standards

Nvidia has reached a valuation in the multitrillion-dollar range, surpassing tech giants like Alphabet and Amazon, thanks to its indispensable H100 AI chip. Now, it looks set to widen its lead with the introduction of the new Blackwell B200 GPU and GB200 "superchip."

During the GTC livestream, Nvidia CEO Jensen Huang showcased the new GPU, positioning it alongside the H100 for comparison. The B200 GPU is claimed to deliver up to 20 petaflops of FP4 computing power with its 208 billion transistors. Moreover, the GB200, which integrates two B200 GPUs with a single Grace CPU, is purported to offer a 30-fold increase in performance for LLM inference tasks and significantly enhanced efficiency, reducing costs and energy usage by up to 25 times compared to the H100, according to Nvidia.

The company illustrates this efficiency leap by noting that training a model with 1.8 trillion parameters, which would have previously required 8,000 Hopper GPUs and 15 megawatts of power, can now be achieved with 2,000 Blackwell GPUs using just four megawatts.

In tests using the GPT-3 LLM with 175 billion parameters, the GB200 reportedly achieves seven times the performance of an H100, and quadruples the training speed.

One GB200 unit comprises two GPUs and one CPU on a single board. A significant advantage is its second-generation transformer engine that, by utilizing four bits per neuron instead of eight, doubles the compute, bandwidth, and model size capabilities. When many of these GPUs are interconnected, a next-generation NVLink switch enables communication among 576 GPUs at a speed of 1.8 terabytes per second.

This advancement necessitated the creation of a new 50 billion transistor network switch chip, enhancing onboard compute to 3.6 teraflops of FP8 performance.

Nvidia's new Blackwell series will feature both FP4 and FP6 capabilities. The design aims to overcome previous limitations where a cluster of 16 GPUs would spend the majority of its time communicating rather than computing.

Anticipating high demand, Nvidia is offering the GB200 NVL72, a liquid-cooled rack that incorporates 36 CPUs and 72 GPUs, delivering up to 720 petaflops of AI training or 1.4 exaflops of inference capacity. This setup requires nearly two miles of cabling and includes 5,000 individual cables.

Companies like Amazon, Google, Microsoft, and Oracle have already committed to including the NVL72 racks in their cloud offerings, though the specifics of these deals have not been disclosed.

Additionally, Nvidia presents the DGX Superpod for DGX GB200, which amalgamates eight systems into one, boasting 288 CPUs, 576 GPUs, and 240TB of memory, providing a staggering 11.5 exaflops of FP4 computing power.

Nvidia's infrastructure can scale up to support tens of thousands of GB200 superchips, using either 800Gbps InfiniBand or ethernet for high-speed networking.

While today's announcements from Nvidia's GPU Technology Conference focus primarily on computing and AI advancements, the underlying Blackwell GPU architecture may eventually power Nvidia's next-generation RTX 50-series for gamers.