News
Nvidia's Jensen Huang Unveils Hopper Successor 'Blackwell' at GTC
- By John K. Waters
- 03/19/2024
Nvidia's Chief Executive Officer Jensen Huang kicked off his company's annual GPU Technology Conference (GTC) yesterday with a keynote that introduced the Blackwell platform, announced a host of expanded industry partnerships, and showcased the platform's capabilities.
Taking the stage at the SAP Center in San Jose, CA, in his signature leather Jacket, and aided by a massive backdrop of dramatic de-constructing animations, Huang explained Blackwell in detail to a standing-room-only crowd.
"I hope you realize this is not a concert," Huang said, joking with his audience.
Named in honor of mathematician David Harold Blackwell, the Blackwell B200 GPU architecture comprises six technologies combined to create a platform—"not a chip," Huang emphasized—that will drive accelerated computing. Blackwell is faster and more efficient than its predecessor, Hopper, the H100 chip named for computer scientist Grace Hopper, Huang said. It can enable AI training and real-time LLM inference for models scaling up to 10 trillion parameters, he said.
"Hopper is fantastic, but we need bigger GPUs," Huang said.
The Blackwell platform comprises the following technologies:
- Blackwell-architecture GPUs, which comprise 208-billion-transistors, are manufactured using a custom-built 4NP TSMC process with two-reticle limit GPU dies connected by 10 TB/second chip-to-chip link into a single, unified GPU. The B200 will be available in a several options, including as part of the GB200 "superchip," which combines two Blackwell GPUs, or graphics processing units, with one Grace CPU, or a general-purpose central processing unit. The GB200 is a key component of the NVIDIA GB200 NVL72, a multi-node, liquid-cooled, rack-scale system, which comprises 600,000 parts, including 72 GB200s. It's designed to be capable of delivering 720 petaflops for training and 1.4 exaflops for inferencing. (And it reportedly weighs 3,000 pounds.)
- The second-generation transformer engine is fueled by new micro-tensor scaling support and Nvidia’s advanced dynamic range management algorithms integrated into its TensorRT LLM and NeMo Megatron frameworks. NeMo is an end-to-end platform for developing custom generative AI anywhere. TensorRT-LLM is an open-source library that accelerates and optimizes inference performance of the latest large language models (LLMs) on the NVIDIA AI platform.
- NVLink is Nvidia's high-speed GPU interconnect. It's designed to provide a faster alternative for multi-GPU systems to traditional PCIe-based solutions. Connecting two NVIDIA graphics cards with NVLink enables scaling of memory and performance to meet the demands of large visual computing workloads. The fifth-generation NVLink was designed to accelerate performance for multitrillion-parameter and mixture-of-experts AI models. The latest iteration delivers 1.8TB/s bidirectional throughput per GPU, to ensure seamless high-speed communication among up to 576 GPUs for the most complex LLMs.
- Blackwell-powered GPUs include a dedicated engine for reliability, availability, and serviceability (RAS). The Blackwell architecture also adds capabilities at the chip level to utilize AI-based preventative maintenance to run diagnostics and forecast reliability issues. The aim is to maximize system uptime and improve resiliency for massive-scale AI deployments to run uninterrupted for weeks, or even months, at a time and to reduce operating costs.
- Built-in advanced confidential computing capabilities are included to protect AI models and customer data without compromising performance. That's accomplished with support for new native interface encryption protocols, which are considered critical for privacy-sensitive industries, such as healthcare and financial services.
- A dedicated decompression engine supports the latest formats, accelerating database queries to deliver high performance in data analytics and data science. Nvidia believes that, in the future, data processing, on which companies spend tens of billions of dollars annually, will be increasingly GPU-accelerated.
"Established enterprise platforms are sitting on a gold mine of data that can be transformed into generative AI co-pilots," Huang told the crowd. "Created with our partner ecosystem, these containerized AI microservices are the building blocks for enterprises in every industry to become AI companies."
The new Nvidia Blackwell-based products are expected to be available later this year from its partners, including Amazon Web Services, Dell Technologies, Google, Meta, Microsoft, OpenAI, and Tesla.
"Generative AI is critical to creating smarter, more reliable and efficient systems," said Michael Dell, founder and CEO of Dell Technologies, in a statement. "Dell Technologies and NVIDIA are working together to shape the future of technology. With the launch of Blackwell, we will continue to deliver the next-generation of accelerated products and services to our customers, providing them with the tools they need to drive innovation across industries."
Dubbed the "Woodstock of AI" by employees and analysts, the GTC conference runs through Thursday. Event organizers are expecting to draw more than 16,000 on-site attendees, with approximately 300,000 participants joining online. This was Jensen's first time on stage in five years.
"As I was simulating how this keynote was going to turn out, somebody did say that another performer did her performance completely on a treadmill so that she could be in shape to deliver it with full energy" Huang said, referring to reports of pop star Taylor Swift's pre-concert prep. "I didn't do that. If I get a low wind at about 10 minutes into this, you know what happened."
About the Author
John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at jwaters@converge360.com.