Nvidia's New Datacenter GPU Boosts AI Workloads
- By John K. Waters
Nvidia has begun shipping the first GPUs based on its next-generation Ampere architecture, the company announced recently.
The Nvidia A100 delivers as much as a 20x performance boost for AI training and inference over its predecessor Volta architecture, the company claimed. Nvidia CEO Jensen Huang made the announcement during the online edition of Nvidia's GPU Technology Conference 2020. It was Huang's first "kitchen keynote," delivered from his home.
The A100 offers the company's largest performance leap to date within its eight generations of GPUs, which was enabled by five innovations the company underscored in its announcement. The Ampere architecture provides more than 54 billion transistors, making it the world's largest 7-nanometer processor. It comes with third-generation Tensor Cores, which account for that 20x AI performance boost. (Tensor Cores also now support FP64, delivering up to 2.5x more compute than the previous generation for HPC applications. A new technical feature, Multi-instance GPU (MIG), enables a single A100 GPU to be partitioned into as many as seven separate GPUs, allowing it to deliver varying degrees of compute for jobs of different sizes. The third-gen Nvidia NVLink doubles the high-speed connectivity between GPUs, the company said, to provide efficient performance scaling in a server. And a new efficiency technique, called structural sparsity, harnesses the inherently "sparse" nature of AI math to double performance, the company said.
Even though Huang's announcements focused on hardware, he took pains to acknowledge the critical importance of developers building software to run on them, and their vital role in the company's "accelerated computing" strategy.
"Ultimately the most important part of [Nvidia's] accelerated computing is the developer," he said. "Developers optimize their applications, which increases the performance and the value of the platform, which attracts customers and increases the installed base, which attracts other developers. The positive feedback system grows, and it is now very clear that Nvidia's accelerated computing platform is at its tipping point."
Nvidia announced several updates to its software stack that make it possible for developers to take advantage of A100 GPU's innovations. Those updates include new versions of more than 50 Cuda-X libraries used to accelerate graphics, simulation, and AI; Cuda 11; Nvidia Jarvis, a multimodal, conversational AI services framework; Nvidia Merlin, a deep recommender application framework; and the Nvidia HPC SDK, which includes compilers, libraries and tools that help HPC developers debug and optimize their code for A100.
Santa Clara, CA-based Nvidia has grown in a few short years into a multi-billion-dollar company by marketing its graphical processing units (GPUs) to datacenter operators as the right silicon for processing the flood of data demanded by a new generation of AI-oriented applications. Coming on the heels of the company's recent acquisition of high-performance networking company Mellanox, this new collection of software-acceleration libraries holds great promise for that market.
John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at firstname.lastname@example.org.