Nvidia Touts Breakthroughs Toward Real-Time Conversational AI

Limited conversational AI services have driven applications like chatbots and voice-driven UI and search for several years. Now Nvidia has announced that it's achieved important breakthroughs in enabling real-time, conversational AI, thanks to optimizations to its AI platform.

The improvements, Nvidia said, enabled the company to sharply reduce both training and inference times when running the large version of the Bidirectional Encoder Representations from Transformers (BERT) model, which is a widely-used advanced model for natural language processing. An NVidia DGX SuperPOD equipped with 92 Nvidia DGX-2H systems running 1,472 Nvidia V100 GPUs completed training BERT-Large in just 53 minutes -- down from a typical training time of several days. In addition, the company said a single Nvidia DGX-2 system was able to train BERT-Large in 2.8 days, illustrating the scalability of the solution.

Nvidia also touted sharp gains in inferencing on the BERT-Base Stanford Question Answering Dataset (SQuAD) using Nvidia T4 GPUs running TensorRT. The system performed inference in 2.2 milliseconds (ms), well below the 10ms threshold for real-time language applications, and more than an order of magnitude faster than the 40ms Nvidia said it measured with optimized CPU code.

The work with large language models is driving advancement toward conversational, natural-language UIs that can closely resemble human interaction, says Bryan Catanzaro, vice president of Applied Deep Learning Research at Nvidia.

"They are helping us solve exceptionally difficult language problems, bringing us closer to the goal of truly conversational AI," Catanzaro said. "Nvidia's groundbreaking work accelerating these models allows organizations to create new, state-of-the-art services that can assist and delight their customers in ways never before imagined."

NVIDIA cited key optimizations to its AI platform for the performance gains. The company said it is making the software optimizations available to developers. These include:

Nvidia singled out a number of adopters using its AI platform to drive language research and services, including Microsoft, which uses Nvidia solutions to run the BERT model to produce more accurate search results for its Bing search engine.

Rangan Majumder, group program manager for Microsoft Bing, said Azure Nvidia GPUs produced a 5X improvement in throughput and 2X reduction in latency during inference for the Bing service when compared to a CPU-based platform. Majumder said the enhancements "led to the largest improvement in ranking search quality Bing deployed in the last year."

About the Author

Michael Desmond is an editor and writer for 1105 Media's Enterprise Computing Group.