Nvidia Touts Breakthroughs Toward Real-Time Conversational AI -- Pure AI

Nvidia Touts Breakthroughs Toward Real-Time Conversational AI

By Michael Desmond
08/14/2019

Limited conversational AI services have driven applications like chatbots and voice-driven UI and search for several years. Now Nvidia has announced that it's achieved important breakthroughs in enabling real-time, conversational AI, thanks to optimizations to its AI platform.

The improvements, Nvidia said, enabled the company to sharply reduce both training and inference times when running the large version of the Bidirectional Encoder Representations from Transformers (BERT) model, which is a widely-used advanced model for natural language processing. An NVidia DGX SuperPOD equipped with 92 Nvidia DGX-2H systems running 1,472 Nvidia V100 GPUs completed training BERT-Large in just 53 minutes -- down from a typical training time of several days. In addition, the company said a single Nvidia DGX-2 system was able to train BERT-Large in 2.8 days, illustrating the scalability of the solution.

Nvidia also touted sharp gains in inferencing on the BERT-Base Stanford Question Answering Dataset (SQuAD) using Nvidia T4 GPUs running TensorRT. The system performed inference in 2.2 milliseconds (ms), well below the 10ms threshold for real-time language applications, and more than an order of magnitude faster than the 40ms Nvidia said it measured with optimized CPU code.

The work with large language models is driving advancement toward conversational, natural-language UIs that can closely resemble human interaction, says Bryan Catanzaro, vice president of Applied Deep Learning Research at Nvidia.

"They are helping us solve exceptionally difficult language problems, bringing us closer to the goal of truly conversational AI," Catanzaro said. "Nvidia's groundbreaking work accelerating these models allows organizations to create new, state-of-the-art services that can assist and delight their customers in ways never before imagined."

NVIDIA cited key optimizations to its AI platform for the performance gains. The company said it is making the software optimizations available to developers. These include:

NVIDIA GitHub BERT training code with PyTorch
NGC model scripts and check-points for TensorFlow
TensorRT optimized BERT Sample on GitHub
Faster Transformer: C++ API, TensorRT plugin, and TensorFlow OP
MXNet Gluon-NLP with AMP support for BERT (training and inference)
TensorRT optimized BERT Jupyter notebook on AI Hub
Megatron-LM: PyTorch code for training massive Transformer models

Nvidia singled out a number of adopters using its AI platform to drive language research and services, including Microsoft, which uses Nvidia solutions to run the BERT model to produce more accurate search results for its Bing search engine.

Rangan Majumder, group program manager for Microsoft Bing, said Azure Nvidia GPUs produced a 5X improvement in throughput and 2X reduction in latency during inference for the Bing service when compared to a CPU-based platform. Majumder said the enhancements "led to the largest improvement in ranking search quality Bing deployed in the last year."

About the Author

Michael Desmond is an editor and writer for 1105 Media's Enterprise Computing Group.

Featured

The New AI Security Rules, Perplexity's $34.5B Chrome Bid, More

Pure AI

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

Live! 360 6-Week Training & Certification Course: Mastering the Microsoft AI Framework: Building Enterprise-Ready AI Agents with Microsoft Foundry
March 10-April 14, 2026

Live! 360 2-Day Hands-On Seminar: Copilot Studio, Microsoft Agent Framework and Foundry: Building Multi-Agent AI Systems
June 8-9, 2026

TechMentor & Cybersecurity Live! @ Microsoft HQ
August 3-7, 2026

Live! 360 Orlando
November 15-20, 2026

Artificial Intelligence Live! Orlando
November 15-20, 2026

AI Enterprise Architecture Live! Orlando
November 15-20, 2026

Cybersecurity & Ransomware Live! Orlando
November 15-20, 2026

Data Platform Live! Orlando
November 15-20, 2026

TechMentor Orlando
November 15-20, 2026