NVIDIA Unveils GPUs to 'Form the Core of AI-Powered PCs' -- Pure AI

NVIDIA Unveils GPUs to 'Form the Core of AI-Powered PCs'

By John K. Waters
01/09/2024

NVIDIA, the world's leading provider of high-end graphics processing units (GPUs) used for AI and accelerated computing, launched the GeForce RTX 40 SUPER Series graphics cards this week. Announced at the annual Consumer Electronics Show (CES), underway this week in Las Vegas, the new hardware was designed to supercharge the latest games and form the core of AI-powered PCs.

The new family of GPUs includes the GeForce RTX 4080 SUPER, the GeForce RTX 4070 Ti SUPER, and the GeForce RTX 4070 SUPER.

Running generative AI locally on a PC is critical for privacy, latency, and cost-sensitive applications, the company says. It requires a large installed base of AI-ready systems, as well as the right developer tools to tune and optimize AI models for the PC platform.

"Generative AI is the single most significant platform transition in computing history and will transform every industry, including gaming," said Jensen Huang, founder and CEO of NVIDIA, in a statement. "With over 100 million RTX AI PCs and workstations, NVIDIA is a massive installed base for developers and gamers to enjoy the magic of generative AI."

The GeForce RTX 4080 SUPER generates AI video 1.5x faster, and images 1.7x faster, than the GeForce RTX 3080 Ti GPU. The Tensor Cores in SUPER GPUs deliver up to 836 trillion operations per second, bringing transformative AI capabilities to gaming, creating and everyday productivity, the company says.

Leading manufacturers, including Acer, ASUS, Dell, HP, Lenovo, MSI, Razer and Samsung, are set to release new RTX AI-based laptops, which will add a full set of generative AI capabilities out of the box. The new systems, which the NVIDIA says deliver a performance increase ranging from 20x to 60x compared with using neural processing units, are expected to begin shipping this month.

With this new GeForce RTX 40 SUPER Series, NVIDIA is offering tools designed to enhance PC experiences with generative AI: NVIDIA TensorRT acceleration of the popular Stable Diffusion XL model for text-to-image workflows, NVIDIA RTX Remix with generative AI texture tools, NVIDIA ACE microservices, and more games that use DLSS 3 technology with Frame Generation.

To help developers quickly create, test, and customize pretrained generative AI models and LLMs using PC-class performance and memory footprint, NVIDIA recently announced AI Workbench, a unified, easy-to-use toolkit for AI developers. AI Workbench, which will be available in beta later this month, offers streamlined access to popular repositories like Hugging Face, GitHub and NVIDIA NGC, along with a simplified user interface that enables developers to easily reproduce, collaborate on and migrate projects.

In addition, NVIDIA TensorRT-LLM (TRT-LLM), an open-source library that accelerates and optimizes inference performance of the latest large language models (LLMs), now supports more pre-optimized models for PCs. Accelerated by TRT-LLM, Chat with RTX, an NVIDIA tech demo also releasing this month, allows AI enthusiasts to interact with their notes, documents and other content.

Mobile workstations with RTX GPUs can run NVIDIA AI Enterprise software, including TensorRT and NVIDIA RAPIDS for simplified, secure generative AI and data science development. A three-year license for NVIDIA AI Enterprise is included with every NVIDIA A800 40GB Active GPU, which the company bills as an ideal workstation development platform for AI and data science.

In collaboration with HP, NVIDIA is also simplifying AI model development by integrating NVIDIA AI Foundation Models and Endpoints, which include RTX-accelerated AI models and software development kits, into the HP AI Studio, a centralized platform for data science. This will allow users to easily search, import and deploy optimized models across PCs and the cloud.

After building AI models for PC use cases, developers can optimize them using NVIDIA TensorRT to take full advantage of RTX GPUs’ Tensor Cores, the company says. NVIDIA recently extended TensorRT to text-based applications with TensorRT-LLM for Windows, an open-source library for accelerating LLMs. The latest update to TensorRT-LLM, available now, adds Phi-2 to the growing list of pre-optimized models for PC, which run up to 5x faster compared to other inference backends.

NVIDIA and its developer partners are also announcing the release of new generative AI-powered applications and services for PCs at CES, including:

NVIDIA RTX Remix, a platform for creating stunning RTX remasters of classic games. Releasing in beta later this month, it delivers generative AI tools that can transform basic textures from classic games into modern, 4K-resolution, physically based rendering materials.

NVIDIA ACE microservices, including generative AI-powered speech and animation models, which enable developers to add intelligent, dynamic digital avatars to games.

TensorRT acceleration for Stable Diffusion XL (SDXL) Turbo and latent consistency models, two of the most popular Stable Diffusion acceleration methods. TensorRT improves performance for both by up to 60% compared with the previous fastest implementation. An updated version of the Stable Diffusion WebUI TensorRT extension is also now available, including acceleration for SDXL, SDXL Turbo, LCM - Low-Rank Adaptation (LoRA) and improved LoRA support.

NVIDIA DLSS 3 with Frame Generation, which uses AI to increase frame rates up to 4x compared with native rendering, will be featured in a dozen of the 14 new RTX games announced, including Horizon Forbidden West, Pax Dei, and Dragon’s Dogma 2.

Chat with RTX, an NVIDIA tech demo available later this month, allows AI enthusiasts to easily connect PC LLMs to their own data using a popular technique known as retrieval-augmented generation (RAG). The demo, accelerated by TensorRT-LLM, enables users to quickly interact with their notes, documents, and other content. It will also be available as an open-source reference project, so developers can easily implement the same capabilities in their own applications.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at [email protected].