VAST Data Redesigns Inference Infrastructure for AI Agents with NVIDIA DPUs -- Pure AI

VAST Data Redesigns Inference Infrastructure for AI Agents with NVIDIA DPUs

By John K. Waters
01/13/2026

Key Takeaways

VAST Data now runs its AI Operating System natively on NVIDIA BlueField-4 DPUs, removing traditional client-server bottlenecks and enabling faster, more efficient access to inference context data.
The redesigned system supports long-lived AI agents by delivering shared, high-speed key-value (KV) cache infrastructure that allows context to be stored, reused, and shared across distributed nodes.
VAST's architecture offers policy controls, auditability, and lifecycle management for inference context, helping organizations improve GPU utilization and meet the demands of regulated, enterprise-scale AI.

VAST Data has unveiled a redesigned inference architecture to support the emerging class of long-lived, multi-agent artificial intelligence systems. The infrastructure centers on the company’s AI Operating System (AI OS), which now runs natively on NVIDIA BlueField-4 DPUs, eliminating traditional storage bottlenecks and enabling deterministic access to shared memory for inference at scale.

The announcement marks a step toward production-grade infrastructure for agentic AI, where reasoning tasks increasingly require persistent context to be stored, shared, and reused across distributed systems. VAST’s new platform is integrated with NVIDIA’s Spectrum-X Ethernet networking and is designed to serve as the foundation for the NVIDIA Inference Context Memory Storage Platform.

As AI systems shift from single prompts to continuous, multi-turn dialogue, the cost of retrieving and maintaining inference context becomes critical. VAST’s approach moves key-value (KV) cache storage and access directly into the data path, reducing latency and improving concurrency by eliminating classic client-server delays.

By embedding its AI OS on NVIDIA’s BlueField-4 hardware, VAST positions key data services within GPU servers and dedicated data nodes. Combined with its Disaggregated Shared-Everything (DASE) architecture, the system creates a globally coherent context layer that can be accessed at line rate without complex coordination. The result is a reduction in time-to-first-token delays and more efficient utilization of GPU resources.

"Inference is becoming a memory system, not a compute job," said John Mao, Vice President, Global Technology Alliances at VAST Data, in a statement. "The winners won’t be the clusters with the most raw compute, they’ll be the ones that can move, share, and govern context at line rate. Continuity is the new performance frontier. If context isn’t available on demand, GPUs idle and economics collapse. With the VAST AI Operating System on NVIDIA BlueField-4, we’re turning context into shared infrastructure – fast by default, policy-driven when needed, and built to stay predictable as agentic AI scales."

The design enables consistent performance even as inference sessions grow in size and complexity. VAST said the platform is built to handle inference memory as a shared resource, offering policy controls, auditability, and lifecycle management to support regulated and revenue-generating AI workloads.

"Context is the fuel of thinking. Just like humans that write things down to remember them, AI agents need to save their work so they can reuse what they’ve learned," said Kevin Deierling, Senior Vice President of Networking, NVIDIA. "Multi-turn and multi-user inferencing fundamentally transform how context memory is managed at scale. VAST Data AI OS with NVIDIA BlueField-4 enables the NVIDIA Inference Context Memory Storage Platform and a coherent data plane designed for sustained throughput and predictable performance as agentic workloads scale."

VAST and NVIDIA said the system provides a foundation for AI-native organizations to scale inference without delays from rebuilds or underutilized GPU clusters. The companies said the system supports emerging demands for high-throughput, policy-driven context memory sharing in AI factories.

The companies will showcase the new architecture at VAST Forward, the company’s inaugural user conference, scheduled for February 24–26, 2026, in Salt Lake City, Utah.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at [email protected].

Featured

Pure AI

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

Live! 360 6-Week Training & Certification Course: Mastering the Microsoft AI Framework: Building Enterprise-Ready AI Agents with Microsoft Foundry
March 10-April 14, 2026

Live! 360 2-Day Hands-On Seminar: Copilot Studio, Microsoft Agent Framework and Foundry: Building Multi-Agent AI Systems
June 8-9, 2026

TechMentor & Cybersecurity Live! @ Microsoft HQ
August 3-7, 2026

Live! 360 Orlando
November 15-20, 2026

Artificial Intelligence Live! Orlando
November 15-20, 2026

AI Enterprise Architecture Live! Orlando
November 15-20, 2026

Cybersecurity & Ransomware Live! Orlando
November 15-20, 2026

Data Platform Live! Orlando
November 15-20, 2026

TechMentor Orlando
November 15-20, 2026