News
At GTC 2026, NVIDIA, AWS, and Google Cloud Shift Focus from Chips to AI Infrastructure
NVIDIA, Amazon Web Services, and Google Cloud used GTC 2026 to make a broader point about the AI market: the race is no longer only about who has the most advanced chips, but who can turn those chips into usable cloud infrastructure for training, inference, and large-scale deployment.
The announcements from AWS and Google Cloud suggested that NVIDIA’s conference has become as much a showcase for cloud architecture as for silicon. Both companies focused on how NVIDIA hardware is packaged with networking, virtualization, orchestration, and managed services for enterprises looking to move AI projects into production.
AWS said it plans to deploy more than 1 million NVIDIA GPUs across its cloud regions starting in 2026, including systems based on Blackwell and Vera Rubin architectures. It also announced support for Amazon EC2 instances using NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs, and said it is adding NVIDIA’s Inference Xfer Library, or NIXL, to AWS Elastic Fabric Adapter to improve disaggregated large language model inference across NVIDIA GPUs and AWS Trainium systems.
The company also used the event to tie AI infrastructure more closely to its broader cloud stack. AWS said it can deliver 3x faster Apache Spark performance using Amazon EMR on EKS with Amazon EC2 G7e instances powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. It also said support for NVIDIA Nemotron models is expanding in Amazon Bedrock, including upcoming reinforcement fine-tuning and planned availability of Nemotron 3 Super.
Google Cloud took a somewhat different approach, leaning into flexibility and software integration. The company said it is previewing fractional G4 virtual machines using NVIDIA virtual GPU technology, giving customers access to 1/2, 1/4, and 1/8 GPU configurations for workloads ranging from inference and rendering to remote desktops and streaming.
Google also said it is integrating NVIDIA Dynamo with GKE Inference Gateway, a move aimed at improving how AI workloads are managed across Kubernetes infrastructure. Looking further ahead, the company said it plans to be among the first cloud providers to offer NVIDIA Vera Rubin NVL72 rack-scale systems in the second half of 2026 as part of its AI Hypercomputer platform.
Taken together, the announcements pointed to a maturing market in which cloud providers are trying to differentiate not just on raw compute supply, but on how efficiently that compute can be consumed. That includes better interconnects for inference, more granular access to GPUs, software designed to reduce bottlenecks, and managed services that let customers build AI systems without assembling the stack themselves.
For NVIDIA, that is a useful reframing. The company still dominates the conversation around AI accelerators, but the message at GTC 2026 was that future growth may depend as much on cloud packaging and deployment models as on the next processor cycle. AWS and Google Cloud, in effect, used NVIDIA’s event to make the case that the infrastructure around the GPU is becoming the real battleground.
About the Author
David Ramel is an editor and writer at Converge 360.