Google Launches Trillium TPU to Support Large-Scale AI Workloads -- Pure AI

Google Launches Trillium TPU to Support Large-Scale AI Workloads

By John K. Waters
12/06/2024

Google Cloud has unveiled its sixth-generation Tensor Processing Unit (TPU), Trillium, making it generally available to customers as the cornerstone of the company's AI Hypercomputer infrastructure. Designed for large-scale artificial intelligence workloads, Trillium promises dramatic improvements in performance, scalability, and energy efficiency, enabling enterprises and startups to tackle next-generation AI challenges.

Trillium TPUs were used to train Gemini 2.0, Google’s latest and most capable AI model. The advanced AI hardware delivers more than four times the training performance and three times the inference throughput of its predecessor, with a 67% increase in energy efficiency. These features, paired with cost savings of up to 2.5 times per training task, make Trillium an attractive option for organizations adopting AI at scale.

Google’s AI Hypercomputer, powered by over 100,000 Trillium chips interconnected via its Jupiter network fabric, can scale distributed training jobs across hundreds of thousands of accelerators. With bandwidth of 13 Petabits per second, the system achieves 99% scaling efficiency for large language models (LLMs) like GPT-3 and Llama-2, a significant leap in efficiency compared to previous TPU generations.

"Trillium TPU’s scalability and performance are redefining what is possible in AI infrastructure," Google said in a statement. "From dense LLMs to embedding-intensive models, it offers unmatched versatility and efficiency for complex AI workloads."

Key enhancements include double the High Bandwidth Memory (HBM) capacity, double the interchip interconnect bandwidth, and improved integration with open software frameworks like TensorFlow, PyTorch, and JAX. Host-offloading capabilities, supported by massive host DRAM, further boost performance while minimizing energy use.

The new hardware is already in use by AI21 Labs, whose CTO Barak Lenz said in a statement, "Trillium’s advancements in scale, speed, and cost-efficiency are critical for accelerating the development of next-generation AI models."

Trillium excels in training dense and Mixture of Experts (MoE) LLMs, delivering up to 4x faster training and nearly 4x higher throughput for MoE models compared to its predecessor. SparseCore enhancements also double performance for embedding-intensive models, further widening the scope of AI applications supported by Trillium.

The AI Hypercomputer’s flexible consumption model allows organizations to dynamically alter AI model parameters, optimize inference workloads, and efficiently manage computational resources. Google's improvements to the XLA compiler and scheduling systems ensure seamless performance across distributed workloads.

With features like collection scheduling, the platform can manage multi-replica workloads on Kubernetes, enabling cost-effective scaling of AI models. Trillium's price-performance ratio is a standout feature, with significant cost savings for inference tasks like image generation.

Trillium’s capabilities have positioned Google Cloud as a leader in AI infrastructure, offering enterprises and researchers the tools to push boundaries in AI innovation. As AI applications expand across industries, Trillium TPUs and the AI Hypercomputer represent a significant step toward more efficient, scalable, and accessible AI solutions.

"Trillium embodies Google’s commitment to delivering cutting-edge infrastructure for the most demanding AI workloads," the company said. "We’re excited to see how organizations leverage this technology to unlock new possibilities in AI."

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at [email protected].