Fractional GPU System for Deep Learning On Kubernetes Announced -- Pure AI

Fractional GPU System for Deep Learning On Kubernetes Announced

By John K. Waters
05/13/2020

Run:AI, an Israel-based company that specializes in virtualizing artificial intelligence (AI) infrastructure, has claimed an industry first, announcing a fractional GPU sharing system for deep learning workloads on Kubernetes.

Fractional GPUs (FGPUs) are a software-only mechanisms for partitioning both the compute and memory resources of a GPU to allow multiple applications to run in parallel with strong performance isolation from each other. This capability is especially useful for data science and AI engineering teams, because it allows them to run workloads such as computer vision, voice recognition, and natural language processing simultaneously on a single GPU. Running those workloads on a single piece of hardware can lower costs considerably.

Run:AI's namesake platform was built on top of Kubernetes with the aim of virtualizing AI infrastructure to improve on the typical bare-metal approach that statically provisions AI workloads to data scientists. To overcome some limitations on how Kubernetes handles GPUs, the company resorted to some tricky math, effectively marking them as floats that can be fractionalized for use in containers, rather that integers that either exist or don't.

"Today's de facto standard for deep learning workloads is to run them in containers orchestrated by Kubernetes," the company said. "However, Kubernetes is only able to allocate whole physical GPUs to containers, lacking the isolation and virtualization capabilities needed to allow GPU resources to be shared without memory overflows or processing clashes."

The result of the company's work to overcome that limitation are virtualized logical GPUs -- sporting their own memory and computing space—that appear as self-contained processors to containers. Especially useful in lightweight workloads, including inference, eight or more container-run jobs can share the same physical chip, while typical use cases allow for only two to four jobs running on a single GPU.

The addition of fractional GPU sharing is a key component in Run:AI's mission to create a true virtualized AI infrastructure that combines with Run:AI's existing technology to elastically stretches workloads over multiple GPUs and enable resource pooling and sharing, the company said.

"Some tasks, such as inference tasks, often don't need a whole GPU," said Run:AI co-founder and CEO Omri Geller, in a statement, "but all those unused processor cycles and RAM go to waste because containers don't know how to take only part of a resource. Run:AI's fractional GPU system lets companies unleash the full capacity of their hardware so they can scale up their deep learning more quickly and efficiently."

More information on creating a virtual pool of GPUs is available here.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at [email protected].