Lyft Open Sources Cloud-Native Machine Learning Tool -- Pure AI

Lyft Open Sources Cloud-Native Machine Learning Tool

By John K. Waters
01/08/2020

Ride-sharing company Lyft has open-sourced its Flyte cloud-native machine learning and data processing platform designed to reduce the overhead associated with large-scale compute jobs. The company has been using the tool in-house for about three years for production-model training and data processing. It's currently the company's primary platform for things like pricing, locations, ETAs, mapping and self-driving.

Flyte is a hosted, multi-tenant service that allows teams to work on separate repositories and deploy them without affecting the rest of the platform. Code is versioned and containerized with its dependencies, which ensures that all executions are reproducible. To provide this level of isolation, Flyte was built directly on Kubernetes, the company said. The platform comes with Flytekit, a python SDK for developing applications on the platform.

Flyte is designed to handle all the overhead involved in executing complex workflows, including hardware provisioning, scheduling, data storage and monitoring. Taking on the overhead allows developers to focus on writing their workflow logic.

"Flyte's mission is to increase development velocity for machine learning and data processing by abstracting this overhead," Lyft product manager Allyson Gale and engineer Ketan Umare explained in a blog post.

All Flyte tasks and workflows have strongly typed inputs and outputs, Gale and Umare explained. "This makes it possible to parameterize your workflows, have rich data lineage, and use cached versions of pre-computed artifacts. If, for example, you're doing hyperparameter optimization, you can easily invoke different parameters with each run. Additionally, if the run invokes a task that was already computed from a previous execution, Flyte will smartly use the cached output, saving both time and money."

Lyft currently uses Flyte to manage more than 7,000 unique workflows, totaling more than 100,000 executions each month, 1 million tasks and 10 million containers.

"With data now being a primary asset for companies, executing large-scale compute jobs is critical to the business, but problematic from an operational standpoint," Gale and Umare wrote. "Scaling, monitoring and managing compute clusters becomes a burden on each product team, slowing down iteration and subsequently product innovation. Moreover, these workflows often have complex data dependencies."

Flyte is framework-agnostic and comes with a growing collection of plugins, including Spark on K8s, AWS Batch, Array Jobs, Hive Qubole, Containers and Pods. The company has provided extensive documentation here. It's also on GitHub.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at [email protected].