DataBricks Partners with Google Cloud to Enable 'Lakehouses' for Data Science and Machine Learning

Databricks has joined forces with Google's Cloud organization to enable deployment of its namesake data engineering platform on Google's global network. With this partnership, Databricks becomes the only unified data platform available across all three clouds (Google, AWS and Azure), effectively enabling a multi-cloud infrastructure for the enterprise with a single tool for data engineering, data science, machine learning (ML), and analytics.

This partnership makes it possible for Databricks users to create a "lakehouse" (which combines the capabilities of a data lake and a data warehouse) on Google Cloud's elastic network, and allows them to deploy Databricks in a fully containerized cloud environment for the first time.

The partnership enables:

  • Tight integration of Databricks with Google Cloud's analytics solutions, which makes it easier to extend "AI-driven insights" across data lakes, data warehouses, and multiple business intelligence tools.
  • Pre-built connectors for integrating Databricks with BigQuery, Google Cloud Storage, Looker and Pub/Sub.
  • Fast and scalable model training with Google Cloud's AI Platform using the data workflows created in Databricks, and simplified deployment of models built in Databricks using AI Platform Prediction.

Databricks was founded by the original creators of the Apache Spark analytics engine, which emerged from the Spark research project at UC Berkeley. The company's namesake analytics platform is powered by the Spark big-data distributed processing engine. Data science teams use that platform to collaborate with data engineering and lines of business to build data products.

In 2018, the company released MLflow, an open-source, cross-cloud framework designed to simplify the machine learning workflow. The framework was developed to allow organizations to package their code for reproducible runs, execute and compare hundreds of parallel experiments, leverage various hardware and software platform, and deploy models to production on a variety of serving platforms. The framework integrates with Apache Spark, SciKit-Learn, TensorFlow, and a range of open-source ML frameworks.

Enabling deployment of Databricks on Google Cloud will "unlock AI-driven insights, enable intelligent decision-making, and ultimately accelerate their digital transformations through data-driven applications," the companies said.

"This is a pivotal milestone that underscores our commitment to enable customer flexibility and choice with a seamless experience across cloud platforms," said Ali Ghodsi, CEO and co-founder of Databricks, in a statement. "We are thrilled to partner with Google Cloud and deliver on our shared vision of a simplified, open, and unified data platform that supports all analytics and AI use-cases that will empower our customers to innovate even faster." 

Both Databricks and Google have long employed strategies with strong support for open source, and this announcement throws a spotlight on that commitment. "Under this new partnership, the two companies will continue to support the open source community, encourage open innovation and collaboration, making it easier for joint customers to build on open-source technologies," the companies said.

Last year, Databricks contributed MLflow to the Linux Foundation.

Other vendors, whose partnerships with the two companies form a Databricks/Google Cloud joint ecosystem, have committed to ensuring "seamless integrations" with Databricks on Google Cloud, including Accenture, Cognizant, Collibra, Confluent, Deloitte, Fishtown Analytics, Fivetran, Immuta, Informatica, Infoworks, Insight, MongoDB, Privacera, Qlik, SADA, SoftServe, Slalom, Tableau, TCS, and Trifacta among others. 

"Businesses with a strong foundation of data and analytics are well-positioned to grow and thrive in the next decade," said Thomas Kurian, CEO at Google Cloud, in a statement. "We're delighted to deliver Databricks' lakehouse for AI and ML-driven analytics on Google Cloud. By combining Databricks' capabilities in data engineering and analytics with Google Cloud's global, secure network—and our expertise in analytics and delivering containerized applications—we can help companies transform their businesses through the power of data."

About the Author

John K. Waters is the editor in chief of a number of sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at