BlazingSQL GPU-Savvy SQL Engine Goes Open Source
The BlazingSQL GPU-accelerated SQL engine for the Nvidia RAPIDS machine learning ecosystem has gone open source under the Apache 2.0 license. RAPIDS is a suite of open source software libraries and APIs that enable organizations to execute end-to-end data science and analytics pipelines entirely on GPUs.
BlazingSQL is a SQL interface for the cuDF GPU DataFrame library that performs loading, joining, aggregating, filtering, and manipulation of data. It enables large-scale data science workflows and enterprise datasets via simple SQL queries, which yield results as GPU DataFrames (GDF) accessible to any RAPIDS library. GDF is an open source project that provides a common data layer in GPU memory to extract the value of columnar GPU in-memory data.
"NVIDIA and the RAPIDS ecosystem are delighted that BlazingSQL is open-sourcing their SQL engine built on RAPIDS," said Josh Patterson, general manager of data science at Nvidia. "By leveraging Apache Arrow on GPUs and integrating with Dask, BlazingSQL will extend open-source functionality, and drive the next wave of interoperability in the accelerated data science ecosystem."
In a blog post, BlazingSQL Chief Executive Officer Rodrigo Aramburu wrote that "processing data at scale is expensive, slow and incredibly complex." He noted that BlazingSQL and RAPIDS requires a fraction of the infrastructure typically needed to enable data science at scale, and that GPU acceleration helps shorten iteration cycles. The combination also simplifies workloads, eliminating the need to prototype at small scale and then rebuild for a distributed environment. Instead, he wrote, users can "write code once and dynamically change the scale of distribution with a single line of code."
Aramburu called RAPIDS "the next-generation analytics ecosystem," and described BlazingSQL as a fundamental pillar of it.
"For this reason, we are fully integrated with the greater RAPIDS team and contribute heavily to cuDF," Aramburu wrote. "BlazingSQL is built entirely on top of cuDF and cuIO. New features pushed to these projects directly impact BlazingSQL features and performance, and because BlazingSQL runs on GDFs it is 100 percent interoperable with all of RAPIDS."
About the Author
Michael Desmond is an editor and writer for 1105 Media's Enterprise Computing Group.