AI Communities Found Consortium for Python Data API Standards
- By John K. Waters
The communities supporting two popular artificial intelligence (AI) frameworks--Apache MXNet and Open Neural Network Exchange (ONNX)--have joined forces to establish a new standards organization to improve interoperability for machine learning (ML) practitioners and data scientists using any framework, library, or tool from the Python ecosystem.
Announced in a long blog post, The Consortium for Python Data API Standards was formed to deal with the fragmentation of multidimensional array (tensor) and dataframe libraries, which underpin the Python data ecosystem. The consortium will address the problem, its founder say, by developing API standards for those arrays and dataframes.
"Currently, array and dataframe libraries all have similar APIs, but with enough differences that using them interchangeably isn't really possible," they wrote.
ONNX is an open ecosystem for interoperable AI models. It was created in 2017 and open sourced by Microsoft and Facebook as an open format to provide "a shared model representation for interoperability and innovation." In 2019, it became a graduate level Linux Foundation project within the foundation's AI group. Today, it provides an open source format for AI models, both deep learning (DL) and traditional ML. It defines a common set of operators, the "building blocks" of ML/DL learning models, and a common file format that makes it possible for AI developers to use models with a range of frameworks, tools, runtimes, and compilers.
ONNX is supported by a number of organizations, including AWS to AMD, ARM, Baudi, HPE, IBM, Nvidia, and Qualcomm, among others. More than 30 companies currently contribute to the ONNX code base.
Apache MXNet is an open-source DL framework used to train and deploy deep neural networks. It's a fast and scalable training and inference framework with an easy-to-use, concise API for ML and AI.
In the blog post, the organizers credit Quansight Labs with the initial effort that led to the formation of the consortium. Quansight Labs is a public-benefit division of Quansight created to provide a home for a "PyData Core Team," the website reads, "which consists of developers, community managers, designers, and writers who create and maintain open-source technology around all aspects of scientific and data science workflows." In addition to Quansight, the list of founding sponsors includes Intel, Microsoft, the D. E. Shaw group, and Google Research.
These are very early days for the consortium, but they are proposing an ambitious schedule that includes publishing the array API RFC and starting the community review by Sep 15, and publishing the dataframe API RFC and starting that community review by Nov 15.
"We aim to grow this consortium into an organization where cross-project and cross-ecosystem alignment on APIs, data exchange mechanisms, and other such topics happens," they wrote. "These topics require coordination and communication to a much larger extent than they require technical innovation. We aim to facilitate the former, while leaving the innovating to current and future individual libraries."
John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at firstname.lastname@example.org.