Open Source Database Maker Launches Managed ML Metadata Service

Database maker ArangoDB today announced the release of ArangoML Pipeline Cloud, a hosted and managed metadata layer for production-grade data science and Machine Learning (ML) platforms.

The ArangoML Pipeline provides a common and extensible metadata layer that allows data scientists and DataOps teams to manage all information related to their ML pipelines in one place. The types of metadata captured when productizing ML pipelines ranges from data storage (size, location, creation date, checksum) to model training (training/validation performance, training duration) and model serving (model linage, serving performance).

"Common metadata is an often overlooked aspect when building production grade ML pipelines, but is equally as important as good training data," said Jörg Schad, Head of Engineering and Machine Learning at ArangoDB, in a statement. "It is not only crucial for DataOps teams when looking for reproducible builds, audit trails, or compliance with privacy regulations, but extremely valuable for data scientists as well, allowing them to easily grasp the lineage of models, what artifacts are involved, and also enabling performance comparisons across different models and approaches."

"DataOps" is the automated, process-oriented methodology for data analytics that is quickly becoming an enterprise essential.

The Pipeline was developed, the company says, to meet the often divergent needs of data scientists (who focus on the quality of the data, feature training, and model results), and DevOps (where it's all about managing which datasets and deployments are in use, their performance, and how they're being deployed).

It accomplishes this by centralizing the metadata produced across the ML pipeline and providing that common interface, which shows the relationships of the data, features, and model training results, as well as the deployments, management, and serving logistics. The product is pipeline agnostic, so it allows any combination of pipeline components to be connected. And because it's cloud-based, it can be up and running in a few clicks.

The ArangoML Pipeline is the latest offering in the company's evolving ArangoML product line, and it runs on the new ArangoDB Oasis managed cloud service. ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. It's designed to accommodate and unite unstructured, highly-interlinked data, such as inference and model descriptions, and allow relationships among them to be stored as a graph that can be managed by the DevOps engineer and used by the data scientist at the same time.

The new ArangoML Pipeline Cloud is available for test drives in a free sandbox here. The company is hosting a webinar with a live demo, for which you can register here.

About the Author

John K. Waters is the editor in chief of a number of sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at