Google Releases Toolkit for Model Transparency
- By John K. Waters
Google announced the availability last week of a new set of tools for creation of Model Cards, which provide a structured framework for machine learning (ML) model transparency. The Model Card Toolkit (MCT) is a collection of tools for developers who are compiling the information that goes into a Model Card--things like model provenance, usage, and "ethics-informed evaluation," as well as a detailed overview of a model's suggested uses and limitations.
Google first proposed "that released models be accompanied by documentation detailing their performance characteristics" in a 2018 research paper ("Model Cards for Model Reporting"). The company began launching Model Cards publicly over the past year as part of an overall goal to increase ML model transparency and worked to create Model Cards for open-source models released by teams across Google.
Google's MediaPipe team, for example, has included Model Cards for each of the open-source models in its GitHub repository, the company said in a blog post. (MediaPipe creates state-of-the-art computer vision models for a number of common tasks.) "Creating Model Cards like these takes substantial time and effort, often requiring a detailed evaluation and analysis of both data and model performance," explained the post's authors, Huanming Fang and Hui Miao, both software engineers in Google's Research group. "In many cases, one needs to additionally evaluate how a model performs on different subsets of data, noting any areas where the model underperforms. Further, Model Card creators may want to report on the model's intended uses and limitations, as well as any ethical considerations potential users might find useful, compiling and presenting the information in a format that's accessible and understandable."
The MCT was developed to streamline this process. To demonstrate how the MCT can be used in practice, the company released a Colab tutorial that builds a Model Card for a simple classification model trained on the UCI Census Income dataset.
Google is also providing a JSON schema, which specifies the fields to include in the Model Card. Using the model provenance information stored with ML Metadata (MLMD), the MCT automatically populates the JSON with relevant information, such as class distributions in the data and model performance statistics. They're also providing a ModelCard data API to represent an instance of the JSON schema and visualize it as a Model Card. "The Model Card creator can choose which metrics and graphs to display in the final Model Card," the post's authors explained, "including metrics that highlight areas where the model's performance might deviate from its overall performance."
The Google MCT is currently available to anyone using TensorFlow Extended
(TFX) in open source or on the Google Cloud Platform
. "Users who are not serving their ML models via TFX can still leverage the JSON schema and the methods to visualize via the HTML template," the company said.
John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at firstname.lastname@example.org.