Facebook Data Visualization Tool Helps AI Researchers
- By John K. Waters
Facebook's newly open sourced HiPlot is a lightweight, interactive visualization tool designed to help AI researchers discover correlations and patterns in so-called high-dimensional data using parallel plots and other graphical forms to represent information.
The term "high-dimension data" refers to data with a very large number of features, attributes, or characteristics that leads to "the curse of dimensionality" and potentially hundreds or thousands of dimensions. Parallel plots or coordinates are a visualization technique for plotting individual data elements across many dimensions. It looks like lines on a graph.
HiPlot enables machine learning (ML) researchers to more easily evaluate the influence of their hyperparameters, such as learning rate, regularizations, and architecture. It can also be used by researchers in other fields, so they can observe and analyze correlations in data relevant to their work.
"ML models are getting ever more complex and often have many hyperparameters," said research engineers Daniel Haziza, Jérémy Rapin, and Gabriel Synnaeve, in a blog post. "At Facebook AI, we have been using HiPlot to explore and efficiently analyze hyperparameter tuning of deep neural networks with dozens of hyperparameters and more than 100,000 experiments. We hope this tool will enable other scientists and engineers to explore and make the most of their own experimental data, while also paving the way for more dynamic training methods, such as those inspired by genetic algorithms."
HiPlot has two modes: As a Web server; in a Jupyter notebook (to visualize Python data). HiPlot requires Python 3.6 or newer. By default, the tool's Web server can parse CSV or JSON files. Users can also provide it with a custom Python parser that will convert their experiments into a HiPlot experiment. To help researchers performing hyperparameter searches, HiPlot is already compatible with the logs of open source Facebook AI libraries, such as wav2letter@anywhere, its inference framework for online speech recognition; Nevergrad, its open source tool for derivative-free optimization; and fairseq, its sequence modeling toolkit.
The tool is available now on GitHub.
John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at email@example.com.