MIT Announces 'Natural' Object ID Model Using Audio Descriptions
The Massachusetts Institute of Technology (MIT) announced this week that the university's computer scientists have invented a new machine learning model for object recognition that incorporates audio descriptions (versus transcripts of audio) along with images.
"The model doesn't require manual transcriptions and annotations of the example [speech] it's trained on," the official announcement explained of the new method. "Instead, it learns words directly from recorded speech clips and objects in raw images, and associates them with one another."
Typically, most machine learning models that incorporate audio require transcriptions of that audio versus using the audio itself. While this current system only recognizes "several hundred words and object types," the researchers who developed it have high hopes for its future.
"We wanted to do speech recognition in a way that's more natural, leveraging additional signals and information that humans have the benefit of using, but that machine learning algorithms don't typically have access to," commented David Harwath, a researcher in MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Spoken Language Systems Group.
"There's potential there for a Babel Fish-type of mechanism," he continued.
This specific experiment is built upon a 2016 project, but with more images and data added, and with a new approach to the training. Specific details on how the model was trained can be found in the official announcement of the new project here.
About the Author
Becky Nagel is the vice president of Web & Digital Strategy for 1105's Converge360 Group, where she oversees the front-end Web team and deals with all aspects of digital projects at the company, including launching and running the group's popular virtual summit and Coffee talk series . She an experienced tech journalist (20 years), and before her current position, was the editorial director of the group's sites. A few years ago she gave a talk at a leading technical publishers conference about how changes in Web browser technology would impact online advertising for publishers. Follow her on twitter @beckynagel.