Facebook Open Sources LASER Natural Language Processing Toolkit
Facebook announced today Tuesday it is open sourcing LASER (Language-Agnostic SEntence Representations), a toolkit created by Facebook Research that is "the first successful exploration of massively multilingual sentence representations to be shared publicly with the [natural language processing] community," the company says.
LASER originally worked with only a few romantic and Germanic languages but has since been expanded to 90 languages and 28 alphabets, and does so within the same model.
The toolkit's multilingual encoder and PyTorch code can be downloaded on GitHub here. Facebook has also included test sets for almost 100 languages.
"LASER opens the door to performing zero-shot transfer of NLP models from one language, such as English, to scores of others -- including languages where training data is extremely limited," the company said in its blog post announcing the release. "LASER is the first such library to use one single model to handle this variety of languages, including low-resource languages, like Kabyle and Uighur, as well as dialects such as Wu Chinese. The work could one day help Facebook and others launch a particular NLP feature, such as classifying movie reviews as positive or negative, in one language and then instantly deploy it in more than 100 other languages."
More information on exactly how LASER works can be found in the blog link above.
Becky Nagel is the vice president of Web & Digital Strategy for 1105's Converge360 Group, where she oversees the front-end Web team and deals with all aspects of digital strategy. She also serves as executive editor of the group's media Web sites, and you'll even find her byline on PureAI.com, the group's newest site for enterprise developers working with AI. She recently gave a talk at a leading technical publishers conference about how changes in Web technology may impact publishers' bottom lines. Follow her on twitter @beckynagel.