Facebook Open Sources LASER Natural Language Processing Toolkit

Facebook announced today Tuesday it is open sourcing LASER (Language-Agnostic SEntence Representations), a toolkit created by Facebook Research that is "the first successful exploration of massively multilingual sentence representations to be shared publicly with the [natural language processing] community," the company says.

LASER originally worked with only a few romantic and Germanic languages but has since been expanded to 90 languages and 28 alphabets, and does so within the same model.

The toolkit's multilingual encoder and PyTorch code can be downloaded on GitHub here. Facebook has also included test sets for almost 100 languages.

"LASER opens the door to performing zero-shot transfer of NLP models from one language, such as English, to scores of others -- including languages where training data is extremely limited," the company said in its blog post announcing the release. "LASER is the first such library to use one single model to handle this variety of languages, including low-resource languages, like Kabyle and Uighur, as well as dialects such as Wu Chinese. The work could one day help Facebook and others launch a particular NLP feature, such as classifying movie reviews as positive or negative, in one language and then instantly deploy it in more than 100 other languages."

More information on exactly how LASER works can be found in the blog link above.

About the Author

Becky Nagel is the former editorial director and director of Web for 1105 Media's Converge 360 group, and she now serves as vice president of AI for company, specializing in developing media, events and training for companies around AI and generative AI technology. She's the author of "ChatGPT Prompt 101 Guide for Business Users" and other popular AI resources with a real-world business perspective. She regularly speaks, writes and develops content around AI, generative AI and other business tech. Find her on X/Twitter @beckynagel.