Meta Provides Public Access to Large-scale Language Models with OPT-175B
- By John K. Waters
The AI research group at Meta today announced the public availability of its Open Pre-trained Transformer (OPT-175B), a large language model with 175 billion parameters trained on publicly available data sets. The release includes both the pre-trained models and the code needed to train and use them.
Meta is limiting access to OPT-175B to "academic researchers; those affiliated with organizations in government, civil society, and academia; along with industry research laboratories around the world," the company said in a blog post. And in an effort "to maintain integrity and prevent misuse" and "focus on research use cases," the company is releasing its model under a noncommercial license.
Large language models are natural language processing (NLP) systems trained on massive volumes of text (more than 100 billion parameters). A transformer is a deep learning model based on a self-attention mechanism that directly models relationships among all words in a sentence, regardless of their respective positions, rather than one-by-one in order. (The "GPT" in OpenAI's ground-breaking neural-network-powered language model, GPT-3, stands for Generative Pre-trained Transformer.)
Researcher access to these models has been limited, Meta points out, "hindering progress on efforts to improve their robustness and mitigate known issues such as bias and toxicity."
"We believe the entire AI community — academic researchers, civil society, policymakers, and industry — must work together to develop clear guidelines around responsible AI in general and responsible large language models in particular, given their centrality in many downstream language applications," the company said. "A much broader segment of the AI community needs access to these models in order to conduct reproducible research and collectively drive the field forward. With the release of OPT-175B and smaller-scale baselines, we hope to increase the diversity of voices defining the ethical considerations of such technologies."
Meta is providing access to OPT-175B, as well as the codebase used to train and deploy the model, using only 16 NVIDIA V100 data center GPUs. The goal is to increase the accessibility of these models specifically for research purposes and "to provide a foundation for analyzing potential harms rooted in quantifiable metrics on a common, shared model." The company is also releasing a suite of smaller-scale baseline models, trained on the same data set, and using similar settings as OPT-175B "to enable researchers to study the effect of scale alone." The parameter count for these smaller-scale models includes 125 million, 350 million, 1.3 billion, 2.7 billion, 6.7 billion, 13 billion, and 30 billion. The company also plans to add a 66 billion parameter count in the future.
The also claims to be address the large amounts of compute power currently required for AI research with OPT-175B.
"We developed OPT-175B with energy efficiency in mind by successfully training a model of this size using only 1/7th the carbon footprint as that of GPT-3," the company said. "This was achieved by combining Meta’s open-source Fully Sharded Data Parallel (FSDP) API and NVIDIA’s tensor parallel abstraction within Megatron-LM. We achieved ~147 TFLOP/s/GPU utilization on NVIDIA’s 80 GB A100 GPUs, roughly 17 percent higher than published by NVIDIA researchers on similar hardware."
Access the open-source code and small-scale pre-trained models is available now on GitHub. Requests for access to OPT-175B must be made via an OPT-175B request form.
John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at firstname.lastname@example.org.