OpenAI's New GPT-3 Sets a New Standard for Language Modeling--And Hype
- By John K. Waters
The Internet began hyperventilating last week when samples of text generated by the latest version of OpenAI's neural-network-powered language model, GPT-3, began circulating via social media. Given access through an API, a select group of beta testers demonstrated GPT-3's genuinely impressive ability to write everything from articles and poems to working computer code and guitar tablature.
Now that the Twitterverse has had a chance to catch its breath, it seemed like a good time for a closer look at what GPT-3 is, what it can do, and what it means to developers building software and systems in the ever expanding AI ecosystem.
Generative Pre-trained Transformer
"GPT" stands for Generative Pre-trained Transformer. A transformer is a deep learning model introduced by Google in 2017. It's based on a self-attention mechanism that directly models relationships among all words in a sentence, regardless of their respective positions, rather than one-by-one in order. This capability made transformers much faster than recurrent neural networks (RNNs), the leading approach at the time to natural language processing (NLP). Google introduced its open-source machine-learning framework, BERT (Bidirectional Encoder Representations from Transformers) in 2019 to better understand the context of words in search queries.
OpenAI researchers published a paper on generative pre-training in June 2018, in which they showed how a generative model of language is able to acquire world knowledge and process long-range dependencies by pre-training on a diverse corpus with long stretches of contiguous text. GPTs are unsupervised transformer language models; they use machine learning to analyze a sequence of words and other data to write text predictively and essentially elaborate on examples to produce original output, such as newspaper articles, essays, business reports, and short stories. The OpenAI researchers noted the potential of this model:
Supervised learning is at the core of most of the recent success of machine learning. However, it can require large, carefully cleaned, and expensive to create datasets to work well. Unsupervised learning is attractive because of its potential to address these drawbacks. Since unsupervised learning removes the bottleneck of explicit human labeling it also scales well with current trends of increasing compute and availability of raw data. Unsupervised learning is a very active area of research but practical uses of it are often still limited.
Most Powerful Language Model Ever Built?
Limited then, but fast forward to May 2020 and a new paper in which OpenAI researchers discuss how that initial exploration led to their recent demonstration of "substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task."
Talk about an understatement.
GPT-3, which followed the initial GPT and GPT-2 releases, was upgraded with a massive 175 billion training parameters, more than any other language model. (GPT-2 had 1.5 billion, and Microsoft's Turing Natural Language Generation model, or T-NLG, has 17 billion.) Also, GPT-3 is pre-trained on the Common Crawl data set, a corpus of almost a trillion words scraped from the Web.
Unlike other models, such as BERT, which require extensive fine-tuning with thousands of examples, GPT-3 can perform specific tasks without special tuning. GPT-3 can churn out the work of a poet or a programmer with fewer than 10 training examples. This is the capability that had social media buzzing.
GPT-3's capabilities are breathtaking to behold (and a little frightening). There's actually a Web site devoted to GPT-3-generated creative fiction. But neither GPT-3, or its predecessors--or BERT, for that matter--actually understand the meaning in language they're using. The OpenAI researchers themselves acknowledge in their paper that GPT-3 "…still has notable weaknesses in text synthesis and several NLP task limitations…," and the problems this can cause.
[A]lthough the overall quality is high, GPT-3 samples still sometimes repeat themselves semantically at the document level, start to lose coherence over sufficiently long passages, contradict themselves, and occasionally contain non-sequitur sentences or paragraphs… Within the domain of discrete language tasks, we have noticed informally that GPT-3 seems to have special difficulty with "common sense physics"… that test this domain. Specifically GPT-3 has difficulty with questions of the type "If I put cheese into the fridge, will it melt."
Gartner analyst Erick Brethenoux, VP and AI research lead, specializes in machine learning, artificial intelligence, and applied cognitive computing. He allows that, as significant a step as the GPT-3 release may be, the initial hyperbolic reactions should be taken with a grain of salt.
"I think some part of that community is, perhaps, over enthusiastic about some of the results they are getting," he told Pure AI. "There's a pure connectionist approach in the AI community that says, if you have neural-networks-based cognitive modeling that encompasses deep learning, you'll eventually be able to represent the brain. But GPT-3 needs 175 billion parameters to do the things it does, and they're saying the next iterations could involve trillions. My kid doesn't need a billion parameters to recognize a cat, or to understand that I'm joking."
Brethenoux says the GPT-3 release does represent an essential step in the evolution of AI. But he sees an even more important evolutionary path in what is called neuro-symbolic AI, which combines deep learning neural network architectures with symbolic reasoning techniques.
"I think it has more merit," he said, "and I'm not the only one. It's part of a wave Gartner calls composite AI, which involves assembling different AI techniques to solve problems, like recognizing language. It allows you to do abstractions faster; once you show me five cats, I have a pretty good idea what a cat looks like and I can abstract that at a certain level."
For now, OpenAI is content to invite outside developers to help take GPT-3 through its paces. The company plans to turn the tool into a commercial product later this year, probably as a paid-for enterprise subscription via the cloud.
OpenAI was originally founded in 2015 in San Francisco as a non-profit open-source organization by a group of investors that included Tesla founder Elon Musk, as well as Sam Altman, Ilya Sutskever, and Greg Bockman. Today it comprises two entities: the non-profit OpenAI Inc. and the for-profit OpenAI LP. In February 2018, OpenAI announced via a blog post that Musk would be leaving the board, but would remain as an advisor and donor.
John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at email@example.com.