OpenAI Releases GPT-4 with Multimodal Capabilities

OpenAI announced and demo'd the latest incarnation of its neural network machine learning model, GPT-4, this week. The new version is multimodal, which means it responds to both text and image inputs. (It outputs text only.) The company claims that, with its ability to better "see" images and "reason," GPT-4 delivers a more accurate, more knowledgeable, and more creative capability even than its headline-grabbing ChatGPT generative AI tool.

Microsoft, which has invested more than $10 billion in OpenAI, said the new version is already powering its Bing search engine. OpenAI said it used Microsoft's Azure cloud computing platform to train the model.

OpenAI's GPT-3 was released in 2022, followed closely by GPT-3.5, on which both ChatGPT and the image-generation tool, Dall-E, are built. The latest version of this generative pre-trained transformer, rumored to be in development much of last year, is being described as "long-awaited," by both the company and the press.

"Honestly, it's kind of hard for me to believe that this day is here," said Greg Brockman, president and co-founder of OpenAI, during an online demonstration. "Open AI has been building this technology, really since we started the company."

During his demo, Brockman showed how to code with GPT-4 as "a partner" by building a Discord bot. He began by telling the model that it should now assume the role of an AI programming assistant, and that its job was to write things out in pseudocode first, and then actually write the code. "This approach of letting the model break down the problem into smaller pieces is very helpful," Brockman explained. "That way you're not asking it to come up with a solution to a super hard problem all in one go. It also makes it very interpretable because you can see exactly what the model was thinking. And you can even provide corrections if you'd like."

"This is the kind of thing GPT-3.5 would totally choke on if you've tried anything like it," he added.

He also showed how GPT-4 could extrapolated from a hand-drawn sketch to create the code for a website. And he showed how it could summarize the blog post about this release using words beginning only with the letter "G." The result: " “GPT-4, groundbreaking generational growth, gains greater grades. Guardrails, guidance, and gains garnered. Gigantic, groundbreaking, and globally gifted." He also used it to file a tax return.

OpenAI has shown how GPT-4 outperforms its own ChatGPT, which was built on GPT-3.5, because it is a larger language model. In an experiment conducted by two law professors, it scored 297 on the bar exam, Reuters reported. That score puts GPT-4 in the 90th percentile of actual test takers. It's high enough to get a license to practice law in most states. It also ranked in the 93rd percentile on an SAT reading exam, and the 89th percentile on the SAT Math exam, OpenAI said.

Among the real-world use cased Brockman pointed to was a startup called Be My Eyes, provider of a mobile app designed to allow anyone to assist visually impaired people through live video calls. The Denmark-based company is using GPT-4 in its new Virtual Volunteer digital visual assistant.

"We are entering the next wave of innovation for accessibility technology powered by AI," said Mike Buckley, CEO of Be My Eyes, in a statement. "This new Be My Eyes feature will be transformative in providing people who are blind or have low vision with powerful tools to better navigate physical environments, address everyday needs, and gain more independence. We are thrilled to work with OpenAI to further our mission of improving accessibility for the 253 million people who are blind or have low-vision, with safe and accessible applications of generative AI."

"A year ago, we trained GPT-3.5 as a first 'test run' of the system," OpenAI said on its website. "We found and fixed some bugs and improved our theoretical foundations. As a result, our GPT-4 training run was (for us at least!) unprecedentedly stable, becoming our first large model whose training performance we were able to accurately predict ahead of time."

OpenAI claims that GPT-4 is less prone to "hallucinations"—providing wrong answers with a great deal of confidence—than its predecessor large language model.

OpenAI is releasing GPT-4’s text input capability via ChatGPT and the API (with a waitlist). To prepare the image input capability for wider availability, the company is collaborating closely with Be My Eyes and open-sourcing OpenAI Evals, the framework for automated evaluation of AI model performance, "to allow anyone to report shortcomings in our models to help guide further improvements."

GPT-4 is also available to subscribers of the premium paid-for ChatGPT Plus in a limited, text-only capacity.

About the Author

John K. Waters is the editor in chief of a number of sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at