OpenAI Takes Wraps Off 'Sora' Text-to-Video Model -- Pure AI

OpenAI Takes Wraps Off 'Sora' Text-to-Video Model

By Gladys Rama
02/15/2024

OpenAI on Thursday unveiled "Sora," a text-to-video and image-to-video generative AI model.

The company, steward of the ChatGPT chatbot, said it has begun rolling out Sora to a select group of security testers and creatives. No public availability date has been announced.

Sora is a prompt-based video-creation tool. It can create videos up to one minute long in accordance with a user's text prompt. It can also animate still images and turn them into videos, make existing videos longer or fill in the gaps when a video skips.

"Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world," OpenAI explained in its announcement.

"The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately persist characters and visual style."

Sora shares some common technology with ChatGPT and OpenAI's first text-to-image tool, DALL-E. Specifically, it uses the same transformer architecture as ChatGPT and the same recapturing technique as DALL-E 3, which "involves generating highly descriptive captions for the visual training data."

"As a result," per OpenAI, "the model is able to follow the user's text instructions in the generated video more faithfully."

Examples of movies created by Sora are on the OpenAI announcement, as well as this thread on X/Twitter. Some were generated by prompts grounded in the real world ("A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage") and some were purely fantastical ("Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle").

Prompt: “A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. she wears a black leather jacket, a long red dress, and black boots, and carries a black purse. she wears sunglasses and red lipstick. she walks confidently and casually.… pic.twitter.com/cjIdgYFaWq
— OpenAI (@OpenAI) February 15, 2024

As impressive as it looks right now, Sora is not ready for primetime. It currently has an imperfect understanding of physics, linearity, and cause and effect, among other things. OpenAI's testers are also still in the process of assessing whether and how easily the model can be used in efforts to spread bias, misinformation and harmful content.

AI's ability to proliferate "fake news" is a persistent and legitimate concern. To address this, OpenAI recently announced that it will embed metadata in all images created by ChatGPT and DALL-E 3 (it's not a bullet-proof solution, however, as the metadata can simply be removed by the end user). Videos created by Sora will also come with a similar "detection classifier" that will mark them as AI-generated, OpenAI suggested in its blog.

As another safety handhold, OpenAI is working to ensure Sora prompts and outputs don't violate its usage policies. "[O]nce in an OpenAI product, our text classifier will check and reject text input prompts that are in violation of our usage policies, like those that request extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others," the company said. "We've also developed robust image classifiers that are used to review the frames of every video generated to help ensure that it adheres to our usage policies, before it's shown to the user."

Even as it describes its efforts to forestall misuse, OpenAI acknowledged that it's a Sisyphean task. "Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it," it said. "That's why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time."

About the Author

Gladys Rama (@GladysRama3) is the editorial director of Converge360.