AI Castaway
        
        OpenAI's Sora: Did That Really Happen?
        Seeing is no longer believing, thanks to OpenAI's latest generative AI model. 
        
        
			- By Ginger Grant
- 02/22/2024
Welcome  to the inaugural installment of our new column, AI Castaway! Written by  Microsoft Data Platform MVP, international speaker and technical consultant Ginger  Grant, AI Castaway will take a technologist's lens to AI's use and development,  including how and when to use it, different implementations, and the impact of AI  on applications and processes. If you would like to engage directly with Ginger around the  topic of this and future installments, you can find her on X (formerly Twitter) @desertislesql.
Last week marked another new  announcement from OpenAI that will once again make it harder for people to  determine whether what they are seeing is real or computer-generated. 
OpenAI is adding to the capability of humans to use AI to perform a new  task: creating videos. This announcement will take CGI to a whole new level, as  the computer-generated imagery can be created using text prompts from ChatGPT. This  new library  is  called "Sora."
The 'Sky' Is the Limit
   Sora is an amazing new model that allows you to generate video based on inputting  a text description. For those of you who were wondering if Sora is an acronym,  it is not; Sora is the word for "sky" in Japanese.
The videos Sora creates look incredible. OpenAI states that it can  generate videos up to a minute in length and up to 2048x2048 in resolution. The  video examples OpenAI included in its announcement show an outstanding level of  detail.
What is truly amazing when looking at the sample videos in OpenAI's  Sora unveiling is the variations in style that are possible. The videos include  cartoons with a Pixar look and feel, as well as detailed and realistic views of  people, landscapes and environments, including a video that looks like something you might  see taken by a scuba diver. 
 
You are not constrained by the real world; if you like, you can send fish  swimming through a city or create a video of a digital Minecraft-style scene. 
	 
Sora contains a number of different background elements so you can locate  your video in the desert or in a forest. The models have been built to include  different focal lengths and occlusion, as well. This means that people can walk  in front of a dog in a window but the focus remains on the dog while the people  walking by appear fuzzy. It also means the dog is persisted before and after  the people walk by.
 Sora starts by creating a grainy-looking video, then sharpens it over  time, which is known as diffusion. Those of you who have created images using OpenAI's  image generator, DALL-E, will be familiar with how to generate an image, as  Sora draws heavily on concepts generated for that model. 
Sora generates videos using a text-to-video generative model that will  rely upon ChatGPT to describe scenes. In addition to creating movies based on  descriptions, Sora will be able to generate a video from a picture to animate  the contents and extend it by adding additional frames. Sora is designed to  extend a video forward and backward, which is all you need to create an endless  loop like you might see in a GIF. OpenAI can also cut videos of different  topics together into a longer video by slowly adding elements from one video  into another until the contents are merged.
	Creatives at a Crossroads
  The data used to train Sora was actual videos recorded in their original aspect  ratios. To be sure, Hollywood and the gaming industry will be investigating  this tool heavily to ensure that their digital copyrights have not been  violated when Sora was trained. In fact, as soon as OpenAI unveiled Sora last  week, studios  raised questions as to the source of the training videos. 
On the flip side, studios will also likely investigate how they can use  Sora to decrease the cost of CGI, which is the mainstay of special effects. It  isn't farfetched to expect Sora might lower the cost of making movies and games  -- as well as the salaries of those who currently create them.
	What CAN'T Sora Do?
  There are some limitations to this initial version of Sora. OpenAI admits that  it has problems with physics, as it cannot render glass breaking. 
	 
	Also, the act  of a human eating food is not a task the AI understands, so when you create a  video with a person eating food, the amount of food left on their plate does  not decrease. If you make a video of a person eating a sandwich, the sandwich  might still look whole after they take a bite.
Unfortunately, the general public is not able to try out Sora just yet,  or even see a working demo. OpenAI is still working on safety measures before  being ready to release it. For instance, it wants to make sure that Sora cannot  be used to generate sexual content, create a video with a celebrity as the  subject, or use someone else's movie or other copyrighted material. 
OpenAI is doing testing and final modifications with a small group of  testers, including MIT. As soon as it is available, I am guessing we will see  the output on TikTok.
What Now?
  When this technology is released to the public, it will be even harder to  determine truth from fiction. The video generated from Sora can be so realistic  that you could easily believe what you are seeing really happened. 
Malformed  hands are one of the few dead giveaways that a picture is AI-generated. It  will be interesting to see if OpenAI ever figures out how to render hands  appropriately in video. Right now, unless the video shows glass breaking or  someone eating -- the two things that OpenAI admitted that Sora has problems depicting  -- it may be impossible to tell reality from AI. 
        
        
        
        
        
        
        
        
        
        
        
        
            
        
        
                
                    About the Author
                    
                
                    
                    Ginger Grant is a Data Platform MVP who provides consulting services in advanced analytic solutions, including machine learning, data warehousing, and Power BI. She is an author of articles, books, and at DesertIsleSQL.com and uses her MCT to provide data platform training in topics such as Azure Synapse Analytics, Python and Azure Machine Learning. You can find her on X/Twitter at @desertislesql.