Days after announcing the potential release of ChatGPT-5, OpenAI has unveiled its new AI Tool, Sora, a text-to-video diffusion model, nicknamed Sora after the Japanese word for sky, that can produce minute-long videos that are so realistic that they look like the real thing. How so? Sora allows users to create photorealistic videos up to a minute long — all based on prompts they’ve written.
According to OpenAI’s blog post, Sora can create “complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background by teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.”
For example, the company noted Sora can “understand how objects exist in the physical world” by “accurately interpreting props and generating compelling characters that express vibrant emotions.” Thus, what are the initial impressions of Sora, and how will the model compare to other text-to-video models such as Google’s Lumiere?
What Can Sora Do?
Firstly, although OpenAI announced Sora, the model is currently off-limits to the public as it is still in the red-teaming phase. What it means is that Sora is still under scrutinising testing to ensure it doesn’t produce harmful or inappropriate content.
OpenAI explained that it is also granting access to a select group of “visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals as they noted that the existing Sora might not accurately simulate the physics of a complex scene and may not interpret certain instances of cause and effect.”
Regardless, OpenAI did not keep us all completely in the dark as OpenAI included some demos in their blog and on X, including the replication of several prompts such as “Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous Sakura petals are flying through the wind, along with snowflakes” and more. Although it is not perfect, it could be a job killer in the future.
In addition, OpenAI’s CEO Sam Altman posted on X asking users to “reply with captions for videos you’d like to see” and quote posting those with Sora’s videos and demos. For example, Altman responded with replies to prompts such as “Two golden retrievers podcasting on top of a mountain”, “a bicycle race on the ocean with different animals as athletes riding the bicycles with drone camera view” and more. Every video generated is impressive, especially for a demo version, but there is still a lingering concern regarding how OpenAI trained Sora.
Also Read: Apple Researchers Unveil Keyframer: An AI Tool That Animates Still Images Using LLMs
Is Sora Trained From Licensed Copyright?
OpenAI has not disclosed how much footage they used to train Sora or where the training videos may have originated, other than Bill Peebles telling the New York Times that the corpus contained “the training data is from the content we’ve licensed and also publicly available content.”
While the numerous copyright infringement charges against the company, like the multiple alleged copyright infringement in the training of its generative AI tools, which digest massive amounts of material scraped from the internet and imitated the images or text contained in those datasets begged to differ, we can only take them at their mouth words for now.
Moreover, OpenAI has hidden a feature of Sora in their demo. As for what, it is the model’s ability to generate videos from a single image or a sequence of frames. According to Tim Brooks, a research scientist on the project, this function is “another really cool way to improve storytelling capabilities”, as “you can draw what you have on your mind and then animate it to life.”
Naturally, a function so powerful has some drawbacks, such as the increased possibility of deepfakes and misinformation. Thus, Peebles stated that “they will be very cautious about all the safety implications for this” before releasing.
Is OpenAI Ahead of the Chasing Pack Again?
Since the era of AI started, OpenAI has always been at the forefront of discussion with innovative releases such as ChatGPT’s chatbot and is now a potential pioneer of the text-to-video diffusion model. Although companies like Runway and Pika have shown impressive text-to-video models of their own, and Google’s Lumiere being the latest to jump on the trend, there is a lingering feeling that OpenAI will surpass these companies with its own.
For example, Sora has striking photorealism and has the ability to produce longer clips than the brief snippets other models typically do, up to one minute. However, as the model is still a research product with an unspecified date of making it available to all wannabe auteurs, it’s premature to decide which model is better without proper side-by-side comparisons.
Besides OpenAI’s massive announcement of Sora, Google also unveiled an update to their Gemini Model with Gemini 1.5. To learn more about this topic and other AI news, follow our Facebook and Twitter pages for the latest updates and coverage of the stories occupying your feed.