OpenAI just achieved a breakthrough in the world of artificial intelligence (AI) and video creation with the introduction of Sora. The generative AI model is capable of generating videos up to a minute long by simply using simple text prompts similar to how ChatGPT works.
Key Features of Sora
According to OpenAI, Sora can make highly detailed and complex scenes with multiple characters. Each character can display vibrant emotions and movements within a dynamic background and complex camera motions. All these are made possible by the deep cognizance of the software in natural language.
To demonstrate the potential of the text-to-video model, OpenAI posted a video on X that was entirely made by Sora. The live-action clip showed a couple walking down a Tokyo sidewalk in front of some stores amid a winter background.
“We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction,” stated OpenAI on the official webpage of Sora. “Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.”
“Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background,” it added. “The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.”
The company led by CEO Sam Altman does not have a release date for Sora yet though.
Challenges Encountered by OpenAI in Sora
OpenAI admitted that there are weaknesses in the current version of Sora that need to be addressed before its public launch. Among them are issues in simulating the physics of a complicated scene due to its limitations in comprehending the particular logic of cause and effect. An example it provided was a video depicting a person biting a cookie but the next scene would likely result in the cookie never having a bite mark at all.
It could also get confused on spatial instructions, which may lead to the mix-up of some directional details as simple as left and right. Likewise, it’s still prone to misinformation, bias, and harmful content.
Furthermore, the tech firm is presently working on coding C2PA metadata within Sora when it is packaged as an OpenAI product. This way, detection classifiers will have a way of knowing whether or not a certain video is generated by Sora while preventing the attribution of misleading content on the platform.
The maker of the popular ChatGPT ensured that it is presently engaging policymakers, educators, and artists worldwide to get as much input from them regarding their concerns and to identify the possible use cases for the new technology.
“Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it,” cautioned OpenAI. “That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time.”
Its maker considers Sora as an important foundation for generative AI models in their understanding of how to simulate real-world scenarios.