OpenAI's Sora: The ChatGPT Of Generative Video

by Colin Kirkland , February 16, 2024

OpenAI, the artificial intelligence company known for ChatGPT, has announced Sora, a new text-to-video model that is able to generate brief 1080p videos from typed prompt responses.

According to the company, these movie-like scenes can involve multiple characters and background details.

“Sora has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions,” OpenAI wrote in a blog post. “The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.”

Sora's creations are being praised for their high video-quality, consistency, style range and length. OpenAI's posts on X include Sora-generated clips of woolly mammoths running through snowy tundra, a photo-realistic woman walking through the streets of Tokyo at night, an animated monster playing with a candle and more.

“I did not expect this level of sustained, coherent video generation for another two to three years,” Ted Underwood, a professor of information science at University of Illinois at Urbana-Champaign, told The Washington Post.

To create a video through Sora, which can be up to one minute long -- far longer than any other AI video-generator to date -- users will respond to a prompt with a descriptive paragraph like this:

“Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. the art style is 3d and realistic, with a focus on lighting and texture. the mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. the use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.”

According to a group of OpenAI researchers' newly published paper called “Video generation models as world simulators,” Sora also has the ability to not only generate looping videos and edit existing videos, but “simulate digital worlds,” meaning that Sora can replicate video-game environments like Minecraft while controlling the player character.

To a basic degree, Sora not only generates photos or videos, but can determine the physics of each object in a digital environment, making it a “data-driven physics engine,” said senior Nvidia researcher Jim Fan.

“These capabilities suggest that continued scaling of video models is a promising path towards the development of highly-capable simulators of the physical and digital world, and the objects, animals and people that live within them,” the OpenAI co-authors wrote in their paper.

When Sora launches, the tool will initially be available to a small group of artists, filmmakers and a group of researchers called “red teamers” focused on exploring ways the AI tool might be used maliciously, such as in the spread of election misinformation and the creation of deepfakes.

The company is also working with experts to develop tools for detecting whether a video was generated by Sora.

“Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it,” OpenAI said. “That's why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time.”

artificial intelligence, chatbots, research, technology, text, video

Next story loading