- OpenAI is expected to release the Sora 2 AI video model soon
- Sora 2 will face stiff competition from Google’s Veo 3 model
- Veo 3 already offers features that Sora does not, and OpenAI will need to enhance both what Sora can do and how easy it is to use to entice possible customers
OpenAI appears to be finalizing plans to release Sora 2, the next iteration of its text-to-video model, based on references spotted in OpenAI’s servers.
Nothing has been officially confirmed, but there are signs that Sora 2 will be a major upgrade aimed squarely at Google’s Veo 3 AI video model. It’s not just a race to generate prettier pixels; it’s about sound and the experience of producing what the user is imagining when writing a prompt.
OpenAI’s Sora impressed many when it debuted with its high-quality images. They were silent films, however. But, when Veo 3 debuted this year, it showcased short clips with speech and environmental audio baked in and synced up. Not only could you watch a man pour coffee in slow motion, but you could also hear the gentle splash of liquid, the clink of ceramic, and even the hum of a diner around the digital character.
To make Sora 2 stand out as more than just a lesser option to Veo 3, OpenAI will need to figure out how to stitch believable voices, sound effects, and ambient noise into even better versions of its visuals. Getting audio right, particularly lip-sync, is tricky. Most AI video models can show you a face saying words. The magic trick is making it look like those words actually came from that face.
It’s not that Veo 3 is perfect at matching sound to picture, but there are examples of videos with surprisingly tight audio-to-mouth coordination, background music that matches the mood, and effects that fit the intent of the video.
Granted, a maximum of eight seconds per video limits the scope for success or failure, but fidelity to the scene is necessary before considering duration. And it’s hard to deny that it can make videos that both look and sound like real cats jumping off high dives into a pool. Though if Sora 2 can extend to 30 seconds or more with a steady quality, it’s easy to see it attracting users looking for more room for creating AI videos.
Sora 2’s movie mission
OpenAI’s Sora can stretch up to 20 seconds or more of high-quality video. And as it’s embedded into ChatGPT, you can make it part of a larger project. This flexibility is significant for helping Sora stand out, but the audio absence is notable. To compete directly with Veo 3, Sora 2 will have to find its voice. Not only find it, but weave it smoothly into the videos it produces. Sora 2 might have great audio, but if it can’t outmatch the seamless way Veo 3’s audio connects with its visuals, it might not matter.
At the same time, making Sora 2 too good might cause its own issues. With every new generation of AI video model, there’s more concern about blurring the line with reality. Sora and Veo 3 both don’t allow prompts involving real people, violence, or copyrighted content. But adding audio offers a whole new dimension of scrutiny over the origin and use of realistic voices.
The other big question is pricing. Google has Veo 3 behind the Gemini Advanced paywall, and you really need to subscribe to the $250 a month AI Ultra tier if you want to use Veo 3 all the time. OpenAI might bundle access to Sora 2 into the ChatGPT Plus and Pro tiers in a similar manner, but if it can offer more to the cheaper tier, it’s likely to quickly expand its userbase.
For the average person, the AI video tool they turn to will hinge on that price, as well as ease of use, as much as the features and quality of video. There’s a lot OpenAI needs to do if Sora 2 is going to be more than a silent blip in the AI race, but it looks like we will find out how well it can compete soon.