Commentary

Google Releases Veo 2 Price Tag, Changing Video Ad Production

by Laurie Sullivan , Staff Writer, February 24, 2025

Veo 2, Google’s AI video generator announced in December, now has a price that is attracting attention.

The 50 cents per second of video adds up to $30 per minute. The price tag will likely make video ads and content across the web -- as well as advertising on connected TV (CTV) -- more affordable for smaller companies. The company posted a price page online.

“[A] very important number to keep in mind when considering the future of generative and non-generative media,” Jon Barron, an AI researcher at Google DeepMind, wrote in an X post.

In comparison, Barron wrote that the blockbuster "Avengers: Endgame" cost around $32,000 per second to produce when using traditional methods.

Barron agreed with a comment from @mahaoo_ASI, a follower on X, that his observation wasn’t completely an apples-to-apples comparison “because you'd probably need to generate hundreds of generations before you get what you want, but it's still a 1000x difference, and in a few years, it might be comparable in terms of capabilities (consistent characters/scenes)”

OpenAI recently made Sora, a video generation model available to subscribers for $200 a month through a ChatGPT Pro subscription.

There’s other use for video. Meta and others have begun training AI models using publicly available video to help it understand the world.

Meta publicly released the Video Joint Embedding Predictive Architecture (V-JEPA) model, a step in advancing machine intelligence. The early example of a physical world model can detect and understand highly detailed interactions between objects.

It was released under a Creative Commons non-commercial license for researchers to further explore.

V-JEPA was trained on two million public videos. Meta said it achieves performance on motion and appearance-based tasks without fine-tuning, and can outperform other methods.

It required training a foundation model for object-centric learning using video data. A neural network extracted object-centric representations from video frames, capturing motion and appearance cues.

Then the images are refined through contrastive learning to enhance objects. The architecture processes these representations to model object interactions over time. The framework is trained on a large-scale dataset, optimizing for reconstruction, accuracy and consistency of the objects across frames.

ad campaign, ai, artificial intelligence, branded entertainment, ctv, entertainment, google, media buying, movies, search, television, video

Next story loading

About the Author

Laurie Sullivan is a writer and editor for MediaPost. You can reach Laurie at lauriesullivan@gmail.com.

More from Performance Marketing Insider

SPONSOR CONTENT

Join Top Retail Marketers in Lake Tahoe