Google 'Whisks' AI-Generated Video

A new AI experiment announced from Google Labs allows users to generate images using other images as prompts.

The tool -- Whisk, released Monday -- enables users to prompt the technology with multiple images for the subject, the scene, and the style.

Google also will fill in some images as prompts if the user does not have any. Additional details can be added via text later.

The Gemini model automatically writes a detailed caption of the images. It then feeds those descriptions into Google’s latest imagegeneration model, Imagen 3. This process captures the subject's essence, not an exact replica, making it easier to mix subjects, scenes and styles.

Google said it extracts only a few key characteristics from the image used, so it might generate images that differ from the creator's expectations. For example, the generated subject might have a different height, weight, hairstyle or skin tone. While specific features may be critical to the project, Google admits that "Whisk may miss the mark," so the platform lets users view and edit the underlying prompts at any time.

Whisk uses the latest version of Imagen 3, an image-generation model that it also announced today, along with Veo 2, the next version of its video-generation model.

advertisement

advertisement

Google says Vero 2 has an understanding of the language used in cinematography, and creates things in images such as extra hands far less frequently than other models.

Veo 2 will initially integrate into Google’s VideoFX. A waitlist is available in Google Labs waitlist. The company expects to also integrate it into YouTube Shorts and other products sometime next year.

Video models often “hallucinate” unwanted details -- such as an extra foot or fingers, as well as other unexpected objects.

Google says Veo 2 produces these less frequently, making the videos and images more realistic.

Next story loading loading..