
A new AI experiment announced from Google Labs allows users to
generate images using other images as prompts.
The tool -- Whisk, released Monday -- enables users to prompt the technology with multiple images for the subject, the scene, and the style.
Google also will fill in some images as prompts if the user does not have any. Additional details can be added via text later.
The Gemini model automatically writes a detailed caption of
the images. It then feeds those descriptions into Google’s latest imagegeneration model, Imagen 3. This process captures the subject's essence, not an exact replica, making it easier
to mix subjects, scenes and styles.
Google said it extracts only a few key characteristics from the image used, so it might generate images that differ
from the creator's expectations. For example, the generated subject might have a different height, weight, hairstyle or skin tone. While specific features may be critical to the project, Google admits
that "Whisk may miss the mark," so the platform lets users view and edit the underlying prompts at any time.
Whisk uses the latest version of Imagen 3, an image-generation model that it
also announced today, along with Veo 2, the next version of its video-generation model.
advertisement
advertisement
Google says Vero 2 has an understanding of the language used in cinematography, and creates things in
images such as extra hands far less frequently than other models.
Veo 2 will initially
integrate into Google’s VideoFX. A waitlist is available in Google Labs waitlist. The company expects to also integrate it into YouTube Shorts and other products sometime next year.
Video models often “hallucinate” unwanted details -- such as an extra foot or fingers, as well as other unexpected objects.
Google says Veo 2 produces these
less frequently, making the videos and images more realistic.