Google is responding to Meta with Imagen Video, its text-to-video solution

After Meta’s Make-A-Video presentation, Google responds. The company introduced Imagen Video, its system for creating videos from written descriptions. This announcement follows the presentation of Google Imagen (a text-to-image solution) just a few months ago, showing that these new text-to-video AI models have been developed very quickly.

Video in 1280 x 768 resolution

Google claims it can create 1280 x 768 pixel video at 24 frames per second from text. The company explains: “Verify and transfer the results of previous work on image generation based on diffusion models to video generation.” On the site, you can see videos such as “a teddy bear runs through New York”, “a drone flies over a snow-covered rainforest”, “a teddy bear is washing dishes”.

To achieve this result, Google uses Imagen. For this first text-to-image solution, the company explains that it relies on core language understanding models as well as broadcast models to produce high-quality images. Google claims that large general language models (such as T5) pre-trained on text corpora are efficient at converting text to images.

Increasing the size of Imagen’s language model improves both sampling accuracy and image-to-text ratio more than increasing the size of the image propagation model. As a result, the company promises “an unprecedented degree of photorealism.”

Models trained on multiple databases

For Imagen Video, Google trains its model on the open source LAION-400M image and text database, as well as 14 million video and text data and 60 million image and text data. The first video is generated from text at 3 images per second at 24 x 48 resolution. This video is then scaled up and the model creates additional images for the final render.

As for Imagen Video, Google claims it can generate videos based on the work of some famous artists, generate 3D rotating objects while maintaining the structure of this object, and also be able to render in various animation styles.

However, Google is aware that “these generative models may be used for other purposes, such as creating false, hateful, explicit or malicious content.” Filters have been introduced to restrict such use, but “there are still social prejudices and stereotypes that are difficult to detect and filter out.” Therefore, Google does not want to release the Imagen Video template or its source code until this issue is resolved. An important moment at a time when fake news and other deepfakes are widely circulating on the Internet.

Back to top button

Adblock Detected

Please consider supporting us by disabling your ad blocker.