Google’s DeepMind Can Sync Soundtrack and Dialogue to Videos

Ken Ngo

5 months ago

Google has an artificial intelligence (AI) lab, DeepMind, and it has taken their video content, as well as movie and TV production, to a whole new level. According to a blog post that DeepMind posted, the lab has made progress on their video-to-audio (V2A) technology. This type of technology combines video pixels and text prompts to generate outstanding soundtracks to enhance video content. On that blog post, DeepMind also shared that their V2A technology can pair with video generation models such as Veo and they can “create shots with a dramatic score, realistic sound effects or dialogue that matches the characters and tone of a video”.

The V2A technology from Google’s DeepMind brings out innovation for video content

According to a post from Music Business Worldwide, The V2A technology from DeepMind can generate a vast amount of soundtracks for video input so that it can “understand raw pixels”. In this way, the tech can figure out which sounds are appropriate to utilize for the specific video content. For example, if someone put down “jellyfish pulsating under water, marine life, ocean” (check out that specific audio prompt by clicking on the first link above), they can see a video of jellyfish moving around in the ocean. DeepMind even added that it also generate soundtracks for a range of other traditional footage as well such as silent films and archival footage. This is definitely a whole big step in the industry of AI-created audio and production and DeepMind is for sure an innovator in this field. Check out another prompt from YouTube below this paragraph.