5

How AI can soon generate synthetic voices speaking in any language - Axios

 2 years ago
source link: https://www.axios.com/artificial-intelligence-voice-dubbing-synthetic-14bfb3c6-99db-4406-920d-91b37d00a99a.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
Dec 4, 2021 - Technology

AI could end foreign-language subtitles

Illustration of binary numbers inside of speech bubbles.

Illustration: Shoshana Gordon/Axios

AI companies are developing methods to interpret and synthesize voices in ads, movies and TV.

Why it matters: The advances in voice synthesis could help fix bad movie dubbing — and they come as international content is becoming increasingly important to studios and streaming platforms as part of the globalization of entertainment.

  • But they raise concerns about the possibility of deepfaking audio, as well as how a celebrity's voice might be used after their death.

What's happening: Foreign-language hits like "Squid Game" and "La Casa de Papel" are drawing record audiences, but subtitles are still a stumbling block for studios trying to tap a growing international market.

  • More Netflix subscribers watched dubbed versions of "Squid Game" than subtitled versions.
  • With blockbusters sucking up a lot of bandwidth, smaller producers of foreign-language content are having a hard time finding enough interpreters and voice-over actors to meet demand.
  • "We're still stuck in the mindset of the one-to-many broadcasting model," says Ryan Steelberg, co-founder and president of AI company Veritone.

Between the lines: Veritone has developed a product called MARVEL.ai that allows content producers to generate and license what it calls "hyper-realistic" synthetic voices.

  • This means, for example, podcast creators could have audio ad copy interpreted into another language and then MARVEL.ai will generate a synthetic version of their voice reading the ad in the new language.
  • "It gives you the ability to hyper-personalize audio on a much bigger scale and at less cost," says Steelberg.

How it works: Text-to-speech technology has existed for decades, but Veritone's product makes use of "speech-to-speech," what Steelberg calls "voice as a service."

  • Veritone has access to petabytes of data from media libraries and uses that to train its AI product, creating a synthetic version of the original voice that can be tuned for different kinds of sentiment or emotion, or with interpretation, speak a foreign language.
  • "It's no longer going to be another person's new voice speaking on behalf of, say, Tom Cruise," says Steelberg. "It's really going to be Tom Cruise's voice speaking another language."
  • Nvidia has been developing technology that would allow AI to alter video or animation in a way that takes an actor's lips and facial expression and matches it with the new language — so no more out-of-sync dubbing like in 1970s-era kung-fu movies.

What's next: This technology will likely first be used in advertisements, but as it migrates to higher-quality content, it will open up potential opportunities and pitfalls for celebrity talent.

  • "In terms of dubbing and post-production, synthetic voices will become mainstream, and you'll see that built into contracts for talent," says Steelberg.
  • That won't just be to ensure Hollywood stars (and their agents) get a cut for any use of their synthesized voice, but also to prevent those voices from being hijacked for malign purposes as the technology becomes more accessible.

What to watch: How the voices and other creative attributes of deceased celebrities might be harnessed by AI.

  • Holograms of dead musicians like Frank Zappa are already being used to front "live" shows that have brought in tens of millions in revenue, while Kenny G recently released a "duet" with the jazz great Stan Getz, who died 30 years ago.
  • Sample notes from Getz's existing library were used to generate a new, synthetic melody — albeit one that jazz writer Ted Gioia called a "Frankenstein record."

The bottom line: We should get used to hearing celebrities speak in almost any language soon — and those celebrities should get used to going through their wills with a fine-toothed comb.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK