Skip to main content
  1. Home
  2. Emerging Tech
  3. Computing
  4. News

Baidu’s new A.I. can mimic your voice after listening to it for just one minute

Add as a preferred source on Google
Image used with permission by copyright holder

We’re not in the business of writing regularly about “fake” news, but it’s hard not to be concerned about the kind of mimicry technology is making possible. First, researchers developed deep learning-based artificial intelligence (A.I.) that can superimpose one person’s face onto another person’s body. Now, researchers at Chinese search giant Baidu have created an A.I. they claim can learn to accurately mimic your voice — based on less than a minute’s worth of listening to it.

“From a technical perspective, this is an important breakthrough showing that a complicated generative modeling problem, namely speech synthesis, can be adapted to new cases by efficiently learning only from a few examples,” Leo Zou, a member of Baidu’s communications team, told Digital Trends. “Previously, it would take numerous examples for a model to learn. Now, it takes a fraction of what it used to.”

Recommended Videos

Baidu Research isn’t the first to try and create voice-replicating A.I. Last year, we covered a project called Lyrebird, which used neural networks to replicate voices including President Donald Trump and former President Barack Obama with a relatively small number of samples. Like Lyrebird’s work, Baidu’s speech synthesis technology doesn’t sound completely convincing, but it’s an impressive step forward — and way ahead of a lot of the robotic A.I. voice assistants that existed just a few years ago.

The work is based around Baidu’s text-to-speech synthesis system Deep Voice, which was trained on upwards of 800 hours of audio from a total of 2,400 speakers. It needs just 100 5-second sections of vocal training data to sound its best, but a version trained on only 10 5-second samples was able to trick a voice-recognition system more than 95 percent of the time.

“We see many great use cases or applications for this technology,” Zou said. “For example, voice cloning could help patients who lost their voices. This is also an important breakthrough in the direction of personalized human-machine interfaces. For example, a mom can easily configure an audiobook reader with her own voice. The method [additionally] allows creation of original digital content. Hundreds of characters in a video game would be able to have unique voices because of this technology. Another interesting application is speech-to-speech language translation, as the synthesizer can learn to mimic the speaker identity in another language.”

For a deeper dive into this subject, you can listen to a sample of the voices or read a paper describing the work.

Luke Dormehl
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
Robots just ran the Beijing half-marathon faster than the world record holder
humanoid robot running a marathon

A humanoid robot just ran a half-marathon faster than the world record holder. It might not seem impressive at first, but considering last year, the fastest robot at Beijing's humanoid robot half-marathon finished in two hours and 40 minutes, this is a huge achievement. 

As reported by the Associated Press, the winning robot at this year's Beijing half-marathon crossed the finish line in 50 minutes and 26 seconds, comfortably beating the human world record of 57 minutes recently set by Jacob Kiplimo. 

Read more
As if the plate wasn’t already full, AI is about to worsen the global e-waste crisis
New report highlights a rising environmental concern
Stack of graphics cards and motherboards in a landfill site e-waste

AI is already changing how the world works, but it’s also quietly making one of our biggest environmental problems even worse. And no, this isn’t about energy consumption this time. It’s about the hardware. Because every smarter AI model comes with a physical cost.

AI is about to supercharge the e-waste problem

Read more
Smart glasses are finding a surprise niche — Korean drama and theater shows
Urban, Night Life, Person

Every year, millions of people follow Korean content without speaking a word of the language. They stream shows with subtitles, read translated lyrics, and find workarounds. But live theater has always been a different problem — you can't pause or rewind it. That's the problem: a Korean startup thinks it's cracked, and Yuroy Wang was one of the first to try it. The 22-year-old Taipei retail worker is a K-pop fan who loves Korean culture but doesn't speak the language. When he went to see "The Second Chance Convenience Store," a touring play based on a Korean novel that was a bestseller in Taiwan, he expected supertitles. What he got instead was a pair of chunky black-framed AI-powered glasses sitting on his nose, translating the dialogue in real time directly on the lenses. "As soon as I found out they were available, I couldn't wait to try them," he said. Wang is part of a growing audience discovering that smart glasses, a category of tech that has struggled to find mainstream purpose for years, might have just found their calling in the most unexpected of places: live Korean theater.

How do the glasses work?

Read more