Skip to main content
  1. Home
  2. Emerging Tech
  3. Computing
  4. News

Baidu’s new A.I. can mimic your voice after listening to it for just one minute

Add as a preferred source on Google
Image used with permission by copyright holder

We’re not in the business of writing regularly about “fake” news, but it’s hard not to be concerned about the kind of mimicry technology is making possible. First, researchers developed deep learning-based artificial intelligence (A.I.) that can superimpose one person’s face onto another person’s body. Now, researchers at Chinese search giant Baidu have created an A.I. they claim can learn to accurately mimic your voice — based on less than a minute’s worth of listening to it.

“From a technical perspective, this is an important breakthrough showing that a complicated generative modeling problem, namely speech synthesis, can be adapted to new cases by efficiently learning only from a few examples,” Leo Zou, a member of Baidu’s communications team, told Digital Trends. “Previously, it would take numerous examples for a model to learn. Now, it takes a fraction of what it used to.”

Recommended Videos

Baidu Research isn’t the first to try and create voice-replicating A.I. Last year, we covered a project called Lyrebird, which used neural networks to replicate voices including President Donald Trump and former President Barack Obama with a relatively small number of samples. Like Lyrebird’s work, Baidu’s speech synthesis technology doesn’t sound completely convincing, but it’s an impressive step forward — and way ahead of a lot of the robotic A.I. voice assistants that existed just a few years ago.

The work is based around Baidu’s text-to-speech synthesis system Deep Voice, which was trained on upwards of 800 hours of audio from a total of 2,400 speakers. It needs just 100 5-second sections of vocal training data to sound its best, but a version trained on only 10 5-second samples was able to trick a voice-recognition system more than 95 percent of the time.

“We see many great use cases or applications for this technology,” Zou said. “For example, voice cloning could help patients who lost their voices. This is also an important breakthrough in the direction of personalized human-machine interfaces. For example, a mom can easily configure an audiobook reader with her own voice. The method [additionally] allows creation of original digital content. Hundreds of characters in a video game would be able to have unique voices because of this technology. Another interesting application is speech-to-speech language translation, as the synthesizer can learn to mimic the speaker identity in another language.”

For a deeper dive into this subject, you can listen to a sample of the voices or read a paper describing the work.

Luke Dormehl
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
DJI’s first 360° drone offers 8K video recording and a freakishly long transmission range
From omnidirectional obstacle sensing to 42 GB of onboard storage, the Avata 360 is DJI doing what DJI does best: raising the bar for everyone else.
DJI Avata 360° drone.

DJI has officially entered the 360° drone arena with the launch of the Avata 360. It’s the company’s first-ever fully immersive FPV drone, and a direct shot at the Antigravity A1, a rival built by an Insta360-incubated brand. Looks like the drone wars just got more interesting. 

What makes the Avata 360 worth looking at?

Read more
I transferred all my chats from other AI apps to Gemini — and it works flawlessly
Google Gemini Graphics Featured

You know that moment when AI assistants like ChatGPT, Gemini, or Claude suddenly lose the plot mid-conversation and start hallucinating like they’re absolutely sure they’re right? Yeah…it’s equal parts funny and painfully annoying. My usual reaction is switching between apps, hoping one of them gets it right. But the real problem is that I have to start over every single time. It feels like I’m stuck in a loop explaining my life story to different AIs, one after the other.

Now with Gemini, I can now jump in from other AI apps without that whole reset conversation. Finally, the Google gods have blessed us. I tried it out expecting the usual hiccups, but it was surprisingly smooth and quick.

Read more
Google expands Search Live globally with voice and camera AI
The feature is now available in 200+ countries with multilingual support
Google Search Live

Google is taking another big step toward turning Search into a full-blown AI assistant. The company has officially expanded Search Live globally, making the feature available in over 200 countries and territories, along with support for dozens of languages.

https://twitter.com/google/status/2037201891130523917

Read more