Skip to main content
  1. Home
  2. Emerging Tech
  3. Computing
  4. News

Baidu’s new A.I. can mimic your voice after listening to it for just one minute

Add as a preferred source on Google
Image used with permission by copyright holder

We’re not in the business of writing regularly about “fake” news, but it’s hard not to be concerned about the kind of mimicry technology is making possible. First, researchers developed deep learning-based artificial intelligence (A.I.) that can superimpose one person’s face onto another person’s body. Now, researchers at Chinese search giant Baidu have created an A.I. they claim can learn to accurately mimic your voice — based on less than a minute’s worth of listening to it.

“From a technical perspective, this is an important breakthrough showing that a complicated generative modeling problem, namely speech synthesis, can be adapted to new cases by efficiently learning only from a few examples,” Leo Zou, a member of Baidu’s communications team, told Digital Trends. “Previously, it would take numerous examples for a model to learn. Now, it takes a fraction of what it used to.”

Recommended Videos

Baidu Research isn’t the first to try and create voice-replicating A.I. Last year, we covered a project called Lyrebird, which used neural networks to replicate voices including President Donald Trump and former President Barack Obama with a relatively small number of samples. Like Lyrebird’s work, Baidu’s speech synthesis technology doesn’t sound completely convincing, but it’s an impressive step forward — and way ahead of a lot of the robotic A.I. voice assistants that existed just a few years ago.

The work is based around Baidu’s text-to-speech synthesis system Deep Voice, which was trained on upwards of 800 hours of audio from a total of 2,400 speakers. It needs just 100 5-second sections of vocal training data to sound its best, but a version trained on only 10 5-second samples was able to trick a voice-recognition system more than 95 percent of the time.

“We see many great use cases or applications for this technology,” Zou said. “For example, voice cloning could help patients who lost their voices. This is also an important breakthrough in the direction of personalized human-machine interfaces. For example, a mom can easily configure an audiobook reader with her own voice. The method [additionally] allows creation of original digital content. Hundreds of characters in a video game would be able to have unique voices because of this technology. Another interesting application is speech-to-speech language translation, as the synthesizer can learn to mimic the speaker identity in another language.”

For a deeper dive into this subject, you can listen to a sample of the voices or read a paper describing the work.

Luke Dormehl
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
A chemical bath could bring your old EV battery back to near-full strength
Cornell researchers have developed a recycling process that restores spent lithium-ion cells to 95% of their original capacity while cutting recycling costs by more than half.
Li-ion battery close up showing recycling symbol

Your next phone or EV could run on a recycled battery that performs nearly as well as a new one. Cornell University researchers have developed a new recycling technique that restores spent lithium-ion cells to up to 95% of their original capacity, while cutting recycling costs by 56%.

A bath instead of a shredder

Read more
The best new ChatGPT feature is one most people will never use
Logo, Emblem, Symbol

For years, the biggest conversation around AI has been what these tools can do. They can browse the web, analyze documents, connect to your apps, conduct research, and increasingly act on your behalf. But as AI systems become more capable, another question has become harder to ignore: what happens when an AI assistant is tricked into handing over information it shouldn’t?

OpenAI’s new Lockdown Mode is its latest answer to that problem. Available across all ChatGPT account types, Lockdown Mode is an optional security setting designed for people and organizations handling sensitive information. The trade-off is that you get stronger protection against certain forms of data theft, but you lose access to some of ChatGPT’s most powerful features.

Read more
An app that lets anyone control a robot from their phone, no coding required
Sounds cool, right? Forget doomscrolling, now your phone can operate a robot arm instead
Representative Image

A team of researchers at Georgia Tech has developed a new smartphone-based system that could dramatically simplify how people interact with robots. Called COBALT, the platform allows users with little to no computing experience to remotely control robot arms from virtually anywhere in the world using just a phone and an internet connection.

The project, developed at Georgia Tech’s People, AI & Robotics (PAIR) Lab, transforms smartphones into motion controllers for robotic arms. Users simply move their phones in different directions, and the robot mirrors those movements in real time. Basic tasks such as grabbing, moving, and releasing objects can be performed through simple on-screen controls, making the experience feel more like playing a mobile game than operating industrial machinery.

Read more