Future of fake human voices nears

WHAT if you could make President Trump say whatever you wanted? How about listening to the vaguely robot-like voice of yourself, programmed into an app based on a sample of your speech?

The technology will be ready “soon”, according to a team of researchers from the University of Montreal’s institute for computer-based learning algorithms. Now they’re seeking investors for their product, Lyrebird, and hope to join Google in the fast-expanding business of mimicking human voices.

Virtual assistants such as Alexa and Siri have driven the voice technology into the mainstream, where we can control our phones, cars and even refrigerators through verbal commands. And now we face a future where the perfect vocal replication of the president of the United States — or you, or anyone — could be just a few years away, some experts say. How does that future sound?

Whoever wins the development race, experts in technology and ethical fields are gearing up for products that will do to voice what Photoshop did to photos — make reality very difficult to tell from a simulation.

Lyrebird is aware of the downsides. The technology is exciting — with potentially “dangerous consequences such as misleading diplomats, fraud and . . . stealing the identity of someone else”, according to an ethical disclaimer on Lyrebird’s website. The developers did not immediately respond to an interview request.

Nevertheless, the inventors plan to begin selling what they call the first technology “to allow copying voices in a matter of minutes” — with fine tuning for emotional control.

Scientific American notes that Lyrebird and a competing Alphabet-owned project called WaveNet use neural network technology — code patterned after neurons in the human brain — to simulate human speech on the fly. In contrast, existing voice assistants such as Siri and Alexa “work by cobbling together words and phrases from prerecorded files of one particular voice”.

Lyrebird says its technology, once released, will be able to mimic any voice based on as little as a minute of audio recording — though one of the developers told TechCrunch that longer samples would reduce the “distinctly metallic rasp” that the outlet noted in clips released so far.

While Lyrebirds developers have not announced a release date for their product, they claim it will simulate audio much faster than Google’s WaveNet. When the tech giant’s artificial intelligence unit demonstrated WaveNet last year, listeners rated it as the closest simulation yet of human speech, according to the Verge.

Timo Baumann, a speech processing researcher at Carnegie Mellon University, told Scientific American that Lyrebird’s audio sounded a tad robotic but that convincing human simulations — voice assistants that people might treat like friends — were a few years away.

Five major tech giants: Apple, Google, Microsoft, Facebook and Amazon.com are pursing what The Washington Post’s Elizabeth Dwoskin called “an arms race” to create the next generation of virtual assistants to make our personal devices converse like humans, if not also sound like them.

“It’s about taking the way that humans have naturally interacted with each other for thousands of years and applying that to the way they interact with services,” Dag Kittlaus, a co-founder of the Siri app now in every iPhone, told Dwoskin.

The prospect of computer-simulated voice concerned a security technologist from Harvard University, who told Scientific American that a “new reality” of fake audio clips was on the horizon.

“A refined version of this system could replicate a person’s voice with incredible accuracy, making it virtually impossible for a human listener to discern the original from the emulation,” Gizmodo warned. “The day is coming when vocal speech, like an image processed in Photoshop, can be manipulated without our knowing.”

When Adobe demonstrated yet another form of voice-faking software last year — one that rearranges words in pre-recorded audio clips — a technology researcher at the University of Stirling expressed horror to the BBC. “It seems that Adobe’s programmers were swept along with the excitement of creating something as innovative as a voice manipulator,” Eddy Borges Rey told the outlet, “and ignored the ethical dilemmas brought up by its potential misuse”.

The creators of Lyrebird said they want their technology to be used for good: “Giving back the voice to people who lost it to sickness, being able to record yourself at different stages in your life and hearing your voice later on,” one of Lyrebird’s developers told Gizmodo.

—By arrangement with The Washington Post

Published in Dawn, May 12th, 2017

Read more