WHAT if you could make President Trump say whatever you wanted? How about listening to the vaguely robot-like voice of yourself, programmed into an app based on a sample of your speech?

The technology will be ready “soon”, according to a team of researchers from the University of Montreal’s institute for computer-based learning algorithms. Now they’re seeking investors for their product, Lyrebird, and hope to join Google in the fast-expanding business of mimicking human voices.

Virtual assistants such as Alexa and Siri have driven the voice technology into the mainstream, where we can control our phones, cars and even refrigerators through verbal commands. And now we face a future where the perfect vocal replication of the president of the United States — or you, or anyone — could be just a few years away, some experts say. How does that future sound?

Whoever wins the development race, experts in technology and ethical fields are gearing up for products that will do to voice what Photoshop did to photos — make reality very difficult to tell from a simulation.

Lyrebird is aware of the downsides. The technology is exciting — with potentially “dangerous consequences such as misleading diplomats, fraud and . . . stealing the identity of someone else”, according to an ethical disclaimer on Lyrebird’s website. The developers did not immediately respond to an interview request.

Nevertheless, the inventors plan to begin selling what they call the first technology “to allow copying voices in a matter of minutes” — with fine tuning for emotional control.

Scientific American notes that Lyrebird and a competing Alphabet-owned project called WaveNet use neural network technology — code patterned after neurons in the human brain — to simulate human speech on the fly. In contrast, existing voice assistants such as Siri and Alexa “work by cobbling together words and phrases from prerecorded files of one particular voice”.

Lyrebird says its technology, once released, will be able to mimic any voice based on as little as a minute of audio recording — though one of the developers told TechCrunch that longer samples would reduce the “distinctly metallic rasp” that the outlet noted in clips released so far.

While Lyrebirds developers have not announced a release date for their product, they claim it will simulate audio much faster than Google’s WaveNet. When the tech giant’s artificial intelligence unit demonstrated WaveNet last year, listeners rated it as the closest simulation yet of human speech, according to the Verge.

Timo Baumann, a speech processing researcher at Carnegie Mellon University, told Scientific American that Lyrebird’s audio sounded a tad robotic but that convincing human simulations — voice assistants that people might treat like friends — were a few years away.

Five major tech giants: Apple, Google, Microsoft, Facebook and Amazon.com are pursing what The Washington Post’s Elizabeth Dwoskin called “an arms race” to create the next generation of virtual assistants to make our personal devices converse like humans, if not also sound like them.

“It’s about taking the way that humans have naturally interacted with each other for thousands of years and applying that to the way they interact with services,” Dag Kittlaus, a co-founder of the Siri app now in every iPhone, told Dwoskin.

The prospect of computer-simulated voice concerned a security technologist from Harvard University, who told Scientific American that a “new reality” of fake audio clips was on the horizon.

“A refined version of this system could replicate a person’s voice with incredible accuracy, making it virtually impossible for a human listener to discern the original from the emulation,” Gizmodo warned. “The day is coming when vocal speech, like an image processed in Photoshop, can be manipulated without our knowing.”

When Adobe demonstrated yet another form of voice-faking software last year — one that rearranges words in pre-recorded audio clips — a technology researcher at the University of Stirling expressed horror to the BBC. “It seems that Adobe’s programmers were swept along with the excitement of creating something as innovative as a voice manipulator,” Eddy Borges Rey told the outlet, “and ignored the ethical dilemmas brought up by its potential misuse”.

The creators of Lyrebird said they want their technology to be used for good: “Giving back the voice to people who lost it to sickness, being able to record yourself at different stages in your life and hearing your voice later on,” one of Lyrebird’s developers told Gizmodo.

—By arrangement with The Washington Post

Published in Dawn, May 12th, 2017

Opinion

Money and man

Money and man

There is no ambiguity about whether very high inflation devastates society; but economists are not entirely sure how much influence high interest rates hold in controlling inflation.

Editorial

Another approach
Updated 01 Jun, 2024

Another approach

Conflating the genuine threat it poses with the online actions of a few misguided individuals or miscreants seems to be taking the matter too far.
Torching girls’ schools
01 Jun, 2024

Torching girls’ schools

PAKISTAN has, in the past few weeks, witnessed ill-omened reminders of a demoralising aspect of militancy: the war ...
Convict Trump
01 Jun, 2024

Convict Trump

AFTER a five-week trial saga, a New York jury on Thursday found former US president Donald Trump guilty of ...
Uncertain budget plans
Updated 31 May, 2024

Uncertain budget plans

It is abundantly clear that the prime minister, caught between public expectations and harsh IMF demands, is in a fix.
‘Mob justice’ courts
31 May, 2024

‘Mob justice’ courts

IN order to tackle the plague of ‘mob justice’ that has spread across the country, the Council of Islamic...
Up in smoke
31 May, 2024

Up in smoke

ON World No Tobacco Day, it is imperative that Pakistan confront the creeping threat of tobacco use. This year’s...