A Canadian startup called Lyrebird has announced that it has developed a platform capable of mimicking human voice with a fraction of the audio samples required by other platforms such as Google DeepMind and Adobe Project VoCo.
The Lyrebird synthesis software requires only 60 seconds of sample audio to produce it’s synthetic sample. VoCo needs about 20 minutes to do the same.
The quality of the voice reproductions that the software can make are mixed. Some are better than others.
The three founders state that they are addressing possible misuse concerns by making the software publicly available. That may be a little optimistic.
“By releasing our technology publicly and making it available to anyone, we want to ensure that there will be no such risks. We hope that everyone will soon be aware that such technology exists and that copying the voice of someone else is possible. More generally, we want to raise attention about the lack of evidence that audio recordings may represent in the near future.”
“There are more troubling uses as well. We already know that synthetic voice generators can trick biometric software used to verify identity. And, given enough source material, AI programs can generate pretty convincing fake pictures and video of anyone you like. For example, this research from 2016 uses 3D mapping to turn videos of famous politicians, including George W. Bush and Vladimir Putin, into real-time “puppets” controlled by engineers. Combine this with a realistic voice synthesizer and you could have a Facebook video of Donald Trump announcing that the US is bombing North Korea going viral before you know it.”
Fake news has a new friend.