Paul J. Richards/AFP via Getty Images
The federal government is boosting efforts to combat so-called audio deepfakes by awarding four organizations for developing technology that distinguishes between real human voices and voices generated by artificial intelligence. .
The award, presented Monday by the Federal Trade Commission, comes as the agency is warning consumers about scams using AI-generated voices, and the impact deepfakes will have on this year's elections. The award was made amid growing concerns.
One of the winners, OriginStory, was created by people including researchers at Arizona State University and uses sensors to detect human actions such as breathing, movement, and heartbeat information.
“We have this very unique sound production mechanism, so by sensing it in parallel with the acoustic signal, we can confirm that the sound is coming from a human. We can,'' the ASU College of Health Solutions professor and team said in an interview with NPR.
OriginStory's technology leverages existing hardware sensors, but is not available on all recording devices. Newer Android smartphones and iPhones don't have sensors. Berisha says it's already focused on hardware that can use it.
Another, DeFake, developed by Ning Zhang, a researcher at Washington University in St. Louis, injects data into real voice recordings so that AI-generated voice clones don't sound like real people. It has become.
Chan said his technique was inspired by an earlier tool developed by researchers at the University of Chicago, in which an artificial intelligence algorithm attempts to train images by making hidden changes to them. This is to make it impossible to imitate.
The third, AI Detect, uses AI to capture AI. It was developed by startup Omni Speech, whose machine learning algorithms extract features such as intonation from audio clips and use them to model authentic audio, said CEO David Przygoda. They are teaching children to tell the difference between a fake voice and a fake voice.
“Humans can be amused by one word and distraught by the next, but humans have great difficulty switching between these emotions,” Przygoda says.
In the FTC's video, the company says the tool is more than 99.9% accurate, but Pujigoda isn't ready for independent testing yet, and the contest judges have not yet tested it under real-world conditions. It states that it has not tested the submission below.
Three winners will split the $35,000 prize. Another company, Pindrop Security, was recognized for its efforts, but because of its size, it does not accept cash.
The company said in a statement that the FTC's award “recognizes and validates Pindrop's 10 years of work researching voice cloning and other deepfake issues.”
There are a number of existing commercial detection tools that rely on machine learning, but they can be unreliable depending on factors such as audio quality and media format, and detectors rely on new deepfake generators to catch the audio being generated. You need to train constantly.
Berisha noted the challenges faced by AI-generated text detectors and said this is a losing game in the long run. OpenAI discontinued its detector a few months after launching it, citing poor results.
“I think we're heading in exactly the same direction with audio. So even if AI-based detectors work now, they're going to be less effective in six months, and they're going to be less effective in a year, two years. I think they're going to decline, and at some point they're going to stop working completely.”
This led him to develop a process to authenticate the human voice as words are spoken.
Earlier this year, the Federal Communications Commission banned AI-powered robocalls. His synthetic voice company, Eleven Labs, has the tools to detect its own products, and he's calling on other synthetic voice makers to do the same. The European Union recently passed the AI Act, and U.S. state legislatures are also looking to pass laws to regulate deepfakes. A tool that states it is not the only solution.