Inside the rapidly escalating war between deepfakes and deepfake detectors

Imagine a twisty-turny movie about a master criminal locked in a war of wits with the world’s greatest detective.

The deepfake problem

This is, in essence, the story of deepfakes and deepfake detection thus far. Deepfakes, a form of synthetic media in which people’s likenesses can be digitally altered like a Face/Off remake directed by A.I. researchers, have been a cause for concern since they sprung onto the scene in 2017. While many deepfakes are lighthearted (swapping out Arnie for Sly Stallone in The Terminator), they also pose a potential threat. Deepfakes have been used to create fake pornographic videos that appear real and they’ve been used in political hoaxes, as well as in financial fraud.

Lest such hoaxes become an even bigger problem, someone needs to be able to step in and say, definitively, when a deepfake is being used and when it isn’t.

“Deepfake detectors work by looking for those details of a deepfake that aren’t quite right by scouring images for not just uncanny valleys, but the tiniest uncanny pothole.”

It didn’t take long for the first deepfake detectors to appear. By April 2018, I covered one of the earlier efforts to do this, which was built by researchers at Germany’s Technical University of Munich. Just like deepfake technology itself, it used A.I. — only this time its creators were utilizing it not to create fakes, but to spot them.

Image used with permission by copyright holder

Deepfake detectors work by looking for those details of a deepfake that aren’t quite right by scouring images for not just uncanny valleys, but the tiniest uncanny pothole. They crop face data from images and then pass it through a neural network to figure out its legitimacy. Giveaway details might include things like badly reproduced eye blinking.

But now researchers from the University of California San Diego have come up with a way of defeating deepfake detectors by inserting what are called adversarial examples into video frames. Adversarial examples are a fascinating — yet terrifying — glitch in the A.I. Matrix. They’re capable of fooling even the smartest of recognition systems into, for example, thinking a turtle is a gun, or an espresso is a baseball. They do this by subtly adding noise into an image so that it causes the neural network to make the wrong classification.

Like mistaking a rifle for a shelled reptile. Or a faked video for a real one.

Fooling the detectors

“There has been a recent surge in methods for generating realistic deepfake videos,” Paarth Neekhara, a UC San Diego computer engineering grad student, told Digital Trends. “Since these manipulated videos can be used for malicious purposes, there has been a significant effort in developing detectors that can reliably detect deepfake videos. For example, Facebook recently launched the Deepfake Detection Challenge to accelerate the research on developing deepfake detectors. [But] while these detection methods can achieve more than 90% accuracy on a dataset of fake and real videos, our work shows that they can be easily bypassed by an attacker. An attacker can inject a carefully crafted noise, that is fairly imperceptible to the human eye, into each frame of a video so that it gets misclassified by a victim detector.”

Facebook Deepfake Challenge — Image used with permission by copyright holder

Attackers can craft these videos even if they don’t possess specific knowledge of the detector’s architecture and parameters. These attacks also still work after videos are compressed, as they would be if they were shared online on a platform like YouTube.

When tested, the method was more than 99% capable of fooling detection systems when given access to the detector model. However, even at its lowest success levels — for compressed videos in which no information was known about the detector models — it still defeated them 78.33% of the time. That’s not great news.

The researchers are declining to publish their code on the basis that it could be misused, Neekhara noted. “The adversarial videos generated using our code can potentially bypass other unseen deepfake detectors that are being used in production by some social media [platforms,]” he explained. “We are collaborating with teams that are working on building these deepfake detection systems, and are using our research to build more robust detection systems.”

A game of deepfake cat and mouse

This isn’t the end of the story, of course. To return to our movie analogy, this would still be only around 20 minutes into the film. We haven’t gotten to the scene yet where the detective realizes that the thief thinks he’s got her fooled. Or to the bit where the thief realizes that the detective knows that he knows that she knows. Or .. you get the picture.

Such a cat-and-mouse game for deepfake detection, which is likely to continue indefinitely, is well-known to anyone who has worked in cybersecurity. Malicious hackers find vulnerabilities, which are then blocked by developers, before hackers find vulnerabilities in their fixed version, which then gets tweaked by the devs yet again. Continue ad infinitum.

“Yes, the deepfake generation and detection systems closely follow the virus and antivirus dynamics,” Shehzeen Hussain, a UC San Diego computer engineering Ph.D. student, told Digital Trends. “Currently, deepfake detectors are trained on a dataset of real and fake videos generated using existing deepfake synthesis techniques. There is no guarantee that such detectors will be foolproof against future deepfake generation systems … To stay ahead in the arms race, detection methods need to be regularly updated and trained on upcoming deepfake synthesis techniques. [They] also need to be made robust to adversarial examples by incorporating adversarial videos during training.”

A paper describing this work, titled “Adversarial Deepfakes: Evaluating Vulnerability of Deepfake Detectors to Adversarial Examples,” was recently presented at the WACV 2021 virtual conference.