Computer scientists with the University of Waterloo in Ontario, Canada, say they’ve developed a way to remove watermarks embedded in AI-generated images.
To support that claim, they’ve released a software tool called UnMarker. It can run offline, and can remove an image watermark in only a few minutes using a 40 GB Nvidia A100 GPU.
Digital image watermarking, the process of altering data in an image file to declare image provenance, has been proposed as a way to help people spot “deepfakes” or AI-generated images and videos.
Back in mid-2023, before AI safety commitments had fallen out of fashion in the US, Amazon, Google, and OpenAI all talked up watermarking as a way to safeguard against harmful AI-generated imagery. Google has devised a system called SynthID for this purpose. Meta has its own system called Stable Signature.
But according to Andre Kassis, a PhD candidate in computer science at the University of Waterloo, and Urs Hengartner, associate professor of computer science, digital watermarks of this sort can be erased, regardless of how they’re encoded.
They describe their work in a paper titled “UnMarker: A Universal Attack on Defensive Image Watermarking,” which appeared in the proceedings of the 46th IEEE Symposium on Security and Privacy in May.
Kassis in a phone interview with The Register cited the flood of AI content and its harmful impact in terms of scams, fraud, and non-consensual exploitative imagery for his interest in this research.
“It’s no secret that we’re surrounded by AI wherever we go,” he said. “And although it has many benefits, it also has a dark side unfortunately.”
Watermarking, he said, is a defense that has been proposed and supported with millions of dollars of investment. “So I think it is essential that we stop for a minute and ask ourselves, ‘Is it worth the hype? Can it really protect us or are we still vulnerable?'” he said.
UnMarker, according to Kassis and Hengartner, is the first watermark removal attack that works against all watermarking schemes, whether semantic (content-altering) or non-semantic (content-preserving). It doesn’t require access to the watermark mechanism’s parameters or internal details, extra data, or feedback from a watermark detector.
Their key insight, the researchers explain in their paper, is that a universal carrier has to be used by any given marking scheme to embed a watermark in an image file and it has to operate on the spectral amplitudes of the pixels in the image.
Carrier, Kassis explained, is an abstract term that refers to the set of attributes a watermark can influence. He likened it to the space allotted to the address on a postal envelope.
“If you mess the address up, then the mailman won’t be able to go and deliver the mail,” he explained. “So this is the same idea. That’s exactly how Unmarker does it. We don’t need to know what the actual content of the watermark is. All we need to know is where it resides and then we basically distort that channel.”
The UnMarker code looks for spectral variations in images in order to alter the frequency without creating visual artifacts. The altered images look the same but no longer get recognized by watermark detection mechanisms most of the time. Consequently, systems set up to block or flag AI-generated content via watermark just won’t work reliably.
Kassis and Hengartner tested various digital watermarking schemes, specifically Yu1, Yu2, HiDDeN, PTW, Stable Signature, StegaStamp, and TRW. When images watermarked with these techniques were processed by UnMarker, the best watermark detection rate only reached 43 percent. And anything below 50 percent, they argue, is essentially worthless.
Kassis said that when these tests were conducted, Google’s SynthID was not available through a public API and could not be evaluated. But he said he had the opportunity to test SynthID later and UnMarker managed to drop its watermark detection rate from 100 percent to around 21 percent.
“So the attack is also extremely effective against this commercial system as well,” he told us.
Other researchers have come to similar conclusions about the fragility of digital watermarks. Back in 2023, academics affiliated with the University of Maryland argued that image watermarking techniques would not work. More recently, in February this year, boffins affiliated with Google DeepMind and the University of Wisconsin-Madison concluded that “no existing [image-provenance] scheme combines robustness, unforgeability, and public-detectability.”
The DeepMind research also covers C2PA (Coalition for Content Provenance and Authenticity), a form of watermarking that involves adding digital signatures to image metadata rather than manipulating image pixel data; the Waterloo research does not specifically address C2PA, though the DeepMind paper deems it less robust than other watermarking methods.
Despite the doubts voiced by Waterloo researchers about the viability of digital watermarking to address AI image concerns, there’s a thriving industry promoting the technology.
“It has become a huge industry,” Kassis said. “And like once you let the genie out of the bottle, it’s hard to put it back. The White House last year secured commitments from seven major tech players to invest and develop these watermarking technologies. Then there’s attention from legislators and stuff like that. So it’s kind of hard to right now just stop everything and take a step back and start from scratch.”
Kassis said the key message is that security should come first.
“We always rush to develop these tools and our excitement overshadows the security aspects,” he said. “We only think about it in hindsight and that’s why we’re always surprised when we find out how malicious attackers can actually misuse these systems.” ®