OpenAI, the leading artificial intelligence research laboratory, is taking steps to prevent users from misusing the content generated by their AI models. With the increasing popularity of their GPT-3 and ChatGPT models, it has become evident that some users are using them for unethical purposes, such as cheating on homework, creating fake news, and making social media bots. To tackle this issue, OpenAI has decided to implement a watermarking system to detect and track the misuse of AI-generated content.
How Effective is OpenAI’s Watermarking System?
According to Tom Goldstein, Perotto Associate Professor in the Department of Computer Science at the University of Maryland, OpenAI’s watermarking system can detect AI-generated content with an accuracy rate of 99.999999999994%. This high level of accuracy is possible because the watermarking system adds a unique signature to the content generated by GPT-3 and ChatGPT.
#OpenAI is planning to stop #ChatGPT users from making social media bots and cheating on homework by “watermarking” outputs. How well could this really work? Here’s just 23 words from a 1.3B parameter watermarked LLM. We detected it with 99.999999999994% confidence. Here’s how 🧵 pic.twitter.com/pVC9M3qPyQ
— Tom Goldstein (@tomgoldsteincs) January 25, 2023
However, the effectiveness of the watermarking system decreases if the user makes significant changes to the content. Tom Goldstein explains that to remove the watermark by hand, the user needs to swap around 40% to 75% of the tokens in a long passage of text. This would result in a low-quality output and may not even fool the watermarking system.
More advanced attacks, such as a synonym attack or rewriting the text, are possible, but they would result in a decrease in quality, closer to that achievable by the attack model. Furthermore, if the user performs a synonym attack or hand-editing, they would need to ensure that they do not leave any long spans of unedited text, as even a span of 36 tokens (36 to 72 words) can trigger the watermarking system with high confidence.
The Futility of Tricking AI Detectors
The implementation of OpenAI’s watermarking system highlights the futility of attempting to trick AI detectors. With the increasing sophistication of AI models, it has become nearly impossible to fool them into believing that the content generated is human-written. Attempting to trick AI detectors would result in low-quality content that would be easily recognizable.
As AI continues to play a larger role in our daily lives, it’s crucial to ensure that it is used for the benefit of society, rather than for malicious purposes. OpenAI’s watermarking system is a testament to this effort, providing a layer of accountability and security for AI-generated content. The system’s ability to accurately detect and deter attempts at manipulation serves as a warning to those who seek to exploit AI for their own gain. As AI technology evolves, it is heartening to see companies like OpenAI taking proactive steps to ensure that it is used for the greater good. As the saying goes, ‘with great power comes great responsibility,‘ and OpenAI is demonstrating its commitment to this ideal.