How to identify text generated by AI large language model- The latest international watermarking tool developed
On October 24th, the internationally renowned academic journal *Nature* published a groundbreaking research paper on artificial intelligence (AI) that has caught the attention of experts in the field. According to the study, researchers have developed a tool capable of adding watermarks to the text generated by AI large language models (LLMs), potentially enhancing our ability to identify and trace synthetic content.
The paper explains that LLMs are widely used AI tools that can generate text for various purposes, including chatbots and writing assistance. However, identifying the source of AI-generated text remains a challenge, raising concerns about the reliability of information. Watermarks are seen as a potential solution to this problem, but strict requirements for quality and computational efficiency have hindered their widespread application.
In this latest study, the DeepMind team from the notable AI company Google introduced a system called SynthID-Text, which employs a novel sampling algorithm to embed watermarks into AI-generated text. This tool cleverly adjusts the vocabulary selection of the LLM, inserting a signature that can be detected by relevant software. The watermark can be integrated via a “distortion” pathway, which enhances quality at the slight expense of output fidelity, or through a “non-distorted” pathway that preserves text quality.
The authors of the paper evaluated the detectability of these watermarks across various publicly available models and found that SynthID-Text outperforms existing methods in terms of detection capabilities. They also assessed the quality of text generated during nearly 20 million online conversations with the Gemini LLM and determined that the non-distorted watermark significantly does not compromise text quality. Furthermore, the impact of using SynthID-Text on the computational resources required by LLMs is minimal, reducing barriers to its implementation.
In conclusion, the authors noted that while it is possible to avoid watermarking by editing or rewriting the output, their research demonstrates the feasibility of a watermarking tool for AI-generated content. This innovation holds promise for enhancing accountability and transparency in the use of LLMs.