← ML Research Wiki / 2301.11305

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D Manning, Chelsea Finn (2023)

Paper Information
arXiv ID
Venue
International Conference on Machine Learning
Domain
natural language processing
SOTA Claim
Yes
Code
Reproducibility
8/10

Abstract

The increasing fluency and widespread usage of large language models (LLMs) highlight the desirability of corresponding tools aiding detection of LLM-generated text. In this paper, we identify a property of the structure of an LLM's probability function that is useful for such detection. Specifically, we demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM. This approach, which we call DetectGPT, does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text. It uses only log probabilities computed by the model of interest and random perturbations of the passage from another generic pre-trained language model (e.g., T5). We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection, notably improving detection of fake news articles generated by 20B parameter GPT-NeoX from 0.81 AUROC for the strongest zero-shot baseline to 0.95 AUROC for Detect-GPT. See ericmitchell.ai/detectgpt for code, data, and other project information.

Summary

This paper presents DetectGPT, a zero-shot method for detecting text generated by large language models (LLMs) based on the curvature of the model's log probability function. It leverages the observation that model-generated text tends to occupy regions of negative curvature in the probability landscape, making it detectable without the need for training a separate classifier or collecting labeled data. The authors argue that their method improves detection accuracy compared to existing zero-shot methods. They conduct empirical tests across multiple datasets, demonstrating that DetectGPT outperforms other methods in identifying generated text across a range of models and conditions, including the detection of misinformation. The paper also discusses the implications of model generation in fields like education and journalism.

Methods

This paper employs the following methods:

  • DetectGPT

Models Used

  • GPT-3
  • GPT-2
  • GPT-Neo
  • GPT-NeoX
  • T5

Datasets

The following datasets were used in this research:

  • XSum
  • SQuAD
  • Reddit WritingPrompts
  • WMT16
  • PubMedQA

Evaluation Metrics

  • AUROC

Results

  • 0.95 AUROC for DetectGPT
  • Improved detection from 0.81 to 0.95 AUROC

Limitations

The authors identified the following limitations:

  • Not specified

Technical Requirements

  • Number of GPUs: None specified
  • GPU Type: None specified

Keywords

machine-generated text detection zero-shot detection probability curvature log probability function

Papers Using Similar Methods

External Resources