← ML Research Wiki / 2301.11305

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D Manning, Chelsea Finn (2023)

Paper Information

arXiv ID

2301.11305

Venue

International Conference on Machine Learning

Domain

natural language processing

SOTA Claim

Yes

Code

Available

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

The increasing fluency and widespread usage of large language models (LLMs) highlight the desirability of corresponding tools aiding detection of LLM-generated text. In this paper, we identify a property of the structure of an LLM's probability function that is useful for such detection. Specifically, we demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM. This approach, which we call DetectGPT, does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text. It uses only log probabilities computed by the model of interest and random perturbations of the passage from another generic pre-trained language model (e.g., T5). We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection, notably improving detection of fake news articles generated by 20B parameter GPT-NeoX from 0.81 AUROC for the strongest zero-shot baseline to 0.95 AUROC for Detect-GPT. See ericmitchell.ai/detectgpt for code, data, and other project information.

Summary

This paper presents DetectGPT, a zero-shot method for detecting text generated by large language models (LLMs) based on the curvature of the model's log probability function. It leverages the observation that model-generated text tends to occupy regions of negative curvature in the probability landscape, making it detectable without the need for training a separate classifier or collecting labeled data. The authors argue that their method improves detection accuracy compared to existing zero-shot methods. They conduct empirical tests across multiple datasets, demonstrating that DetectGPT outperforms other methods in identifying generated text across a range of models and conditions, including the detection of misinformation. The paper also discusses the implications of model generation in fields like education and journalism.

Methods

This paper employs the following methods:

DetectGPT

Models Used

GPT-3
GPT-2
GPT-Neo
GPT-NeoX
T5

Datasets

The following datasets were used in this research:

XSum
SQuAD
Reddit WritingPrompts
WMT16
PubMedQA

Evaluation Metrics

AUROC

Results

0.95 AUROC for DetectGPT
Improved detection from 0.81 to 0.95 AUROC

Limitations

The authors identified the following limitations:

Not specified

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

machine-generated text detection zero-shot detection probability curvature log probability function

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 46
Influential Citations: 124

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers