RoFT

Real or Fake Text

Dataset Information

Modalities

Texts

Languages

English

Introduced

2022

License

MIT

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

RoFT is a dataset of 21,000 human annotations of generated text. The task is "Boundary detection" i.e. given a passage that starts off as human written, determine when the text transitions to being machine generated. The dataset also includes error annotations using the taxonomy introduced in the paper. The data can be used to train automatic detection systems, train automatic error correction, analyze visibility of model errors, and compare performance across models. Data was collected using http://roft.io.

Models: GPT2, GPT2-XL, CTRL, GPT3 "Davinci"

Genres: News, Stories, Recipes, Speeches

Variants: RoFT

Associated Benchmarks

This dataset is used in 1 benchmark:

Boundary Detection - Metrics: Accuracy (%), MSE

Recent Benchmark Submissions

Task	Model	Paper	Date
Boundary Detection	GigaCheck (DN-DAB-DETR)	GigaCheck: Detecting LLM-generated Content	2024-10-31
Boundary Detection	RoBERTa + SEP	AI-generated text boundary detection with …	2023-11-14
Boundary Detection	PHD + TS ML	AI-generated text boundary detection with …	2023-11-14
Boundary Detection	TLE + TS Binary	AI-generated text boundary detection with …	2023-11-14

Research Papers

Recent papers with results on this dataset:

GigaCheck: Detecting LLM-generated Content (2024) -
AI-generated text boundary detection with RoFT (2023) -

External Links:

RoFT

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview