RoFT

Real or Fake Text

Dataset Information
Modalities
Texts
Languages
English
Introduced
2022
License
MIT
Homepage

Overview

RoFT is a dataset of 21,000 human annotations of generated text. The task is "Boundary detection" i.e. given a passage that starts off as human written, determine when the text transitions to being machine generated. The dataset also includes error annotations using the taxonomy introduced in the paper. The data can be used to train automatic detection systems, train automatic error correction, analyze visibility of model errors, and compare performance across models. Data was collected using http://roft.io.

Models: GPT2, GPT2-XL, CTRL, GPT3 "Davinci"

Genres: News, Stories, Recipes, Speeches

Variants: RoFT

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Boundary Detection GigaCheck (DN-DAB-DETR) GigaCheck: Detecting LLM-generated Content 2024-10-31
Boundary Detection RoBERTa + SEP AI-generated text boundary detection with … 2023-11-14
Boundary Detection PHD + TS ML AI-generated text boundary detection with … 2023-11-14
Boundary Detection TLE + TS Binary AI-generated text boundary detection with … 2023-11-14

Research Papers

Recent papers with results on this dataset: