Reuters-21578

Name: Reuters-21578
License: Custom (research-only, attribution)

Dataset Information

Modalities

Graphs

License

Custom (research-only, attribution)

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

The Reuters-21578 dataset is a collection of documents with news articles. The original corpus has 10,369 documents and a vocabulary of 29,930 words.

Source: Topic Model Based Multi-Label Classification from the Crowd

Variants: Reuters-21578, reuters21578

Associated Benchmarks

This dataset is used in 3 benchmarks:

Multi-Label Text Classification - Metrics: Micro-F1
Document Classification - Metrics: Accuracy, F1
Unsupervised Anomaly Detection - Metrics: AUC (outlier ratio = 0.5)

Recent Benchmark Submissions

Task	Model	Paper	Date
Multi-Label Text Classification	TIACBM	Task-Informed Anti-Curriculum by Masking Improves …	2025-02-18
Multi-Label Text Classification	CB-NTR	Balancing Methods for Multi-label Text …	2021-09-10
Multi-Label Text Classification	NTR-FL	Balancing Methods for Multi-label Text …	2021-09-10
Multi-Label Text Classification	DB	Balancing Methods for Multi-label Text …	2021-09-10
Document Classification	Orthogonalized Soft VSM	Text classification with word embedding …	2020-03-10
Document Classification	REL-RWMD k-NN	Speeding up Word Mover's Distance …	2019-12-01
Document Classification	SCDV-MS	Improving Document Classification with Multi-Sense …	2019-11-18
Document Classification	KD-LSTMreg	DocBERT: BERT for Document Classification	2019-04-17
Document Classification	ApproxRepSet	Rep the Set: Neural Networks …	2019-04-03
Unsupervised Anomaly Detection	RSRAE	Robust Subspace Recovery Layer for …	2019-03-30
Multi-Label Text Classification	VLAWE	Vector of Locally-Aggregated Word Embeddings …	2019-02-23
Document Classification	VLAWE	Vector of Locally-Aggregated Word Embeddings …	2019-02-23

Research Papers

Recent papers with results on this dataset:

External Links:

Reuters-21578

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview