Reuters-21578

Dataset Information
Modalities
Graphs
Homepage

Overview

The Reuters-21578 dataset is a collection of documents with news articles. The original corpus has 10,369 documents and a vocabulary of 29,930 words.

Source: Topic Model Based Multi-Label Classification from the Crowd

Variants: Reuters-21578, reuters21578

Associated Benchmarks

This dataset is used in 3 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Multi-Label Text Classification TIACBM Task-Informed Anti-Curriculum by Masking Improves … 2025-02-18
Multi-Label Text Classification CB-NTR Balancing Methods for Multi-label Text … 2021-09-10
Multi-Label Text Classification NTR-FL Balancing Methods for Multi-label Text … 2021-09-10
Multi-Label Text Classification DB Balancing Methods for Multi-label Text … 2021-09-10
Document Classification Orthogonalized Soft VSM Text classification with word embedding … 2020-03-10
Document Classification REL-RWMD k-NN Speeding up Word Mover's Distance … 2019-12-01
Document Classification SCDV-MS Improving Document Classification with Multi-Sense … 2019-11-18
Document Classification KD-LSTMreg DocBERT: BERT for Document Classification 2019-04-17
Document Classification ApproxRepSet Rep the Set: Neural Networks … 2019-04-03
Unsupervised Anomaly Detection RSRAE Robust Subspace Recovery Layer for … 2019-03-30
Multi-Label Text Classification VLAWE Vector of Locally-Aggregated Word Embeddings … 2019-02-23
Document Classification VLAWE Vector of Locally-Aggregated Word Embeddings … 2019-02-23

Research Papers

Recent papers with results on this dataset: