RCV1

Reuters Corpus Volume 1

Dataset Information
Modalities
Texts
Introduced
2004
License
Homepage

Overview

The RCV1 dataset is a benchmark dataset on text categorization. It is a collection of newswire articles producd by Reuters in 1996-1997. It contains 804,414 manually labeled newswire documents, and categorized with respect to three controlled vocabularies: industries, topics and regions.

Source: Random Projections for Linear Support Vector Machines
Image Source: https://www.nasdaq.com/publishers/reuters

Variants: RCV1-v2, Reuters RCV1/RCV2 English-to-German, Reuters RCV1/RCV2 German-to-English, RCV1

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Multi-Label Text Classification HiddeN Joint Learning of Hyperbolic Label … 2021-01-13
Text Classification HiLAP (bow-CNN) Hierarchical Text Classification with Reinforced … 2019-08-27
Text Classification NLP-Cap Towards Scalable and Reliable Capsule … 2019-06-06
Text Classification oh-CNN + two LSTM tv-embed. Supervised and Semi-Supervised Text Categorization … 2016-02-07

Research Papers

Recent papers with results on this dataset: