Reuters Corpus Volume 1
The RCV1 dataset is a benchmark dataset on text categorization. It is a collection of newswire articles producd by Reuters in 1996-1997. It contains 804,414 manually labeled newswire documents, and categorized with respect to three controlled vocabularies: industries, topics and regions.
Source: Random Projections for Linear Support Vector Machines
Image Source: https://www.nasdaq.com/publishers/reuters
Variants: RCV1-v2, Reuters RCV1/RCV2 English-to-German, Reuters RCV1/RCV2 German-to-English, RCV1
This dataset is used in 2 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Multi-Label Text Classification | HiddeN | Joint Learning of Hyperbolic Label … | 2021-01-13 |
Text Classification | HiLAP (bow-CNN) | Hierarchical Text Classification with Reinforced … | 2019-08-27 |
Text Classification | NLP-Cap | Towards Scalable and Reliable Capsule … | 2019-06-06 |
Text Classification | oh-CNN + two LSTM tv-embed. | Supervised and Semi-Supervised Text Categorization … | 2016-02-07 |
Recent papers with results on this dataset: