RCV1

Name: RCV1
Published: 2004-01-01
License: Custom

Reuters Corpus Volume 1

Dataset Information

Modalities

Texts

Introduced

2004

License

Custom

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

The RCV1 dataset is a benchmark dataset on text categorization. It is a collection of newswire articles producd by Reuters in 1996-1997. It contains 804,414 manually labeled newswire documents, and categorized with respect to three controlled vocabularies: industries, topics and regions.

Source: Random Projections for Linear Support Vector Machines
Image Source: https://www.nasdaq.com/publishers/reuters

Variants: RCV1-v2, Reuters RCV1/RCV2 English-to-German, Reuters RCV1/RCV2 German-to-English, RCV1

Associated Benchmarks

This dataset is used in 2 benchmarks:

Multi-Label Text Classification - Metrics: Macro-F1, Micro-F1
Text Classification - Metrics: Accuracy, Macro F1, Micro F1, P@1, P@3, P@5, nDCG@1, nDCG@3, nDCG@5

Recent Benchmark Submissions

Task	Model	Paper	Date
Multi-Label Text Classification	HiddeN	Joint Learning of Hyperbolic Label …	2021-01-13
Text Classification	HiLAP (bow-CNN)	Hierarchical Text Classification with Reinforced …	2019-08-27
Text Classification	NLP-Cap	Towards Scalable and Reliable Capsule …	2019-06-06
Text Classification	oh-CNN + two LSTM tv-embed.	Supervised and Semi-Supervised Text Categorization …	2016-02-07

Research Papers

Recent papers with results on this dataset:

External Links:

RCV1

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview