KP20k

Name: KP20k
Published: 2017-01-01
License: Unknown

Dataset Information

Modalities

Texts

Languages

English

Introduced

2017

License

Unknown

Homepage

Official Website

Contents

Overview

KP20k is a large-scale scholarly articles dataset with 528K articles for training, 20K articles for validation and 20K articles for testing.

Variants: KP20k

This dataset is used in 3 benchmarks:

Task	Model	Paper	Date
Phrase Ranking	Wiki+RoBERTa	UCPhrase: Unsupervised Context-aware Quality Phrase …	2021-05-28
Phrase Ranking	UCPhrase	UCPhrase: Unsupervised Context-aware Quality Phrase …	2021-05-28
Phrase Ranking	TopMine	UCPhrase: Unsupervised Context-aware Quality Phrase …	2021-05-28
Keyphrase Extraction	Wiki+RoBERTa	UCPhrase: Unsupervised Context-aware Quality Phrase …	2021-05-28
Keyphrase Extraction	UCPhrase	UCPhrase: Unsupervised Context-aware Quality Phrase …	2021-05-28
Keyphrase Extraction	AutoPhrase	UCPhrase: Unsupervised Context-aware Quality Phrase …	2021-05-28
Keyphrase Extraction	Spacy	UCPhrase: Unsupervised Context-aware Quality Phrase …	2021-05-28
Keyphrase Extraction	PKE	UCPhrase: Unsupervised Context-aware Quality Phrase …	2021-05-28
Keyphrase Extraction	TopMine	UCPhrase: Unsupervised Context-aware Quality Phrase …	2021-05-28
Keyphrase Extraction	StanfordNLP	UCPhrase: Unsupervised Context-aware Quality Phrase …	2021-05-28
Phrase Tagging	UCPhrase	UCPhrase: Unsupervised Context-aware Quality Phrase …	2021-05-28
Phrase Tagging	Wiki+RoBERTa	UCPhrase: Unsupervised Context-aware Quality Phrase …	2021-05-28
Phrase Tagging	AutoPhrase	UCPhrase: Unsupervised Context-aware Quality Phrase …	2021-05-28
Phrase Tagging	TopMine	UCPhrase: Unsupervised Context-aware Quality Phrase …	2021-05-28

Recent papers with results on this dataset:

External Links: