KP20k

Dataset Information
Modalities
Texts
Languages
English
Introduced
2017
License
Unknown
Homepage

Overview

KP20k is a large-scale scholarly articles dataset with 528K articles for training, 20K articles for validation and 20K articles for testing.

Source: Keyphrase Prediction With Pre-trained Language Model
Image Source: https://arxiv.org/pdf/1704.06879.pdf

Variants: KP20k

Associated Benchmarks

This dataset is used in 3 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Phrase Ranking Wiki+RoBERTa UCPhrase: Unsupervised Context-aware Quality Phrase … 2021-05-28
Phrase Ranking UCPhrase UCPhrase: Unsupervised Context-aware Quality Phrase … 2021-05-28
Phrase Ranking TopMine UCPhrase: Unsupervised Context-aware Quality Phrase … 2021-05-28
Keyphrase Extraction Wiki+RoBERTa UCPhrase: Unsupervised Context-aware Quality Phrase … 2021-05-28
Keyphrase Extraction UCPhrase UCPhrase: Unsupervised Context-aware Quality Phrase … 2021-05-28
Keyphrase Extraction AutoPhrase UCPhrase: Unsupervised Context-aware Quality Phrase … 2021-05-28
Keyphrase Extraction Spacy UCPhrase: Unsupervised Context-aware Quality Phrase … 2021-05-28
Keyphrase Extraction PKE UCPhrase: Unsupervised Context-aware Quality Phrase … 2021-05-28
Keyphrase Extraction TopMine UCPhrase: Unsupervised Context-aware Quality Phrase … 2021-05-28
Keyphrase Extraction StanfordNLP UCPhrase: Unsupervised Context-aware Quality Phrase … 2021-05-28
Phrase Tagging UCPhrase UCPhrase: Unsupervised Context-aware Quality Phrase … 2021-05-28
Phrase Tagging Wiki+RoBERTa UCPhrase: Unsupervised Context-aware Quality Phrase … 2021-05-28
Phrase Tagging AutoPhrase UCPhrase: Unsupervised Context-aware Quality Phrase … 2021-05-28
Phrase Tagging TopMine UCPhrase: Unsupervised Context-aware Quality Phrase … 2021-05-28

Research Papers

Recent papers with results on this dataset: