Krapivin

Dataset Information
Introduced
2008
License
Unknown
Homepage

Overview

A dataset for benchmarking keyphrase extraction and generation techniques from long document English scientific papers. The dataset has high quality and consists of 2,000 scientific papers from the Computer Science domain published by ACM. Each paper has its keyphrases assigned by the authors and verified by the reviewers. Different parts of papers, such as title and abstract, are separated, enabling extraction based on the part of an article's text. The content of each paper is converted from PDF to plain text. The pieces of formulae, tables, figures and LaTeX mark up were removed automatically. Link: https://huggingface.co/datasets/midas/krapivin

Variants: Krapivin

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Keyphrase Extraction Attention-Seeker Attention-Seeker: Dynamic Self-Attention Scoring for … 2024-09-17
Keyphrase Extraction PromptRank PromptRank: Unsupervised Keyphrase Extraction Using … 2023-05-08

Research Papers

Recent papers with results on this dataset: