NUS

Dataset Information
Introduced
2007
License
Unknown
Homepage

Overview

The dataset was constructed by first finding suitable publications and then collecting keyphrases from manual annotators. Google SOAP API was used to find documents using variants of the query “keywords general terms filetype:pdf”. Over 250 of these PDF documents were downloaded for further processing. Documents were then manually restricted to scientific conference papers, with a length range of 4-12 pages. The PDF documents were then converted to plain text using the PDF995 software suite (as it handled two-columned text better than other programs tried). At the end of this process, 211 documents in plain text format were selected which were converted successfully without problems. The authors then recruited student volunteers from our department to participate in manual keyphrase assignments. Each volunteer was given three PDF files (with author-assigned keyphrases hidden) to assign keyphrases to.

Variants: NUS

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Keyphrase Extraction PromptRank PromptRank: Unsupervised Keyphrase Extraction Using … 2023-05-08

Research Papers

Recent papers with results on this dataset: