HOC

Hallmarks of Cancer

Dataset Information
Modalities
Texts
Languages
English
Introduced
2015
License
Unknown
Homepage

Overview

The Hallmarks of Cancer (*HOC) corpus consists of 1852 PubMed publication abstracts manually annotated by experts according to the Hallmarks of Cancer taxonomy. The taxonomy consists of 37 classes in a hierarchy. Zero or more class labels are assigned to each sentence in the corpus.

Source: Hallmarks of Cancer Corpus

Image source: Hallmarks of Cancer Corpus

Variants: HOC

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Document Classification BioGPT BioGPT: Generative Pre-trained Transformer for … 2022-10-19
Document Classification BioLinkBERT (large) LinkBERT: Pretraining Language Models with … 2022-03-29
Document Classification SciFive-large SciFive: a text-to-text transformer model … 2021-05-28
Document Classification PubMedBERT uncased Domain-Specific Language Model Pretraining for … 2020-07-31
Document Classification NCBI_BERT(large) (P) Transfer Learning in Biomedical Natural … 2019-06-13

Research Papers

Recent papers with results on this dataset: