Species-800

Dataset Information
Modalities
Texts
License
Unknown
Homepage

Overview

Species-800 is a corpus for species entities, which is based on manually annotated abstracts. It comprises 800 PubMed abstracts that contain identified organism mentions. To increase the corpus taxonomic mention diversity the 800 abstracts were collected by selecting 100 abstracts from the following 8 categories: bacteriology, botany, entomology, medicine, mycology, protistology, virology and zoology. 800 has been annotated with a focus at the species level; however, higher taxa mentions (such as genera, families and orders) have also been considered.

Variants: Species-800

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Named Entity Recognition (NER) SciFive-Base SciFive: a text-to-text transformer model … 2021-05-28
Named Entity Recognition (NER) Spark NLP Biomedical Named Entity Recognition at … 2020-11-12
Named Entity Recognition (NER) BioBERT BioBERT: a pre-trained biomedical language … 2019-01-25

Research Papers

Recent papers with results on this dataset: