Species-800 is a corpus for species entities, which is based on manually annotated abstracts. It comprises 800 PubMed abstracts that contain identified organism mentions. To increase the corpus taxonomic mention diversity the 800 abstracts were collected by selecting 100 abstracts from the following 8 categories: bacteriology, botany, entomology, medicine, mycology, protistology, virology and zoology. 800 has been annotated with a focus at the species level; however, higher taxa mentions (such as genera, families and orders) have also been considered.
Variants: Species-800
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Named Entity Recognition (NER) | SciFive-Base | SciFive: a text-to-text transformer model … | 2021-05-28 |
Named Entity Recognition (NER) | Spark NLP | Biomedical Named Entity Recognition at … | 2020-11-12 |
Named Entity Recognition (NER) | BioBERT | BioBERT: a pre-trained biomedical language … | 2019-01-25 |
Recent papers with results on this dataset: