GENIA

Dataset Information
Modalities
Texts, Medical
Languages
English
Introduced
2003
License
Unknown
Homepage

Overview

The GENIA corpus is the primary collection of biomedical literature compiled and annotated within the scope of the GENIA project. The corpus was created to support the development and evaluation of information extraction and text mining systems for the domain of molecular biology.

The corpus contains 1,999 Medline abstracts, selected using a PubMed query for the three MeSH terms “human”, “blood cells”, and “transcription factors”. The corpus has been annotated with various levels of linguistic and semantic information.

The primary categories of annotation in the GENIA corpus and the corresponding subcorpora are:

  • Part-of-Speech annotation
  • Constituency (phrase structure) syntactic annotation
  • Term annotation
  • Event annotation
  • Relation annotation
  • Coreference annotation

Source: http://www.geniaproject.org/genia-corpus
Image Source: http://www.geniaproject.org/genia-corpus

Variants: GENIA, GENIA - LAS, GENIA - UAS, GENIA 2013

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Named Entity Recognition (NER) UniNER-7B UniversalNER: Targeted Distillation from Large … 2023-08-07
Named Entity Recognition (NER) DeepStruct multi-task w/ finetune DeepStruct: Pretraining of Language Models … 2022-05-21
Named Entity Recognition (NER) DeepStruct multi-task DeepStruct: Pretraining of Language Models … 2022-05-21
Named Entity Recognition (NER) Deepstruct zero-shot DeepStruct: Pretraining of Language Models … 2022-05-21
Event Extraction GEANet-SciBERT Biomedical Event Extraction with Hierarchical … 2020-09-20
Named Entity Recognition (NER) Biaffine-NER Named Entity Recognition as Dependency … 2020-05-14
Named Entity Recognition (NER) BiFlaG Bipartite Flat-Graph Network for Nested … 2020-05-01
Named Entity Recognition (NER) Second-best learning and decoding + BERT + Flair Nested Named Entity Recognition via … 2019-09-05
Named Entity Recognition (NER) Second-best learning and decoding Nested Named Entity Recognition via … 2019-09-05
Named Entity Recognition (NER) seq2seq+BERT+Flair Neural Architectures for Nested NER … 2019-08-19
Named Entity Recognition (NER) Anchor-Region Networks Sequence-to-Nuggets: Nested Entity Mention Detection … 2019-06-10
Named Entity Recognition (NER) Neural transition-based model A Neural Transition-based Model for … 2018-10-03
Named Entity Recognition (NER) Neural segmental hypergraphs Neural Segmental Hypergraphs for Overlapping … 2018-10-03

Research Papers

Recent papers with results on this dataset: