Genome Understanding Evaluation
A collection of $28$ datasets across $7$ tasks constructed for genome language model evaluation. Contains seven tasks: promoter prediction. core promoter prediction, splice site prediction, covid variant classification, epigenetic marks prediction, and transcription factor binding sites prediction on human and mouse.
Variants: GUE
This dataset is used in 5 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Promoter Detection | DNABERT-2-117M | DNABERT-2: Efficient Foundation Model and … | 2023-06-26 |
Core Promoter Detection | DNABERT-2-117M | DNABERT-2: Efficient Foundation Model and … | 2023-06-26 |
Splice Site Prediction | DNABERT-2-117M | DNABERT-2: Efficient Foundation Model and … | 2023-06-26 |
Covid Variant Prediction | DNABERT-2-117M | DNABERT-2: Efficient Foundation Model and … | 2023-06-26 |
Epigenetic Marks Prediction | DNABERT-2-117M | DNABERT-2: Efficient Foundation Model and … | 2023-06-26 |
Recent papers with results on this dataset: