GUE

Genome Understanding Evaluation

Dataset Information
Modalities
Texts, Medical
Introduced
2023
License
Unknown
Homepage

Overview

A collection of $28$ datasets across $7$ tasks constructed for genome language model evaluation. Contains seven tasks: promoter prediction. core promoter prediction, splice site prediction, covid variant classification, epigenetic marks prediction, and transcription factor binding sites prediction on human and mouse.

Variants: GUE

Associated Benchmarks

This dataset is used in 5 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Promoter Detection DNABERT-2-117M DNABERT-2: Efficient Foundation Model and … 2023-06-26
Core Promoter Detection DNABERT-2-117M DNABERT-2: Efficient Foundation Model and … 2023-06-26
Splice Site Prediction DNABERT-2-117M DNABERT-2: Efficient Foundation Model and … 2023-06-26
Covid Variant Prediction DNABERT-2-117M DNABERT-2: Efficient Foundation Model and … 2023-06-26
Epigenetic Marks Prediction DNABERT-2-117M DNABERT-2: Efficient Foundation Model and … 2023-06-26

Research Papers

Recent papers with results on this dataset: