UniProtQA

Dataset Information
Languages
English
Introduced
2023
License
MIT
Homepage

Overview

UniProtQA consists of proteins and textual queries about their functions and properties. The dataset is constructed from UniProt, and consists 4 types of questions with regard to functions, official names, protein families, and sub-cellular locations. We collect a total of 569, 516 proteins and 1, 891, 506 question-answering samples.

Variants: UniProtQA

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Question Answering BioMedGPT-10B BioMedGPT: Open Multimodal Generative Pre-trained … 2023-08-18
Question Answering Llama2-7B-chat Llama 2: Open Foundation and … 2023-07-18

Research Papers

Recent papers with results on this dataset: