UniProtQA consists of proteins and textual queries about their functions and properties. The dataset is constructed from UniProt, and consists 4 types of questions with regard to functions, official names, protein families, and sub-cellular locations. We collect a total of 569, 516 proteins and 1, 891, 506 question-answering samples.
Variants: UniProtQA
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Question Answering | BioMedGPT-10B | BioMedGPT: Open Multimodal Generative Pre-trained … | 2023-08-18 |
Question Answering | Llama2-7B-chat | Llama 2: Open Foundation and … | 2023-07-18 |
Recent papers with results on this dataset: