ML Research Wiki / Benchmarks / Question Answering / BoolQ

BoolQ

Question Answering Benchmark

Performance Over Time

📊 Showing 65 results | 📏 Metric: Accuracy

Top Performing Models

Rank Model Paper Accuracy Date Code
1 Mistral-Nemo 12B (HPT) Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles 99.87 2024-06-18 📦 devichand579/HPT
2 ST-MoE-32B 269B (fine-tuned) ST-MoE: Designing Stable and Transferable Sparse Expert Models 92.40 2022-02-17 📦 tensorflow/mesh 📦 xuefuzhao/openmoe 📦 yikangshen/megablocks
3 PaLM 540B (fine-tuned) PaLM: Scaling Language Modeling with Pathways 92.20 2022-04-05 📦 lucidrains/CoCa-pytorch 📦 lucidrains/PaLM-pytorch 📦 google/paxml
4 Turing NLR v5 XXL 5.4B (fine-tuned) Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE 92.00 2022-12-04 -
5 T5-XXL 11B (fine-tuned) Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 91.20 2019-10-23 📦 huggingface/transformers 📦 PaddlePaddle/PaddleNLP 📦 google-research/text-to-text-transfer-transformer
6 PaLM 2-L (1-shot) PaLM 2 Technical Report 90.90 2023-05-17 📦 eternityyw/tram-benchmark
7 UL2 20B (fine-tuned) UL2: Unifying Language Learning Paradigms 90.80 2022-05-10 📦 google-research/google-research 📦 opennlg/openba-v2
8 Vega v2 6B (fine-tuned) Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE 90.50 2022-12-04 -
9 DeBERTa-1.5B DeBERTa: Decoding-enhanced BERT with Disentangled Attention 90.40 2020-06-05 📦 huggingface/transformers 📦 microsoft/DeBERTa 📦 osu-nlp-group/mind2web
10 PaLM 2-M (1-shot) PaLM 2 Technical Report 88.60 2023-05-17 📦 eternityyw/tram-benchmark

All Papers (65)

Language Models are Few-Shot Learners

2020
GPT-3 175B (few-shot, k=32)