ML Research Wiki / Benchmarks / Question Answering / MultiRC

MultiRC

Question Answering Benchmark

Performance Over Time

📊 Showing 30 results | 📏 Metric: F1

Top Performing Models

Rank	Model	Paper	F1	Date	Code
1	PaLM 540B (finetuned)	PaLM: Scaling Language Modeling with Pathways	90.10	2022-04-05	📦 lucidrains/CoCa-pytorch 📦 lucidrains/PaLM-pytorch 📦 google/paxml
2	ST-MoE-32B 269B (fine-tuned)	ST-MoE: Designing Stable and Transferable Sparse Expert Models	89.60	2022-02-17	📦 tensorflow/mesh 📦 xuefuzhao/openmoe 📦 yikangshen/megablocks
3	Turing NLR v5 XXL 5.4B (fine-tuned)	Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE	88.40	2022-12-04	-
4	DeBERTa-1.5B	DeBERTa: Decoding-enhanced BERT with Disentangled Attention	88.20	2020-06-05	📦 huggingface/transformers 📦 microsoft/DeBERTa 📦 osu-nlp-group/mind2web
5	Vega v2 6B (fine-tuned)	Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE	88.20	2022-12-04	-
6	PaLM 2-L (one-shot)	PaLM 2 Technical Report	88.20	2023-05-17	📦 eternityyw/tram-benchmark
7	T5-XXL 11B (fine-tuned)	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	88.10	2019-10-23	📦 huggingface/transformers 📦 PaddlePaddle/PaddleNLP 📦 google-research/text-to-text-transfer-transformer
8	ST-MoE-L 4.1B (fine-tuned)	ST-MoE: Designing Stable and Transferable Sparse Expert Models	86.00	2022-02-17	📦 tensorflow/mesh 📦 xuefuzhao/openmoe 📦 yikangshen/megablocks
9	PaLM 2-M (one-shot)	PaLM 2 Technical Report	84.10	2023-05-17	📦 eternityyw/tram-benchmark
10	PaLM 2-S (one-shot)	PaLM 2 Technical Report	84.00	2023-05-17	📦 eternityyw/tram-benchmark

All Papers (30)

PaLM: Scaling Language Modeling with Pathways

2022

PaLM 540B (finetuned)

lucidrains/CoCa-pytorch lucidrains/PaLM-pytorch

ST-MoE: Designing Stable and Transferable Sparse Expert Models

2022

ST-MoE-32B 269B (fine-tuned)

tensorflow/mesh xuefuzhao/openmoe yikangshen/megablocks

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

2022

Turing NLR v5 XXL 5.4B (fine-tuned)

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

2020

DeBERTa-1.5B

huggingface/transformers microsoft/DeBERTa

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

2022

Vega v2 6B (fine-tuned)

PaLM 2 Technical Report

2023

PaLM 2-L (one-shot)

eternityyw/tram-benchmark

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

2019

T5-XXL 11B (fine-tuned)

huggingface/transformers PaddlePaddle/PaddleNLP

ST-MoE: Designing Stable and Transferable Sparse Expert Models

2022

ST-MoE-L 4.1B (fine-tuned)

tensorflow/mesh xuefuzhao/openmoe yikangshen/megablocks

PaLM 2 Technical Report

2023

PaLM 2-M (one-shot)

eternityyw/tram-benchmark

PaLM 2 Technical Report

2023

PaLM 2-S (one-shot)

eternityyw/tram-benchmark

Finetuned Language Models Are Zero-Shot Learners

2021

FLAN 137B (prompt-tuned)

hiyouga/llama-efficient-tuning bigcode-project/starcoder

Finetuned Language Models Are Zero-Shot Learners

2021

FLAN 137B (zero-shot)

hiyouga/llama-efficient-tuning bigcode-project/starcoder

Language Models are Few-Shot Learners

2020

GPT-3 175B (Few-Shot)

ggml-org/llama.cpp ggerganov/llama.cpp

Finetuned Language Models Are Zero-Shot Learners

2021

FLAN 137B (1-shot)

hiyouga/llama-efficient-tuning bigcode-project/starcoder

KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs

2021

KELM (finetuning BERT-large based single model)

nlp-anonymous-happy/anonymous-kg-guided-nlp

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

2018

BERT-large(single model)

huggingface/transformers tensorflow/models

Ask Me Anything: A simple strategy for prompting language models

2022

Neo-6B (QA + WS)

hazyresearch/ama_prompting simran-arora/privacy_fm simran-arora/focus

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

2019

T5-11B

huggingface/transformers PaddlePaddle/PaddleNLP

BloombergGPT: A Large Language Model for Finance

2023

Bloomberg GPT 50B (1-shot)

yangletliu/finlora open-finance-lab/finlora

N-Grammer: Augmenting Transformers with latent n-grams

2022

N-Grammer 343M

tensorflow/lingvo yiyixuxu/n-grammer-flax

Ask Me Anything: A simple strategy for prompting language models

2022

Neo-6B (few-shot)

hazyresearch/ama_prompting simran-arora/privacy_fm simran-arora/focus

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

2022

Hybrid H3 355M (3-shot, logit scoring)

hazyresearch/safari hazyresearch/h3 lindermanlab/S5

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

2022

AlexaTM 20B

amazon-science/alexa-teacher-models

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

2022

Hybrid H3 355M (0-shot, logit scoring)

hazyresearch/safari hazyresearch/h3 lindermanlab/S5

Ask Me Anything: A simple strategy for prompting language models

2022

Neo-6B (QA)

hazyresearch/ama_prompting simran-arora/privacy_fm simran-arora/focus

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

2022

Hybrid H3 125M (0-shot, logit scoring)

hazyresearch/safari hazyresearch/h3 lindermanlab/S5

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

2022

Hybrid H3 125M (3-shot, logit scoring)

hazyresearch/safari hazyresearch/h3 lindermanlab/S5

BloombergGPT: A Large Language Model for Finance

2023

BLOOM 176B (1-shot)

yangletliu/finlora open-finance-lab/finlora

BloombergGPT: A Large Language Model for Finance

2023

GPT-NeoX 20B (1-shot)

yangletliu/finlora open-finance-lab/finlora

BloombergGPT: A Large Language Model for Finance

2023

OPT 66B (1-shot)

yangletliu/finlora open-finance-lab/finlora

MultiRC

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (30)

PaLM: Scaling Language Modeling with Pathways

ST-MoE: Designing Stable and Transferable Sparse Expert Models

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

PaLM 2 Technical Report

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

ST-MoE: Designing Stable and Transferable Sparse Expert Models

PaLM 2 Technical Report

PaLM 2 Technical Report

Finetuned Language Models Are Zero-Shot Learners

Finetuned Language Models Are Zero-Shot Learners

Language Models are Few-Shot Learners

Finetuned Language Models Are Zero-Shot Learners

KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Ask Me Anything: A simple strategy for prompting language models

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

BloombergGPT: A Large Language Model for Finance

N-Grammer: Augmenting Transformers with latent n-grams

Ask Me Anything: A simple strategy for prompting language models

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

Ask Me Anything: A simple strategy for prompting language models

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

BloombergGPT: A Large Language Model for Finance

BloombergGPT: A Large Language Model for Finance

BloombergGPT: A Large Language Model for Finance

Model	Paper	F1	Date
PaLM 540B (finetuned)	PaLM: Scaling Language Modeling with Pathways	90.10	2022-04-05
ST-MoE-32B 269B (fine-tuned)	ST-MoE: Designing Stable and Transferable Sparse …	89.60	2022-02-17
Turing NLR v5 XXL 5.4B (fine-tuned)	Toward Efficient Language Model Pretraining and D…	88.40	2022-12-04
DeBERTa-1.5B	DeBERTa: Decoding-enhanced BERT with Disentangled…	88.20	2020-06-05
Vega v2 6B (fine-tuned)	Toward Efficient Language Model Pretraining and D…	88.20	2022-12-04
PaLM 2-L (one-shot)	PaLM 2 Technical Report	88.20	2023-05-17
T5-XXL 11B (fine-tuned)	Exploring the Limits of Transfer Learning with a …	88.10	2019-10-23
ST-MoE-L 4.1B (fine-tuned)	ST-MoE: Designing Stable and Transferable Sparse …	86.00	2022-02-17
PaLM 2-M (one-shot)	PaLM 2 Technical Report	84.10	2023-05-17
PaLM 2-S (one-shot)	PaLM 2 Technical Report	84.00	2023-05-17
FLAN 137B (prompt-tuned)	Finetuned Language Models Are Zero-Shot Learners	83.40	2021-09-03
FLAN 137B (zero-shot)	Finetuned Language Models Are Zero-Shot Learners	77.50	2021-09-03
GPT-3 175B (Few-Shot)	Language Models are Few-Shot Learners	75.40	2020-05-28
FLAN 137B (1-shot)	Finetuned Language Models Are Zero-Shot Learners	72.10	2021-09-03
KELM (finetuning BERT-large based single model)	KELM: Knowledge Enhanced Pre-Trained Language Rep…	70.80	2021-09-09
BERT-large(single model)	BERT: Pre-training of Deep Bidirectional Transfor…	70.00	2018-10-11
Neo-6B (QA + WS)	Ask Me Anything: A simple strategy for prompting …	63.80	2022-10-05
T5-11B	Exploring the Limits of Transfer Learning with a …	63.30	2019-10-23
Bloomberg GPT 50B (1-shot)	BloombergGPT: A Large Language Model for Finance	62.30	2023-03-30
N-Grammer 343M	N-Grammer: Augmenting Transformers with latent n-…	62.00	2022-07-13
Neo-6B (few-shot)	Ask Me Anything: A simple strategy for prompting …	60.80	2022-10-05
Hybrid H3 355M (3-shot, logit scoring)	Hungry Hungry Hippos: Towards Language Modeling w…	59.70	2022-12-28
AlexaTM 20B	AlexaTM 20B: Few-Shot Learning Using a Large-Scal…	59.60	2022-08-02
Hybrid H3 355M (0-shot, logit scoring)	Hungry Hungry Hippos: Towards Language Modeling w…	59.50	2022-12-28
Neo-6B (QA)	Ask Me Anything: A simple strategy for prompting …	58.80	2022-10-05
Hybrid H3 125M (0-shot, logit scoring)	Hungry Hungry Hippos: Towards Language Modeling w…	51.40	2022-12-28
Hybrid H3 125M (3-shot, logit scoring)	Hungry Hungry Hippos: Towards Language Modeling w…	48.90	2022-12-28
BLOOM 176B (1-shot)	BloombergGPT: A Large Language Model for Finance	26.70	2023-03-30
GPT-NeoX 20B (1-shot)	BloombergGPT: A Large Language Model for Finance	22.90	2023-03-30
OPT 66B (1-shot)	BloombergGPT: A Large Language Model for Finance	18.80	2023-03-30