ML Research Wiki / Benchmarks / Common Sense Reasoning / ReCoRD

ReCoRD

Common Sense Reasoning Benchmark

Performance Over Time

📊 Showing 33 results | 📏 Metric: EM

Top Performing Models

Rank	Model	Paper	EM	Date	Code
1	Turing NLR v5 XXL 5.4B (fine-tuned)	Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE	96.40	2022-12-04	-
2	ST-MoE-32B 269B (fine-tuned)	ST-MoE: Designing Stable and Transferable Sparse Expert Models	95.10	2022-02-17	📦 tensorflow/mesh 📦 xuefuzhao/openmoe 📦 yikangshen/megablocks
3	PaLM 540B (finetuned)	PaLM: Scaling Language Modeling with Pathways	94.60	2022-04-05	📦 lucidrains/CoCa-pytorch 📦 lucidrains/PaLM-pytorch 📦 google/paxml
4	DeBERTa-1.5B	DeBERTa: Decoding-enhanced BERT with Disentangled Attention	94.50	2020-06-05	📦 huggingface/transformers 📦 microsoft/DeBERTa 📦 osu-nlp-group/mind2web
5	Vega v2 6B (fine-tuned)	Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE	94.40	2022-12-04	-
6	T5-11B	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	94.10	2019-10-23	📦 huggingface/transformers 📦 PaddlePaddle/PaddleNLP 📦 google-research/text-to-text-transfer-transformer
7	PaLM 2-L (one-shot)	PaLM 2 Technical Report	93.80	2023-05-17	📦 eternityyw/tram-benchmark
8	T5-XXL 11B (fine-tuned)	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	93.40	2019-10-23	📦 huggingface/transformers 📦 PaddlePaddle/PaddleNLP 📦 google-research/text-to-text-transfer-transformer
9	PaLM 2-M (one-shot)	PaLM 2 Technical Report	92.40	2023-05-17	📦 eternityyw/tram-benchmark
10	GESA 500M	Integrating a Heterogeneous Graph with Entity-aware Self-attention using Relative Position Labels for Reading Comprehension Model	92.20	2023-07-19	-

All Papers (33)

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

2022

Turing NLR v5 XXL 5.4B (fine-tuned)

ST-MoE: Designing Stable and Transferable Sparse Expert Models

2022

ST-MoE-32B 269B (fine-tuned)

tensorflow/mesh xuefuzhao/openmoe yikangshen/megablocks

PaLM: Scaling Language Modeling with Pathways

2022

PaLM 540B (finetuned)

lucidrains/CoCa-pytorch lucidrains/PaLM-pytorch

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

2020

DeBERTa-1.5B

huggingface/transformers microsoft/DeBERTa

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

2022

Vega v2 6B (fine-tuned)

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

2019

T5-11B

huggingface/transformers PaddlePaddle/PaddleNLP

PaLM 2 Technical Report

2023

PaLM 2-L (one-shot)

eternityyw/tram-benchmark

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

2019

T5-XXL 11B (fine-tuned)

huggingface/transformers PaddlePaddle/PaddleNLP

PaLM 2 Technical Report

2023

PaLM 2-M (one-shot)

eternityyw/tram-benchmark

Integrating a Heterogeneous Graph with Entity-aware Self-attention using Relative Position Labels for Reading Comprehension Model

2023

GESA 500M

PaLM 2 Technical Report

2023

PaLM 2-S (one-shot)

eternityyw/tram-benchmark

LUKE-Graph: A Transformer-based Approach with Gated Relational Graph Attention for Cloze-style Reading Comprehension

2023

LUKE-Graph

LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention

2020

LUKE 483M

huggingface/transformers PaddlePaddle/PaddleNLP

Large Language Models are Zero-Shot Reasoners

2022

GPT-3 175B (one-shot)

kojima-takeshi188/zero_shot_cot skytliang/multi-agents-debate

KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs

2021

KELM (finetuning RoBERTa-large based single model)

nlp-anonymous-happy/anonymous-kg-guided-nlp

ST-MoE: Designing Stable and Transferable Sparse Expert Models

2022

ST-MoE-L 4.1B (fine-tuned)

tensorflow/mesh xuefuzhao/openmoe yikangshen/megablocks

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

2022

AlexaTM 20B

amazon-science/alexa-teacher-models

Finetuned Language Models Are Zero-Shot Learners

2021

FLAN 137B (prompt-tuned)

hiyouga/llama-efficient-tuning bigcode-project/starcoder

BloombergGPT: A Large Language Model for Finance

2023

Bloomberg GPT 50B (1-shot)

yangletliu/finlora open-finance-lab/finlora

BloombergGPT: A Large Language Model for Finance

2023

OPT 66B (1-shot)

yangletliu/finlora open-finance-lab/finlora

Language Models are Few-Shot Learners

2020

GPT-3 Large 760M (0-shot)

ggml-org/llama.cpp ggerganov/llama.cpp

Efficient Language Modeling with Sparse all-MLP

2022

Switch Transformer 9B

BloombergGPT: A Large Language Model for Finance

2023

BLOOM 176B (1-shot)

yangletliu/finlora open-finance-lab/finlora

KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs

2021

KELM (finetuning BERT-large based single model)

nlp-anonymous-happy/anonymous-kg-guided-nlp

Efficient Language Modeling with Sparse all-MLP

2022

sMLP – deterministic 9.4B (0-shot)

Finetuned Language Models Are Zero-Shot Learners

2021

FLAN 137B (zero-shot)

hiyouga/llama-efficient-tuning bigcode-project/starcoder

Efficient Language Modeling with Sparse all-MLP

2022

Gshard 9B

BloombergGPT: A Large Language Model for Finance

2023

GPT-NeoX 20B (1-shot)

yangletliu/finlora open-finance-lab/finlora

Efficient Language Modeling with Sparse all-MLP

2022

HASH Layers 10B (0-shot)

Efficient Language Modeling with Sparse all-MLP

2022

Base Layers 10B (0-shot)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

2018

BERT-Base (single model)

huggingface/transformers tensorflow/models

ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension

2018

DocQA + ELMo

N-Grammer: Augmenting Transformers with latent n-grams

2022

N-Grammer 343M

tensorflow/lingvo yiyixuxu/n-grammer-flax

ReCoRD

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (33)

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

ST-MoE: Designing Stable and Transferable Sparse Expert Models

PaLM: Scaling Language Modeling with Pathways

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

PaLM 2 Technical Report

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

PaLM 2 Technical Report

Integrating a Heterogeneous Graph with Entity-aware Self-attention using Relative Position Labels for Reading Comprehension Model

PaLM 2 Technical Report

LUKE-Graph: A Transformer-based Approach with Gated Relational Graph Attention for Cloze-style Reading Comprehension

LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention

Large Language Models are Zero-Shot Reasoners

KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs

ST-MoE: Designing Stable and Transferable Sparse Expert Models

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

Finetuned Language Models Are Zero-Shot Learners

BloombergGPT: A Large Language Model for Finance

BloombergGPT: A Large Language Model for Finance

Language Models are Few-Shot Learners

Efficient Language Modeling with Sparse all-MLP

BloombergGPT: A Large Language Model for Finance

KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs

Efficient Language Modeling with Sparse all-MLP

Finetuned Language Models Are Zero-Shot Learners

Efficient Language Modeling with Sparse all-MLP

BloombergGPT: A Large Language Model for Finance

Efficient Language Modeling with Sparse all-MLP

Efficient Language Modeling with Sparse all-MLP

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension

N-Grammer: Augmenting Transformers with latent n-grams

Model	Paper	EM	Date
Turing NLR v5 XXL 5.4B (fine-tuned)	Toward Efficient Language Model Pretraining and D…	96.40	2022-12-04
ST-MoE-32B 269B (fine-tuned)	ST-MoE: Designing Stable and Transferable Sparse …	95.10	2022-02-17
PaLM 540B (finetuned)	PaLM: Scaling Language Modeling with Pathways	94.60	2022-04-05
DeBERTa-1.5B	DeBERTa: Decoding-enhanced BERT with Disentangled…	94.50	2020-06-05
Vega v2 6B (fine-tuned)	Toward Efficient Language Model Pretraining and D…	94.40	2022-12-04
T5-11B	Exploring the Limits of Transfer Learning with a …	94.10	2019-10-23
PaLM 2-L (one-shot)	PaLM 2 Technical Report	93.80	2023-05-17
T5-XXL 11B (fine-tuned)	Exploring the Limits of Transfer Learning with a …	93.40	2019-10-23
PaLM 2-M (one-shot)	PaLM 2 Technical Report	92.40	2023-05-17
GESA 500M	Integrating a Heterogeneous Graph with Entity-awa…	92.20	2023-07-19
PaLM 2-S (one-shot)	PaLM 2 Technical Report	92.10	2023-05-17
LUKE-Graph	LUKE-Graph: A Transformer-based Approach with Gat…	91.50	2023-03-12
LUKE 483M	LUKE: Deep Contextualized Entity Representations …	91.20	2020-10-02
GPT-3 175B (one-shot)	Large Language Models are Zero-Shot Reasoners	90.20	2022-05-24
KELM (finetuning RoBERTa-large based single model)	KELM: Knowledge Enhanced Pre-Trained Language Rep…	89.60	2021-09-09
ST-MoE-L 4.1B (fine-tuned)	ST-MoE: Designing Stable and Transferable Sparse …	88.90	2022-02-17
AlexaTM 20B	AlexaTM 20B: Few-Shot Learning Using a Large-Scal…	88.40	2022-08-02
FLAN 137B (prompt-tuned)	Finetuned Language Models Are Zero-Shot Learners	85.10	2021-09-03
Bloomberg GPT 50B (1-shot)	BloombergGPT: A Large Language Model for Finance	82.80	2023-03-30
OPT 66B (1-shot)	BloombergGPT: A Large Language Model for Finance	82.50	2023-03-30
GPT-3 Large 760M (0-shot)	Language Models are Few-Shot Learners	82.10	2020-05-28
Switch Transformer 9B	Efficient Language Modeling with Sparse all-MLP	79.90	2022-03-14
BLOOM 176B (1-shot)	BloombergGPT: A Large Language Model for Finance	78.00	2023-03-30
KELM (finetuning BERT-large based single model)	KELM: Knowledge Enhanced Pre-Trained Language Rep…	76.70	2021-09-09
sMLP – deterministic 9.4B (0-shot)	Efficient Language Modeling with Sparse all-MLP	73.40	2022-03-14
FLAN 137B (zero-shot)	Finetuned Language Models Are Zero-Shot Learners	72.50	2021-09-03
Gshard 9B	Efficient Language Modeling with Sparse all-MLP	72.40	2022-03-14
GPT-NeoX 20B (1-shot)	BloombergGPT: A Large Language Model for Finance	67.90	2023-03-30
HASH Layers 10B (0-shot)	Efficient Language Modeling with Sparse all-MLP	67.20	2022-03-14
Base Layers 10B (0-shot)	Efficient Language Modeling with Sparse all-MLP	60.70	2022-03-14
BERT-Base (single model)	BERT: Pre-training of Deep Bidirectional Transfor…	56.07	2018-10-11
DocQA + ELMo	ReCoRD: Bridging the Gap between Human and Machin…	46.70	2018-10-30
N-Grammer 343M	N-Grammer: Augmenting Transformers with latent n-…	29.90	2022-07-13