ML Research Wiki / Benchmarks / Natural Language Inference / MultiNLI

MultiNLI

Natural Language Inference Benchmark

Performance Over Time

📊 Showing 63 results | 📏 Metric: Matched

Top Performing Models

Rank	Model	Paper	Matched	Date	Code
1	UnitedSynT5 (3B) 📚	First Train to Generate, then Generate to Train: UnitedSynT5 for Few-Shot NLI	92.60	2024-12-12	-
2	T5	SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization	92.00	2019-11-08	📦 namisan/mt-dnn 📦 microsoft/MT-DNN 📦 archinetai/smart-pytorch
3	T5-XXL 11B (fine-tuned)	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	92.00	2019-10-23	📦 huggingface/transformers 📦 PaddlePaddle/PaddleNLP 📦 google-research/text-to-text-transfer-transformer
4	T5-11B	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	91.70	2019-10-23	📦 huggingface/transformers 📦 PaddlePaddle/PaddleNLP 📦 google-research/text-to-text-transfer-transformer
5	T5-3B	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	91.40	2019-10-23	📦 huggingface/transformers 📦 PaddlePaddle/PaddleNLP 📦 google-research/text-to-text-transfer-transformer
6	ALBERT	ALBERT: A Lite BERT for Self-supervised Learning of Language Representations	91.30	2019-09-26	📦 huggingface/transformers 📦 tensorflow/models 📦 PaddlePaddle/PaddleNLP
7	DeBERTa (large)	DeBERTa: Decoding-enhanced BERT with Disentangled Attention	91.10	2020-06-05	📦 huggingface/transformers 📦 microsoft/DeBERTa 📦 osu-nlp-group/mind2web
8	Adv-RoBERTa ensemble	StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding	91.10	2019-08-13	-
9	SMARTRoBERTa	SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization	91.10	2019-11-08	📦 namisan/mt-dnn 📦 microsoft/MT-DNN 📦 archinetai/smart-pytorch
10	RoBERTa	RoBERTa: A Robustly Optimized BERT Pretraining Approach	90.80	2019-07-26	📦 huggingface/transformers 📦 pytorch/fairseq 📦 PaddlePaddle/PaddleNLP

All Papers (63)

First Train to Generate, then Generate to Train: UnitedSynT5 for Few-Shot NLI

2024

UnitedSynT5 (3B)

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

2019

T5

namisan/mt-dnn microsoft/MT-DNN

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

2019

T5-XXL 11B (fine-tuned)

huggingface/transformers PaddlePaddle/PaddleNLP

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

2019

T5-11B

huggingface/transformers PaddlePaddle/PaddleNLP

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

2019

T5-3B

huggingface/transformers PaddlePaddle/PaddleNLP

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

2019

ALBERT

huggingface/transformers tensorflow/models

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

2020

DeBERTa (large)

huggingface/transformers microsoft/DeBERTa

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

2019

Adv-RoBERTa ensemble

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

2019

SMARTRoBERTa

namisan/mt-dnn microsoft/MT-DNN

RoBERTa: A Robustly Optimized BERT Pretraining Approach

2019

RoBERTa

huggingface/transformers pytorch/fairseq

XLNet: Generalized Autoregressive Pretraining for Language Understanding

2019

XLNet (single model)

huggingface/transformers PaddlePaddle/PaddleNLP

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

2022

RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)

timdettmers/bitsandbytes huggingface/transformers-bloom-inference

RoBERTa: A Robustly Optimized BERT Pretraining Approach

2019

RoBERTa (ensemble)

huggingface/transformers pytorch/fairseq

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

2019

T5-Large

huggingface/transformers PaddlePaddle/PaddleNLP

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

2020

PSQ (Chen et al., 2020)

cjf00000/StatQuant gaochang-bjtu/1-bit-fqt

First Train to Generate, then Generate to Train: UnitedSynT5 for Few-Shot NLI

2024

UnitedSynT5 (335M)

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

2019

T5-Large 770M

huggingface/transformers PaddlePaddle/PaddleNLP

ERNIE 2.0: A Continual Pre-training Framework for Language Understanding

2019

ERNIE 2.0 Large

PaddlePaddle/PaddleNLP PaddlePaddle/ERNIE DataScienceNigeria/ERNIE-2.0-from-Baidu-Inc.

SpanBERT: Improving Pre-training by Representing and Predicting Spans

2019

SpanBERT

facebookresearch/SpanBERT mandarjoshi90/coref

FNet: Mixing Tokens with Fourier Transforms

2021

BERT-Large

labmlai/annotated_deep_learning_paper_implementations google-research/google-research

Adversarial Self-Attention for Language Understanding

2022

ASA + RoBERTa

gingasan/adversarialsa

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding

2019

MT-DNN-ensemble

namisan/mt-dnn microsoft/MT-DNN chunhuililili/mt_dnn

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

2019

Q-BERT (Shen et al., 2020)

Training Complex Models with Multi-Task Weak Supervision

2018

Snorkel MeTaL (ensemble)

HazyResearch/metal

Big Bird: Transformers for Longer Sequences

2020

BigBird

huggingface/transformers tensorflow/models

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

2019

T5-Base

huggingface/transformers PaddlePaddle/PaddleNLP

Multi-Task Deep Neural Networks for Natural Language Understanding

2019

MT-DNN

namisan/mt-dnn xycforgithub/MultiTask-MRC

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

2018

BERT-LARGE

huggingface/transformers tensorflow/models

RealFormer: Transformer Likes Residual Attention

2020

RealFormer

google-research/google-research cloneofsimo/RealFormer-pytorch

Pay Attention to MLPs

2021

gMLP-large

labmlai/annotated_deep_learning_paper_implementations rwightman/pytorch-image-models

ERNIE 2.0: A Continual Pre-training Framework for Language Understanding

2019

ERNIE 2.0 Base

PaddlePaddle/PaddleNLP PaddlePaddle/ERNIE DataScienceNigeria/ERNIE-2.0-from-Baidu-Inc.

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

2019

MT-DNN-SMARTv0

namisan/mt-dnn microsoft/MT-DNN

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

2019

MT-DNN-SMART

namisan/mt-dnn microsoft/MT-DNN

Q8BERT: Quantized 8Bit BERT

2019

Q8BERT (Zafrir et al., 2019)

NervanaSystems/nlp-architect intellabs/model-compression-research-package

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

2019

SMART+BERT-BASE

namisan/mt-dnn microsoft/MT-DNN

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

2019

SMART-BERT

namisan/mt-dnn microsoft/MT-DNN

Adversarial Self-Attention for Language Understanding

2022

ASA + BERT-base

gingasan/adversarialsa

TinyBERT: Distilling BERT for Natural Language Understanding

2019

TinyBERT-6 67M

PaddlePaddle/PaddleNLP huawei-noah/Pretrained-Language-Model

Not all layers are equally as important: Every Layer Counts BERT

2023

ELC-BERT-base 98M (zero init)

How to Train BERT with an Academic Budget

2021

24hBERT

peteriz/academic-budget-bert IntelLabs/academic-budget-bert

ERNIE: Enhanced Language Representation with Informative Entities

2019

ERNIE

thunlp/ERNIE Mind23-2/MindCode-136

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

2021

Charformer-Tall

google-research/google-research lucidrains/charformer-pytorch

Not all layers are equally as important: Every Layer Counts BERT

2023

LTG-BERT-base 98M

TinyBERT: Distilling BERT for Natural Language Understanding

2019

TinyBERT-4 14.5M

PaddlePaddle/PaddleNLP huawei-noah/Pretrained-Language-Model

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

2019

T5-Small

huggingface/transformers PaddlePaddle/PaddleNLP

SqueezeBERT: What can computer vision teach NLP about efficient neural networks?

2020

SqueezeBERT

huggingface/transformers huggingface/transformers

Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale

2024

GPST(unsupervised generative syntactic LM)

ant-research/structuredlm_rtdt alipay/StructuredLM_RTDT

Not all layers are equally as important: Every Layer Counts BERT

2023

ELC-BERT-small 24M

Not all layers are equally as important: Every Layer Counts BERT

2023

LTG-BERT-small 24M

FNet: Mixing Tokens with Fourier Transforms

2021

FNet-Large

labmlai/annotated_deep_learning_paper_implementations google-research/google-research

Attention Boosted Sequential Inference Model

2018

aESIM

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

2023

T5-Large 738M

mbzuai-nlp/lamini-lm

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

2018

Multi-task BiLSTM + Attn

ofa-sys/ofa alibaba/EasyNLP

Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News

2018

Stacked Bi-LSTMs (shortcut connections, max-pooling)

imran3180/pytorch-nli LuisPB7/fnc-msc yinghao1019/imdb_prac

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

2018

GenSen

facebookresearch/InferSent facebookresearch/SentEval

Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News

2018

Bi-LSTM sentence encoder (max-pooling)

imran3180/pytorch-nli LuisPB7/fnc-msc yinghao1019/imdb_prac

Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News

2018

Stacked Bi-LSTMs (shortcut connections, max-pooling, attention)

imran3180/pytorch-nli LuisPB7/fnc-msc yinghao1019/imdb_prac

LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning

2023

LM-CPPF RoBERTa-base

amirabaskohi/lm-cppf

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

2018

SWEM-max

dinghanshen/SWEM nyk510/scdv-python

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

2023

LaMini-GPT 1.5B

mbzuai-nlp/lamini-lm

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

2023

LaMini-F-T5 783M

mbzuai-nlp/lamini-lm

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

2023

LaMini-T5 738M

mbzuai-nlp/lamini-lm

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

2023

GPT-2-XL 1.5B

mbzuai-nlp/lamini-lm

Model	Paper	Matched	Date
UnitedSynT5 (3B)	First Train to Generate, then Generate to Train: …	92.60	2024-12-12
T5	SMART: Robust and Efficient Fine-Tuning for Pre-t…	92.00	2019-11-08
T5-XXL 11B (fine-tuned)	Exploring the Limits of Transfer Learning with a …	92.00	2019-10-23
T5-11B	Exploring the Limits of Transfer Learning with a …	91.70	2019-10-23
T5-3B	Exploring the Limits of Transfer Learning with a …	91.40	2019-10-23
ALBERT	ALBERT: A Lite BERT for Self-supervised Learning …	91.30	2019-09-26
DeBERTa (large)	DeBERTa: Decoding-enhanced BERT with Disentangled…	91.10	2020-06-05
Adv-RoBERTa ensemble	StructBERT: Incorporating Language Structures int…	91.10	2019-08-13
SMARTRoBERTa	SMART: Robust and Efficient Fine-Tuning for Pre-t…	91.10	2019-11-08
RoBERTa	RoBERTa: A Robustly Optimized BERT Pretraining Ap…	90.80	2019-07-26
XLNet (single model)	XLNet: Generalized Autoregressive Pretraining for…	90.80	2019-06-19
RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)	LLM.int8(): 8-bit Matrix Multiplication for Trans…	90.20	2022-08-15
RoBERTa (ensemble)	RoBERTa: A Robustly Optimized BERT Pretraining Ap…	90.20	2019-07-26
T5-Large	Exploring the Limits of Transfer Learning with a …	89.90	2019-10-23
PSQ (Chen et al., 2020)	A Statistical Framework for Low-bitwidth Training…	89.90	2020-10-27
UnitedSynT5 (335M)	First Train to Generate, then Generate to Train: …	89.80	2024-12-12
T5-Large 770M	Exploring the Limits of Transfer Learning with a …	89.60	2019-10-23
ERNIE 2.0 Large	ERNIE 2.0: A Continual Pre-training Framework for…	88.70	2019-07-29
SpanBERT	SpanBERT: Improving Pre-training by Representing …	88.10	2019-07-24
BERT-Large	FNet: Mixing Tokens with Fourier Transforms	88.00	2021-05-09
ASA + RoBERTa	Adversarial Self-Attention for Language Understan…	88.00	2022-06-25
MT-DNN-ensemble	Improving Multi-Task Deep Neural Networks via Kno…	87.90	2019-04-20
Q-BERT (Shen et al., 2020)	Q-BERT: Hessian Based Ultra Low Precision Quantiz…	87.80	2019-09-12
Snorkel MeTaL (ensemble)	Training Complex Models with Multi-Task Weak Supe…	87.60	2018-10-05
BigBird	Big Bird: Transformers for Longer Sequences	87.50	2020-07-28
T5-Base	Exploring the Limits of Transfer Learning with a …	87.10	2019-10-23
MT-DNN	Multi-Task Deep Neural Networks for Natural Langu…	86.70	2019-01-31
BERT-LARGE	BERT: Pre-training of Deep Bidirectional Transfor…	86.70	2018-10-11
RealFormer	RealFormer: Transformer Likes Residual Attention	86.28	2020-12-21
gMLP-large	Pay Attention to MLPs	86.20	2021-05-17
ERNIE 2.0 Base	ERNIE 2.0: A Continual Pre-training Framework for…	86.10	2019-07-29
MT-DNN-SMARTv0	SMART: Robust and Efficient Fine-Tuning for Pre-t…	85.70	2019-11-08
MT-DNN-SMART	SMART: Robust and Efficient Fine-Tuning for Pre-t…	85.70	2019-11-08
Q8BERT (Zafrir et al., 2019)	Q8BERT: Quantized 8Bit BERT	85.60	2019-10-14
SMART+BERT-BASE	SMART: Robust and Efficient Fine-Tuning for Pre-t…	85.60	2019-11-08
SMART-BERT	SMART: Robust and Efficient Fine-Tuning for Pre-t…	85.60	2019-11-08
ASA + BERT-base	Adversarial Self-Attention for Language Understan…	85.00	2022-06-25
TinyBERT-6 67M	TinyBERT: Distilling BERT for Natural Language Un…	84.60	2019-09-23
ELC-BERT-base 98M (zero init)	Not all layers are equally as important: Every La…	84.40	2023-11-03
24hBERT	How to Train BERT with an Academic Budget	84.40	2021-04-15
ERNIE	ERNIE: Enhanced Language Representation with Info…	84.00	2019-05-17
Charformer-Tall	Charformer: Fast Character Transformers via Gradi…	83.70	2021-06-23
LTG-BERT-base 98M	Not all layers are equally as important: Every La…	83.00	2023-11-03
TinyBERT-4 14.5M	TinyBERT: Distilling BERT for Natural Language Un…	82.50	2019-09-23
T5-Small	Exploring the Limits of Transfer Learning with a …	82.40	2019-10-23
SqueezeBERT	SqueezeBERT: What can computer vision teach NLP a…	82.00	2020-06-19
GPST(unsupervised generative syntactic LM)	Generative Pretrained Structured Transformers: Un…	81.80	2024-03-13
ELC-BERT-small 24M	Not all layers are equally as important: Every La…	79.20	2023-11-03
LTG-BERT-small 24M	Not all layers are equally as important: Every La…	78.00	2023-11-03
FNet-Large	FNet: Mixing Tokens with Fourier Transforms	78.00	2021-05-09
aESIM	Attention Boosted Sequential Inference Model	73.90	2018-12-05
T5-Large 738M	LaMini-LM: A Diverse Herd of Distilled Models fro…	72.40	2023-04-27
Multi-task BiLSTM + Attn	GLUE: A Multi-Task Benchmark and Analysis Platfor…	72.20	2018-04-20
Stacked Bi-LSTMs (shortcut connections, max-pooling)	Combining Similarity Features and Deep Representa…	71.40	2018-11-02
GenSen	Learning General Purpose Distributed Sentence Rep…	71.40	2018-03-30
Bi-LSTM sentence encoder (max-pooling)	Combining Similarity Features and Deep Representa…	70.70	2018-11-02
Stacked Bi-LSTMs (shortcut connections, max-pooling, attention)	Combining Similarity Features and Deep Representa…	70.70	2018-11-02
LM-CPPF RoBERTa-base	LM-CPPF: Paraphrasing-Guided Data Augmentation fo…	68.40	2023-05-29
SWEM-max	Baseline Needs More Love: On Simple Word-Embeddin…	68.20	2018-05-24
LaMini-GPT 1.5B	LaMini-LM: A Diverse Herd of Distilled Models fro…	67.50	2023-04-27
LaMini-F-T5 783M	LaMini-LM: A Diverse Herd of Distilled Models fro…	61.40	2023-04-27
LaMini-T5 738M	LaMini-LM: A Diverse Herd of Distilled Models fro…	54.70	2023-04-27
GPT-2-XL 1.5B	LaMini-LM: A Diverse Herd of Distilled Models fro…	36.50	2023-04-27

MultiNLI

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (63)