GLUE

General Language Understanding Evaluation benchmark

Dataset Information
Modalities
Texts
Languages
English
Introduced
2019
Homepage

Overview

General Language Understanding Evaluation (GLUE) benchmark is a collection of nine natural language understanding tasks, including single-sentence tasks CoLA and SST-2, similarity and paraphrasing tasks MRPC, STS-B and QQP, and natural language inference tasks MNLI, QNLI, RTE and WNLI.

Source: Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models
Image Source: https://gluebenchmark.com/

Variants: WNLI, RTE, QNLI, MNLI-mm, MNLI-m, qqp, STS-B, MRPC, SST-2, CoLA, FinanceInc/auditor_sentiment, CHANGE-IT, datasetX, GLUE QNLI Dev, GLUE SST2 Dev, GLUE STSB, GLUE SST2, GLUE RTE, GLUE QQP, GLUE QNLI, GLUE MNLI, GLUE COLA, GLUE WNLI, GLUE MRPC, GLUE

Associated Benchmarks

This dataset is used in 3 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Natural Language Understanding MT-DNN-SMART SMART: Robust and Efficient Fine-Tuning … 2019-11-08
Natural Language Understanding BERT-LARGE BERT: Pre-training of Deep Bidirectional … 2018-10-11

Research Papers

Recent papers with results on this dataset: