GLUE

Name: GLUE
Published: 2019-01-01
License: Custom (various)

General Language Understanding Evaluation benchmark

Dataset Information

Modalities

Texts

Languages

English

Introduced

2019

License

Custom (various)

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

General Language Understanding Evaluation (GLUE) benchmark is a collection of nine natural language understanding tasks, including single-sentence tasks CoLA and SST-2, similarity and paraphrasing tasks MRPC, STS-B and QQP, and natural language inference tasks MNLI, QNLI, RTE and WNLI.

Source: Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models
Image Source: https://gluebenchmark.com/

Variants: WNLI, RTE, QNLI, MNLI-mm, MNLI-m, qqp, STS-B, MRPC, SST-2, CoLA, FinanceInc/auditor_sentiment, CHANGE-IT, datasetX, GLUE QNLI Dev, GLUE SST2 Dev, GLUE STSB, GLUE SST2, GLUE RTE, GLUE QQP, GLUE QNLI, GLUE MNLI, GLUE COLA, GLUE WNLI, GLUE MRPC, GLUE

Associated Benchmarks

This dataset is used in 3 benchmarks:

Natural Language Inference - Metrics: Accuracy, F1 Macro, F1 Micro, F1 Weighted, Precision Macro, Precision Micro, Precision Weighted, Recall Macro, Recall Micro, Recall Weighted, loss
Text Classification - Metrics: AUC, Accuracy, F1, Matthews Correlation, Precision, Recall, loss
Natural Language Understanding - Metrics: Average

Recent Benchmark Submissions

Task	Model	Paper	Date
Natural Language Understanding	MT-DNN-SMART	SMART: Robust and Efficient Fine-Tuning …	2019-11-08
Natural Language Understanding	BERT-LARGE	BERT: Pre-training of Deep Bidirectional …	2018-10-11

Research Papers

Recent papers with results on this dataset:

External Links:

GLUE

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview