General Language Understanding Evaluation benchmark
General Language Understanding Evaluation (GLUE) benchmark is a collection of nine natural language understanding tasks, including single-sentence tasks CoLA and SST-2, similarity and paraphrasing tasks MRPC, STS-B and QQP, and natural language inference tasks MNLI, QNLI, RTE and WNLI.
Source: Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models
Image Source: https://gluebenchmark.com/
Variants: WNLI, RTE, QNLI, MNLI-mm, MNLI-m, qqp, STS-B, MRPC, SST-2, CoLA, FinanceInc/auditor_sentiment, CHANGE-IT, datasetX, GLUE QNLI Dev, GLUE SST2 Dev, GLUE STSB, GLUE SST2, GLUE RTE, GLUE QQP, GLUE QNLI, GLUE MNLI, GLUE COLA, GLUE WNLI, GLUE MRPC, GLUE
This dataset is used in 3 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Natural Language Understanding | MT-DNN-SMART | SMART: Robust and Efficient Fine-Tuning … | 2019-11-08 |
Natural Language Understanding | BERT-LARGE | BERT: Pre-training of Deep Bidirectional … | 2018-10-11 |
Recent papers with results on this dataset: