AdversarialQA

Name: AdversarialQA
Published: 2020-02-02
License: CC BY-SA 3.0

Dataset Information

Modalities

Texts

Languages

English

Introduced

2020

License

CC BY-SA 3.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

We have created three new Reading Comprehension datasets constructed using an adversarial model-in-the-loop.

We use three different models; BiDAF (Seo et al., 2016), BERTLarge (Devlin et al., 2018), and RoBERTaLarge (Liu et al., 2019) in the annotation loop and construct three datasets; D(BiDAF), D(BERT), and D(RoBERTa), each with 10,000 training examples, 1,000 validation, and 1,000 test examples.

The adversarial human annotation paradigm ensures that these datasets consist of questions that current state-of-the-art models (at least the ones used as adversaries in the annotation loop) find challenging. The three AdversarialQA round 1 datasets provide a training and evaluation resource for such methods.

Variants: AdversarialQA, adversarial_qa

Associated Benchmarks

This dataset is used in 1 benchmark:

Reading Comprehension - Metrics: Overall: F1, D(BiDAF): F1, D(BERT): F1, D(RoBERTa): F1

Recent Benchmark Submissions

Task	Model	Paper	Date
Reading Comprehension	RoBERTa-Large	Beat the AI: Investigating Adversarial …	2020-02-02
Reading Comprehension	BERT-Large	Beat the AI: Investigating Adversarial …	2020-02-02
Reading Comprehension	BiDAF	Beat the AI: Investigating Adversarial …	2020-02-02

Research Papers

Recent papers with results on this dataset:

Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension (2020) -

External Links:

AdversarialQA

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview