Spoken-SQuAD

Name: Spoken-SQuAD
License: Unknown

Dataset Information

Modalities

Speech

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

In SpokenSQuAD, the document is in spoken form, the input question is in the form of text and the answer to each question is always a span in the document. The following procedures were used to generate spoken documents from the original SQuAD dataset. First, the Google text-to-speech system was used to generate the spoken version of the articles in SQuAD. Then CMU Sphinx was sued to generate the corresponding ASR transcriptions. The SQuAD training set was used to generate the training set of Spoken SQuAD, and SQuAD development set was used to generate the testing set for Spoken SQuAD. If the answer of a question did not exist in the ASR transcriptions of the associated article, the question-answer pair was removed from the dataset because these examples are too difficult for listening comprehension machine at this stage.

Source: https://github.com/chiahsuan156/Spoken-SQuAD
Image Source: https://github.com/chiahsuan156/Spoken-SQuAD

Variants: Spoken-SQuAD

Associated Benchmarks

This dataset is used in 1 benchmark:

Spoken Language Understanding - Metrics: F1 score

Recent Benchmark Submissions

Task	Model	Paper	Date
Spoken Language Understanding	ALBERT	End-to-end Spoken Conversational Question Answering: …	2022-04-29
Spoken Language Understanding	SpeechBERT	SpeechBERT: An Audio-and-text Jointly Learned …	2019-10-25
Spoken Language Understanding	QANet + GAN	Mitigating the Impact of Speech …	2019-04-16
Spoken Language Understanding	Baseline	Spoken SQuAD: A Study of …	2018-04-01

Research Papers

Recent papers with results on this dataset:

External Links:

Spoken-SQuAD

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview