PodcastFillers

Name: PodcastFillers
Published: 2022-03-28
License: Creative Commons Non-Commercial (Any)

Dataset Information

Modalities

Speech

Languages

English

Introduced

2022

License

Creative Commons Non-Commercial (Any)

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

The PodcastFillers dataset consists of 199 full-length podcast episodes in English with manually annotated filler words and automatically generated transcripts. The podcast audio recordings, sourced from SoundCloud, are CC-licensed, gender-balanced, and total 145 hours of audio from over 350 speakers. The annotations are provided under a non-commercial license and consist of 85,803 manually annotated audio events including approximately 35,000 filler words (“uh” and “um”) and 50,000 non-filler events such as breaths, music, laughter, repeated words, and noise. The annotated events are also provided as pre-processed 1-second audio clips. The dataset also includes automatically generated speech transcripts from a speech-to-text system. A detailed description is provided in Dataset.

Variants: PodcastFillers

Associated Benchmarks

This dataset is used in 1 benchmark:

Sound Event Localization and Detection - Metrics: event-based F1 score

Recent Benchmark Submissions

Task	Model	Paper	Date
Sound Event Localization and Detection	AVC-FillerNet	Filler Word Detection and Classification: …	2022-03-28
Sound Event Localization and Detection	VC-FillerNet	Filler Word Detection and Classification: …	2022-03-28

Research Papers

Recent papers with results on this dataset:

Filler Word Detection and Classification: A Dataset and Benchmark (2022) -

External Links:

PodcastFillers

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview