AudioSet

Dataset Information
Modalities
Videos, Audio
Languages
Chinese
Introduced
2017
License
Homepage

Overview

Audioset is an audio event dataset, which consists of over 2M human-annotated 10-second video clips. These clips are collected from YouTube, therefore many of which are in poor-quality and contain multiple sound-sources. A hierarchical ontology of 632 event classes is employed to annotate these data, which means that the same sound could be annotated as different labels. For example, the sound of barking is annotated as Animal, Pets, and Dog. All the videos are split into Evaluation/Balanced-Train/Unbalanced-Train set.

Source: Curriculum Audiovisual Learning

Variants: AudioSet

Associated Benchmarks

This dataset is used in 5 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Audio Classification SSLAM (Audio-Only, Single) SSLAM: Enhancing Self-Supervised Models with … 2025-06-13
Audio Classification M2D2 M2D2: Exploring General-purpose Audio-Language Representations … 2025-03-28
Audio Classification DASS-Small (Audio-only, single) DASS: Distilled Audio State Space … 2024-07-04
Audio Classification DASS-Medium (Audio-only, single) DASS: Distilled Audio State Space … 2024-07-04
Audio Classification M2D-CLAP/0.7 M2D-CLAP: Masked Modeling Duo Meets … 2024-06-04
Audio Classification M2D-AS/0.7 Masked Modeling Duo: Towards a … 2024-04-09
Audio Classification M2D/0.7 Masked Modeling Duo: Towards a … 2024-04-09
Audio Classification EquiAV EquiAV: Leveraging Equivariance for Audio-Visual … 2024-03-14
Target Sound Extraction CLAPSep CLAPSep: Leveraging Contrastive Pre-trained Model … 2024-02-27
Audio Classification EAT EAT: Self-Supervised Pre-Training with Efficient … 2024-01-07
Audio Classification OmniVec OmniVec: Learning robust representations with … 2023-11-07
Audio Classification DyMN-L (Audio-Only, Single) Dynamic Convolutional Neural Networks as … 2023-10-24
Audio Tagging DyMN-L (Audio-Only, Single) Dynamic Convolutional Neural Networks as … 2023-10-24
Audio Classification ATST-C2F(Single) Self-supervised Audio Teacher-Student Transformer for … 2023-06-07
Audio Classification ATST-Frame Self-supervised Audio Teacher-Student Transformer for … 2023-06-07
Audio Classification BEATs (Audio-only, Single) BEATs: Audio Pre-Training with Acoustic … 2022-12-18
Audio Classification BEATs (Audio-only, Ensemble) BEATs: Audio Pre-Training with Acoustic … 2022-12-18
Audio Classification Audiovisual Masked Autoencoder (Audio-only, Single) Audiovisual Masked Autoencoders 2022-12-09
Audio Classification Audiovisual Masked Autoencoder (Audiovisual, Single) Audiovisual Masked Autoencoders 2022-12-09
Audio Classification mn40_as (Ensemble) Efficient Large-scale Audio Tagging via … 2022-11-09

Research Papers

Recent papers with results on this dataset: