EPIC-SOUNDS

Dataset Information
Modalities
Audio
Introduced
2023
Homepage

Overview

EPIC-SOUNDS is a large scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos from EPIC-KITCHENS-100. EPIC-SOUNDS includes 78.4k categorised and 39.2k non-categorised segments of audible events and actions, distributed across 44 classes.

Source: Epic-Sounds: A Large-scale Dataset of Actions That Sound

Variants: EPIC-SOUNDS

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Audio Classification CA2ST(B/16) CA^2ST: Cross-Attention in Audio, Space, … 2025-03-30
Audio Classification CAVA(B/16) CA^2ST: Cross-Attention in Audio, Space, … 2025-03-30
Audio Classification Mirasol3B Mirasol3B: A Multimodal Autoregressive model … 2023-11-09
Human Interaction Recognition Slow-Fast(Finetune by Fivewin team) Slow-Fast Auditory Streams For Audio … 2021-03-05

Research Papers

Recent papers with results on this dataset: