EPIC-SOUNDS is a large scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos from EPIC-KITCHENS-100. EPIC-SOUNDS includes 78.4k categorised and 39.2k non-categorised segments of audible events and actions, distributed across 44 classes.
Source: Epic-Sounds: A Large-scale Dataset of Actions That Sound
Variants: EPIC-SOUNDS
This dataset is used in 2 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Audio Classification | CA2ST(B/16) | CA^2ST: Cross-Attention in Audio, Space, … | 2025-03-30 |
Audio Classification | CAVA(B/16) | CA^2ST: Cross-Attention in Audio, Space, … | 2025-03-30 |
Audio Classification | Mirasol3B | Mirasol3B: A Multimodal Autoregressive model … | 2023-11-09 |
Human Interaction Recognition | Slow-Fast(Finetune by Fivewin team) | Slow-Fast Auditory Streams For Audio … | 2021-03-05 |
Recent papers with results on this dataset: