AudioCaps

Dataset Information
Modalities
Texts, Audio
Introduced
2019
License
Unknown
Homepage

Overview

AudioCaps is a dataset of sounds with event descriptions that was introduced for the task of audio captioning, with sounds sourced from the AudioSet dataset. Annotators were provided the audio tracks together with category hints (and with additional video hints if needed).

Source: Audio Retrieval with Natural Language Queries

Image source: https://audiocaps.github.io/

Variants: AudioCaps

Associated Benchmarks

This dataset is used in 4 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Audio captioning LAVCap LAVCap: LLM-based Audio-Visual Captioning using … 2025-01-16
Audio Generation TangoFlux-base TangoFlux: Super Fast and Faithful … 2024-12-30
Audio Generation TangoFlux TangoFlux: Super Fast and Faithful … 2024-12-30
Audio Generation ETTA ETTA: Elucidating the Design Space … 2024-12-26
Audio Generation ETTA-FT-AC-100k ETTA: Elucidating the Design Space … 2024-12-26
Audio captioning MQ-Cap Enhancing Retrieval-Augmented Audio Captioning with … 2024-10-14
Audio captioning SLAM-AAC SLAM-AAC: Enhancing Audio Captioning with … 2024-10-12
Audio captioning EnCLAP++-base EnCLAP++: Analyzing the EnCLAP Framework … 2024-09-02
Audio captioning EnCLAP++-large EnCLAP++: Analyzing the EnCLAP Framework … 2024-09-02
Audio Generation Stable Audio Open Stable Audio Open 2024-07-19
Audio captioning AutoCap Taming Data and Transformers for … 2024-06-27
Audio Generation GenAu-Large Taming Data and Transformers for … 2024-06-27
Audio captioning LOAE Enhancing Automated Audio Captioning via … 2024-06-19
Audio Generation Tango-AF&AC-FT-AC Improving Text-To-Audio Models with Synthetic … 2024-06-18
Audio Generation Stable Audio 2.0 Long-form music generation with latent … 2024-04-16
Text to Audio Retrieval InternVideo2-6B InternVideo2: Scaling Foundation Models for … 2024-03-22
Target Sound Extraction CLAPSep CLAPSep: Leveraging Contrastive Pre-trained Model … 2024-02-27
Audio Generation Stable Audio Fast Timing-Conditioned Latent Audio Diffusion 2024-02-07
Audio captioning EnCLAP-large EnCLAP: Combining Neural Audio Codec … 2024-01-31
Audio captioning EnCLAP-base EnCLAP: Combining Neural Audio Codec … 2024-01-31

Research Papers

Recent papers with results on this dataset: