ML Research Wiki / Benchmarks / Audio Classification / ESC-50

ESC-50

Audio Classification Benchmark

Performance Over Time

📊 Showing 26 results | 📏 Metric: Top-1 Accuracy

Top Performing Models

Rank Model Paper Top-1 Accuracy Date Code
1 InternVideo2 📚 InternVideo2: Scaling Foundation Models for Multimodal Video Understanding 98.60 2024-03-22 📦 opengvlab/internvideo 📦 opengvlab/internvideo2
2 M2D2 AS+ 📚 M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAP 98.50 2025-03-28 📦 nttcslab/m2d 📦 nttcslab/eval-audio-repr
3 OmniVec 📚 OmniVec: Learning robust representations with cross modal sharing 98.40 2023-11-07 -
4 BEATs 📚 BEATs: Audio Pre-Training with Acoustic Tokenizers 98.10 2022-12-18 📦 microsoft/unilm 📦 Yui010206/CREMA 📦 qingyuliu0521/icsd 📦 phuriches/genrepasd
5 mn40_as 📚 Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation 97.45 2022-11-09 📦 fschmid56/efficientat 📦 fschmid56/efficientat_hear
6 DyMN-L 📚 Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models 97.40 2023-10-24 📦 fschmid56/efficientat
7 M2D-CLAP/0.7 📚 M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation 97.40 2024-06-04 📦 nttcslab/m2d 📦 nttcslab/eval-audio-repr
8 M2D-AS/0.7 📚 Masked Modeling Duo: Towards a Universal Audio Pre-training Framework 97.20 2024-04-09 📦 nttcslab/m2d 📦 nttcslab/eval-audio-repr
9 HTS-AT 📚 HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection 97.00 2022-02-02 📦 retrocirce/hts-audio-transformer
10 EAT-M 📚 End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network 96.30 2022-04-25 📦 Alibaba-MIIL/AudioClassfication

All Papers (26)