InternVideo2
|
InternVideo2: Scaling Foundation Models for Multi…
|
98.60
|
2024-03-22
|
|
M2D2 AS+
|
M2D2: Exploring General-purpose Audio-Language Re…
|
98.50
|
2025-03-28
|
|
OmniVec
|
OmniVec: Learning robust representations with cro…
|
98.40
|
2023-11-07
|
|
BEATs
|
BEATs: Audio Pre-Training with Acoustic Tokenizers
|
98.10
|
2022-12-18
|
|
mn40_as
|
Efficient Large-scale Audio Tagging via Transform…
|
97.45
|
2022-11-09
|
|
DyMN-L
|
Dynamic Convolutional Neural Networks as Efficien…
|
97.40
|
2023-10-24
|
|
M2D-CLAP/0.7
|
M2D-CLAP: Masked Modeling Duo Meets CLAP for Lear…
|
97.40
|
2024-06-04
|
|
M2D-AS/0.7
|
Masked Modeling Duo: Towards a Universal Audio Pr…
|
97.20
|
2024-04-09
|
|
HTS-AT
|
HTS-AT: A Hierarchical Token-Semantic Audio Trans…
|
97.00
|
2022-02-02
|
|
EAT-M
|
End-to-End Audio Strikes Back: Boosting Augmentat…
|
96.30
|
2022-04-25
|
|
LHGNN
|
LHGNN: Local-Higher Order Graph Neural Networks F…
|
96.20
|
2025-01-07
|
|
M2D/0.7
|
Masked Modeling Duo: Towards a Universal Audio Pr…
|
96.00
|
2024-04-09
|
|
EAT
|
EAT: Self-Supervised Pre-Training with Efficient …
|
96.00
|
2024-01-07
|
|
Audio Spectrogram Transformer
|
AST: Audio Spectrogram Transformer
|
95.70
|
2021-04-05
|
|
EAT-S
|
End-to-End Audio Strikes Back: Boosting Augmentat…
|
95.25
|
2022-04-25
|
|
MATPAC (SSL model, linear eval)
|
Masked Latent Prediction and Classification for S…
|
93.50
|
2025-02-17
|
|
EAT-S (scratch)
|
End-to-End Audio Strikes Back: Boosting Augmentat…
|
92.15
|
2022-04-25
|
|
SepTr + LeRaC
|
Learning Rate Curriculum
|
91.58
|
2022-05-18
|
|
SepTr
|
SepTr: Separable Transformer for Audio Spectrogra…
|
91.13
|
2022-03-17
|
|
Multi-Format Contrastive
|
Multi-Format Contrastive Learning of Audio Repres…
|
90.50
|
2021-03-11
|
|
AVID
|
Audio-Visual Instance Discrimination with Cross-M…
|
89.20
|
2020-04-27
|
|
ACDNet
|
Environmental Sound Classification on the Edge: A…
|
87.10
|
2021-03-05
|
|
XDC
|
Self-Supervised Learning by Cross-Modal Audio-Vid…
|
85.40
|
2019-11-28
|
|
XDC
|
Self-Supervised Learning by Cross-Modal Audio-Vid…
|
84.80
|
2019-11-28
|
|
AVTS
|
Cooperative Learning of Audio and Video Models fr…
|
82.30
|
2018-06-30
|
|
L3
|
Look, Listen and Learn
|
79.30
|
2017-05-23
|
|