OmniVec
|
OmniVec: Learning robust representations with cro…
|
0.55
|
2023-11-07
|
|
EquiAV
|
EquiAV: Leveraging Equivariance for Audio-Visual …
|
0.55
|
2024-03-14
|
|
Audiovisual Masked Autoencoder (Audiovisual, Single)
|
Audiovisual Masked Autoencoders
|
0.52
|
2022-12-09
|
|
CAV-MAE (Audio-Visual)
|
Contrastive Audio-Visual Masked Autoencoder
|
0.51
|
2022-10-02
|
|
BEATs (Audio-only, Ensemble)
|
BEATs: Audio Pre-Training with Acoustic Tokenizers
|
0.51
|
2022-12-18
|
|
UAVM (Audio + Video)
|
UAVM: Towards Unifying Audio and Visual Models
|
0.50
|
2022-07-29
|
|
SSLAM (Audio-Only, Single)
|
SSLAM: Enhancing Self-Supervised Models with Audi…
|
0.50
|
2025-06-13
|
|
mn40_as (Ensemble)
|
Efficient Large-scale Audio Tagging via Transform…
|
0.50
|
2022-11-09
|
|
ATST-C2F(Single)
|
Self-supervised Audio Teacher-Student Transformer…
|
0.50
|
2023-06-07
|
|
MBT (AS-500K training + Video)
|
Attention Bottlenecks for Multimodal Fusion
|
0.50
|
2021-06-30
|
|
PaSST (Ensemble)
|
Efficient Training of Audio Transformers with Pat…
|
0.50
|
2021-10-11
|
|
DyMN-L (Audio-Only, Single)
|
Dynamic Convolutional Neural Networks as Efficien…
|
0.49
|
2023-10-24
|
|
M2D2
|
M2D2: Exploring General-purpose Audio-Language Re…
|
0.49
|
2025-03-28
|
|
HTS-AT (Ensemble)
|
HTS-AT: A Hierarchical Token-Semantic Audio Trans…
|
0.49
|
2022-02-02
|
|
BEATs (Audio-only, Single)
|
BEATs: Audio Pre-Training with Acoustic Tokenizers
|
0.49
|
2022-12-18
|
|
EAT
|
EAT: Self-Supervised Pre-Training with Efficient …
|
0.49
|
2024-01-07
|
|
AST (Ensemble)
|
AST: Audio Spectrogram Transformer
|
0.49
|
2021-04-05
|
|
M2D-CLAP/0.7
|
M2D-CLAP: Masked Modeling Duo Meets CLAP for Lear…
|
0.49
|
2024-06-04
|
|
M2D-AS/0.7
|
Masked Modeling Duo: Towards a Universal Audio Pr…
|
0.49
|
2024-04-09
|
|
mn40_as (Single)
|
Efficient Large-scale Audio Tagging via Transform…
|
0.48
|
2022-11-09
|
|
ATST-Frame
|
Self-supervised Audio Teacher-Student Transformer…
|
0.48
|
2023-06-07
|
|
M2D/0.7
|
Masked Modeling Duo: Towards a Universal Audio Pr…
|
0.48
|
2024-04-09
|
|
PlayItBackX3
|
Play It Back: Iterative Attention for Audio Recog…
|
0.48
|
2022-10-20
|
|
DASS-Medium (Audio-only, single)
|
DASS: Distilled Audio State Space Models Are Stro…
|
0.48
|
2024-07-04
|
|
PSLA (Ensemble)
|
PSLA: Improving Audio Tagging with Pretraining, S…
|
0.47
|
2021-02-02
|
|
DASS-Small (Audio-only, single)
|
DASS: Distilled Audio State Space Models Are Stro…
|
0.47
|
2024-07-04
|
|
PaSST-S (Single)
|
Efficient Training of Audio Transformers with Pat…
|
0.47
|
2021-10-11
|
|
CAV-MAE (Audio-Only)
|
Contrastive Audio-Visual Masked Autoencoder
|
0.47
|
2022-10-02
|
|
Audiovisual Masked Autoencoder (Audio-only, Single)
|
Audiovisual Masked Autoencoders
|
0.47
|
2022-12-09
|
|
AudioVisual Fusion Net
|
Large Scale Audiovisual Learning of Sounds with W…
|
0.46
|
2020-05-29
|
|
AST (Single)
|
AST: Audio Spectrogram Transformer
|
0.46
|
2021-04-05
|
|
Perceiver
|
Perceiver: General Perception with Iterative Atte…
|
0.45
|
2021-03-04
|
|
PSLA (Single)
|
PSLA: Improving Audio Tagging with Pretraining, S…
|
0.44
|
2021-02-02
|
|
EAT-M
|
End-to-End Audio Strikes Back: Boosting Augmentat…
|
0.43
|
2022-04-25
|
|
Conformer (AS-2M)
|
Conformer-Based Self-Supervised Learning for Non-…
|
0.41
|
2021-10-14
|
|
EAT-S
|
End-to-End Audio Strikes Back: Boosting Augmentat…
|
0.41
|
2022-04-25
|
|
WEANet-SUSTAIN
|
A Sequential Self Teaching Approach for Improving…
|
0.40
|
2020-06-30
|
|
VATT-Base
|
VATT: Transformers for Multimodal Self-Supervised…
|
0.39
|
2021-04-22
|
|
Multi-Format Contrastive
|
Multi-Format Contrastive Learning of Audio Repres…
|
0.38
|
2021-03-11
|
|
MMV
|
Self-Supervised MultiModal Versatile Networks
|
0.31
|
2020-06-29
|
|
CAV-MAE (Visual-Only)
|
Contrastive Audio-Visual Masked Autoencoder
|
0.26
|
2022-10-02
|
|
L3
|
Look, Listen and Learn
|
0.25
|
2017-05-23
|
|
Triplet
|
Unsupervised Learning of Semantic Audio Represent…
|
0.24
|
2017-11-06
|
|