PaSST-RoBERTa & Estimated Audio–Caption Correspondences
|
Estimated Audio-Caption Correspondences Improve L…
|
27.69
|
2024-08-21
|
|
InternVideo2-6B
|
InternVideo2: Scaling Foundation Models for Multi…
|
27.20
|
2024-03-22
|
|
VAST
|
VAST: A Vision-Audio-Subtitle-Text Omni-Modality …
|
26.90
|
2023-05-29
|
|
PaSST–RoBERTa & GPT-augment
|
Advancing Natural-Language Based Audio Retrieval …
|
26.07
|
2023-08-08
|
|
ONE-PEACE
|
ONE-PEACE: Exploring One General Representation M…
|
22.40
|
2023-05-18
|
|
VALOR
|
VALOR: Vision-Audio-Language Omni-Perception Pret…
|
17.50
|
2023-04-17
|
|
CE (pretraining:AudioCaps)
|
Audio Retrieval with Natural Language Queries
|
|
2021-05-05
|
|
MoEE (pretraining:AudioCaps)
|
Audio Retrieval with Natural Language Queries
|
|
2021-05-05
|
|
CE
|
Audio Retrieval with Natural Language Queries
|
|
2021-05-05
|
|
MMT
|
Audio Retrieval with Natural Language Queries: A …
|
|
2021-12-17
|
|
CE(pretraining:SoundDescs)
|
Audio Retrieval with Natural Language Queries: A …
|
|
2021-12-17
|
|
MoEE
|
Audio Retrieval with Natural Language Queries
|
|
2021-05-05
|
|