π Showing 8 results | π Metric: BLEU-4
Rank | Model | Paper | BLEU-4 | Date | Code |
---|---|---|---|---|---|
1 | VALOR π | VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset | 45.60 | 2023-04-17 | π¦ TXH-mercury/VALOR |
2 | VAST π | VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset | 45.00 | 2023-05-29 | π¦ TXH-mercury/VALOR π¦ txh-mercury/vast |
3 | COSA π | COSA: Concatenated Sample Pretrained Vision-Language Foundation Model | 43.70 | 2023-06-15 | π¦ txh-mercury/cosa |
4 | VideoCoCa π | VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners | 39.70 | 2022-12-09 | - |
5 | VASTA (Kinetics-backbone) | Diverse Video Captioning by Adaptive Spatio-temporal Attention | 36.25 | 2022-08-19 | π¦ zohrehghaderi/vasta |
6 | CoCap (ViT/L14) | Accurate and Fast Compressed Video Captioning | 35.80 | 2023-09-22 | π¦ acherstyx/CoCap |
7 | ORG-TRL π | Object Relational Graph with Teacher-Recommended Learning for Video Captioning | 32.10 | 2020-02-26 | - |
8 | NITS-VC | NITS-VC System for VATEX Video Captioning Challenge 2020 | 20.00 | 2020-06-07 | - |