📊 Showing 5 results | 📏 Metric: BLEU4
Rank | Model | Paper | BLEU4 | Date | Code |
---|---|---|---|---|---|
1 | VLTinT (ae-test split) C3D/Ling | VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning | 36.56 | 2022-11-28 | 📦 uark-aicv/vltint |
2 | VLCap (ae-test split) - Appearance + Language | VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning | 35.99 | 2022-06-26 | 📦 UARK-AICV/VLCAP |
3 | VideoCoCa 📚 | VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners | 35.00 | 2022-12-09 | - |
4 | COOT (ae-test split) - Only Appearance features | COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning | 31.45 | 2020-11-01 | 📦 gingsi/coot-videotext |
5 | MART (ae-test split) - Appearance + Flow | MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning | 10.33 | 2020-05-11 | 📦 jayleicn/recurrent-transformer |