MSVD-Indonesian is derived from the MSVD dataset, which is obtained with the help of a machine translation service. This dataset can be used for multimodal video-text tasks, including text-to-video retrieval, video-to-text retrieval, and video captioning. Same as the original English dataset, the MSVD-Indonesian dataset contains about 80k video-text pairs.
Variants: MSVD-Indonesian
This dataset is used in 3 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Video Captioning | VNS-GRU (Cross-Lingual) | MSVD-Indonesian: A Benchmark for Multimodal … | 2023-06-20 |
Video Retrieval | X-CLIP (Cross-Lingual) | MSVD-Indonesian: A Benchmark for Multimodal … | 2023-06-20 |
Text to Video Retrieval | X-CLIP (Cross-Lingual) | MSVD-Indonesian: A Benchmark for Multimodal … | 2023-06-20 |
Recent papers with results on this dataset: