VATEX

Video And TEXt

Dataset Information
Modalities
Videos, Texts
Languages
English, Chinese
Introduced
2019
License
Homepage

Overview

VATEX is multilingual, large, linguistically complex, and diverse dataset in terms of both video and natural language descriptions. It has two tasks for video-and-language research: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description into the target language using the video information as additional spatiotemporal context.

Source: https://arxiv.org/pdf/1904.03493.pdf
Image Source: https://arxiv.org/pdf/1904.03493.pdf

Variants: VATEX

Associated Benchmarks

This dataset is used in 3 benchmarks:

  • Video Captioning -
  • Video Retrieval -
  • Zero-Shot Video Retrieval -

Recent Benchmark Submissions

Task Model Paper Date
Video Retrieval GRAM Gramian Multimodal Representation Learning and … 2024-12-16
Zero-Shot Video Retrieval GRAM Gramian Multimodal Representation Learning and … 2024-12-16
Zero-Shot Video Retrieval InternVideo2-6B InternVideo2: Scaling Foundation Models for … 2024-03-22
Video Retrieval InternVideo2-6B InternVideo2: Scaling Foundation Models for … 2024-03-22
Zero-Shot Video Retrieval InternVideo2-1B InternVideo2: Scaling Foundation Models for … 2024-03-22
Video Retrieval Side4Video Side4Video: Spatial-Temporal Side Network for … 2023-11-27
Video Captioning CoCap (ViT/L14) Accurate and Fast Compressed Video … 2023-09-22
Video Captioning COSA COSA: Concatenated Sample Pretrained Vision-Language … 2023-06-15
Video Captioning VAST VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation … 2023-05-29
Video Retrieval VAST VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation … 2023-05-29
Video Retrieval VALOR VALOR: Vision-Audio-Language Omni-Perception Pretraining Model … 2023-04-17
Video Captioning VALOR VALOR: Vision-Audio-Language Omni-Perception Pretraining Model … 2023-04-17
Video Retrieval Unmasked Teacher Unmasked Teacher: Towards Training-Efficient Video … 2023-03-28
Video Retrieval Cap4Video Cap4Video: What Can Auxiliary Captions … 2022-12-31
Video Captioning VideoCoCa VideoCoCa: Video-Text Modeling with Zero-Shot … 2022-12-09
Zero-Shot Video Retrieval VideoCoCa VideoCoCa: Video-Text Modeling with Zero-Shot … 2022-12-09
Video Retrieval InternVideo InternVideo: General Video Foundation Models … 2022-12-06
Zero-Shot Video Retrieval InternVideo InternVideo: General Video Foundation Models … 2022-12-06
Video Captioning VASTA (Kinetics-backbone) Diverse Video Captioning by Adaptive … 2022-08-19
Video Retrieval TS2-Net TS2-Net: Token Shift and Selection … 2022-07-16

Research Papers

Recent papers with results on this dataset: