LLaMA-VQA
|
Large Language Models are Temporal and Causal Reaβ¦
|
82.20
|
2023-10-24
|
|
FrozenBiLM
|
Zero-Shot Video Question Answering via Frozen Bidβ¦
|
82.00
|
2022-06-16
|
|
VindLU
|
VindLU: A Recipe for Effective Video-and-Languageβ¦
|
79.00
|
2022-12-09
|
|
iPerceive (Chadha et al., 2020)
|
iPerceive: Applying Common-Sense Reasoning to Mulβ¦
|
76.96
|
2020-11-16
|
|
Hero w/ pre-training
|
HERO: Hierarchical Encoder for Video+Language Omnβ¦
|
74.24
|
2020-05-01
|
|
STAGE (Lei et al., 2019)
|
TVQA+: Spatio-Temporal Grounding for Video Questiβ¦
|
70.50
|
2019-04-25
|
|