Text + Text (no Multimodal Pretext Training)
|
Towards Fast Adaptation of Pretrained Contrastiveβ¦
|
40.20
|
2022-06-05
|
|
FrozenBiLM
|
Zero-Shot Video Question Answering via Frozen Bidβ¦
|
39.60
|
2022-06-16
|
|
VideoCoCa
|
VideoCoCa: Video-Text Modeling with Zero-Shot Traβ¦
|
39.00
|
2022-12-09
|
|
Co-Tokenization
|
Video Question Answering with Iterative Video-Texβ¦
|
38.20
|
2022-08-01
|
|
Just Ask (fine-tune)
|
Just Ask: Learning to Answer Questions from Milliβ¦
|
35.40
|
2020-12-01
|
|
FrozenBiLM (0-shot)
|
Zero-Shot Video Question Answering via Frozen Bidβ¦
|
26.80
|
2022-06-16
|
|
Just Ask (0-shot)
|
Just Ask: Learning to Answer Questions from Milliβ¦
|
12.20
|
2020-12-01
|
|