Text + Text (no Multimodal Pretext Training)
|
Towards Fast Adaptation of Pretrained Contrastive…
|
93.20
|
2022-06-05
|
|
FrozenBiLM
|
Zero-Shot Video Question Answering via Frozen Bid…
|
86.70
|
2022-06-16
|
|
Just Ask
|
Just Ask: Learning to Answer Questions from Milli…
|
84.40
|
2020-12-01
|
|
Hero w/ pre-training
|
HERO: Hierarchical Encoder for Video+Language Omn…
|
77.75
|
2020-05-01
|
|
ATP
|
Revisiting the "Video" in Video-Language Understa…
|
65.10
|
2022-06-03
|
|
FrozenBiLM (0-shot)
|
Zero-Shot Video Question Answering via Frozen Bid…
|
58.40
|
2022-06-16
|
|
Just Ask (0-shot)
|
Just Ask: Learning to Answer Questions from Milli…
|
51.10
|
2020-12-01
|
|