SMoLA-PaLI-X Specialist Model
|
Omni-SMoLA: Boosting Generalist Multimodal Models…
|
83.75
|
2023-12-01
|
|
PaLI-X-VPD
|
Visual Program Distillation: Distilling Tools and…
|
80.40
|
2023-12-05
|
|
Prophet
|
Prophet: Prompting Large Language Models with Com…
|
75.10
|
2023-03-03
|
|
PromptCap
|
PromptCap: Prompt-Guided Task-Aware Image Caption…
|
73.20
|
2022-11-15
|
|
MC-CoT
|
Boosting the Power of Small Multimodal Reasoning …
|
71.00
|
2023-11-23
|
|
A Simple Baseline for KB-VQA
|
A Simple Baseline for Knowledge-Based Visual Ques…
|
57.50
|
2023-10-20
|
|
HYDRA
|
HYDRA: A Hyper Agent for Dynamic Compositional Vi…
|
56.35
|
2024-03-19
|
|
GPV-2
|
Webly Supervised Concept Expansion for General Pu…
|
53.70
|
2022-02-04
|
|
KRISP
|
KRISP: Integrating Implicit and Symbolic Knowledg…
|
42.20
|
2020-12-20
|
|
ViLBERT - VQA
|
ViLBERT: Pretraining Task-Agnostic Visiolinguisti…
|
42.10
|
2019-08-06
|
|
LXMERT
|
LXMERT: Learning Cross-Modality Encoder Represent…
|
41.60
|
2019-08-20
|
|
ViLBERT
|
ViLBERT: Pretraining Task-Agnostic Visiolinguisti…
|
41.50
|
2019-08-06
|
|
Pythia
|
Pythia v0.1: the Winning Entry to the VQA Challen…
|
40.10
|
2018-07-26
|
|
VLC-BERT
|
VLC-BERT: Visual Question Answering with Contextu…
|
38.05
|
2022-10-24
|
|
ViLBERT - OK-VQA
|
ViLBERT: Pretraining Task-Agnostic Visiolinguisti…
|
34.10
|
2019-08-06
|
|