Visual Question Answering (VQA)
|
ChatGPT 3.5 with LAPDoc Prompt (SpatialFormat) |
LAPDoc: Layout-Aware Prompting for Documents
|
2024-02-15 |
Visual Question Answering (VQA)
|
ScreenAI 5B (4.62 B params, w/ OCR) |
ScreenAI: A Vision-Language Model for …
|
2024-02-07 |
Visual Question Answering (VQA)
|
Gemini Ultra (pixel only) |
Gemini: A Family of Highly …
|
2023-12-19 |
Visual Question Answering (VQA)
|
SMoLA-PaLI-X Specialist |
Omni-SMoLA: Boosting Generalist Multimodal Models …
|
2023-12-01 |
Visual Question Answering (VQA)
|
SMoLA-PaLI-X Generalist |
Omni-SMoLA: Boosting Generalist Multimodal Models …
|
2023-12-01 |
Visual Question Answering (VQA)
|
PaLI-3 (w/ OCR) |
PaLI-3 Vision Language Models: Smaller, …
|
2023-10-13 |
Visual Question Answering (VQA)
|
PaLI-3 |
PaLI-3 Vision Language Models: Smaller, …
|
2023-10-13 |
Visual Question Answering (VQA)
|
DocFormerv2-large |
DocFormerv2: Local Features for Document …
|
2023-06-02 |
Visual Question Answering (VQA)
|
Claude + LATIN-Prompt |
Layout and Task Aware Instruction …
|
2023-06-01 |
Visual Question Answering (VQA)
|
GPT-3.5 + LATIN-Prompt |
Layout and Task Aware Instruction …
|
2023-06-01 |
Visual Question Answering (VQA)
|
PaLI-X (Single-task FT w/ OCR) |
PaLI-X: On Scaling up a …
|
2023-05-29 |
Visual Question Answering (VQA)
|
PaLI-X (Multi-task FT) |
PaLI-X: On Scaling up a …
|
2023-05-29 |
Visual Question Answering (VQA)
|
PaLI-X (Single-task FT) |
PaLI-X: On Scaling up a …
|
2023-05-29 |
Visual Question Answering (VQA)
|
DUBLIN (variable resolution) |
DUBLIN -- Document Understanding By …
|
2023-05-23 |
Visual Question Answering (VQA)
|
DUBLIN |
DUBLIN -- Document Understanding By …
|
2023-05-23 |
Visual Question Answering (VQA)
|
MatCha |
MatCha: Enhancing Visual Language Pretraining …
|
2022-12-19 |
Visual Question Answering (VQA)
|
UDOP (aux) |
Unifying Vision, Text, and Layout …
|
2022-12-05 |
Visual Question Answering (VQA)
|
UDOP |
Unifying Vision, Text, and Layout …
|
2022-12-05 |
Visual Question Answering (VQA)
|
Pix2Struct-large |
Pix2Struct: Screenshot Parsing as Pretraining …
|
2022-10-07 |
Visual Question Answering (VQA)
|
Pix2Struct-base |
Pix2Struct: Screenshot Parsing as Pretraining …
|
2022-10-07 |