Visual Question Answering (VQA)
|
HYDRA |
HYDRA: A Hyper Agent for …
|
2024-03-19 |
Visual Question Answering (VQA)
|
Lyrics |
Lyrics: Boosting Fine-grained Language-Vision Alignment …
|
2023-12-08 |
Visual Question Answering (VQA)
|
PaLI-X-VPD |
Visual Program Distillation: Distilling Tools …
|
2023-12-05 |
Visual Question Answering (VQA)
|
A Simple Baseline for KB-VQA |
A Simple Baseline for Knowledge-Based …
|
2023-10-20 |
Retrieval
|
FLMR |
Fine-grained Late-interaction Multi-modal Retrieval for …
|
2023-09-29 |
Visual Question Answering (VQA)
|
RA-VQA-v2 (T5-large) |
Fine-grained Late-interaction Multi-modal Retrieval for …
|
2023-09-29 |
Visual Question Answering (VQA)
|
RA-VQA-v2 (BLIP 2) |
Fine-grained Late-interaction Multi-modal Retrieval for …
|
2023-09-29 |
Visual Question Answering (VQA)
|
PaLI-X (Single-task FT) |
PaLI-X: On Scaling up a …
|
2023-05-29 |
Visual Question Answering (VQA)
|
PaLM-E-562B |
PaLM-E: An Embodied Multimodal Language …
|
2023-03-06 |
Visual Question Answering (VQA)
|
Prophet |
Prophet: Prompting Large Language Models …
|
2023-03-03 |
Visual Question Answering (VQA)
|
VK-OOD |
Differentiable Outlier Detection Enable Robust …
|
2023-02-11 |
Visual Question Answering (VQA)
|
BLIP-2 ViT-G FlanT5 XXL (zero-shot) |
BLIP-2: Bootstrapping Language-Image Pre-training with …
|
2023-01-30 |
Visual Question Answering (VQA)
|
BLIP-2 ViT-G FlanT5 XL (zero-shot) |
BLIP-2: Bootstrapping Language-Image Pre-training with …
|
2023-01-30 |
Visual Question Answering (VQA)
|
BLIP-2 ViT-L OPT 2.7B (zero-shot) |
BLIP-2: Bootstrapping Language-Image Pre-training with …
|
2023-01-30 |
Visual Question Answering (VQA)
|
BLIP-2 ViT-G OPT 2.7B (zero-shot) |
BLIP-2: Bootstrapping Language-Image Pre-training with …
|
2023-01-30 |
Visual Question Answering (VQA)
|
BLIP-2 ViT-G OPT 6.7B (zero-shot) |
BLIP-2: Bootstrapping Language-Image Pre-training with …
|
2023-01-30 |
Visual Question Answering (VQA)
|
BLIP-2 ViT-L FlanT5 XL (zero-shot) |
BLIP-2: Bootstrapping Language-Image Pre-training with …
|
2023-01-30 |
Visual Question Answering (VQA)
|
ReVeaL WIT + CC12M + Wikidata + VQA-2 |
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with …
|
2022-12-10 |
Visual Question Answering (VQA)
|
PromptCap |
PromptCap: Prompt-Guided Task-Aware Image Captioning
|
2022-11-15 |
Visual Question Answering (VQA)
|
VLC-BERT |
VLC-BERT: Visual Question Answering with …
|
2022-10-24 |