SQA3D

Situated Question Answering in 3D Scenes

Dataset Information
Modalities
Images, Videos, Texts, 3D
Languages
English
Introduced
2023
License
Homepage

Overview

SQA3D is a dataset for embodied scene understanding, where an agent needs to comprehend the scene it situates from an first person's perspective and answer questions. The questions are designed to be situated, embodied and knowledge-intensive. We offer three different modalities to represent a 3D scene: 3D scan, egocentric video and BEV picture.

Variants: SQA3D

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Question Answering Lexicon3D Lexicon3D: Probing Visual Foundation Models … 2024-09-05
Question Answering Situation3D Situational Awareness Matters in 3D … 2024-06-11
Question Answering CREMA CREMA: Generalizable and Efficient Video-Language … 2024-02-08
Question Answering LM4VisualEncoding Frozen Transformers in Language Models … 2023-10-19
Referring Expression Random SQA3D: Situated Question Answering in … 2022-10-14
Question Answering ScanQA (w/ auxiliary loss) SQA3D: Situated Question Answering in … 2022-10-14
Question Answering ScanQA SQA3D: Situated Question Answering in … 2022-10-14
Question Answering MCAN Deep Modular Co-Attention Networks for … 2019-06-25

Research Papers

Recent papers with results on this dataset: