SIQA

Social Interaction QA

Dataset Information
Modalities
Texts
Introduced
2019
License
Unknown
Homepage

Overview

Social Interaction QA (SIQA) is a question-answering benchmark for testing social commonsense intelligence. Contrary to many prior benchmarks that focus on physical or taxonomic knowledge, Social IQa focuses on reasoning about people’s actions and their social implications. For example, given an action like "Jesse saw a concert" and a question like "Why did Jesse do this?", humans can easily infer that Jesse wanted "to see their favorite performer" or "to enjoy the music", and not "to see what's happening inside" or "to see if it works". The actions in Social IQa span a wide variety of social situations, and answer candidates contain both human-curated answers and adversarially-filtered machine-generated candidates. Social IQa contains over 37,000 QA pairs for evaluating models’ abilities to reason about the social implications of everyday events and situations.

Source: Social IQA
Image Source: https://arxiv.org/pdf/1904.09728.pdf

Variants: SIQA

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Question Answering LLaMA-3 8B+MoSLoRA (fine-tuned) Mixture-of-Subspaces in Low-Rank Adaptation 2024-06-16
Question Answering LLaMA-2 13B + MixLoRA MixLoRA: Enhancing Large Language Models … 2024-04-22
Question Answering LLaMA-2 7B + MixLoRA MixLoRA: Enhancing Large Language Models … 2024-04-22
Question Answering LLaMA-3 8B + MixLoRA MixLoRA: Enhancing Large Language Models … 2024-04-22
Question Answering phi-1.5 1.3B (zero-shot) Textbooks Are All You Need … 2023-09-11
Question Answering phi-1.5-web 1.3B (zero-shot) Textbooks Are All You Need … 2023-09-11
Question Answering LLaMA 13B (zero-shot) LLaMA: Open and Efficient Foundation … 2023-02-27
Question Answering LLaMA 33B (zero-shot) LLaMA: Open and Efficient Foundation … 2023-02-27
Question Answering LLaMA 7B (zero-shot) LLaMA: Open and Efficient Foundation … 2023-02-27
Question Answering LLaMA 65B (zero-shot) LLaMA: Open and Efficient Foundation … 2023-02-27
Question Answering DeBERTa-Large 304M Two is Better than Many? … 2022-10-29
Question Answering DeBERTa-Large 304M (classification-based) Two is Better than Many? … 2022-10-29
Question Answering CompassMTL 567M Task Compass: Scaling Multi-task Pre-training … 2022-10-12
Question Answering ExDeBERTa 567M Task Compass: Scaling Multi-task Pre-training … 2022-10-12
Question Answering CompassMTL 567M with Tailor Task Compass: Scaling Multi-task Pre-training … 2022-10-12
Question Answering Chinchilla (zero-shot) Training Compute-Optimal Large Language Models 2022-03-29
Question Answering Gopher (zero-shot) Scaling Language Models: Methods, Analysis … 2021-12-08
Question Answering Unicorn 11B (fine-tuned) UNICORN on RAINBOW: A Universal … 2021-03-24
Question Answering UnifiedQA 3B UnifiedQA: Crossing Format Boundaries With … 2020-05-02
Question Answering RoBERTa-Large 355M (fine-tuned) RoBERTa: A Robustly Optimized BERT … 2019-07-26

Research Papers

Recent papers with results on this dataset: