HellaSwag

Dataset Information
License
MIT
Homepage

Overview

HellaSwag is a challenge dataset for evaluating commonsense NLI that is specially hard for state-of-the-art models, though its questions are trivial for humans (>95% accuracy).

Variants: HellaSwag, HellaSwag (10-Shot), HellaSwag TR

Associated Benchmarks

This dataset is used in 4 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Question Answering Shakti-LLM (2.5B) SHAKTI: A 2.5 Billion Parameter … 2024-10-15
parameter-efficient fine-tuning LLaMA2-7b GIFT-SW: Gaussian noise Injected Fine-Tuning … 2024-08-27
Sentence Completion LLaMA3+MoSLoRA Mixture-of-Subspaces in Low-Rank Adaptation 2024-06-16
Sentence Completion LLaMA-2 13B + MixLoRA MixLoRA: Enhancing Large Language Models … 2024-04-22
Sentence Completion LLaMA-3 8B + MixLoRA MixLoRA: Enhancing Large Language Models … 2024-04-22
Sentence Completion LLaMA-2 7B + MixLoRA MixLoRA: Enhancing Large Language Models … 2024-04-22
parameter-efficient fine-tuning LLaMA2-7b DoRA: Weight-Decomposed Low-Rank Adaptation 2024-02-14
Sentence Completion Camelidae-8×34B (10-shot) Parameter-Efficient Sparsity Crafting from Dense … 2024-01-05
Sentence Completion Qwen2idae-16x14B (10-shot) Parameter-Efficient Sparsity Crafting from Dense … 2024-01-05
Sentence Completion OPT-6.7B LLM in a flash: Efficient … 2023-12-12
Sentence Completion LLM in a Flash (OPT-6.7B with Predictor) LLM in a flash: Efficient … 2023-12-12
Sentence Completion Mamba-1.4B Mamba: Linear-Time Sequence Modeling with … 2023-12-01
Sentence Completion Mamba-2.8B Mamba: Linear-Time Sequence Modeling with … 2023-12-01
Sentence Completion Falcon-180B (0-shot) The Falcon Series of Open … 2023-11-28
Sentence Completion Falcon-7B (0-shot) The Falcon Series of Open … 2023-11-28
Sentence Completion Falcon-40B (0-shot) The Falcon Series of Open … 2023-11-28
Sentence Completion Open-LLaMA-3B-v2 Sheared LLaMA: Accelerating Language Model … 2023-10-10
Sentence Completion Sheared-LLaMA-1.3B (50B) Sheared LLaMA: Accelerating Language Model … 2023-10-10
Sentence Completion Mistral 7B (0-shot) Mistral 7B 2023-10-10
Sentence Completion Sheared-LLaMA-2.7B (50B) Sheared LLaMA: Accelerating Language Model … 2023-10-10

Research Papers

Recent papers with results on this dataset: