HellaSwag

Name: HellaSwag
License: MIT

Dataset Information

License

MIT

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

HellaSwag is a challenge dataset for evaluating commonsense NLI that is specially hard for state-of-the-art models, though its questions are trivial for humans (>95% accuracy).

Variants: HellaSwag, HellaSwag (10-Shot), HellaSwag TR

Associated Benchmarks

This dataset is used in 4 benchmarks:

Question Answering - Metrics: Accuracy
Text Generation - Metrics: Accuracy, acc
parameter-efficient fine-tuning - Metrics: Accuracy (% )
Sentence Completion - Metrics: Accuracy

Recent Benchmark Submissions

Task	Model	Paper	Date
Question Answering	Shakti-LLM (2.5B)	SHAKTI: A 2.5 Billion Parameter …	2024-10-15
parameter-efficient fine-tuning	LLaMA2-7b	GIFT-SW: Gaussian noise Injected Fine-Tuning …	2024-08-27
Sentence Completion	LLaMA3+MoSLoRA	Mixture-of-Subspaces in Low-Rank Adaptation	2024-06-16
Sentence Completion	LLaMA-2 13B + MixLoRA	MixLoRA: Enhancing Large Language Models …	2024-04-22
Sentence Completion	LLaMA-3 8B + MixLoRA	MixLoRA: Enhancing Large Language Models …	2024-04-22
Sentence Completion	LLaMA-2 7B + MixLoRA	MixLoRA: Enhancing Large Language Models …	2024-04-22
parameter-efficient fine-tuning	LLaMA2-7b	DoRA: Weight-Decomposed Low-Rank Adaptation	2024-02-14
Sentence Completion	Camelidae-8×34B (10-shot)	Parameter-Efficient Sparsity Crafting from Dense …	2024-01-05
Sentence Completion	Qwen2idae-16x14B (10-shot)	Parameter-Efficient Sparsity Crafting from Dense …	2024-01-05
Sentence Completion	OPT-6.7B	LLM in a flash: Efficient …	2023-12-12
Sentence Completion	LLM in a Flash (OPT-6.7B with Predictor)	LLM in a flash: Efficient …	2023-12-12
Sentence Completion	Mamba-1.4B	Mamba: Linear-Time Sequence Modeling with …	2023-12-01
Sentence Completion	Mamba-2.8B	Mamba: Linear-Time Sequence Modeling with …	2023-12-01
Sentence Completion	Falcon-180B (0-shot)	The Falcon Series of Open …	2023-11-28
Sentence Completion	Falcon-7B (0-shot)	The Falcon Series of Open …	2023-11-28
Sentence Completion	Falcon-40B (0-shot)	The Falcon Series of Open …	2023-11-28
Sentence Completion	Open-LLaMA-3B-v2	Sheared LLaMA: Accelerating Language Model …	2023-10-10
Sentence Completion	Sheared-LLaMA-1.3B (50B)	Sheared LLaMA: Accelerating Language Model …	2023-10-10
Sentence Completion	Mistral 7B (0-shot)	Mistral 7B	2023-10-10
Sentence Completion	Sheared-LLaMA-2.7B (50B)	Sheared LLaMA: Accelerating Language Model …	2023-10-10

Research Papers

Recent papers with results on this dataset:

External Links:

HellaSwag

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview