ML Research Wiki / Benchmarks / Sentence Completion / HellaSwag

HellaSwag

Sentence Completion Benchmark

Performance Over Time

📊 Showing 86 results | 📏 Metric: Accuracy

Top Performing Models

Rank Model Paper Accuracy Date Code
1 CompassMTL 567M with Tailor Task Compass: Scaling Multi-task Pre-training with Task Prefix 96.10 2022-10-12 📦 cooelf/compassmtl
2 CompassMTL 567M Task Compass: Scaling Multi-task Pre-training with Task Prefix 95.60 2022-10-12 📦 cooelf/compassmtl
3 DeBERTa-Large 304M (classification-based) Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering 95.60 2022-10-29 📦 declare-lab/team
4 GPT-4 (10-shot) GPT-4 Technical Report 95.30 2023-03-15 📦 openai/evals 📦 shmsw25/factscore 📦 unispac/visual-adversarial-examples-jailbreak-large-language-models
5 LLaMA3+MoSLoRA Mixture-of-Subspaces in Low-Rank Adaptation 95.00 2024-06-16 📦 wutaiqiang/moslora
6 DeBERTa-Large 304M Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering 94.70 2022-10-29 📦 declare-lab/team
7 LLaMA-2 13B + MixLoRA MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts 94.70 2024-04-22 📦 TUDB-Labs/MixLoRA 📦 mikecovlee/mLoRA
8 Unicorn 11B (fine-tuned) 📚 UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark 93.90 2021-03-24 📦 allenai/rainbow
9 LLaMA-3 8B + MixLoRA MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts 93.30 2024-04-22 📦 TUDB-Labs/MixLoRA 📦 mikecovlee/mLoRA
10 LLaMA-2 7B + MixLoRA MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts 93.10 2024-04-22 📦 TUDB-Labs/MixLoRA 📦 mikecovlee/mLoRA

All Papers (86)

DiscoSense: Commonsense Reasoning with Discourse Connectives

2022
ELECTRA-Large 335M (fine-tuned on DiscoSense and HellaSwag)

Stay on topic with Classifier-Free Guidance

2023
LLaMA 65B + CFG (0-shot)

Stay on topic with Classifier-Free Guidance

2023
LLaMA 30B + CFG (0-shot)

Stay on topic with Classifier-Free Guidance

2023
LLaMA 13B + CFG (0-shot)

Language Models are Few-Shot Learners

2020
GPT-3 175B (few-shot, k=32)

Efficient Language Modeling with Sparse all-MLP

2022
sMLP – deterministic 9.4B (0-shot)

Language Models are Few-Shot Learners

2020
GPT-3 Large 760M (0-shot)