ML Research Wiki / Benchmarks / Question Answering / StoryCloze

StoryCloze

Question Answering Benchmark

Performance Over Time

📊 Showing 20 results | 📏 Metric: Accuracy

Top Performing Models

Rank Model Paper Accuracy Date Code
1 BLOOMZ Crosslingual Generalization through Multitask Finetuning 96.30 2022-11-03 📦 bigscience-workshop/xmtf
2 Flipped-3B Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners 95.88 2022-10-06 📦 seonghyeonye/flipped-learning
3 FLAN 137B (few-shot, k=10) Finetuned Language Models Are Zero-Shot Learners 94.70 2021-09-03 📦 hiyouga/llama-efficient-tuning 📦 bigcode-project/starcoder 📦 bigscience-workshop/promptsource
4 T0-3B (CoT fine-tuned) The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning 94.50 2023-05-23 📦 kaistai/cot-collection 📦 kaist-lklab/cot-collection
5 KiC-770M Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models 94.40 2022-10-28 -
6 FLAN 137B (zero-shot) Finetuned Language Models Are Zero-Shot Learners 93.40 2021-09-03 📦 hiyouga/llama-efficient-tuning 📦 bigcode-project/starcoder 📦 bigscience-workshop/promptsource
7 Reading Strategies Model Improving Machine Reading Comprehension with General Reading Strategies 88.30 2018-10-31 📦 nlpdata/strategy
8 RoE-3B Exploring the Benefits of Training Expert Language Models over Instruction Tuning 86.33 2023-02-07 📦 joeljang/rlphf 📦 joeljang/elm
9 OPT-175B SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot 79.82 2023-01-02 📦 nvidia/tensorrt-model-optimizer 📦 ist-daslab/sparsegpt 📦 nvlabs/maskllm
10 SparseGPT (175B, 50% Sparsity) SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot 78.87 2023-01-02 📦 nvidia/tensorrt-model-optimizer 📦 ist-daslab/sparsegpt 📦 nvlabs/maskllm

All Papers (20)

Efficient Language Modeling with Sparse all-MLP

2022
sMLP – deterministic 9.4B (0-shot)

Language Models are Few-Shot Learners

2020
GPT-3 Large 760M (zero-shot)