BIG-bench

Beyond the Imitation Game Benchmark

Dataset Information
Modalities
Texts
Languages
English
Introduced
2022
Homepage

Overview

The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. Big-bench include more than 200 tasks.

Image source: https://arxiv.org/pdf/2206.04615.pdf

Variants: BBH-nlp, BBH-alg, Big-bench Hard, BIG-bench (Logical Sequence), BIG-bench (Logical Fallacy Detection), BIG-bench (Known Unknowns), BIG-bench (Hindu Knowledge), BIG-bench (Novel Concepts), BIG-bench (StrategyQA), BIG-bench (Winowhy), BIG-bench (Logic Grid Puzzle), BIG-bench (Anachronisms), BIG-bench (Temporal Sequences), BIG-bench (Sports Understanding), BIG-bench (SNARKS), BIG-bench (Ruin Names), BIG-bench (Reasoning About Colored Objects), BIG-bench (Penguins In A Table), BIG-bench (Navigate), BIG-bench (Movie Recommendation), BIG-bench (Hyperbaton), BIG-bench (Formal Fallacies Syllogisms Negation), BIG-bench (Disambiguation QA), BIG-bench (Date Understanding), BIG-bench (Causal Judgment), BIG-bench-lite, Big-bench Lite, BIG-bench

Associated Benchmarks

This dataset is used in 41 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
General Knowledge Chinchilla-70B (few-shot, k=5) Training Compute-Optimal Large Language Models 2022-03-29
Intent Recognition Chinchilla-70B (few-shot, k=5) Training Compute-Optimal Large Language Models 2022-03-29
Human Organs Senses Multiple Choice Chinchilla-70B (few-shot, k=5) Training Compute-Optimal Large Language Models 2022-03-29
Identify Odd Metapor Chinchilla-70B (few-shot, k=5) Training Compute-Optimal Large Language Models 2022-03-29
Analogical Similarity Chinchilla-70B (few-shot, k=5) Training Compute-Optimal Large Language Models 2022-03-29
Odd One Out Chinchilla-70B (few-shot, k=5) Training Compute-Optimal Large Language Models 2022-03-29
Logical Fallacies Gopher-280B (few-shot, k=5) Scaling Language Models: Methods, Analysis … 2021-12-08
International Law Gopher-280B (few-shot, k=5) Scaling Language Models: Methods, Analysis … 2021-12-08
Odd One Out Gopher-280B (few-shot, k=5) Scaling Language Models: Methods, Analysis … 2021-12-08
High School World History Gopher-280B (few-shot, k=5) Scaling Language Models: Methods, Analysis … 2021-12-08
World Religions Gopher-280B (few-shot, k=5) Scaling Language Models: Methods, Analysis … 2021-12-08
Jurisprudence Gopher-280B (few-shot, k=5) Scaling Language Models: Methods, Analysis … 2021-12-08
High School US History Gopher-280B (few-shot, k=5) Scaling Language Models: Methods, Analysis … 2021-12-08
Management Gopher-280B (few-shot, k=5) Scaling Language Models: Methods, Analysis … 2021-12-08
Prehistory Gopher-280B (few-shot, k=5) Scaling Language Models: Methods, Analysis … 2021-12-08
High School European History Gopher-280B (few-shot, k=5) Scaling Language Models: Methods, Analysis … 2021-12-08
Philosophy Gopher-280B (few-shot, k=5) Scaling Language Models: Methods, Analysis … 2021-12-08
Professional Law Gopher-280B (few-shot, k=5) Scaling Language Models: Methods, Analysis … 2021-12-08
Marketing Gopher-280B (few-shot, k=5) Scaling Language Models: Methods, Analysis … 2021-12-08
General Knowledge Gopher-280B (few-shot, k=5) Scaling Language Models: Methods, Analysis … 2021-12-08

Research Papers

Recent papers with results on this dataset: