Massive Multitask Language Understanding
MMLU (Massive Multitask Language Understanding) is a new benchmark designed to measure knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings. This makes the benchmark more challenging and more similar to how we evaluate humans. The benchmark covers 57 subjects across STEM, the humanities, the social sciences, and more. It ranges in difficulty from an elementary level to an advanced professional level, and it tests both world knowledge and problem solving ability. Subjects range from traditional areas, such as mathematics and history, to more specialized areas like law and ethics. The granularity and breadth of the subjects makes the benchmark ideal for identifying a model’s blind spots.
Image source: https://arxiv.org/pdf/2009.03300v3.pdf
Variants: mmlu, MMLU TR v0.2, MMLU TR, MMLU (5-Shot), MMLU (College Medicine), MMLU (Professional medicine), MMLU (Anatomy), MMLU (Clinical Knowledge), MMLU (Medical Genetics), MMLU (Mathematics), MMLU (Machine Learning), MMLU (High School Statistics), MMLU (High School Physics), MMLU (High School Mathematics), MMLU (High School Computer Science), MMLU (High School Chemistry), MMLU (High School Biology), MMLU (Formal Logic), MMLU (Elementary Mathematics), MMLU (Electrical Engineer), MMLU (Econometrics), MMLU (College Physics), MMLU (College Mathematics), MMLU (College Computer Science), MMLU (College Chemistry), MMLU (College Biology), MMLU (Astronomy), MMLU (Abstract Algebra), MML
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Question Answering | qwen-LLM 7B | SHAKTI: A 2.5 Billion Parameter … | 2024-10-15 |
Recent papers with results on this dataset: