The MMLU-Pro dataset is an enhanced version of the Massive Multitask Language Understanding (MMLU) benchmark. It's designed to be more robust and challenging, aiming to rigorously benchmark large language models' capabilities in language comprehension and reasoning across diverse domains. Here are some key features of the MMLU-Pro dataset:
(1) TIGER-Lab/MMLU-Pro · Datasets at Hugging Face. https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro.
(2) MMLU-Pro: A More Robust and Challenging Multi-Task Language .... https://arxiv.org/abs/2406.01574.
(3) MMLU-Pro: An Upgraded Version of the MMLU Dataset | LLM Explorer Blog. https://llm.extractum.io/static/blog/?id=mmlu-pro-benchmark.
(4) TIGER-Lab Introduces MMLU-Pro Dataset for Comprehensive Benchmarking of .... https://www.marktechpost.com/2024/05/16/tiger-lab-introduces-mmlu-pro-dataset-for-comprehensive-benchmarking-of-large-language-models-capabilities-and-performance/.
(5) undefined. https://doi.org/10.48550/arXiv.2406.01574.
Variants: MMLU-Pro
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
MMLU | Orange-mini | MyGO Multiplex CoT: A Method … | 2025-01-20 |
Recent papers with results on this dataset: