This dataset is a benchmark for complex reasoning abilities in large language models, drawing on United Kingdom Linguistics Olympiad problems which cover a wide range of languages.
Variants: LingOly
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Logical Reasoning | Claude Opus | LINGOLY: A Benchmark of Olympiad-Level … | 2024-06-10 |
Logical Reasoning | GPT-4o | LINGOLY: A Benchmark of Olympiad-Level … | 2024-06-10 |
Logical Reasoning | Gemini 1.5 Pro | LINGOLY: A Benchmark of Olympiad-Level … | 2024-06-10 |
Logical Reasoning | GPT-4 | LINGOLY: A Benchmark of Olympiad-Level … | 2024-06-10 |
Logical Reasoning | Command R+ | LINGOLY: A Benchmark of Olympiad-Level … | 2024-06-10 |
Logical Reasoning | GPT-3.5 | LINGOLY: A Benchmark of Olympiad-Level … | 2024-06-10 |
Logical Reasoning | Mixtral 8x7B | LINGOLY: A Benchmark of Olympiad-Level … | 2024-06-10 |
Logical Reasoning | Llama 3 8B | LINGOLY: A Benchmark of Olympiad-Level … | 2024-06-10 |
Logical Reasoning | Llama 3 70B | LINGOLY: A Benchmark of Olympiad-Level … | 2024-06-10 |
Logical Reasoning | Gemma 7B | LINGOLY: A Benchmark of Olympiad-Level … | 2024-06-10 |
Logical Reasoning | Llama 2 70B | LINGOLY: A Benchmark of Olympiad-Level … | 2024-06-10 |
Recent papers with results on this dataset: