The Bamboogle dataset is a collection of questions that was constructed to investigate the ability of language models to perform compositional reasoning tasks. The dataset is made up of questions that Google answers incorrectly. It covers many different types of questions on various areas, written in unique ways.
Variants: Bamboogle
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Question Answering | ReST meets ReAct (PaLM 2-L + Google Search) | ReST meets ReAct: Self-Improvement for … | 2023-12-15 |
Question Answering | FireAct | FireAct: Toward Language Agent Fine-tuning | 2023-10-09 |
Question Answering | RALM (LLaMA2-13B + Google Search) | Making Retrieval-Augmented Language Models Robust … | 2023-10-02 |
Question Answering | MCR (code-davinci-002) + Google Search | Answering Questions by Meta-Reasoning over … | 2023-04-25 |
Question Answering | Direct Prompting (GPT-3; davinci-002) | Measuring and Narrowing the Compositionality … | 2022-10-07 |
Question Answering | Google Search | Measuring and Narrowing the Compositionality … | 2022-10-07 |
Question Answering | Self-ask (GPT-3; davinci-002) + Google Search | Measuring and Narrowing the Compositionality … | 2022-10-07 |
Question Answering | Self-ask (GPT-3; davinci-002) | Measuring and Narrowing the Compositionality … | 2022-10-07 |
Question Answering | Chain-of-Thought (GPT-3; davinci-002) | Measuring and Narrowing the Compositionality … | 2022-10-07 |
Recent papers with results on this dataset: