TruthfulQA

Dataset Information
Introduced
2021
License
Unknown
Homepage

Overview

TruthfulQA is a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. The authors crafted questions that some humans would answer falsely due to a false belief or misconception.

Image source: https://arxiv.org/pdf/2109.07958v1.pdf

Variants: TruthfulQA, TruthfulQA (0-shot), TruthfulQA TR v0.2, TruthfulQA TR

Associated Benchmarks

This dataset is used in 1 benchmark:

  • Question Answering -

Recent Benchmark Submissions

Task Model Paper Date
Question Answering Shakti-LLM (2.5B) SHAKTI: A 2.5 Billion Parameter … 2024-10-15
Question Answering CoA w/o actions Chain-of-Action: Faithful and Multimodal Question … 2024-03-26
Question Answering CoA Chain-of-Action: Faithful and Multimodal Question … 2024-03-26
Question Answering Mistral-7B-Instruct-v0.2 + TruthX TruthX: Alleviating Hallucinations by Editing … 2024-02-27
Question Answering LLaMa-2-7B-Chat + TruthX TruthX: Alleviating Hallucinations by Editing … 2024-02-27
Question Answering LLaMA-2-Chat-13B + Representation Control (Contrast Vector) Representation Engineering: A Top-Down Approach … 2023-10-02
Question Answering LLaMA-2-Chat-7B + Representation Control (Contrast Vector) Representation Engineering: A Top-Down Approach … 2023-10-02
Question Answering ToT Tree of Thoughts: Deliberate Problem … 2023-05-17
Question Answering GPT-4 (RLHF) GPT-4 Technical Report 2023-03-15
Question Answering LLaMA 13B LLaMA: Open and Efficient Foundation … 2023-02-27
Question Answering LLaMA 65B LLaMA: Open and Efficient Foundation … 2023-02-27
Question Answering LLaMA 7B LLaMA: Open and Efficient Foundation … 2023-02-27
Question Answering LLaMA 33B LLaMA: Open and Efficient Foundation … 2023-02-27
Question Answering GAL 120B Galactica: A Large Language Model … 2022-11-16
Question Answering GAL 30B Galactica: A Large Language Model … 2022-11-16
Question Answering OPT 175B Galactica: A Large Language Model … 2022-11-16
Question Answering GAL 125M Galactica: A Large Language Model … 2022-11-16
Question Answering GAL 1.3B Galactica: A Large Language Model … 2022-11-16
Question Answering GAL 6.7B Galactica: A Large Language Model … 2022-11-16
Question Answering Auto-CoT Automatic Chain of Thought Prompting … 2022-10-07

Research Papers

Recent papers with results on this dataset: