TruthfulQA

Name: TruthfulQA
Published: 2021-09-08
License: Unknown

Dataset Information

Introduced

2021

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

TruthfulQA is a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. The authors crafted questions that some humans would answer falsely due to a false belief or misconception.

Image source: https://arxiv.org/pdf/2109.07958v1.pdf

Variants: TruthfulQA, TruthfulQA (0-shot), TruthfulQA TR v0.2, TruthfulQA TR

Associated Benchmarks

This dataset is used in 1 benchmark:

Question Answering - Metrics: MC1, MC2, % true, % info, % true (GPT-judge), BLEURT, ROUGE, BLEU, EM, Accuracy

Recent Benchmark Submissions

Task	Model	Paper	Date
Question Answering	Shakti-LLM (2.5B)	SHAKTI: A 2.5 Billion Parameter …	2024-10-15
Question Answering	CoA w/o actions	Chain-of-Action: Faithful and Multimodal Question …	2024-03-26
Question Answering	CoA	Chain-of-Action: Faithful and Multimodal Question …	2024-03-26
Question Answering	Mistral-7B-Instruct-v0.2 + TruthX	TruthX: Alleviating Hallucinations by Editing …	2024-02-27
Question Answering	LLaMa-2-7B-Chat + TruthX	TruthX: Alleviating Hallucinations by Editing …	2024-02-27
Question Answering	LLaMA-2-Chat-13B + Representation Control (Contrast Vector)	Representation Engineering: A Top-Down Approach …	2023-10-02
Question Answering	LLaMA-2-Chat-7B + Representation Control (Contrast Vector)	Representation Engineering: A Top-Down Approach …	2023-10-02
Question Answering	ToT	Tree of Thoughts: Deliberate Problem …	2023-05-17
Question Answering	GPT-4 (RLHF)	GPT-4 Technical Report	2023-03-15
Question Answering	LLaMA 13B	LLaMA: Open and Efficient Foundation …	2023-02-27
Question Answering	LLaMA 65B	LLaMA: Open and Efficient Foundation …	2023-02-27
Question Answering	LLaMA 7B	LLaMA: Open and Efficient Foundation …	2023-02-27
Question Answering	LLaMA 33B	LLaMA: Open and Efficient Foundation …	2023-02-27
Question Answering	GAL 120B	Galactica: A Large Language Model …	2022-11-16
Question Answering	GAL 30B	Galactica: A Large Language Model …	2022-11-16
Question Answering	OPT 175B	Galactica: A Large Language Model …	2022-11-16
Question Answering	GAL 125M	Galactica: A Large Language Model …	2022-11-16
Question Answering	GAL 1.3B	Galactica: A Large Language Model …	2022-11-16
Question Answering	GAL 6.7B	Galactica: A Large Language Model …	2022-11-16
Question Answering	Auto-CoT	Automatic Chain of Thought Prompting …	2022-10-07

Research Papers

Recent papers with results on this dataset:

External Links:

TruthfulQA

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview