← ML Research Wiki / 2311.05232

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

Lei Huang [email protected] Harbin Institute of Technology HarbinChina, Weijiang Yu [email protected] Huawei Inc ShenzhenChina, Weitao Ma [email protected] Harbin Institute of Technology HarbinChina, Weihong Zhong [email protected] Harbin Institute of Technology HarbinChina, Zhangyin Feng [email protected] Harbin Institute of Technology HarbinChina, Haotian Wang Harbin Institute of Technology HarbinChina, Qianglong Chen [email protected] Huawei Inc ShenzhenChina, Weihua Peng Huawei Inc ShenzhenChina, Xiaocheng Feng xcfeng†@ir.hit.edu.cn Harbin Institute of Technology HarbinChina, Bing Qin [email protected] Harbin Institute of Technology HarbinChina, Ting Liu [email protected] Harbin Institute of Technology HarbinChina (2023)

Paper Information

arXiv ID

2311.05232

Venue

ACM Transactions on Information Systems

Domain

Artificial Intelligence, Machine Learning, Natural Language Processing

SOTA Claim

Yes

Reproducibility

8/10

Contents

Abstract
Datasets
Related Work
External Resources

Abstract

The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), leading to remarkable advancements in text understanding and generation.Nevertheless, alongside these strides, LLMs exhibit a critical tendency to produce hallucinations, resulting in content that is inconsistent with real-world facts or user inputs.This phenomenon poses substantial challenges to their practical deployment and raises concerns over the reliability of LLMs in real-world scenarios, which attracts increasing attention to detect and mitigate these hallucinations.In this survey, we aim to provide a thorough and in-depth overview of recent advances in the field of LLM hallucinations.We begin with an innovative taxonomy of LLM hallucinations, then delve into the factors contributing to hallucinations.Subsequently, we present a comprehensive overview of hallucination detection methods and benchmarks.Additionally, representative approaches designed to mitigate hallucinations are introduced accordingly.Finally, we analyze the challenges that highlight the current limitations and formulate open questions, aiming to delineate pathways for future research on hallucinations in LLMs. 1

Summary

This academic survey investigates hallucination in large language models (LLMs), highlighting their tendency to generate plausible but factually incorrect content and the implications for their reliability in real-world scenarios. It presents a new taxonomy of hallucination types: factuality hallucinations (discrepancies with verifiable facts) and faithfulness hallucinations (divergences from user instructions). The survey outlines contributing factors to these hallucinations—data flaws, training issues, and inference complications—while also reviewing various detection methods and evaluation benchmarks. Additionally, it discusses mitigation strategies aimed at improving the alignment of LLM outputs with factual correctness and user directives, highlighting the ongoing challenges and concerns that remain in ensuring LLM reliability and safety in practical applications. Finally, the paper identifies open questions for future research in the realm of LLM hallucinations, advocating for a deeper understanding of their sources and solutions.

Models Used

GPT-3
PaLM
Galactica
LLaMA
GPT-4

Datasets

The following datasets were used in this research:

TruthfulQA
REALTIMEQA
Med-HALT
FACTOR
ChineseFactEval
HalluQA

Evaluation Metrics

ROUGE
Accuracy

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

Hallucination Large Language Models Detection Mitigation Evaluation Benchmarks Factuality Faithfulness

External Resources

Funding: Not specified
References: 287
Influential Citations: 21

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

Abstract edit

Summary

Models Used add

Datasets add

Evaluation Metrics add

Technical Requirements edit

Keywords add

Related Papers