Yifan Li Gaoling School of Artificial Intelligence Renmin University of China Beijing Key Laboratory of Big Data Management and Analysis Methods 4 Meituan Group, Yifan Du Gaoling School of Artificial Intelligence Renmin University of China Beijing Key Laboratory of Big Data Management and Analysis Methods 4 Meituan Group, Kun Zhou School of Information Renmin University of China, Jinpeng Wang [email protected], Wayne Xin Zhao School of Information Renmin University of China Beijing Key Laboratory of Big Data Management and Analysis Methods 4 Meituan Group, Ji-Rong Wen [email protected] Gaoling School of Artificial Intelligence Renmin University of China School of Information Renmin University of China Beijing Key Laboratory of Big Data Management and (2023)
This paper investigates the phenomenon of object hallucination in large vision-language models (LVLMs), wherein these models generate objects that do not exist in the images they describe. The authors conduct a systematic study, evaluating several representative LVLMs and presenting experimental results that indicate significant hallucination issues, particularly in comparison to smaller models. They highlight that visual instructions influence hallucinations, with frequently appearing objects being more prone to erroneous generation. A new evaluation method called Polling-based Object Probing Evaluation (POPE) is proposed, demonstrating improved stability and flexibility over existing metrics such as the Caption Hallucination Assessment with Image Relevance (CHAIR). The authors conclude with discussions on the implications and limitations of their findings, noting that further investigation into more LVLMs is necessary.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: