← ML Research Wiki / 2310.11511

SELF-RAG: LEARNING TO RETRIEVE, GENERATE, AND CRITIQUE THROUGH SELF-REFLECTION

Akari Asai University of Washington Allen Institute for AI ‡ IBM Research AI, Zeqiu Wu [email protected] University of Washington Allen Institute for AI ‡ IBM Research AI, Yizhong Wang [email protected] University of Washington Allen Institute for AI ‡ IBM Research AI, Avirup Sil University of Washington Allen Institute for AI ‡ IBM Research AI, Hannaneh Hajishirzi [email protected] University of Washington Allen Institute for AI ‡ IBM Research AI (2023)

Paper Information

arXiv ID

2310.11511

Venue

International Conference on Learning Representations

Domain

Natural Language Processing

SOTA Claim

Yes

Code

Available

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Related Work
External Resources

Abstract

Despite their remarkable capabilities, large language models (LLMs) often produce responses containing factual inaccuracies due to their sole reliance on the parametric knowledge they encapsulate.Retrieval-Augmented Generation (RAG), an ad hoc approach that augments LMs with retrieval of relevant knowledge, decreases such issues.However, indiscriminately retrieving and incorporating a fixed number of retrieved passages, regardless of whether retrieval is necessary, or passages are relevant, diminishes LM versatility or can lead to unhelpful response generation.We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (SELF-RAG) that enhances an LM's quality and factuality through retrieval and self-reflection.Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens, called reflection tokens.Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements.Experiments show that SELF-RAG (7B and 13B parameters) significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks.Specifically, SELF-RAG outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA, reasoning and fact verification tasks, and it shows significant gains in improving factuality and citation accuracy for long-form generations relative to these models. 1 1 Our code and trained models are available at https://selfrag.github.io/.

Summary

The paper presents a new framework named Self-Reflective Retrieval-Augmented Generation (SELF-RAG) which aims to improve large language models (LLMs) by integrating adaptive retrieval and self-reflection components. The framework addresses the limitation of traditional Retrieval-Augmented Generation (RAG) methods that may introduce irrelevant information, thereby reducing versatility and quality in output generation. SELF-RAG introduces reflection tokens that guide the model in determining when retrieval is necessary and critiquing its own responses. The framework trains an LLM end-to-end to better reflect on prior outputs and retrieved passages, enabling on-demand retrieval and enhancing factual accuracy in various tasks. Empirical evaluations demonstrate that SELF-RAG significantly outperforms state-of-the-art models like ChatGPT and Llama2-chat in tasks including open-domain question answering, reasoning, and long-form generation, showcasing improvements in factuality and citation accuracy.

Methods

This paper employs the following methods:

RAG
SELF-RAG

Models Used

ChatGPT
Llama2
SELF-RAG

Datasets

The following datasets were used in this research:

PubHealth
ARC-Preprint Challenge
PopQA
TriviaQA-unfiltered
ALCE-ASQA

Evaluation Metrics

Accuracy
FactScore
MAUVE
ROUGE
citation precision
citation recall

Results

SELF-RAG outperforms ChatGPT and Llama2-chat on Open-domain QA and fact verification tasks
SELF-RAG improves citation accuracy for long-form generations

Technical Requirements

Number of GPUs: 4
GPU Type: Nvidia A100 80GB

Keywords

retrieval-augmented generation self-reflection self-critique controllable generation factuality learning to retrieve and critique

Papers Using Similar Methods

External Resources

Funding: Various including DARPA, NSF, IBM, OpenAI, and others
References: 57
Influential Citations: 92

SELF-RAG: LEARNING TO RETRIEVE, GENERATE, AND CRITIQUE THROUGH SELF-REFLECTION

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Technical Requirements edit

Keywords add

Related Papers