← ML Research Wiki / 2301.12652

REPLUG: Retrieval-Augmented Black-Box Language Models

Weijia Shi University of Washington, Sewon Min University of Washington, Michihiro Yasunaga Stanford University 3 KAIST, Minjoon Seo, Rich James Meta AI, Mike Lewis Meta AI, Luke Zettlemoyer University of Washington Meta AI, Wen-Tau Yih Meta AI (2023)

Paper Information
arXiv ID
Venue
North American Chapter of the Association for Computational Linguistics
Domain
natural language processing
Reproducibility
6/10

Abstract

We introduce REPLUG, a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model.Unlike prior retrieval-augmented LMs that train language models with special cross attention mechanisms to encode the retrieved text, REPLUG simply prepends retrieved documents to the input for the frozen black-box LM.This simple design can be easily applied to any existing retrieval and language models.Furthermore, we show that the LM can be used to supervise the retrieval model, which can then find documents that help the LM make better predictions.Our experiments demonstrate that REPLUG with the tuned retriever significantly improves the performance of GPT-3 (175B) on language modeling by 6.3%, as well as the performance of Codex on five-shot MMLU by 5.1%.

Summary

The paper introduces REPLUG, a retrieval-augmented language modeling framework that improves the performance of large language models (LLMs) by treating them as black boxes and integrating a tuneable retrieval model. Unlike prior approaches that modify the LMs, REPLUG simply prepends retrieved documents to the input context, allowing it to enhance any existing black-box LM. The authors demonstrate that this approach significantly improves the performance of models like GPT-3 and Codex, reporting enhancements of up to 6.3% and 5.1% on respective tasks. The proposed methodology also includes REPLUG LSR, a training mechanism that utilizes the language model to better adapt the retrieval strategy, further boosting performance on benchmarks such as MMLU. REPLUG has been shown to increase language modeling effectiveness and improve accuracy across different applications.

Methods

This paper employs the following methods:

  • REPLUG
  • REPLUG LSR

Models Used

  • GPT-3
  • Codex
  • OPT
  • BLOOM

Datasets

The following datasets were used in this research:

  • The Pile
  • MMLU
  • Natural Questions
  • TriviaQA

Evaluation Metrics

  • Bits per UTF-8 encoded byte (BPB)
  • Language Model Perplexity
  • Accuracy

Results

  • REPLUG improves GPT-3 language modeling by 6.3%
  • Codex performance on five-shot MMLU improves by 5.1%
  • REPLUG LSR improves Codex by 5.1%
  • REPLUG achieves a 7.7% improvement across various language models

Limitations

The authors identified the following limitations:

  • Not specified

Technical Requirements

  • Number of GPUs: 72
  • GPU Type: A100 80GB

Keywords

retrieval-augmented models black-box language models large language models retrieval training scheme

External Resources