← ML Research Wiki / 2307.03172

Lost in the Middle: How Language Models Use Long Contexts

Nelson F Liu [email protected] Stanford University USA, Kevin Lin University of California BerkeleyUSA, John Hewitt Stanford University USA, Ashwin Paranjape Samaya AI UK Samaya AI USA, Michele Bevilacqua Samaya AI UK, Fabio Petroni Samaya AI UK, Percy Liang Stanford University USA (2023)

Paper Information

arXiv ID

2307.03172

Venue

Transactions of the Association for Computational Linguistics

Domain

natural language processing

Reproducibility

7/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context.We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts: multi-document question answering and key-value retrieval.We find that performance can degrade significantly when changing the position of relevant information, indicating that current language models do not robustly make use of information in long input contexts.In particular, we observe that performance is often highest when relevant information occurs at the beginning or end of the input context, and significantly degrades when models must access relevant information in the middle of long contexts, even for explicitly long-context models.Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long-context language models.

Summary

In the paper titled "Lost in the Middle: How Language Models Use Long Contexts," the authors analyze the efficacy of language models in utilizing long input contexts. They conducted experiments focusing on multi-document question answering and key-value retrieval tasks, revealing that model performance significantly degrades when relevant information is located in the middle of the input context. Specifically, performance peaks at the beginning and end of the context, which indicates a primacy and recency bias. The study investigates various factors affecting this performance, such as model architecture, query-aware contextualization, and instruction fine-tuning. The findings suggest that while extended-context models exist, they are not necessarily better at using long contexts effectively. Further, a case study on open-domain question answering illustrates that simply increasing context length does not guarantee improved model performance, as performance saturates even before retrieval accuracy peaks. The paper concludes by proposing new evaluation protocols for future long-context models and releasing their code and evaluation data for further research.

Methods

This paper employs the following methods:

Transformer
query-aware contextualization

Models Used

MPT-30B-Instruct
LongChat-13B (16K)
GPT-3.5-Turbo
Claude-1.3

Datasets

The following datasets were used in this research:

NaturalQuestions-Open

Evaluation Metrics

Accuracy

Results

Performance degrades significantly when relevant information is in the middle of the input context
High performance is observed when relevant information is at the start or end (primacy and recency bias)
Extended-context models do not outperform regular-context models in effectively using input context
Saturation of model performance occurs before saturation of retrieval accuracy in open-domain question answering

Limitations

The authors identified the following limitations:

Current language models do not robustly access and utilize information in long input contexts
Increased context length raises reasoning challenges, potentially decreasing accuracy
Limited exploration of other decoding methods beyond greedy decoding

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

long input context transformers model architecture question answering retrieval

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 58
Influential Citations: 86

Lost in the Middle: How Language Models Use Long Contexts

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers