← ML Research Wiki / 2401.08417

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Haoran Xu Equal contribution ♠ Johns Hopkins University ♡ Microsoft Equal contribution ♠ Johns Hopkins University ♡ Microsoft, ♠ Amr Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Sharaf ♡ Yunmo Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Weiting Tan Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Lingfeng Shen Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Benjamin Van Durme Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Kenton Murray Equal contribution ♠ Johns Hopkins University ♡ Microsoft, ♠ Young Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Jin Kim Equal contribution ♠ Johns Hopkins University ♡ Microsoft (2024)

Paper Information

arXiv ID

2401.08417

Venue

International Conference on Machine Learning

Domain

Not specified

SOTA Claim

Yes

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

Moderate-sized large language models (LLMs) -those with 7B or 13B parameters -exhibit promising machine translation (MT) performance.However, they do not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4 (OpenAI, 2023).In this study, we bridge this performance gap.We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated.Then, in contrast to supervised fine-tuning which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations.Applying CPO to ALMA (Xu et al., 2023) models with only 22K parallel sentences and tuning only 0.1% parameters yields significant improvements.The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4 on WMT'21, WMT'22 and WMT'23 test datasets.

Summary

This paper introduces Contrastive Preference Optimization (CPO), a novel method aimed at enhancing the performance of moderate-sized large language models (LLMs) in the field of machine translation (MT). It identifies the limitations of supervised fine-tuning (SFT) in MT due to its reliance on potentially flawed human-generated reference data. CPO focuses on teaching models to avoid merely replicating imperfect translations and instead pushes for higher quality outputs. The authors apply CPO to modify existing models, specifically the ALMA model which initially achieved competitive results due to prior fine-tuning on both multilingual non-English data and high-quality parallel data. Their enhanced model, ALMA-R, achieves performance that matches or exceeds that of state-of-the-art models including GPT-4 and past WMT competition winners across multiple test datasets. The study also critically assesses the quality of reference translations and advocates for the use of reference-free evaluation metrics.

Methods

This paper employs the following methods:

Contrastive Preference Optimization
Supervised Fine-Tuning

Models Used

ALMA
ALMA-R
GPT-4
ALMA-13B-LoRA

Datasets

The following datasets were used in this research:

FLORES-200
WMT'21
WMT'22
WMT'23

Evaluation Metrics

BLEU
COMET
KIWI-XXL
XCOMET

Results

ALMA-R matches or exceeds the performance of GPT-4 and WMT competition winners
ALMA-R demonstrates significant improvements over the original ALMA model and other state-of-the-art models

Limitations

The authors identified the following limitations:

Not specified

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 47
Influential Citations: 26

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Related Papers