← ML Research Wiki / 2401.08417

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Haoran Xu Equal contribution ♠ Johns Hopkins University ♡ Microsoft Equal contribution ♠ Johns Hopkins University ♡ Microsoft, ♠ Amr Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Sharaf ♡ Yunmo Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Weiting Tan Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Lingfeng Shen Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Benjamin Van Durme Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Kenton Murray Equal contribution ♠ Johns Hopkins University ♡ Microsoft, ♠ Young Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Jin Kim Equal contribution ♠ Johns Hopkins University ♡ Microsoft (2024)

Paper Information
arXiv ID
Venue
International Conference on Machine Learning
Domain
Not specified
SOTA Claim
Yes

Abstract

Moderate-sized large language models (LLMs) -those with 7B or 13B parameters -exhibit promising machine translation (MT) performance.However, they do not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4 (OpenAI, 2023).In this study, we bridge this performance gap.We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated.Then, in contrast to supervised fine-tuning which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations.Applying CPO to ALMA (Xu et al., 2023) models with only 22K parallel sentences and tuning only 0.1% parameters yields significant improvements.The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4 on WMT'21, WMT'22 and WMT'23 test datasets.

Summary

This paper introduces Contrastive Preference Optimization (CPO), a novel method aimed at enhancing the performance of moderate-sized large language models (LLMs) in the field of machine translation (MT). It identifies the limitations of supervised fine-tuning (SFT) in MT due to its reliance on potentially flawed human-generated reference data. CPO focuses on teaching models to avoid merely replicating imperfect translations and instead pushes for higher quality outputs. The authors apply CPO to modify existing models, specifically the ALMA model which initially achieved competitive results due to prior fine-tuning on both multilingual non-English data and high-quality parallel data. Their enhanced model, ALMA-R, achieves performance that matches or exceeds that of state-of-the-art models including GPT-4 and past WMT competition winners across multiple test datasets. The study also critically assesses the quality of reference translations and advocates for the use of reference-free evaluation metrics.

Methods

This paper employs the following methods:

  • Contrastive Preference Optimization
  • Supervised Fine-Tuning

Models Used

  • ALMA
  • ALMA-R
  • GPT-4
  • ALMA-13B-LoRA

Datasets

The following datasets were used in this research:

  • FLORES-200
  • WMT'21
  • WMT'22
  • WMT'23

Evaluation Metrics

  • BLEU
  • COMET
  • KIWI-XXL
  • XCOMET

Results

  • ALMA-R matches or exceeds the performance of GPT-4 and WMT competition winners
  • ALMA-R demonstrates significant improvements over the original ALMA model and other state-of-the-art models

Limitations

The authors identified the following limitations:

  • Not specified

Technical Requirements

  • Number of GPUs: None specified
  • GPU Type: None specified

Papers Using Similar Methods

External Resources