Haoran Xu Equal contribution ♠ Johns Hopkins University ♡ Microsoft Equal contribution ♠ Johns Hopkins University ♡ Microsoft, ♠ Amr Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Sharaf ♡ Yunmo Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Weiting Tan Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Lingfeng Shen Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Benjamin Van Durme Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Kenton Murray Equal contribution ♠ Johns Hopkins University ♡ Microsoft, ♠ Young Equal contribution ♠ Johns Hopkins University ♡ Microsoft, Jin Kim Equal contribution ♠ Johns Hopkins University ♡ Microsoft (2024)
This paper introduces Contrastive Preference Optimization (CPO), a novel method aimed at enhancing the performance of moderate-sized large language models (LLMs) in the field of machine translation (MT). It identifies the limitations of supervised fine-tuning (SFT) in MT due to its reliance on potentially flawed human-generated reference data. CPO focuses on teaching models to avoid merely replicating imperfect translations and instead pushes for higher quality outputs. The authors apply CPO to modify existing models, specifically the ALMA model which initially achieved competitive results due to prior fine-tuning on both multilingual non-English data and high-quality parallel data. Their enhanced model, ALMA-R, achieves performance that matches or exceeds that of state-of-the-art models including GPT-4 and past WMT competition winners across multiple test datasets. The study also critically assesses the quality of reference translations and advocates for the use of reference-free evaluation metrics.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: