← ML Research Wiki / 1406.1078

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau [email protected], Fethi Bougares, Holger Schwenk, Yoshua Bengio, Université de Montréal Jacobs University Germany, Université du Maine France, Université de Montréal CIFAR Senior Fellow (2014)

Paper Information

arXiv ID

1406.1078

Venue

Conference on Empirical Methods in Natural Language Processing

Domain

Natural Language Processing, Machine Translation

SOTA Claim

Yes

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixedlength vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.

Summary

This paper proposes a novel neural network model called RNN Encoder-Decoder for statistical machine translation. The model consists of two recurrent neural networks (RNN) where one encodes a source sequence into a fixed-length vector and the other decodes it back into a target sequence. The proposed RNN Encoder-Decoder jointly optimizes the conditional probabilities of the target sequence based on the source sequence, significantly improving the performance of a statistical machine translation system. The model captures semantically and syntactically meaningful representations of phrases, and qualitative analyses demonstrate its ability to better handle linguistic regularities compared to traditional translation models. The evaluation focuses on English to French translation using the WMT'14 workshop datasets and reveals improved BLEU scores when integrating features from the RNN Encoder-Decoder into existing systems.

Methods

This paper employs the following methods:

RNN Encoder-Decoder
Recurrent Neural Networks

Models Used

RNN Encoder-Decoder

Datasets

The following datasets were used in this research:

Europarl
news commentary
UN
crawled corpora

Evaluation Metrics

BLEU

Results

The proposed model improves translation performance in terms of BLEU scores.
Qualitative analysis shows better handling of linguistic regularities by the RNN Encoder-Decoder.

Limitations

The authors identified the following limitations:

Not specified

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

RNN Encoder-Decoder sequence-to-sequence models neural language models statistical machine translation

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 33
Influential Citations: 2959

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers