OpenSubtitles

Name: OpenSubtitles
License: Unknown

Dataset Information

Languages

Russian

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

OpenSubtitles is collection of multilingual parallel corpora. The dataset is compiled from a large database of movie and TV subtitles and includes a total of 1689 bitexts spanning 2.6 billion sentences across 60 languages.

Variants: OpenSubtitles, OpenSubtitles

Associated Benchmarks

This dataset is used in 2 benchmarks:

Machine Translation - Metrics: BLEU score, METEOR
Language Identification - Metrics: Accuracy

Recent Benchmark Submissions

Task	Model	Paper	Date
Machine Translation	Fine tuned MarianMT	Crossing Language Borders: A Pipeline …	2025-01-03
Language Identification	Apple bi-LSTM	A reproduction of Apple's bi-directional …	2021-02-11

Research Papers

Recent papers with results on this dataset:

External Links:

OpenSubtitles

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview