OpenSubtitles is collection of multilingual parallel corpora. The dataset is compiled from a large database of movie and TV subtitles and includes a total of 1689 bitexts spanning 2.6 billion sentences across 60 languages.
Variants: OpenSubtitles, OpenSubtitles
This dataset is used in 2 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Machine Translation | Fine tuned MarianMT | Crossing Language Borders: A Pipeline … | 2025-01-03 |
Language Identification | Apple bi-LSTM | A reproduction of Apple's bi-directional … | 2021-02-11 |
Recent papers with results on this dataset: