Yinhan Liu [email protected] Facebook AI Equal contribution, Myle Ott [email protected] Facebook AI Equal contribution, Naman Goyal Facebook AI Equal contribution, Jingfei Du [email protected] Facebook AI Equal contribution, Mandar Joshi [email protected] Paul G. Allen School of Computer Science & Engineering University of Washington SeattleWA, Danqi Chen Facebook AI, Omer Levy [email protected] Facebook AI, Mike Lewis [email protected] Facebook AI, Luke Zettlemoyer Paul G. Allen School of Computer Science & Engineering University of Washington SeattleWA Facebook AI, Veselin Stoyanov Facebook AI (2019)
This paper presents RoBERTa, an improved version of the BERT pretraining approach that emphasizes the importance of training duration, data size, and hyperparameter tuning. The authors perform a replication study of BERT, finding it was significantly undertrained, and propose modifications that lead to superior performance on various benchmarks. Key changes include training the model longer with larger batches, removing the next sentence prediction (NSP) objective, training on longer sequences, and dynamically changing the masking patterns. The proposed model, RoBERTa, is trained on an extensive dataset, including the novel CC-NEWS dataset, and achieves state-of-the-art results on GLUE, RACE, and SQuAD, showcasing the effectiveness of the improvements made. The authors provide an in-depth evaluation of the effects of these changes and release their models and code for further research.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: