← ML Research Wiki / 1910.10683

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel [email protected], Noam Shazeer, Adam Roberts, Katherine Lee [email protected], Sharan Narang [email protected], Michael Matena [email protected], Yanqi Zhou [email protected], Wei Li [email protected], Peter J Liu [email protected] (2019)

Paper Information

arXiv ID

1910.10683

Venue

Journal of machine learning research

Domain

artificial intelligence, machine learning, NLP

SOTA Claim

Yes

Reproducibility

7/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

Transfer learning, where a model is first pre-trained on a data-rich task before being finetuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code. 1

Summary

This paper presents a comprehensive examination of transfer learning in natural language processing (NLP) using a unified text-to-text framework. It introduces the Text-to-Text Transfer Transformer (T5) model, which casts all NLP tasks as text-to-text conversion problems, allowing the use of the same model architecture and training procedure for various tasks including translation, summarization, and question answering. The paper describes extensive experiments that explore the impact of different pre-training objectives, model sizes, and datasets on performance across multiple NLP benchmarks. The authors propose the Colossal Clean Crawled Corpus (C4) as a new large-scale dataset for training. Their findings indicate that scaling up model size and training data size significantly improves performance, leading to state-of-the-art results on several benchmarks. Furthermore, the authors emphasize the importance of transferring knowledge from large, pre-trained models to downstream tasks, and they provide the model and dataset for future research.

Methods

This paper employs the following methods:

Text-to-Text Transfer Transformer

Models Used

T5-11B
T5-3B

Datasets

The following datasets were used in this research:

Colossal Clean Crawled Corpus (C4)
SQuAD
GLUE
SuperGLUE
CNN/Daily Mail
WMT

Evaluation Metrics

GLUE
Exact Match (SQuAD)
F1 (SQuAD)
BLEU (WMT)
ROUGE (CNN/Daily Mail)

Results

State-of-the-art performance on 18 of 24 tasks
Achieved an average GLUE score of 90.3
Improved SQuAD exact match score beyond previous benchmarks
Exceeds human performance on certain reading comprehension tasks in SuperGLUE

Limitations

The authors identified the following limitations:

High computational resources required for model training and inference
Dependence on large-scale clean datasets for effective training

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

transfer learning text-to-text transfer transformer T5 unsupervised learning pre-training fine-tuning scale benchmark

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 134
Influential Citations: 2161

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers