Jingfeng Yang [email protected], Amazon, USAHongye Jin, Ruixiang Tang, Xiaotian Han [email protected], Qizhang Feng, Amazon, USAHaoming Jiang [email protected], Xia Hu [email protected], Department of Computer Science and Engineering Texas A&M University USA, Department of Computer Science Rice University USA, Department of Computer Science and Engineering Texas A&M University USA, Department of Computer Science and Engineering Texas A&M University USA, BING YIN AmazonUSA, Department of Computer Science Rice University USA, Department of Computer Science and Engineering Texas A&M University USA; Ruixiang Tang, Department of Computer Science Rice University USA, Xiaotian Han, Department of Computer Science and Engineering Texas A&M University USA, Department of Computer Science and Engineering Texas A&M University Haoming JiangUSA, Xia Hu USA, Department of Computer Science Rice University USA, Jingfeng Yang Ruixiang Tang, Xiaotian Han, Qizhang Feng, Haoming Jiang, Bing Yin, and Xia HuHongye Jin (2023)
This paper serves as a comprehensive guide for practitioners using Large Language Models (LLMs) in natural language processing tasks. It offers insights into the utilization of LLMs such as ChatGPT and discusses various models, data influences, and downstream tasks. The paper categorizes LLMs, emphasizing the distinctions between encoder-only and decoder-only models, exploring their training strategies, architectures, and practical applications. It highlights the significance of pre-training data, the considerations for zero, few, and abundant annotated data, and addresses performance in various NLP tasks including NLU and NLG. The paper also discusses biases, efficiency concerns, and challenges in real-world deployments, concluding with considerations for model evaluation and future challenges such as alignment with human values and the implications of scaling. The authors aim to support better understanding and implementation of LLMs across NLP tasks to promote innovation.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: