← ML Research Wiki / 2303.17564

BloombergGPT: A Large Language Model for Finance

Shijie Wu Bloomberg, New YorkNYUSA, Ozanirsoy Bloomberg, New YorkNYUSA, Steven Lu Bloomberg, New YorkNYUSA, Vadim Dabravolski Bloomberg, New YorkNYUSA, Mark Dredze Bloomberg, New YorkNYUSA Computer Science Johns Hopkins University BaltimoreMDUSA, Sebastian Gehrmann Bloomberg, New YorkNYUSA, Prabhanjan Kambadur Bloomberg, New YorkNYUSA, David Rosenberg Bloomberg, New YorkNYUSA, Gideon Mann Bloomberg, New YorkNYUSA (2023)

Paper Information

arXiv ID

2303.17564

Venue

arXiv.org

Domain

natural language processing, finance, machine learning

SOTA Claim

Yes

Reproducibility

6/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. As a next step, we plan to release training logs (Chronicles) detailing our experience in training BloombergGPT. * . Co-first authors.

Summary

The paper presents BloombergGPT, a large language model specifically designed for financial content, consisting of 50 billion parameters and trained on a mixed dataset comprising 363 billion tokens of domain-specific financial documents and 345 billion tokens from general-purpose datasets. The authors emphasize that prior LLMs were not specialized for finance and that BloombergGPT surpasses existing models on financial benchmarks while performing competitively on general LLM tasks. Key contributions include the development of the FinPile dataset, evaluation methodology incorporating public and internal benchmarks, and insights into training and tokenization strategies. Future work includes releasing training logs and further evaluations.

Methods

This paper employs the following methods:

Transformer

Models Used

BloombergGPT

Datasets

The following datasets were used in this research:

FinPile

Evaluation Metrics

F1 score
Exact match accuracy
Weighted F1 score

Results

BloombergGPT outperforms existing models on financial tasks
Competitive performance on general LLM benchmarks

Limitations

The authors identified the following limitations:

BloombergGPT trained on proprietary data cannot be released publicly
Challenges in evaluating domain-specific benchmarks

Technical Requirements

Number of GPUs: 512
GPU Type: NVIDIA 40GB A100

Keywords

BloombergGPT large language model finance domain-specific dataset training process evaluation

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 138
Influential Citations: 43

BloombergGPT: A Large Language Model for Finance

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers