Shijie Wu Bloomberg, New YorkNYUSA, Ozanirsoy Bloomberg, New YorkNYUSA, Steven Lu Bloomberg, New YorkNYUSA, Vadim Dabravolski Bloomberg, New YorkNYUSA, Mark Dredze Bloomberg, New YorkNYUSA Computer Science Johns Hopkins University BaltimoreMDUSA, Sebastian Gehrmann Bloomberg, New YorkNYUSA, Prabhanjan Kambadur Bloomberg, New YorkNYUSA, David Rosenberg Bloomberg, New YorkNYUSA, Gideon Mann Bloomberg, New YorkNYUSA (2023)
The paper presents BloombergGPT, a large language model specifically designed for financial content, consisting of 50 billion parameters and trained on a mixed dataset comprising 363 billion tokens of domain-specific financial documents and 345 billion tokens from general-purpose datasets. The authors emphasize that prior LLMs were not specialized for finance and that BloombergGPT surpasses existing models on financial benchmarks while performing competitively on general LLM tasks. Key contributions include the development of the FinPile dataset, evaluation methodology incorporating public and internal benchmarks, and insights into training and tokenization strategies. Future work includes releasing training logs and further evaluations.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: