Shuming Ma https://aka.ms/GeneralAI, Hongyu Wang https://aka.ms/GeneralAI, Lingxiao Ma https://aka.ms/GeneralAI, Lei Wang https://aka.ms/GeneralAI, Wenhui Wang https://aka.ms/GeneralAI, Shaohan Huang https://aka.ms/GeneralAI, Li Dong https://aka.ms/GeneralAI, Ruiping Wang https://aka.ms/GeneralAI, Jilong Xue https://aka.ms/GeneralAI, Furu Wei https://aka.ms/GeneralAI (2024)
The paper introduces BitNet b1.58, a 1-bit variant of Large Language Models (LLMs), using ternary weights {-1, 0, 1}. It promises high performance comparable to full-precision models while being more energy-efficient and cost-effective in terms of latency, memory, and throughput. The new model architecture offers significant benefits in computation by reducing the need for multiplications and decreasing energy consumption during matrix operations. By retaining performance on various natural language tasks, BitNet b1.58 enables effective deployment of LLMs on resource-constrained devices, opens up new avenues for hardware optimization, and challenges traditional scaling practices in model training. Overall, it suggests that 1.58-bit LLMs can achieve state-of-the-art performance with improved efficiency, particularly for larger model sizes.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: