Venue
International Conference on Machine Learning
Among the widely used parameter-efficient finetuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs.However, there still often exists an accuracy gap between these methods and full fine-tuning (FT).In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA.Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA).DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters.By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead.DoRA consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding.
In the paper, the authors propose a new parameter-efficient fine-tuning method called Weight-Decomposed Low-Rank Adaptation (DoRA) which aims to minimize the accuracy gap between traditional full fine-tuning (FT) and the popular LoRA approach. DoRA utilizes a weight decomposition analysis, separating the model's weights into magnitude and directional components to enhance learning stability and capacity while avoiding the inference overhead associated with other methods. The experimental results demonstrate that DoRA consistently outperforms existing methods like LoRA on various tasks, including commonsense reasoning and visual instruction tuning, while also showing compatibility with other LoRA variants.
This paper employs the following methods:
- Weight-Decomposed Low-Rank Adaptation (DoRA)
- LoRA (Low-Rank Adaptation)
- Weight Normalization
The following datasets were used in this research:
- Accuracy
- Exact match score
- DoRA outperforms LoRA in commonsense reasoning tasks by 3.7% on LLaMA-7B/13B and by 2.9% on LLaMA2-7B.
- DoRA improves performance in visual instruction tuning by 0.6% on LLaVA-7B.
- DoRA shows better accuracy in multimodal tasks compared to LoRA.
- Number of GPUs: None specified
- GPU Type: None specified