← ML Research Wiki / 2402.09353

DoRA: Weight-Decomposed Low-Rank Adaptation

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen (2024)

Paper Information

arXiv ID

2402.09353

Venue

International Conference on Machine Learning

Domain

Not specified

Contents

Abstract
Methods
Datasets
Results
Related Work
External Resources

Abstract

Among the widely used parameter-efficient finetuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs.However, there still often exists an accuracy gap between these methods and full fine-tuning (FT).In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA.Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA).DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters.By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead.DoRA consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding.

Summary

In the paper, the authors propose a new parameter-efficient fine-tuning method called Weight-Decomposed Low-Rank Adaptation (DoRA) which aims to minimize the accuracy gap between traditional full fine-tuning (FT) and the popular LoRA approach. DoRA utilizes a weight decomposition analysis, separating the model's weights into magnitude and directional components to enhance learning stability and capacity while avoiding the inference overhead associated with other methods. The experimental results demonstrate that DoRA consistently outperforms existing methods like LoRA on various tasks, including commonsense reasoning and visual instruction tuning, while also showing compatibility with other LoRA variants.

Methods

This paper employs the following methods:

Weight-Decomposed Low-Rank Adaptation (DoRA)
LoRA (Low-Rank Adaptation)
Weight Normalization

Models Used

LLaMA
LLaVA
VL-BART

Datasets

The following datasets were used in this research:

None specified

Evaluation Metrics

Accuracy
Exact match score

Results

DoRA outperforms LoRA in commonsense reasoning tasks by 3.7% on LLaMA-7B/13B and by 2.9% on LLaMA2-7B.
DoRA improves performance in visual instruction tuning by 0.6% on LLaVA-7B.
DoRA shows better accuracy in multimodal tasks compared to LoRA.

DoRA: Weight-Decomposed Low-Rank Adaptation

Abstract

Summary

Methods

Models Used

Datasets

Evaluation Metrics

Results

Technical Requirements

Papers Using Similar Methods

External Resources

DoRA: Weight-Decomposed Low-Rank Adaptation

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Technical Requirements edit

Related Papers

Papers Using Similar Methods

External Resources

Edit Paper Information

Abstract

Methods

Models Used

Datasets

Evaluation Metrics

Results

Technical Requirements