-
Parameter-Efficient and Stable Singular Value Adaptation for Pre-Trained Models
We observe that the Q, K, O, and V matrices in attention layers can inherently be absorbed and decomposed into four head-wise orthogonal matrices and two sets of singular values without any loss. After orthogonalization, we freeze the singular vectors and fine-tune only the singular values, enabling stable fine-tuning constrained to the original latent space, which achieves a 5.4% improvement over LoRA across eight commonsense reasoning datasets. Additionally, this absorb-decompose operation eliminates redundant vectors losslessly and reduces the encoder parameters of Whisper-large-v3 by 46.42% without requiring additional training.
-
Sample Blog Post
Your blog post's abstract. Please add your abstract or summary here and not in the main body of your text. Do not include math/latex or hyperlinks.
-
Sample Blog Post (HTML version)
Your blog post's abstract. Please add your abstract or summary here and not in the main body of your text. Do not include math/latex or hyperlinks.