|
--- |
|
base_model: meta-llama/Meta-Llama-3-8B |
|
library_name: peft |
|
pipeline_tag: text-generation |
|
license: apache-2.0 |
|
--- |
|
|
|
# Model Card for LoRI-D_code_llama3_rank_64 |
|
|
|
This model is part of [LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation](https://arxiv.org/abs/2504.07448). |
|
|
|
**LoRI** (LoRA with Reduced Interference) is a simple yet effective approach that freezes the projection matrices $A$ as random projections and sparsifies the matrices $B$ using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance. Moreover, LoRI minimizes cross-task interference in adapter merging by leveraging the orthogonality between adapter subspaces, and supports continual learning by using sparsity to mitigate catastrophic forgetting. |
|
|
|
<div align="center"> |
|
<img src="https://github.com/juzhengz/LoRI/raw/main/LoRI.png" alt="LoRI" width="80%"> |
|
</div> |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
LoRI-D_code_llama3_rank_64 is an adapter for the `meta-llama/Meta-Llama-3-8B` base model, fine-tuned using the LoRI (LoRA with Reduced Interference) framework specifically for code generation tasks. LoRI is a parameter-efficient fine-tuning (PEFT) method designed to address overhead and parameter interference in multi-task scenarios when using traditional LoRA. It achieves this by freezing projection matrices `A` as random projections and sparsifying matrices `B` with task-specific masks, significantly reducing trainable parameters while maintaining strong performance. This model utilizes a rank of 64 for its LoRA adaptations. |
|
|
|
- **Developed by:** Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein |
|
- **Model type:** Low-Rank Adaptation (LoRA) adapter for Causal Language Models |
|
- **Language(s) (NLP):** English |
|
- **License:** Apache-2.0 |
|
- **Finetuned from model:** `meta-llama/Meta-Llama-3-8B` |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [https://github.com/juzhengz/LoRI/](https://github.com/juzhengz/LoRI/) |
|
- **Paper:** [https://huggingface.co/papers/2504.07448](https://huggingface.co/papers/2504.07448) |
|
- **Hugging Face Collection:** [LoRI Adapters](https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011) |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model is intended to be loaded as a PEFT adapter on top of the `meta-llama/Meta-Llama-3-8B` base model to enhance its performance on code generation tasks. It provides an efficient way to fine-tune large language models with significantly fewer trainable parameters. |
|
|
|
### Downstream Use |
|
|
|
LoRI adapters facilitate effective adapter merging and continual learning across various tasks, including natural language understanding, mathematical reasoning, code generation, and safety alignment. This makes them suitable for multi-task learning environments and adaptive model deployments. |
|
|
|
### Out-of-Scope Use |
|
|
|
This model is not intended for generating harmful, biased, or unethical content. Users should exercise caution and implement appropriate safeguards when deploying it in real-world applications, especially in sensitive domains. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
As an adaptation method built upon pre-trained Large Language Models, LoRI models inherit biases and risks present in their base models (e.g., Meta-Llama-3-8B) and the datasets they were fine-tuned on. Users should be aware of potential issues related to fairness, toxicity, and factual accuracy. Specific limitations include: |
|
- Performance might vary depending on the chosen base model and the sparsity level. |
|
- While LoRI significantly reduces cross-task interference, perfect isolation of knowledge across tasks is not guaranteed during adapter merging. |
|
|
|
### Recommendations |
|
|
|
Users (both direct and downstream) should refer to the original `meta-llama/Meta-Llama-3-8B` model card for inherent biases and risks. It is recommended to perform task-specific evaluations and careful validation when deploying models fine-tuned with LoRI in sensitive applications. |
|
|
|
## How to Get Started with the Model |
|
|
|
Pretrained LoRI adapters are available via the Hugging Face collection and can be loaded as follows: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from peft import PeftModel |
|
import torch |
|
|
|
# Load the base model |
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
"meta-llama/Meta-Llama-3-8B", |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto" # or specify your device, e.g., "cuda" |
|
) |
|
|
|
# Load the LoRI adapter |
|
adapter = PeftModel.from_pretrained(base_model, "tomg-group-umd/LoRI-D_code_llama3_rank_64") |
|
|
|
# Load the tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B") |
|
|
|
# Example for text generation (code generation) |
|
prompt = "def factorial(n): |
|
if n == 0: |
|
return 1 |
|
else: |
|
" |
|
inputs = tokenizer(prompt, return_tensors="pt").to(base_model.device) |
|
|
|
# Generate text |
|
with torch.no_grad(): |
|
outputs = adapter.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True) |
|
|
|
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(generated_text) |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
LoRI adapters were extensively evaluated and trained on various datasets relevant to specific tasks. For code generation tasks, like this model, the `CodeAlpaca` dataset was primarily used. Other tasks included: |
|
- Mathematical reasoning: `GSM8K` |
|
- Safety alignment: `Saferpaca` |
|
- Natural language understanding: (specific datasets for NLU implied but not detailed in source) |
|
|
|
### Training Procedure |
|
|
|
LoRI is implemented using Fully Sharded Data Parallel (FSDP) and supports multi-GPU training environments. The training process involves two main stages: |
|
1. **LoRI-D (Decomposition):** Initial training where projection matrices `A` are frozen as random projections, and matrices `B` are learned. This stage also extracts sparse masks. |
|
2. **LoRI-S (Sparsity):** Continued training with the learned sparse masks (e.g., 90% sparsity) applied to matrices `B`, further reducing parameters and promoting orthogonality. |
|
|
|
#### Training Hyperparameters |
|
|
|
- **Adapter ranks:** Models were trained with adapter ranks of 32 and 64 (this model uses rank 64). |
|
- **Sparsity:** 90% (for `LoRI-S` stage). |
|
- **Base models used:** LLaMA-3-8B and Mistral-7B. |
|
|
|
## Evaluation |
|
|
|
Extensive experiments demonstrated that LoRI outperforms full fine-tuning and existing PEFT methods while using up to 95% fewer trainable parameters than standard LoRA. For code generation, performance was evaluated on the HumanEval benchmark. In multi-task experiments, LoRI enabled effective adapter merging and continual learning with reduced cross-task interference. Detailed evaluation results and comparisons can be found in the accompanying paper. |
|
|
|
## Acknowledgements |
|
|
|
This project builds on the codebase of [dpo-rlaif](https://github.com/architsharma97/dpo-rlaif) and incorporates code from [lottery-ticket-adaptation](https://github.com/kiddyboots216/lottery-ticket-adaptation). Code generation performance on HumanEval is evaluated using the [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness). |
|
|
|
## Citation |
|
|
|
If you use LoRI in your work, please cite: |
|
|
|
```bibtex |
|
@article{zhang2025lori, |
|
title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation}, |
|
author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom}, |
|
journal={arXiv preprint arXiv:2504.07448}, |
|
year={2025} |
|
} |
|
``` |
|
|
|
## Framework versions |
|
|
|
- PEFT 0.12.0 |