File size: 9,004 Bytes
292e6c0 cd2b9aa 8470f0f 292e6c0 fadcd3d 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 8470f0f 292e6c0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
---
base_model: meta-llama/Meta-Llama-3-8B
library_name: peft
pipeline_tag: text-generation
license: apache-2.0
---
# Model Card for LoRI-D_nlu_llama3_rank_64
This model is part of [LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation](https://arxiv.org/abs/2504.07448).
This is an adapter model based on the paper **LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation**, which introduces a simple yet effective approach to Low-Rank Adaptation (LoRA) for Large Language Models (LLMs). LoRI freezes the projection matrices A as random projections and sparsifies the matrices B using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance, minimizes cross-task interference in adapter merging, and supports continual learning by using sparsity to mitigate catastrophic forgetting.
<div align="center">
<img src="https://github.com/juzhengz/LoRI/raw/main/LoRI.png" alt="LoRI Framework" width="80%">
</div>
### ✨ Key Highlights
* **Scalable & Efficient**: Uses up to 95% fewer trainable parameters than traditional LoRA while maintaining performance.
* **Reduced Interference**: Minimizes cross-task interference in multi-task scenarios by leveraging orthogonality between adapter subspaces.
* **Continual Learning**: Supports continual learning by using sparsity to mitigate catastrophic forgetting.
* **Universal Applicability**: Evaluated across natural language understanding, mathematical reasoning, code generation, and safety alignment tasks.
## Model Details
### Model Description
The `LoRI-D_nlu_llama3_rank_64` model is a LoRA adapter specifically designed for Natural Language Understanding (NLU) tasks, fine-tuned on the `meta-llama/Meta-Llama-3-8B` base model with a rank of 64. It is part of the LoRI family of models, which aims to provide parameter-efficient fine-tuning with reduced cross-task interference.
- **Developed by:** Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein
- **Model type:** Low-Rank Adaptation (LoRI) adapter (PEFT method for LLMs)
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Finetuned from model:** `meta-llama/Meta-Llama-3-8B`
### Model Sources
- **Repository:** [https://github.com/juzhengz/LoRI/](https://github.com/juzhengz/LoRI/)
- **Paper:** [https://arxiv.org/abs/2504.07448](https://arxiv.org/abs/2504.07448)
- **HuggingFace Collection:** [https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011](https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011)
## Uses
### Direct Use
This model is intended to be used as a PEFT adapter on top of the `meta-llama/Meta-Llama-3-8B` base model for natural language understanding tasks, leveraging its efficient design for reduced parameter overhead and improved multi-task performance.
### Downstream Use
LoRI adapters can be merged for multi-task applications or sequentially applied for continual learning without significant performance degradation. This makes LoRI suitable for building generalist agents or systems that need to learn new skills over time.
### Out-of-Scope Use
This model is not intended for use in high-stakes or safety-critical applications without further rigorous testing and validation. Given its focus on NLU tasks, its performance on other domains or tasks without specific fine-tuning is not guaranteed.
## Bias, Risks, and Limitations
As with any language model, this model may inherit biases present in its training data, including the base model (`Llama-3-8B`) and the datasets used for LoRI fine-tuning. Potential risks include generating biased, inaccurate, or harmful content.
### Recommendations
Users should carefully evaluate the model's output for their specific application and consider fine-tuning on domain-specific, curated data to mitigate potential biases or limitations.
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3-8B",
torch_dtype=torch.bfloat16, # or torch.float16 depending on your hardware
device_map="auto"
)
# Load the LoRI adapter
adapter = PeftModel.from_pretrained(base_model, "tomg-group-umd/LoRI-D_nlu_llama3_rank_64")
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")
# Example usage for a general text generation task (adjust for specific NLU use-cases)
prompt = "The quick brown fox jumps over the lazy dog."
inputs = tokenizer(prompt, return_tensors="pt").to(adapter.device)
# Generate text
outputs = adapter.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
# For specific NLU tasks, the prompt and expected output format would vary.
# You would then apply relevant NLU processing to the generated text or use the adapter's output directly.
```
## Training Details
### Training Data
The LoRI models are trained on various datasets depending on the task:
- **Natural Language Understanding (NLU):** Specific NLU datasets, as indicated by this model.
- **Code generation:** CodeAlpaca dataset.
- **Mathematical reasoning:** GSM8K dataset.
- **Safety alignment:** Saferpaca dataset.
More details on specific datasets can be found in the [GitHub repository](https://github.com/juzhengz/LoRI/).
### Training Procedure
LoRI is implemented using Fully Sharded Data Parallel (FSDP) for multi-GPU training. The training involves two main stages:
1. **LoRI-D (Dense) training**: Adapters are trained with random projection matrices `A` frozen and `B` matrices dense. Sparse masks are then extracted.
2. **LoRI-S (Sparse) training**: Training continues with the extracted sparse masks applied to matrices `B`, typically at 90% sparsity.
#### Training Hyperparameters
- **Training regime:** Mixed precision (e.g., `bfloat16` for Llama-3) is typically used for training large models.
- **Adapter Rank (`r`):** 64 (for this `LoRI-D_nlu_llama3_rank_64` model).
- **LoRA Alpha (`lora_alpha`):** 128 (from `adapter_config.json`).
- **LoRA Dropout (`lora_dropout`):** 0.05 (from `adapter_config.json`).
- **Target Modules (`target_modules`):** `o_proj`, `k_proj`, `up_proj`, `q_proj`, `v_proj`, `down_proj`, `gate_proj` (from `adapter_config.json`).
## Evaluation
### Testing Data, Factors & Metrics
LoRI's performance has been extensively evaluated across natural language understanding, mathematical reasoning, code generation (e.g., HumanEval), and safety alignment tasks.
#### Metrics
Performance is measured using relevant metrics for each task. The paper demonstrates that LoRI consistently outperforms full fine-tuning and existing PEFT methods across various tasks, while using up to 95% fewer trainable parameters than traditional LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference. For detailed quantitative results, please refer to the [paper](https://arxiv.org/abs/2504.07448).
## Technical Specifications
### Model Architecture and Objective
LoRI introduces a novel architecture where projection matrices `A` in LoRA are frozen as random projections, and matrices `B` are sparsified using task-specific masks. This design is intended to achieve monosemantic experts, reduce trainable parameters, and minimize cross-task interference. The objective remains focused on improving performance on downstream tasks while promoting parameter efficiency and modularity.
### Compute Infrastructure
#### Hardware
Training was performed in a multi-GPU environment using technologies like Fully Sharded Data Parallel (FSDP).
#### Software
The implementation uses Python, PyTorch, and the Hugging Face `transformers` and `peft` libraries.
## Acknowledgements
This project builds on the codebase of [dpo-rlaif](https://github.com/architsharma97/dpo-rlaif) and incorporates code from [lottery-ticket-adaptation](https://github.com/kiddyboots216/lottery-ticket-adaptation). Code generation performance on HumanEval is evaluated using the [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness).
## Citation
If you use LoRI in your work, please cite:
```bibtex
@article{zhang2025lori,
title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation},
author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom},
journal={arXiv preprint arXiv:2504.07448},
year={2025}
}
```
## Model Card Contact
For questions or inquiries, please refer to the contact information provided in the original [repository](https://github.com/juzhengz/LoRI/).
### Framework versions
- PEFT 0.12.0 |