|
--- |
|
base_model: meta-llama/Meta-Llama-3-8B |
|
library_name: peft |
|
pipeline_tag: text-generation |
|
license: apache-2.0 |
|
--- |
|
|
|
# Model Card for LoRI-D_nlu_llama3_rank_64 |
|
|
|
This model is part of [LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation](https://arxiv.org/abs/2504.07448). |
|
|
|
This is an adapter model based on the paper **LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation**, which introduces a simple yet effective approach to Low-Rank Adaptation (LoRA) for Large Language Models (LLMs). LoRI freezes the projection matrices A as random projections and sparsifies the matrices B using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance, minimizes cross-task interference in adapter merging, and supports continual learning by using sparsity to mitigate catastrophic forgetting. |
|
|
|
<div align="center"> |
|
<img src="https://github.com/juzhengz/LoRI/raw/main/LoRI.png" alt="LoRI Framework" width="80%"> |
|
</div> |
|
|
|
### ✨ Key Highlights |
|
|
|
* **Scalable & Efficient**: Uses up to 95% fewer trainable parameters than traditional LoRA while maintaining performance. |
|
* **Reduced Interference**: Minimizes cross-task interference in multi-task scenarios by leveraging orthogonality between adapter subspaces. |
|
* **Continual Learning**: Supports continual learning by using sparsity to mitigate catastrophic forgetting. |
|
* **Universal Applicability**: Evaluated across natural language understanding, mathematical reasoning, code generation, and safety alignment tasks. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
The `LoRI-D_nlu_llama3_rank_64` model is a LoRA adapter specifically designed for Natural Language Understanding (NLU) tasks, fine-tuned on the `meta-llama/Meta-Llama-3-8B` base model with a rank of 64. It is part of the LoRI family of models, which aims to provide parameter-efficient fine-tuning with reduced cross-task interference. |
|
|
|
- **Developed by:** Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein |
|
- **Model type:** Low-Rank Adaptation (LoRI) adapter (PEFT method for LLMs) |
|
- **Language(s) (NLP):** English |
|
- **License:** Apache 2.0 |
|
- **Finetuned from model:** `meta-llama/Meta-Llama-3-8B` |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [https://github.com/juzhengz/LoRI/](https://github.com/juzhengz/LoRI/) |
|
- **Paper:** [https://arxiv.org/abs/2504.07448](https://arxiv.org/abs/2504.07448) |
|
- **HuggingFace Collection:** [https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011](https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011) |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model is intended to be used as a PEFT adapter on top of the `meta-llama/Meta-Llama-3-8B` base model for natural language understanding tasks, leveraging its efficient design for reduced parameter overhead and improved multi-task performance. |
|
|
|
### Downstream Use |
|
|
|
LoRI adapters can be merged for multi-task applications or sequentially applied for continual learning without significant performance degradation. This makes LoRI suitable for building generalist agents or systems that need to learn new skills over time. |
|
|
|
### Out-of-Scope Use |
|
|
|
This model is not intended for use in high-stakes or safety-critical applications without further rigorous testing and validation. Given its focus on NLU tasks, its performance on other domains or tasks without specific fine-tuning is not guaranteed. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
As with any language model, this model may inherit biases present in its training data, including the base model (`Llama-3-8B`) and the datasets used for LoRI fine-tuning. Potential risks include generating biased, inaccurate, or harmful content. |
|
|
|
### Recommendations |
|
|
|
Users should carefully evaluate the model's output for their specific application and consider fine-tuning on domain-specific, curated data to mitigate potential biases or limitations. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from peft import PeftModel |
|
import torch |
|
|
|
# Load the base model |
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
"meta-llama/Meta-Llama-3-8B", |
|
torch_dtype=torch.bfloat16, # or torch.float16 depending on your hardware |
|
device_map="auto" |
|
) |
|
|
|
# Load the LoRI adapter |
|
adapter = PeftModel.from_pretrained(base_model, "tomg-group-umd/LoRI-D_nlu_llama3_rank_64") |
|
|
|
# Load the tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B") |
|
|
|
# Example usage for a general text generation task (adjust for specific NLU use-cases) |
|
prompt = "The quick brown fox jumps over the lazy dog." |
|
inputs = tokenizer(prompt, return_tensors="pt").to(adapter.device) |
|
|
|
# Generate text |
|
outputs = adapter.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7) |
|
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(generated_text) |
|
|
|
# For specific NLU tasks, the prompt and expected output format would vary. |
|
# You would then apply relevant NLU processing to the generated text or use the adapter's output directly. |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The LoRI models are trained on various datasets depending on the task: |
|
- **Natural Language Understanding (NLU):** Specific NLU datasets, as indicated by this model. |
|
- **Code generation:** CodeAlpaca dataset. |
|
- **Mathematical reasoning:** GSM8K dataset. |
|
- **Safety alignment:** Saferpaca dataset. |
|
|
|
More details on specific datasets can be found in the [GitHub repository](https://github.com/juzhengz/LoRI/). |
|
|
|
### Training Procedure |
|
|
|
LoRI is implemented using Fully Sharded Data Parallel (FSDP) for multi-GPU training. The training involves two main stages: |
|
1. **LoRI-D (Dense) training**: Adapters are trained with random projection matrices `A` frozen and `B` matrices dense. Sparse masks are then extracted. |
|
2. **LoRI-S (Sparse) training**: Training continues with the extracted sparse masks applied to matrices `B`, typically at 90% sparsity. |
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** Mixed precision (e.g., `bfloat16` for Llama-3) is typically used for training large models. |
|
- **Adapter Rank (`r`):** 64 (for this `LoRI-D_nlu_llama3_rank_64` model). |
|
- **LoRA Alpha (`lora_alpha`):** 128 (from `adapter_config.json`). |
|
- **LoRA Dropout (`lora_dropout`):** 0.05 (from `adapter_config.json`). |
|
- **Target Modules (`target_modules`):** `o_proj`, `k_proj`, `up_proj`, `q_proj`, `v_proj`, `down_proj`, `gate_proj` (from `adapter_config.json`). |
|
|
|
## Evaluation |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
LoRI's performance has been extensively evaluated across natural language understanding, mathematical reasoning, code generation (e.g., HumanEval), and safety alignment tasks. |
|
|
|
#### Metrics |
|
|
|
Performance is measured using relevant metrics for each task. The paper demonstrates that LoRI consistently outperforms full fine-tuning and existing PEFT methods across various tasks, while using up to 95% fewer trainable parameters than traditional LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference. For detailed quantitative results, please refer to the [paper](https://arxiv.org/abs/2504.07448). |
|
|
|
## Technical Specifications |
|
|
|
### Model Architecture and Objective |
|
|
|
LoRI introduces a novel architecture where projection matrices `A` in LoRA are frozen as random projections, and matrices `B` are sparsified using task-specific masks. This design is intended to achieve monosemantic experts, reduce trainable parameters, and minimize cross-task interference. The objective remains focused on improving performance on downstream tasks while promoting parameter efficiency and modularity. |
|
|
|
### Compute Infrastructure |
|
|
|
#### Hardware |
|
|
|
Training was performed in a multi-GPU environment using technologies like Fully Sharded Data Parallel (FSDP). |
|
|
|
#### Software |
|
|
|
The implementation uses Python, PyTorch, and the Hugging Face `transformers` and `peft` libraries. |
|
|
|
## Acknowledgements |
|
|
|
This project builds on the codebase of [dpo-rlaif](https://github.com/architsharma97/dpo-rlaif) and incorporates code from [lottery-ticket-adaptation](https://github.com/kiddyboots216/lottery-ticket-adaptation). Code generation performance on HumanEval is evaluated using the [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness). |
|
|
|
## Citation |
|
|
|
If you use LoRI in your work, please cite: |
|
|
|
```bibtex |
|
@article{zhang2025lori, |
|
title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation}, |
|
author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom}, |
|
journal={arXiv preprint arXiv:2504.07448}, |
|
year={2025} |
|
} |
|
``` |
|
|
|
## Model Card Contact |
|
|
|
For questions or inquiries, please refer to the contact information provided in the original [repository](https://github.com/juzhengz/LoRI/). |
|
|
|
### Framework versions |
|
|
|
- PEFT 0.12.0 |