Enhance model card for LoRI-S_nlu_llama3_rank_64 with comprehensive details
Browse filesThis PR significantly enhances the model card for `tomg-group-umd/LoRI-S_nlu_llama3_rank_64` by incorporating detailed information from the paper '[LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation](https://huggingface.co/papers/2504.07448)' and its associated GitHub repository.
Key updates include:
- Adding the paper's abstract and key highlights to provide a quick overview.
- Specifying the license as Apache 2.0 in both metadata and content.
- Adding relevant tags to the metadata for improved discoverability on the Hugging Face Hub.
- Populating sections like 'Model Details', 'Uses', 'Bias, Risks, and Limitations', 'How to Get Started with the Model', 'Training Details', and 'Evaluation'.
- Providing a runnable Python code example for model inference.
- Including explicit links to the GitHub repository and the project's Hugging Face collection.
- Adding the full BibTeX citation.
These changes make the model card much more informative and user-friendly, allowing researchers and practitioners to understand and use the model effectively.
@@ -2,192 +2,172 @@
|
|
2 |
base_model: meta-llama/Meta-Llama-3-8B
|
3 |
library_name: peft
|
4 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
---
|
6 |
|
7 |
# Model Card for LoRI-S_nlu_llama3_rank_64
|
8 |
|
9 |
This model is part of [LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation](https://arxiv.org/abs/2504.07448).
|
10 |
|
11 |
-
|
12 |
-
|
13 |
|
|
|
|
|
|
|
|
|
14 |
|
15 |
## Model Details
|
16 |
|
17 |
### Model Description
|
|
|
18 |
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
- **
|
24 |
-
- **Funded by [optional]:** [More Information Needed]
|
25 |
-
- **Shared by [optional]:** [More Information Needed]
|
26 |
-
- **Model type:** [More Information Needed]
|
27 |
-
- **Language(s) (NLP):** [More Information Needed]
|
28 |
-
- **License:** [More Information Needed]
|
29 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
30 |
|
31 |
-
### Model Sources
|
32 |
|
33 |
-
|
34 |
-
|
35 |
-
- **
|
36 |
-
- **Paper [optional]:** [More Information Needed]
|
37 |
-
- **Demo [optional]:** [More Information Needed]
|
38 |
|
39 |
## Uses
|
40 |
|
41 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
42 |
-
|
43 |
### Direct Use
|
|
|
44 |
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
|
|
50 |
|
51 |
-
|
52 |
-
|
53 |
-
[More Information Needed]
|
54 |
|
55 |
### Out-of-Scope Use
|
56 |
-
|
57 |
-
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
58 |
-
|
59 |
-
[More Information Needed]
|
60 |
|
61 |
## Bias, Risks, and Limitations
|
62 |
-
|
63 |
-
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
64 |
-
|
65 |
-
[More Information Needed]
|
66 |
|
67 |
### Recommendations
|
68 |
-
|
69 |
-
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
70 |
-
|
71 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
72 |
|
73 |
## How to Get Started with the Model
|
74 |
|
75 |
-
|
76 |
-
|
77 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
78 |
|
79 |
## Training Details
|
80 |
|
81 |
### Training Data
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
|
87 |
### Training Procedure
|
|
|
|
|
|
|
88 |
|
89 |
-
|
90 |
-
|
91 |
-
#### Preprocessing [optional]
|
92 |
-
|
93 |
-
[More Information Needed]
|
94 |
-
|
95 |
|
96 |
#### Training Hyperparameters
|
97 |
-
|
98 |
-
-
|
99 |
-
|
100 |
-
#### Speeds, Sizes, Times [optional]
|
101 |
-
|
102 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
103 |
-
|
104 |
-
[More Information Needed]
|
105 |
|
106 |
## Evaluation
|
107 |
|
108 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
109 |
-
|
110 |
### Testing Data, Factors & Metrics
|
111 |
-
|
112 |
-
#### Testing Data
|
113 |
-
|
114 |
-
<!-- This should link to a Dataset Card if possible. -->
|
115 |
-
|
116 |
-
[More Information Needed]
|
117 |
-
|
118 |
-
#### Factors
|
119 |
-
|
120 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
121 |
-
|
122 |
-
[More Information Needed]
|
123 |
-
|
124 |
-
#### Metrics
|
125 |
-
|
126 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
127 |
-
|
128 |
-
[More Information Needed]
|
129 |
|
130 |
### Results
|
|
|
131 |
|
132 |
-
|
133 |
-
|
134 |
-
#### Summary
|
135 |
-
|
136 |
-
|
137 |
-
|
138 |
-
## Model Examination [optional]
|
139 |
-
|
140 |
-
<!-- Relevant interpretability work for the model goes here -->
|
141 |
-
|
142 |
-
[More Information Needed]
|
143 |
-
|
144 |
-
## Technical Specifications [optional]
|
145 |
|
146 |
### Model Architecture and Objective
|
147 |
-
|
148 |
-
[More Information Needed]
|
149 |
|
150 |
### Compute Infrastructure
|
151 |
|
152 |
-
[More Information Needed]
|
153 |
-
|
154 |
#### Hardware
|
155 |
-
|
156 |
-
[More Information Needed]
|
157 |
|
158 |
#### Software
|
159 |
-
|
160 |
-
|
161 |
-
|
162 |
-
## Citation
|
163 |
-
|
164 |
-
|
165 |
-
|
166 |
-
|
167 |
-
|
168 |
-
|
169 |
-
|
170 |
-
|
171 |
-
|
172 |
-
|
173 |
-
|
174 |
-
##
|
175 |
-
|
176 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
177 |
-
|
178 |
-
[More Information Needed]
|
179 |
-
|
180 |
-
## More Information [optional]
|
181 |
-
|
182 |
-
[More Information Needed]
|
183 |
-
|
184 |
-
## Model Card Authors [optional]
|
185 |
-
|
186 |
-
[More Information Needed]
|
187 |
-
|
188 |
-
## Model Card Contact
|
189 |
-
|
190 |
-
[More Information Needed]
|
191 |
-
### Framework versions
|
192 |
-
|
193 |
-
- PEFT 0.12.0
|
|
|
2 |
base_model: meta-llama/Meta-Llama-3-8B
|
3 |
library_name: peft
|
4 |
pipeline_tag: text-generation
|
5 |
+
license: apache-2.0
|
6 |
+
tags:
|
7 |
+
- peft
|
8 |
+
- lora
|
9 |
+
- fine-tuning
|
10 |
+
- multi-task
|
11 |
+
- continual-learning
|
12 |
+
- natural-language-understanding
|
13 |
+
- causal-lm
|
14 |
---
|
15 |
|
16 |
# Model Card for LoRI-S_nlu_llama3_rank_64
|
17 |
|
18 |
This model is part of [LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation](https://arxiv.org/abs/2504.07448).
|
19 |
|
20 |
+
**Abstract:**
|
21 |
+
Low-Rank Adaptation (LoRA) has emerged as a popular parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs), yet it still incurs notable overhead and suffers from parameter interference in multi-task scenarios. We propose LoRA with Reduced Interference (LoRI), a simple yet effective approach that freezes the projection matrices $A$ as random projections and sparsifies the matrices $B$ using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance. Moreover, LoRI minimizes cross-task interference in adapter merging by leveraging the orthogonality between adapter subspaces, and supports continual learning by using sparsity to mitigate catastrophic forgetting. Extensive experiments across natural language understanding, mathematical reasoning, code generation, and safety alignment tasks demonstrate that LoRI outperforms full fine-tuning and existing PEFT methods, while using up to 95% fewer trainable parameters than LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference. Code is available at: this https URL
|
22 |
|
23 |
+
**Key Highlights:**
|
24 |
+
- **Reduced Trainable Parameters**: LoRI substantially reduces the number of trainable parameters (up to 95% fewer than standard LoRA) while maintaining strong task performance.
|
25 |
+
- **Minimized Cross-Task Interference**: By leveraging the orthogonality between adapter subspaces, LoRI minimizes interference when merging adapters.
|
26 |
+
- **Continual Learning Support**: LoRI uses sparsity to mitigate catastrophic forgetting, supporting effective continual learning.
|
27 |
|
28 |
## Model Details
|
29 |
|
30 |
### Model Description
|
31 |
+
LoRI-S_nlu_llama3_rank_64 is a specific adapter for `meta-llama/Meta-Llama-3-8B` fine-tuned for Natural Language Understanding (NLU) tasks using the LoRI (LoRA with Reduced Interference) method. LoRI is a parameter-efficient fine-tuning (PEFT) approach that freezes the LoRA projection matrices A as random projections and sparsifies the matrices B using task-specific masks. This design drastically reduces the number of trainable parameters while maintaining robust task performance. This model instance is trained with an adapter rank of 64.
|
32 |
|
33 |
+
- **Developed by:** Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein
|
34 |
+
- **Model type:** Low-Rank Adaptation (LoRA) with Reduced Interference (LoRI) adapter
|
35 |
+
- **Language(s) (NLP):** English
|
36 |
+
- **License:** Apache 2.0
|
37 |
+
- **Finetuned from model:** `meta-llama/Meta-Llama-3-8B`
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
|
39 |
+
### Model Sources
|
40 |
|
41 |
+
- **Repository:** https://github.com/juzhengz/LoRI
|
42 |
+
- **Paper:** https://arxiv.org/abs/2504.07448
|
43 |
+
- **HuggingFace Collection:** https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011
|
|
|
|
|
44 |
|
45 |
## Uses
|
46 |
|
|
|
|
|
47 |
### Direct Use
|
48 |
+
LoRI is intended for parameter-efficient fine-tuning (PEFT) of Large Language Models (LLMs), particularly for single-task performance, multi-task scenarios (adapter merging), and continual learning. This specific adapter (`LoRI-S_nlu_llama3_rank_64`) is optimized for Natural Language Understanding (NLU) tasks.
|
49 |
|
50 |
+
### Downstream Use
|
51 |
+
LoRI can be used to efficiently fine-tune LLMs for various tasks, including:
|
52 |
+
- Natural Language Understanding (NLU)
|
53 |
+
- Mathematical Reasoning
|
54 |
+
- Code Generation
|
55 |
+
- Safety Alignment
|
56 |
|
57 |
+
It is designed to outperform full fine-tuning and other PEFT methods while being highly parameter-efficient. Its reduced interference property makes it suitable for scenarios involving adapter merging and continual learning across different tasks.
|
|
|
|
|
58 |
|
59 |
### Out-of-Scope Use
|
60 |
+
The model should not be used for any illegal or unethical purposes. Users should be aware that the base model's limitations and biases may still be present. As a language model adapter, it should not be used in safety-critical applications without thorough additional testing and validation.
|
|
|
|
|
|
|
61 |
|
62 |
## Bias, Risks, and Limitations
|
63 |
+
The inherent biases, risks, and limitations of the base model (`meta-llama/Meta-Llama-3-8B`) apply to this adapter. Additionally, while LoRI aims to reduce cross-task interference, complete elimination of such interference may not be guaranteed across all possible task combinations. The paper focuses on specific benchmarks and tasks; performance on unaddressed tasks or distributions might vary.
|
|
|
|
|
|
|
64 |
|
65 |
### Recommendations
|
66 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Detailed evaluation on specific deployment scenarios and for diverse user groups is recommended to ensure responsible and fair usage.
|
|
|
|
|
|
|
67 |
|
68 |
## How to Get Started with the Model
|
69 |
|
70 |
+
Pretrained LoRI adapters are available via the HuggingFace collection and can be loaded as follows:
|
71 |
+
|
72 |
+
```python
|
73 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
74 |
+
from peft import PeftModel
|
75 |
+
import torch
|
76 |
+
|
77 |
+
# Load the base model and tokenizer
|
78 |
+
base_model_id = "meta-llama/Meta-Llama-3-8B"
|
79 |
+
# This model card is for tomg-group-umd/LoRI-S_nlu_llama3_rank_64
|
80 |
+
lori_adapter_id = "tomg-group-umd/LoRI-S_nlu_llama3_rank_64"
|
81 |
+
|
82 |
+
# Load the base model with appropriate dtype and device mapping
|
83 |
+
# Adjust torch_dtype (e.g., torch.float16) as per your hardware/model requirements
|
84 |
+
base_model = AutoModelForCausalLM.from_pretrained(
|
85 |
+
base_model_id,
|
86 |
+
torch_dtype=torch.bfloat16,
|
87 |
+
low_cpu_mem_usage=True,
|
88 |
+
device_map="auto" # Automatically distribute model across available GPUs
|
89 |
+
)
|
90 |
+
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
|
91 |
+
|
92 |
+
# Load the LoRI adapter and attach it to the base model
|
93 |
+
model = PeftModel.from_pretrained(base_model, lori_adapter_id)
|
94 |
+
|
95 |
+
# Optional: Merge the adapter weights into the base model for a single consolidated model
|
96 |
+
# This makes the model a standard Transformers model, removing the PEFT wrapper.
|
97 |
+
# model = model.merge_and_unload()
|
98 |
+
|
99 |
+
# Example inference
|
100 |
+
prompt = "What is the capital of France?"
|
101 |
+
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
102 |
+
|
103 |
+
# Generate text using the model with the loaded adapter
|
104 |
+
outputs = model.generate(
|
105 |
+
**inputs,
|
106 |
+
max_new_tokens=50, # Maximum number of new tokens to generate
|
107 |
+
temperature=0.7, # Sampling temperature
|
108 |
+
do_sample=True, # Enable sampling
|
109 |
+
eos_token_id=tokenizer.eos_token_id, # Stop generation at end-of-sequence token
|
110 |
+
)
|
111 |
+
|
112 |
+
# Decode the generated tokens, skipping the input prompt
|
113 |
+
generated_text = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
|
114 |
+
print(f"Prompt: {prompt}
|
115 |
+
Generated: {generated_text}")
|
116 |
+
```
|
117 |
|
118 |
## Training Details
|
119 |
|
120 |
### Training Data
|
121 |
+
LoRI models are trained on various datasets across different tasks. For Natural Language Understanding (NLU) tasks, this model was trained on relevant NLU datasets. Other tasks supported by LoRI include:
|
122 |
+
- **Code generation:** CodeAlpaca
|
123 |
+
- **Mathematical reasoning:** GSM8K
|
124 |
+
- **Safety alignment:** Saferpaca
|
125 |
|
126 |
### Training Procedure
|
127 |
+
LoRI employs a two-stage training procedure as outlined in the paper and GitHub repository:
|
128 |
+
1. **LoRI-D (Dense) training:** An initial phase where the projection matrices `A` are frozen as random projections, and matrices `B` are trained.
|
129 |
+
2. **LoRI-S (Sparse) training:** Sparse masks are extracted from the trained `LoRI-D` models, and training continues with `LoRI-S` at a specified sparsity level (e.g., 90%).
|
130 |
|
131 |
+
The training is implemented using [Fully Sharded Data Parallel (FSDP)](https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html) and is designed for execution in a multi-GPU environment.
|
|
|
|
|
|
|
|
|
|
|
132 |
|
133 |
#### Training Hyperparameters
|
134 |
+
- **Adapter ranks:** 32 and 64 (this model is rank 64).
|
135 |
+
- **Sparsity (LoRI-S):** 90%.
|
136 |
+
- Specific training scripts and hyperparameters for various tasks are available in the [LoRI GitHub repository](https://github.com/juzhengz/LoRI/tree/main/scripts).
|
|
|
|
|
|
|
|
|
|
|
137 |
|
138 |
## Evaluation
|
139 |
|
|
|
|
|
140 |
### Testing Data, Factors & Metrics
|
141 |
+
Evaluation was conducted across a wide range of tasks, including natural language understanding, mathematical reasoning, code generation, and safety alignment. For code generation performance, HumanEval was used as a benchmark, evaluated with the [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
142 |
|
143 |
### Results
|
144 |
+
Extensive experiments demonstrate that LoRI outperforms full fine-tuning and existing PEFT methods, while using up to 95% fewer trainable parameters than standard LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference. For detailed quantitative results and specific metrics, please refer to the [original paper](https://arxiv.org/abs/2504.07448).
|
145 |
|
146 |
+
## Technical Specifications
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
147 |
|
148 |
### Model Architecture and Objective
|
149 |
+
LoRI modifies the standard LoRA architecture by freezing the projection matrices `A` as random projections and by sparsifying the matrices `B` using task-specific masks. This design aims to substantially reduce trainable parameters and minimize cross-task interference during adapter merging and continual learning, while maintaining strong task performance.
|
|
|
150 |
|
151 |
### Compute Infrastructure
|
152 |
|
|
|
|
|
153 |
#### Hardware
|
154 |
+
Training and evaluation are designed for multi-GPU environments, leveraging techniques like Fully Sharded Data Parallel (FSDP).
|
|
|
155 |
|
156 |
#### Software
|
157 |
+
The implementation relies on PyTorch and the PEFT library, along with other dependencies specified in the project's `requirements.txt`.
|
158 |
+
- **PEFT version:** 0.12.0
|
159 |
+
|
160 |
+
## Citation
|
161 |
+
If you use LoRI in your work, please cite:
|
162 |
+
|
163 |
+
```bibtex
|
164 |
+
@article{zhang2025lori,
|
165 |
+
title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation},
|
166 |
+
author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom},
|
167 |
+
journal={arXiv preprint arXiv:2504.07448},
|
168 |
+
year={2025}
|
169 |
+
}
|
170 |
+
```
|
171 |
+
|
172 |
+
## More Information
|
173 |
+
Feel free to reach out to the authors listed in the paper or refer to the [project's GitHub repository](https://github.com/juzhengz/LoRI) if you have any questions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|