Improve model card for LoRI-S_code_llama3_rank_64
Browse filesThis PR significantly enhances the model card for `tomg-group-umd/LoRI-S_code_llama3_rank_64` by adding comprehensive information and improving its discoverability and usability.
Key updates include:
- **Metadata Enrichment:** Adding `license: apache-2.0`, and relevant `tags` such as `peft`, `lora`, `code-generation`, and `llama`, along with the specific `datasets` used for training this model.
- **Detailed Model Description:** Populating the "Model Details" section with information about the developers, model type, language, and the base model, based on the paper abstract and GitHub repository.
- **Complete Model Sources:** Adding direct links to the official GitHub repository, the Hugging Face paper page, the project page, and the Hugging Face collection.
- **Elaborated Usage Instructions:** Filling in "Uses" sections (Direct Use, Downstream Use, Out-of-Scope) to clarify the model's intended applications and limitations.
- **Executable Code Snippet:** Providing a runnable Python code example in "How to Get Started" for quick inference using `transformers` and `peft`.
- **Training Information:** Detailing the "Training Data" and "Training Procedure" (LoRI-D and LoRI-S stages, FSDP) and "Training Hyperparameters" (rank, sparsity, etc.).
- **Evaluation Summary:** Summarizing key evaluation aspects and directing users to the paper for detailed results.
- **Citation:** Including the BibTeX entry from the paper.
- **Visual Aid:** Embedding the LoRI architecture diagram from the GitHub repository.
This update makes the model card much more informative and user-friendly for researchers and practitioners.
@@ -2,192 +2,203 @@
|
|
2 |
base_model: meta-llama/Meta-Llama-3-8B
|
3 |
library_name: peft
|
4 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
---
|
6 |
|
7 |
# Model Card for LoRI-S_code_llama3_rank_64
|
8 |
|
9 |
This model is part of [LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation](https://arxiv.org/abs/2504.07448).
|
10 |
|
11 |
-
|
12 |
-
|
13 |
|
|
|
|
|
|
|
14 |
|
15 |
## Model Details
|
16 |
|
17 |
### Model Description
|
18 |
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
|
23 |
-
-
|
24 |
-
-
|
25 |
-
-
|
26 |
-
-
|
27 |
-
-
|
28 |
-
- **License:** [More Information Needed]
|
29 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
30 |
|
31 |
-
### Model Sources
|
32 |
|
33 |
-
|
34 |
-
|
35 |
-
-
|
36 |
-
-
|
37 |
-
- **Demo [optional]:** [More Information Needed]
|
38 |
|
39 |
## Uses
|
40 |
|
41 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
42 |
-
|
43 |
### Direct Use
|
44 |
|
45 |
-
|
46 |
-
|
47 |
-
[More Information Needed]
|
48 |
|
49 |
-
### Downstream Use
|
50 |
|
51 |
-
|
52 |
-
|
53 |
-
[More Information Needed]
|
54 |
|
55 |
### Out-of-Scope Use
|
56 |
|
57 |
-
|
58 |
-
|
59 |
-
[More Information Needed]
|
60 |
|
61 |
## Bias, Risks, and Limitations
|
62 |
|
63 |
-
|
64 |
-
|
65 |
-
|
|
|
66 |
|
67 |
### Recommendations
|
68 |
|
69 |
-
|
70 |
-
|
71 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
72 |
|
73 |
## How to Get Started with the Model
|
74 |
|
75 |
-
Use the code below to get started with the model
|
76 |
-
|
77 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
78 |
|
79 |
## Training Details
|
80 |
|
81 |
### Training Data
|
82 |
|
83 |
-
|
84 |
-
|
85 |
-
|
|
|
86 |
|
87 |
### Training Procedure
|
88 |
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
[More Information Needed]
|
94 |
-
|
95 |
|
96 |
#### Training Hyperparameters
|
97 |
|
98 |
-
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
[More Information Needed]
|
105 |
|
106 |
## Evaluation
|
107 |
|
108 |
-
|
109 |
-
|
110 |
-
### Testing Data, Factors & Metrics
|
111 |
-
|
112 |
-
#### Testing Data
|
113 |
-
|
114 |
-
<!-- This should link to a Dataset Card if possible. -->
|
115 |
-
|
116 |
-
[More Information Needed]
|
117 |
-
|
118 |
-
#### Factors
|
119 |
-
|
120 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
121 |
-
|
122 |
-
[More Information Needed]
|
123 |
-
|
124 |
-
#### Metrics
|
125 |
-
|
126 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
127 |
-
|
128 |
-
[More Information Needed]
|
129 |
|
130 |
### Results
|
131 |
|
132 |
-
[
|
133 |
-
|
134 |
-
#### Summary
|
135 |
|
136 |
-
|
137 |
-
|
138 |
-
## Model Examination [optional]
|
139 |
-
|
140 |
-
<!-- Relevant interpretability work for the model goes here -->
|
141 |
-
|
142 |
-
[More Information Needed]
|
143 |
-
|
144 |
-
## Technical Specifications [optional]
|
145 |
|
146 |
### Model Architecture and Objective
|
147 |
|
148 |
-
|
149 |
|
150 |
### Compute Infrastructure
|
151 |
|
152 |
-
[More Information Needed]
|
153 |
-
|
154 |
-
#### Hardware
|
155 |
-
|
156 |
-
[More Information Needed]
|
157 |
-
|
158 |
#### Software
|
159 |
|
160 |
-
|
161 |
-
|
162 |
-
## Citation [optional]
|
163 |
-
|
164 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
165 |
-
|
166 |
-
**BibTeX:**
|
167 |
-
|
168 |
-
[More Information Needed]
|
169 |
|
170 |
-
|
171 |
|
172 |
-
|
173 |
|
174 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
175 |
|
176 |
-
|
177 |
|
178 |
-
|
179 |
-
|
180 |
-
## More Information [optional]
|
181 |
-
|
182 |
-
[More Information Needed]
|
183 |
-
|
184 |
-
## Model Card Authors [optional]
|
185 |
-
|
186 |
-
[More Information Needed]
|
187 |
|
188 |
## Model Card Contact
|
189 |
|
190 |
-
|
191 |
-
### Framework versions
|
192 |
-
|
193 |
-
- PEFT 0.12.0
|
|
|
2 |
base_model: meta-llama/Meta-Llama-3-8B
|
3 |
library_name: peft
|
4 |
pipeline_tag: text-generation
|
5 |
+
license: apache-2.0
|
6 |
+
tags:
|
7 |
+
- peft
|
8 |
+
- lora
|
9 |
+
- code-generation
|
10 |
+
- llama
|
11 |
+
datasets:
|
12 |
+
- CodeAlpaca
|
13 |
---
|
14 |
|
15 |
# Model Card for LoRI-S_code_llama3_rank_64
|
16 |
|
17 |
This model is part of [LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation](https://arxiv.org/abs/2504.07448).
|
18 |
|
19 |
+
LoRI (LoRA with Reduced Interference) is a simple yet effective parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs). It addresses common issues like notable overhead and parameter interference in multi-task scenarios by freezing the projection matrices `A` as random projections and sparsifying the matrices `B` using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance, minimizing cross-task interference in adapter merging, and supporting continual learning by mitigating catastrophic forgetting.
|
|
|
20 |
|
21 |
+
<div align="center">
|
22 |
+
<img src="https://github.com/juzhengz/LoRI/raw/main/LoRI.png" alt="LoRI Architecture" width="80%">
|
23 |
+
</div>
|
24 |
|
25 |
## Model Details
|
26 |
|
27 |
### Model Description
|
28 |
|
29 |
+
LoRI-S_code_llama3_rank_64 is a specific LoRI-S (Sparse) adapter trained for code generation tasks. It is built upon the `meta-llama/Meta-Llama-3-8B` base model with an adapter rank of 64. The LoRI approach has been demonstrated to outperform full fine-tuning and existing PEFT methods, using up to 95% fewer trainable parameters than standard LoRA. This model is part of a broader set of LoRI adapters that cover natural language understanding, mathematical reasoning, code generation, and safety alignment tasks.
|
|
|
|
|
30 |
|
31 |
+
- **Developed by:** Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein
|
32 |
+
- **Model type:** Low-Rank Adaptation (LoRA) variant (LoRI-S), Parameter-Efficient Fine-Tuning (PEFT) adapter for Causal Language Models.
|
33 |
+
- **Language(s) (NLP):** English
|
34 |
+
- **License:** Apache-2.0
|
35 |
+
- **Finetuned from model:** `meta-llama/Meta-Llama-3-8B`
|
|
|
|
|
36 |
|
37 |
+
### Model Sources
|
38 |
|
39 |
+
- **Repository:** [https://github.com/juzhengz/LoRI](https://github.com/juzhengz/LoRI)
|
40 |
+
- **Paper:** [https://huggingface.co/papers/2504.07448](https://huggingface.co/papers/2504.07448)
|
41 |
+
- **Project Page:** [https://juzhengz.github.io/](https://juzhengz.github.io/)
|
42 |
+
- **Hugging Face Collection:** [https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011](https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011)
|
|
|
43 |
|
44 |
## Uses
|
45 |
|
|
|
|
|
46 |
### Direct Use
|
47 |
|
48 |
+
This model is intended to be used as a PEFT adapter to efficiently fine-tune or enhance the `meta-llama/Meta-Llama-3-8B` base model specifically for code generation tasks. It should be loaded using the Hugging Face `PEFT` library on top of the base LLM.
|
|
|
|
|
49 |
|
50 |
+
### Downstream Use
|
51 |
|
52 |
+
LoRI adapters are particularly designed for multi-task scenarios and continual learning, where they enable effective adapter merging and reduce cross-task interference. This model can be combined with other LoRI adapters for different tasks to build more robust multi-task systems.
|
|
|
|
|
53 |
|
54 |
### Out-of-Scope Use
|
55 |
|
56 |
+
This model is not intended for standalone use; it strictly requires the `meta-llama/Meta-Llama-3-8B` as its base model. Like all large language models, it may generate biased, harmful, or factually incorrect content, and should not be used in critical applications without thorough evaluation and additional safeguards.
|
|
|
|
|
57 |
|
58 |
## Bias, Risks, and Limitations
|
59 |
|
60 |
+
While LoRI aims to reduce interference and parameter overhead, the model may still inherit biases present in its pre-training or fine-tuning data (e.g., CodeAlpaca, Meta-Llama-3-8B's pre-training data). Potential risks and limitations include:
|
61 |
+
- **Generalization:** Performance may degrade on code generation tasks significantly different from its training distribution.
|
62 |
+
- **Factual Accuracy:** Generated code or comments may not always be logically sound or factually correct.
|
63 |
+
- **Safety:** The model may generate insecure or malicious code, or outputs that perpetuate stereotypes or harmful content if not properly constrained.
|
64 |
|
65 |
### Recommendations
|
66 |
|
67 |
+
Users (both direct and downstream) should be aware of these potential issues and implement appropriate validation and filtering mechanisms for the model's outputs. It is recommended to apply responsible AI practices and conduct task-specific evaluations.
|
|
|
|
|
68 |
|
69 |
## How to Get Started with the Model
|
70 |
|
71 |
+
Use the code below to get started with the model:
|
72 |
+
|
73 |
+
```python
|
74 |
+
import torch
|
75 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
76 |
+
from peft import PeftModel
|
77 |
+
|
78 |
+
# 1. Load the base model
|
79 |
+
base_model_name = "meta-llama/Meta-Llama-3-8B"
|
80 |
+
base_model = AutoModelForCausalLM.from_pretrained(
|
81 |
+
base_model_name,
|
82 |
+
torch_dtype=torch.bfloat16, # Llama 3 models often use bfloat16
|
83 |
+
device_map="auto", # Load model onto available devices (GPU if available)
|
84 |
+
low_cpu_mem_usage=True # Optimize CPU memory usage
|
85 |
+
)
|
86 |
+
|
87 |
+
# 2. Load the LoRI adapter
|
88 |
+
# Replace "tomg-group-umd/LoRI-S_code_llama3_rank_64" with the correct model ID if different
|
89 |
+
adapter_model_id = "tomg-group-umd/LoRI-S_code_llama3_rank_64"
|
90 |
+
adapter_model = PeftModel.from_pretrained(base_model, adapter_model_id)
|
91 |
+
|
92 |
+
# 3. Load the tokenizer
|
93 |
+
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
|
94 |
+
# Set pad_token if not already set, crucial for batching/generation
|
95 |
+
if tokenizer.pad_token is None:
|
96 |
+
tokenizer.pad_token = tokenizer.eos_token # Or another appropriate token
|
97 |
+
|
98 |
+
# 4. Set the model to evaluation mode
|
99 |
+
adapter_model.eval()
|
100 |
+
|
101 |
+
# 5. Prepare your input prompt for code generation
|
102 |
+
prompt = '''
|
103 |
+
def bubble_sort(arr):
|
104 |
+
n = len(arr)
|
105 |
+
for i in range(n - 1):
|
106 |
+
for j in range(0, n - i - 1):
|
107 |
+
if arr[j] > arr[j + 1]:
|
108 |
+
arr[j], arr[j + 1] = arr[j + 1], arr[j]
|
109 |
+
return arr
|
110 |
+
|
111 |
+
# Write a docstring for the function above, describing its purpose and parameters.
|
112 |
+
'''
|
113 |
+
|
114 |
+
# Encode the prompt and move to the model's device
|
115 |
+
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(adapter_model.device)
|
116 |
+
|
117 |
+
# 6. Generate output
|
118 |
+
with torch.no_grad():
|
119 |
+
output_ids = adapter_model.generate(
|
120 |
+
input_ids,
|
121 |
+
max_new_tokens=100,
|
122 |
+
do_sample=True, # Sample outputs
|
123 |
+
temperature=0.01, # Low temperature for less randomness, more deterministic code
|
124 |
+
top_p=0.95, # Nucleus sampling
|
125 |
+
num_return_sequences=1,
|
126 |
+
eos_token_id=tokenizer.eos_token_id,
|
127 |
+
pad_token_id=tokenizer.pad_token_id,
|
128 |
+
)
|
129 |
+
|
130 |
+
# Decode and print the generated text
|
131 |
+
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
|
132 |
+
print(generated_text)
|
133 |
+
|
134 |
+
# Optional: Merge adapter weights into the base model for easier deployment
|
135 |
+
# merged_model = adapter_model.merge_and_unload()
|
136 |
+
# merged_model.save_pretrained("path/to/merged-lori-model")
|
137 |
+
```
|
138 |
|
139 |
## Training Details
|
140 |
|
141 |
### Training Data
|
142 |
|
143 |
+
This `LoRI-S_code_llama3_rank_64` adapter was specifically fine-tuned on the **CodeAlpaca** dataset for code generation tasks. The LoRI paper also describes experiments on:
|
144 |
+
- **Natural Language Understanding (NLU):** GLUE benchmark
|
145 |
+
- **Mathematical Reasoning:** GSM8K dataset
|
146 |
+
- **Safety Alignment:** Saferpaca dataset
|
147 |
|
148 |
### Training Procedure
|
149 |
|
150 |
+
LoRI training typically involves a two-stage process, implemented using Fully Sharded Data Parallel (FSDP) for efficient multi-GPU training:
|
151 |
+
1. **LoRI-D (Dense) Training:** An initial phase where the projection matrices `A` are frozen as random projections, and the `B` matrices are trained densely.
|
152 |
+
2. **Mask Extraction:** After `LoRI-D` training, sparse masks are extracted from the learned `B` matrices. For `LoRI-S` models, a high sparsity level (e.g., 90%) is typically applied.
|
153 |
+
3. **LoRI-S (Sparse) Training:** The model continues training using these extracted sparse masks. This particular model, `LoRI-S_code_llama3_rank_64`, is the result of this sparsified training phase.
|
|
|
|
|
154 |
|
155 |
#### Training Hyperparameters
|
156 |
|
157 |
+
- **Base Model:** `meta-llama/Meta-Llama-3-8B`
|
158 |
+
- **Adapter Rank (`r`):** 64
|
159 |
+
- **LoRA Alpha (`lora_alpha`):** 128
|
160 |
+
- **LoRA Dropout (`lora_dropout`):** 0.05
|
161 |
+
- **Sparsity (for LoRI-S phase):** 90%
|
162 |
+
- **Training Regime:** Mixed precision (bf16 for Llama 3 models)
|
|
|
163 |
|
164 |
## Evaluation
|
165 |
|
166 |
+
LoRI models have been extensively evaluated across natural language understanding, mathematical reasoning, code generation, and safety alignment tasks. Experiments demonstrate that LoRI outperforms full fine-tuning and existing PEFT methods, while using significantly fewer trainable parameters (up to 95% less than LoRA). In multi-task settings, LoRI enables effective adapter merging and continual learning with reduced cross-task interference.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
167 |
|
168 |
### Results
|
169 |
|
170 |
+
For detailed quantitative results, specific metrics (e.g., HumanEval for code generation, SuperGLUE for NLU, GSM8K for math), and comprehensive comparisons against baselines, please refer to the [official paper](https://huggingface.co/papers/2504.07448).
|
|
|
|
|
171 |
|
172 |
+
## Technical Specifications
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
173 |
|
174 |
### Model Architecture and Objective
|
175 |
|
176 |
+
LoRI introduces a modification to the standard LoRA architecture where the projection matrices `A` are fixed as random projections, and the matrices `B` are sparsified using task-specific masks. This design is aimed at reducing cross-task interference in multi-task learning and mitigating catastrophic forgetting in continual learning scenarios.
|
177 |
|
178 |
### Compute Infrastructure
|
179 |
|
|
|
|
|
|
|
|
|
|
|
|
|
180 |
#### Software
|
181 |
|
182 |
+
- PEFT 0.12.0
|
183 |
+
- Transformers (compatible with versions supporting Llama 3 and PEFT)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
184 |
|
185 |
+
## Citation
|
186 |
|
187 |
+
If you use LoRI in your work, please cite:
|
188 |
|
189 |
+
```bibtex
|
190 |
+
@article{zhang2025lori,
|
191 |
+
title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation},
|
192 |
+
author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom},
|
193 |
+
journal={arXiv preprint arXiv:2504.07448},
|
194 |
+
year={2025}
|
195 |
+
}
|
196 |
+
```
|
197 |
|
198 |
+
## Model Card Authors
|
199 |
|
200 |
+
Niels Rogge (Hugging Face Community Science Team)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
201 |
|
202 |
## Model Card Contact
|
203 |
|
204 | |
|
|
|
|
|