Enhance model card for LoRI-S_nlu_llama3_rank_64 with comprehensive details

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +116 -136
README.md CHANGED
@@ -2,192 +2,172 @@
2
  base_model: meta-llama/Meta-Llama-3-8B
3
  library_name: peft
4
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
  # Model Card for LoRI-S_nlu_llama3_rank_64
8
 
9
  This model is part of [LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation](https://arxiv.org/abs/2504.07448).
10
 
11
- <!-- Provide a quick summary of what the model is/does. -->
12
-
13
 
 
 
 
 
14
 
15
  ## Model Details
16
 
17
  ### Model Description
 
18
 
19
- <!-- Provide a longer summary of what this model is. -->
20
-
21
-
22
-
23
- - **Developed by:** [More Information Needed]
24
- - **Funded by [optional]:** [More Information Needed]
25
- - **Shared by [optional]:** [More Information Needed]
26
- - **Model type:** [More Information Needed]
27
- - **Language(s) (NLP):** [More Information Needed]
28
- - **License:** [More Information Needed]
29
- - **Finetuned from model [optional]:** [More Information Needed]
30
 
31
- ### Model Sources [optional]
32
 
33
- <!-- Provide the basic links for the model. -->
34
-
35
- - **Repository:** [More Information Needed]
36
- - **Paper [optional]:** [More Information Needed]
37
- - **Demo [optional]:** [More Information Needed]
38
 
39
  ## Uses
40
 
41
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
42
-
43
  ### Direct Use
 
44
 
45
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
46
-
47
- [More Information Needed]
48
-
49
- ### Downstream Use [optional]
 
50
 
51
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
52
-
53
- [More Information Needed]
54
 
55
  ### Out-of-Scope Use
56
-
57
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
58
-
59
- [More Information Needed]
60
 
61
  ## Bias, Risks, and Limitations
62
-
63
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
64
-
65
- [More Information Needed]
66
 
67
  ### Recommendations
68
-
69
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
70
-
71
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
72
 
73
  ## How to Get Started with the Model
74
 
75
- Use the code below to get started with the model.
76
-
77
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
 
79
  ## Training Details
80
 
81
  ### Training Data
82
-
83
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
84
-
85
- [More Information Needed]
86
 
87
  ### Training Procedure
 
 
 
88
 
89
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
90
-
91
- #### Preprocessing [optional]
92
-
93
- [More Information Needed]
94
-
95
 
96
  #### Training Hyperparameters
97
-
98
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
99
-
100
- #### Speeds, Sizes, Times [optional]
101
-
102
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
103
-
104
- [More Information Needed]
105
 
106
  ## Evaluation
107
 
108
- <!-- This section describes the evaluation protocols and provides the results. -->
109
-
110
  ### Testing Data, Factors & Metrics
111
-
112
- #### Testing Data
113
-
114
- <!-- This should link to a Dataset Card if possible. -->
115
-
116
- [More Information Needed]
117
-
118
- #### Factors
119
-
120
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
121
-
122
- [More Information Needed]
123
-
124
- #### Metrics
125
-
126
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
127
-
128
- [More Information Needed]
129
 
130
  ### Results
 
131
 
132
- [More Information Needed]
133
-
134
- #### Summary
135
-
136
-
137
-
138
- ## Model Examination [optional]
139
-
140
- <!-- Relevant interpretability work for the model goes here -->
141
-
142
- [More Information Needed]
143
-
144
- ## Technical Specifications [optional]
145
 
146
  ### Model Architecture and Objective
147
-
148
- [More Information Needed]
149
 
150
  ### Compute Infrastructure
151
 
152
- [More Information Needed]
153
-
154
  #### Hardware
155
-
156
- [More Information Needed]
157
 
158
  #### Software
159
-
160
- [More Information Needed]
161
-
162
- ## Citation [optional]
163
-
164
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
165
-
166
- **BibTeX:**
167
-
168
- [More Information Needed]
169
-
170
- **APA:**
171
-
172
- [More Information Needed]
173
-
174
- ## Glossary [optional]
175
-
176
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
177
-
178
- [More Information Needed]
179
-
180
- ## More Information [optional]
181
-
182
- [More Information Needed]
183
-
184
- ## Model Card Authors [optional]
185
-
186
- [More Information Needed]
187
-
188
- ## Model Card Contact
189
-
190
- [More Information Needed]
191
- ### Framework versions
192
-
193
- - PEFT 0.12.0
 
2
  base_model: meta-llama/Meta-Llama-3-8B
3
  library_name: peft
4
  pipeline_tag: text-generation
5
+ license: apache-2.0
6
+ tags:
7
+ - peft
8
+ - lora
9
+ - fine-tuning
10
+ - multi-task
11
+ - continual-learning
12
+ - natural-language-understanding
13
+ - causal-lm
14
  ---
15
 
16
  # Model Card for LoRI-S_nlu_llama3_rank_64
17
 
18
  This model is part of [LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation](https://arxiv.org/abs/2504.07448).
19
 
20
+ **Abstract:**
21
+ Low-Rank Adaptation (LoRA) has emerged as a popular parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs), yet it still incurs notable overhead and suffers from parameter interference in multi-task scenarios. We propose LoRA with Reduced Interference (LoRI), a simple yet effective approach that freezes the projection matrices $A$ as random projections and sparsifies the matrices $B$ using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance. Moreover, LoRI minimizes cross-task interference in adapter merging by leveraging the orthogonality between adapter subspaces, and supports continual learning by using sparsity to mitigate catastrophic forgetting. Extensive experiments across natural language understanding, mathematical reasoning, code generation, and safety alignment tasks demonstrate that LoRI outperforms full fine-tuning and existing PEFT methods, while using up to 95% fewer trainable parameters than LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference. Code is available at: this https URL
22
 
23
+ **Key Highlights:**
24
+ - **Reduced Trainable Parameters**: LoRI substantially reduces the number of trainable parameters (up to 95% fewer than standard LoRA) while maintaining strong task performance.
25
+ - **Minimized Cross-Task Interference**: By leveraging the orthogonality between adapter subspaces, LoRI minimizes interference when merging adapters.
26
+ - **Continual Learning Support**: LoRI uses sparsity to mitigate catastrophic forgetting, supporting effective continual learning.
27
 
28
  ## Model Details
29
 
30
  ### Model Description
31
+ LoRI-S_nlu_llama3_rank_64 is a specific adapter for `meta-llama/Meta-Llama-3-8B` fine-tuned for Natural Language Understanding (NLU) tasks using the LoRI (LoRA with Reduced Interference) method. LoRI is a parameter-efficient fine-tuning (PEFT) approach that freezes the LoRA projection matrices A as random projections and sparsifies the matrices B using task-specific masks. This design drastically reduces the number of trainable parameters while maintaining robust task performance. This model instance is trained with an adapter rank of 64.
32
 
33
+ - **Developed by:** Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein
34
+ - **Model type:** Low-Rank Adaptation (LoRA) with Reduced Interference (LoRI) adapter
35
+ - **Language(s) (NLP):** English
36
+ - **License:** Apache 2.0
37
+ - **Finetuned from model:** `meta-llama/Meta-Llama-3-8B`
 
 
 
 
 
 
38
 
39
+ ### Model Sources
40
 
41
+ - **Repository:** https://github.com/juzhengz/LoRI
42
+ - **Paper:** https://arxiv.org/abs/2504.07448
43
+ - **HuggingFace Collection:** https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011
 
 
44
 
45
  ## Uses
46
 
 
 
47
  ### Direct Use
48
+ LoRI is intended for parameter-efficient fine-tuning (PEFT) of Large Language Models (LLMs), particularly for single-task performance, multi-task scenarios (adapter merging), and continual learning. This specific adapter (`LoRI-S_nlu_llama3_rank_64`) is optimized for Natural Language Understanding (NLU) tasks.
49
 
50
+ ### Downstream Use
51
+ LoRI can be used to efficiently fine-tune LLMs for various tasks, including:
52
+ - Natural Language Understanding (NLU)
53
+ - Mathematical Reasoning
54
+ - Code Generation
55
+ - Safety Alignment
56
 
57
+ It is designed to outperform full fine-tuning and other PEFT methods while being highly parameter-efficient. Its reduced interference property makes it suitable for scenarios involving adapter merging and continual learning across different tasks.
 
 
58
 
59
  ### Out-of-Scope Use
60
+ The model should not be used for any illegal or unethical purposes. Users should be aware that the base model's limitations and biases may still be present. As a language model adapter, it should not be used in safety-critical applications without thorough additional testing and validation.
 
 
 
61
 
62
  ## Bias, Risks, and Limitations
63
+ The inherent biases, risks, and limitations of the base model (`meta-llama/Meta-Llama-3-8B`) apply to this adapter. Additionally, while LoRI aims to reduce cross-task interference, complete elimination of such interference may not be guaranteed across all possible task combinations. The paper focuses on specific benchmarks and tasks; performance on unaddressed tasks or distributions might vary.
 
 
 
64
 
65
  ### Recommendations
66
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Detailed evaluation on specific deployment scenarios and for diverse user groups is recommended to ensure responsible and fair usage.
 
 
 
67
 
68
  ## How to Get Started with the Model
69
 
70
+ Pretrained LoRI adapters are available via the HuggingFace collection and can be loaded as follows:
71
+
72
+ ```python
73
+ from transformers import AutoModelForCausalLM, AutoTokenizer
74
+ from peft import PeftModel
75
+ import torch
76
+
77
+ # Load the base model and tokenizer
78
+ base_model_id = "meta-llama/Meta-Llama-3-8B"
79
+ # This model card is for tomg-group-umd/LoRI-S_nlu_llama3_rank_64
80
+ lori_adapter_id = "tomg-group-umd/LoRI-S_nlu_llama3_rank_64"
81
+
82
+ # Load the base model with appropriate dtype and device mapping
83
+ # Adjust torch_dtype (e.g., torch.float16) as per your hardware/model requirements
84
+ base_model = AutoModelForCausalLM.from_pretrained(
85
+ base_model_id,
86
+ torch_dtype=torch.bfloat16,
87
+ low_cpu_mem_usage=True,
88
+ device_map="auto" # Automatically distribute model across available GPUs
89
+ )
90
+ tokenizer = AutoTokenizer.from_pretrained(base_model_id)
91
+
92
+ # Load the LoRI adapter and attach it to the base model
93
+ model = PeftModel.from_pretrained(base_model, lori_adapter_id)
94
+
95
+ # Optional: Merge the adapter weights into the base model for a single consolidated model
96
+ # This makes the model a standard Transformers model, removing the PEFT wrapper.
97
+ # model = model.merge_and_unload()
98
+
99
+ # Example inference
100
+ prompt = "What is the capital of France?"
101
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
102
+
103
+ # Generate text using the model with the loaded adapter
104
+ outputs = model.generate(
105
+ **inputs,
106
+ max_new_tokens=50, # Maximum number of new tokens to generate
107
+ temperature=0.7, # Sampling temperature
108
+ do_sample=True, # Enable sampling
109
+ eos_token_id=tokenizer.eos_token_id, # Stop generation at end-of-sequence token
110
+ )
111
+
112
+ # Decode the generated tokens, skipping the input prompt
113
+ generated_text = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
114
+ print(f"Prompt: {prompt}
115
+ Generated: {generated_text}")
116
+ ```
117
 
118
  ## Training Details
119
 
120
  ### Training Data
121
+ LoRI models are trained on various datasets across different tasks. For Natural Language Understanding (NLU) tasks, this model was trained on relevant NLU datasets. Other tasks supported by LoRI include:
122
+ - **Code generation:** CodeAlpaca
123
+ - **Mathematical reasoning:** GSM8K
124
+ - **Safety alignment:** Saferpaca
125
 
126
  ### Training Procedure
127
+ LoRI employs a two-stage training procedure as outlined in the paper and GitHub repository:
128
+ 1. **LoRI-D (Dense) training:** An initial phase where the projection matrices `A` are frozen as random projections, and matrices `B` are trained.
129
+ 2. **LoRI-S (Sparse) training:** Sparse masks are extracted from the trained `LoRI-D` models, and training continues with `LoRI-S` at a specified sparsity level (e.g., 90%).
130
 
131
+ The training is implemented using [Fully Sharded Data Parallel (FSDP)](https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html) and is designed for execution in a multi-GPU environment.
 
 
 
 
 
132
 
133
  #### Training Hyperparameters
134
+ - **Adapter ranks:** 32 and 64 (this model is rank 64).
135
+ - **Sparsity (LoRI-S):** 90%.
136
+ - Specific training scripts and hyperparameters for various tasks are available in the [LoRI GitHub repository](https://github.com/juzhengz/LoRI/tree/main/scripts).
 
 
 
 
 
137
 
138
  ## Evaluation
139
 
 
 
140
  ### Testing Data, Factors & Metrics
141
+ Evaluation was conducted across a wide range of tasks, including natural language understanding, mathematical reasoning, code generation, and safety alignment. For code generation performance, HumanEval was used as a benchmark, evaluated with the [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
142
 
143
  ### Results
144
+ Extensive experiments demonstrate that LoRI outperforms full fine-tuning and existing PEFT methods, while using up to 95% fewer trainable parameters than standard LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference. For detailed quantitative results and specific metrics, please refer to the [original paper](https://arxiv.org/abs/2504.07448).
145
 
146
+ ## Technical Specifications
 
 
 
 
 
 
 
 
 
 
 
 
147
 
148
  ### Model Architecture and Objective
149
+ LoRI modifies the standard LoRA architecture by freezing the projection matrices `A` as random projections and by sparsifying the matrices `B` using task-specific masks. This design aims to substantially reduce trainable parameters and minimize cross-task interference during adapter merging and continual learning, while maintaining strong task performance.
 
150
 
151
  ### Compute Infrastructure
152
 
 
 
153
  #### Hardware
154
+ Training and evaluation are designed for multi-GPU environments, leveraging techniques like Fully Sharded Data Parallel (FSDP).
 
155
 
156
  #### Software
157
+ The implementation relies on PyTorch and the PEFT library, along with other dependencies specified in the project's `requirements.txt`.
158
+ - **PEFT version:** 0.12.0
159
+
160
+ ## Citation
161
+ If you use LoRI in your work, please cite:
162
+
163
+ ```bibtex
164
+ @article{zhang2025lori,
165
+ title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation},
166
+ author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom},
167
+ journal={arXiv preprint arXiv:2504.07448},
168
+ year={2025}
169
+ }
170
+ ```
171
+
172
+ ## More Information
173
+ Feel free to reach out to the authors listed in the paper or refer to the [project's GitHub repository](https://github.com/juzhengz/LoRI) if you have any questions.