Improve model card with full details and usage for LoRI-D_nlu_llama3_rank_64 (#1)

Browse files

- Improve model card with full details and usage for LoRI-D_nlu_llama3_rank_64 (663383aac9e7478eceaed5f8c05b67cc0ec1c955)

Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show

README.md +89 -107

README.md CHANGED Viewed

@@ -2,192 +2,174 @@
 base_model: meta-llama/Meta-Llama-3-8B
 library_name: peft
 pipeline_tag: text-generation
 ---
 # Model Card for LoRI-D_nlu_llama3_rank_64
 This model is part of [LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation](https://arxiv.org/abs/2504.07448).
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
 ### Model Description
-<!-- Provide a longer summary of what this model is. -->
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
 ## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
 ### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
 ## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
 ### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
 Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
 ### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
 #### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Technical Specifications [optional]
 ### Model Architecture and Objective
-[More Information Needed]
 ### Compute Infrastructure
-[More Information Needed]
 #### Hardware
-[More Information Needed]
 #### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
 ## Model Card Contact
-[More Information Needed]
 ### Framework versions
 - PEFT 0.12.0

 base_model: meta-llama/Meta-Llama-3-8B
 library_name: peft
 pipeline_tag: text-generation
+license: apache-2.0
 ---
 # Model Card for LoRI-D_nlu_llama3_rank_64
 This model is part of [LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation](https://arxiv.org/abs/2504.07448).
+This is an adapter model based on the paper **LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation**, which introduces a simple yet effective approach to Low-Rank Adaptation (LoRA) for Large Language Models (LLMs). LoRI freezes the projection matrices A as random projections and sparsifies the matrices B using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance, minimizes cross-task interference in adapter merging, and supports continual learning by using sparsity to mitigate catastrophic forgetting.
+<div align="center">
+    <img src="https://github.com/juzhengz/LoRI/raw/main/LoRI.png" alt="LoRI Framework" width="80%">
+</div>
+### ✨ Key Highlights
+*   **Scalable & Efficient**: Uses up to 95% fewer trainable parameters than traditional LoRA while maintaining performance.
+*   **Reduced Interference**: Minimizes cross-task interference in multi-task scenarios by leveraging orthogonality between adapter subspaces.
+*   **Continual Learning**: Supports continual learning by using sparsity to mitigate catastrophic forgetting.
+*   **Universal Applicability**: Evaluated across natural language understanding, mathematical reasoning, code generation, and safety alignment tasks.
 ## Model Details
 ### Model Description
+The `LoRI-D_nlu_llama3_rank_64` model is a LoRA adapter specifically designed for Natural Language Understanding (NLU) tasks, fine-tuned on the `meta-llama/Meta-Llama-3-8B` base model with a rank of 64. It is part of the LoRI family of models, which aims to provide parameter-efficient fine-tuning with reduced cross-task interference.
+-   **Developed by:** Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein
+-   **Model type:** Low-Rank Adaptation (LoRI) adapter (PEFT method for LLMs)
+-   **Language(s) (NLP):** English
+-   **License:** Apache 2.0
+-   **Finetuned from model:** `meta-llama/Meta-Llama-3-8B`
+### Model Sources
+-   **Repository:** [https://github.com/juzhengz/LoRI/](https://github.com/juzhengz/LoRI/)
+-   **Paper:** [https://arxiv.org/abs/2504.07448](https://arxiv.org/abs/2504.07448)
+-   **HuggingFace Collection:** [https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011](https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011)
 ## Uses
 ### Direct Use
+This model is intended to be used as a PEFT adapter on top of the `meta-llama/Meta-Llama-3-8B` base model for natural language understanding tasks, leveraging its efficient design for reduced parameter overhead and improved multi-task performance.
+### Downstream Use
+LoRI adapters can be merged for multi-task applications or sequentially applied for continual learning without significant performance degradation. This makes LoRI suitable for building generalist agents or systems that need to learn new skills over time.
 ### Out-of-Scope Use
+This model is not intended for use in high-stakes or safety-critical applications without further rigorous testing and validation. Given its focus on NLU tasks, its performance on other domains or tasks without specific fine-tuning is not guaranteed.
 ## Bias, Risks, and Limitations
+As with any language model, this model may inherit biases present in its training data, including the base model (`Llama-3-8B`) and the datasets used for LoRI fine-tuning. Potential risks include generating biased, inaccurate, or harmful content.
 ### Recommendations
+Users should carefully evaluate the model's output for their specific application and consider fine-tuning on domain-specific, curated data to mitigate potential biases or limitations.
 ## How to Get Started with the Model
 Use the code below to get started with the model.
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import torch
+# Load the base model
+base_model = AutoModelForCausalLM.from_pretrained(
+    "meta-llama/Meta-Llama-3-8B",
+    torch_dtype=torch.bfloat16, # or torch.float16 depending on your hardware
+    device_map="auto"
+)
+# Load the LoRI adapter
+adapter = PeftModel.from_pretrained(base_model, "tomg-group-umd/LoRI-D_nlu_llama3_rank_64")
+# Load the tokenizer
+tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")
+# Example usage for a general text generation task (adjust for specific NLU use-cases)
+prompt = "The quick brown fox jumps over the lazy dog."
+inputs = tokenizer(prompt, return_tensors="pt").to(adapter.device)
+# Generate text
+outputs = adapter.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7)
+generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(generated_text)
+# For specific NLU tasks, the prompt and expected output format would vary.
+# You would then apply relevant NLU processing to the generated text or use the adapter's output directly.
+```
+## Training Details
+### Training Data
+The LoRI models are trained on various datasets depending on the task:
+-   **Natural Language Understanding (NLU):** Specific NLU datasets, as indicated by this model.
+-   **Code generation:** CodeAlpaca dataset.
+-   **Mathematical reasoning:** GSM8K dataset.
+-   **Safety alignment:** Saferpaca dataset.
+More details on specific datasets can be found in the [GitHub repository](https://github.com/juzhengz/LoRI/).
+### Training Procedure
+LoRI is implemented using Fully Sharded Data Parallel (FSDP) for multi-GPU training. The training involves two main stages:
+1.  **LoRI-D (Dense) training**: Adapters are trained with random projection matrices `A` frozen and `B` matrices dense. Sparse masks are then extracted.
+2.  **LoRI-S (Sparse) training**: Training continues with the extracted sparse masks applied to matrices `B`, typically at 90% sparsity.
+#### Training Hyperparameters
+-   **Training regime:** Mixed precision (e.g., `bfloat16` for Llama-3) is typically used for training large models.
+-   **Adapter Rank (`r`):** 64 (for this `LoRI-D_nlu_llama3_rank_64` model).
+-   **LoRA Alpha (`lora_alpha`):** 128 (from `adapter_config.json`).
+-   **LoRA Dropout (`lora_dropout`):** 0.05 (from `adapter_config.json`).
+-   **Target Modules (`target_modules`):** `o_proj`, `k_proj`, `up_proj`, `q_proj`, `v_proj`, `down_proj`, `gate_proj` (from `adapter_config.json`).
 ## Evaluation
 ### Testing Data, Factors & Metrics
+LoRI's performance has been extensively evaluated across natural language understanding, mathematical reasoning, code generation (e.g., HumanEval), and safety alignment tasks.
 #### Metrics
+Performance is measured using relevant metrics for each task. The paper demonstrates that LoRI consistently outperforms full fine-tuning and existing PEFT methods across various tasks, while using up to 95% fewer trainable parameters than traditional LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference. For detailed quantitative results, please refer to the [paper](https://arxiv.org/abs/2504.07448).
+## Technical Specifications
 ### Model Architecture and Objective
+LoRI introduces a novel architecture where projection matrices `A` in LoRA are frozen as random projections, and matrices `B` are sparsified using task-specific masks. This design is intended to achieve monosemantic experts, reduce trainable parameters, and minimize cross-task interference. The objective remains focused on improving performance on downstream tasks while promoting parameter efficiency and modularity.
 ### Compute Infrastructure
 #### Hardware
+Training was performed in a multi-GPU environment using technologies like Fully Sharded Data Parallel (FSDP).
 #### Software
+The implementation uses Python, PyTorch, and the Hugging Face `transformers` and `peft` libraries.
+## Acknowledgements
+This project builds on the codebase of [dpo-rlaif](https://github.com/architsharma97/dpo-rlaif) and incorporates code from [lottery-ticket-adaptation](https://github.com/kiddyboots216/lottery-ticket-adaptation). Code generation performance on HumanEval is evaluated using the [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness).
+## Citation
+If you use LoRI in your work, please cite:
+```bibtex
+@article{zhang2025lori,
+  title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation},
+  author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom},
+  journal={arXiv preprint arXiv:2504.07448},
+  year={2025}
+}
+```
 ## Model Card Contact
+For questions or inquiries, please refer to the contact information provided in the original [repository](https://github.com/juzhengz/LoRI/).
 ### Framework versions
 - PEFT 0.12.0