--- base_model: meta-llama/Meta-Llama-3-8B library_name: peft pipeline_tag: text-generation license: apache-2.0 tags: - peft - lora - fine-tuning - multi-task - continual-learning - natural-language-understanding - causal-lm --- # Model Card for LoRI-S_nlu_llama3_rank_64 This model is part of [LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation](https://arxiv.org/abs/2504.07448). **Abstract:** Low-Rank Adaptation (LoRA) has emerged as a popular parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs), yet it still incurs notable overhead and suffers from parameter interference in multi-task scenarios. We propose LoRA with Reduced Interference (LoRI), a simple yet effective approach that freezes the projection matrices $A$ as random projections and sparsifies the matrices $B$ using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance. Moreover, LoRI minimizes cross-task interference in adapter merging by leveraging the orthogonality between adapter subspaces, and supports continual learning by using sparsity to mitigate catastrophic forgetting. Extensive experiments across natural language understanding, mathematical reasoning, code generation, and safety alignment tasks demonstrate that LoRI outperforms full fine-tuning and existing PEFT methods, while using up to 95% fewer trainable parameters than LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference. Code is available at: this https URL **Key Highlights:** - **Reduced Trainable Parameters**: LoRI substantially reduces the number of trainable parameters (up to 95% fewer than standard LoRA) while maintaining strong task performance. - **Minimized Cross-Task Interference**: By leveraging the orthogonality between adapter subspaces, LoRI minimizes interference when merging adapters. - **Continual Learning Support**: LoRI uses sparsity to mitigate catastrophic forgetting, supporting effective continual learning. ## Model Details ### Model Description LoRI-S_nlu_llama3_rank_64 is a specific adapter for `meta-llama/Meta-Llama-3-8B` fine-tuned for Natural Language Understanding (NLU) tasks using the LoRI (LoRA with Reduced Interference) method. LoRI is a parameter-efficient fine-tuning (PEFT) approach that freezes the LoRA projection matrices A as random projections and sparsifies the matrices B using task-specific masks. This design drastically reduces the number of trainable parameters while maintaining robust task performance. This model instance is trained with an adapter rank of 64. - **Developed by:** Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein - **Model type:** Low-Rank Adaptation (LoRA) with Reduced Interference (LoRI) adapter - **Language(s) (NLP):** English - **License:** Apache 2.0 - **Finetuned from model:** `meta-llama/Meta-Llama-3-8B` ### Model Sources - **Repository:** https://github.com/juzhengz/LoRI - **Paper:** https://arxiv.org/abs/2504.07448 - **HuggingFace Collection:** https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011 ## Uses ### Direct Use LoRI is intended for parameter-efficient fine-tuning (PEFT) of Large Language Models (LLMs), particularly for single-task performance, multi-task scenarios (adapter merging), and continual learning. This specific adapter (`LoRI-S_nlu_llama3_rank_64`) is optimized for Natural Language Understanding (NLU) tasks. ### Downstream Use LoRI can be used to efficiently fine-tune LLMs for various tasks, including: - Natural Language Understanding (NLU) - Mathematical Reasoning - Code Generation - Safety Alignment It is designed to outperform full fine-tuning and other PEFT methods while being highly parameter-efficient. Its reduced interference property makes it suitable for scenarios involving adapter merging and continual learning across different tasks. ### Out-of-Scope Use The model should not be used for any illegal or unethical purposes. Users should be aware that the base model's limitations and biases may still be present. As a language model adapter, it should not be used in safety-critical applications without thorough additional testing and validation. ## Bias, Risks, and Limitations The inherent biases, risks, and limitations of the base model (`meta-llama/Meta-Llama-3-8B`) apply to this adapter. Additionally, while LoRI aims to reduce cross-task interference, complete elimination of such interference may not be guaranteed across all possible task combinations. The paper focuses on specific benchmarks and tasks; performance on unaddressed tasks or distributions might vary. ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Detailed evaluation on specific deployment scenarios and for diverse user groups is recommended to ensure responsible and fair usage. ## How to Get Started with the Model Pretrained LoRI adapters are available via the HuggingFace collection and can be loaded as follows: ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch # Load the base model and tokenizer base_model_id = "meta-llama/Meta-Llama-3-8B" # This model card is for tomg-group-umd/LoRI-S_nlu_llama3_rank_64 lori_adapter_id = "tomg-group-umd/LoRI-S_nlu_llama3_rank_64" # Load the base model with appropriate dtype and device mapping # Adjust torch_dtype (e.g., torch.float16) as per your hardware/model requirements base_model = AutoModelForCausalLM.from_pretrained( base_model_id, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, device_map="auto" # Automatically distribute model across available GPUs ) tokenizer = AutoTokenizer.from_pretrained(base_model_id) # Load the LoRI adapter and attach it to the base model model = PeftModel.from_pretrained(base_model, lori_adapter_id) # Optional: Merge the adapter weights into the base model for a single consolidated model # This makes the model a standard Transformers model, removing the PEFT wrapper. # model = model.merge_and_unload() # Example inference prompt = "What is the capital of France?" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) # Generate text using the model with the loaded adapter outputs = model.generate( **inputs, max_new_tokens=50, # Maximum number of new tokens to generate temperature=0.7, # Sampling temperature do_sample=True, # Enable sampling eos_token_id=tokenizer.eos_token_id, # Stop generation at end-of-sequence token ) # Decode the generated tokens, skipping the input prompt generated_text = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) print(f"Prompt: {prompt} Generated: {generated_text}") ``` ## Training Details ### Training Data LoRI models are trained on various datasets across different tasks. For Natural Language Understanding (NLU) tasks, this model was trained on relevant NLU datasets. Other tasks supported by LoRI include: - **Code generation:** CodeAlpaca - **Mathematical reasoning:** GSM8K - **Safety alignment:** Saferpaca ### Training Procedure LoRI employs a two-stage training procedure as outlined in the paper and GitHub repository: 1. **LoRI-D (Dense) training:** An initial phase where the projection matrices `A` are frozen as random projections, and matrices `B` are trained. 2. **LoRI-S (Sparse) training:** Sparse masks are extracted from the trained `LoRI-D` models, and training continues with `LoRI-S` at a specified sparsity level (e.g., 90%). The training is implemented using [Fully Sharded Data Parallel (FSDP)](https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html) and is designed for execution in a multi-GPU environment. #### Training Hyperparameters - **Adapter ranks:** 32 and 64 (this model is rank 64). - **Sparsity (LoRI-S):** 90%. - Specific training scripts and hyperparameters for various tasks are available in the [LoRI GitHub repository](https://github.com/juzhengz/LoRI/tree/main/scripts). ## Evaluation ### Testing Data, Factors & Metrics Evaluation was conducted across a wide range of tasks, including natural language understanding, mathematical reasoning, code generation, and safety alignment. For code generation performance, HumanEval was used as a benchmark, evaluated with the [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness). ### Results Extensive experiments demonstrate that LoRI outperforms full fine-tuning and existing PEFT methods, while using up to 95% fewer trainable parameters than standard LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference. For detailed quantitative results and specific metrics, please refer to the [original paper](https://arxiv.org/abs/2504.07448). ## Technical Specifications ### Model Architecture and Objective LoRI modifies the standard LoRA architecture by freezing the projection matrices `A` as random projections and by sparsifying the matrices `B` using task-specific masks. This design aims to substantially reduce trainable parameters and minimize cross-task interference during adapter merging and continual learning, while maintaining strong task performance. ### Compute Infrastructure #### Hardware Training and evaluation are designed for multi-GPU environments, leveraging techniques like Fully Sharded Data Parallel (FSDP). #### Software The implementation relies on PyTorch and the PEFT library, along with other dependencies specified in the project's `requirements.txt`. - **PEFT version:** 0.12.0 ## Citation If you use LoRI in your work, please cite: ```bibtex @article{zhang2025lori, title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation}, author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom}, journal={arXiv preprint arXiv:2504.07448}, year={2025} } ``` ## More Information Feel free to reach out to the authors listed in the paper or refer to the [project's GitHub repository](https://github.com/juzhengz/LoRI) if you have any questions.