base_model: meta-llama/Meta-Llama-3-8B
library_name: peft
pipeline_tag: text-generation
license: apache-2.0
Model Card for LoRI-D_nlu_llama3_rank_64
This model is part of LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation.
This is an adapter model based on the paper LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation, which introduces a simple yet effective approach to Low-Rank Adaptation (LoRA) for Large Language Models (LLMs). LoRI freezes the projection matrices A as random projections and sparsifies the matrices B using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance, minimizes cross-task interference in adapter merging, and supports continual learning by using sparsity to mitigate catastrophic forgetting.

✨ Key Highlights
- Scalable & Efficient: Uses up to 95% fewer trainable parameters than traditional LoRA while maintaining performance.
- Reduced Interference: Minimizes cross-task interference in multi-task scenarios by leveraging orthogonality between adapter subspaces.
- Continual Learning: Supports continual learning by using sparsity to mitigate catastrophic forgetting.
- Universal Applicability: Evaluated across natural language understanding, mathematical reasoning, code generation, and safety alignment tasks.
Model Details
Model Description
The LoRI-D_nlu_llama3_rank_64
model is a LoRA adapter specifically designed for Natural Language Understanding (NLU) tasks, fine-tuned on the meta-llama/Meta-Llama-3-8B
base model with a rank of 64. It is part of the LoRI family of models, which aims to provide parameter-efficient fine-tuning with reduced cross-task interference.
- Developed by: Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein
- Model type: Low-Rank Adaptation (LoRI) adapter (PEFT method for LLMs)
- Language(s) (NLP): English
- License: Apache 2.0
- Finetuned from model:
meta-llama/Meta-Llama-3-8B
Model Sources
- Repository: https://github.com/juzhengz/LoRI/
- Paper: https://arxiv.org/abs/2504.07448
- HuggingFace Collection: https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011
Uses
Direct Use
This model is intended to be used as a PEFT adapter on top of the meta-llama/Meta-Llama-3-8B
base model for natural language understanding tasks, leveraging its efficient design for reduced parameter overhead and improved multi-task performance.
Downstream Use
LoRI adapters can be merged for multi-task applications or sequentially applied for continual learning without significant performance degradation. This makes LoRI suitable for building generalist agents or systems that need to learn new skills over time.
Out-of-Scope Use
This model is not intended for use in high-stakes or safety-critical applications without further rigorous testing and validation. Given its focus on NLU tasks, its performance on other domains or tasks without specific fine-tuning is not guaranteed.
Bias, Risks, and Limitations
As with any language model, this model may inherit biases present in its training data, including the base model (Llama-3-8B
) and the datasets used for LoRI fine-tuning. Potential risks include generating biased, inaccurate, or harmful content.
Recommendations
Users should carefully evaluate the model's output for their specific application and consider fine-tuning on domain-specific, curated data to mitigate potential biases or limitations.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3-8B",
torch_dtype=torch.bfloat16, # or torch.float16 depending on your hardware
device_map="auto"
)
# Load the LoRI adapter
adapter = PeftModel.from_pretrained(base_model, "tomg-group-umd/LoRI-D_nlu_llama3_rank_64")
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")
# Example usage for a general text generation task (adjust for specific NLU use-cases)
prompt = "The quick brown fox jumps over the lazy dog."
inputs = tokenizer(prompt, return_tensors="pt").to(adapter.device)
# Generate text
outputs = adapter.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
# For specific NLU tasks, the prompt and expected output format would vary.
# You would then apply relevant NLU processing to the generated text or use the adapter's output directly.
Training Details
Training Data
The LoRI models are trained on various datasets depending on the task:
- Natural Language Understanding (NLU): Specific NLU datasets, as indicated by this model.
- Code generation: CodeAlpaca dataset.
- Mathematical reasoning: GSM8K dataset.
- Safety alignment: Saferpaca dataset.
More details on specific datasets can be found in the GitHub repository.
Training Procedure
LoRI is implemented using Fully Sharded Data Parallel (FSDP) for multi-GPU training. The training involves two main stages:
- LoRI-D (Dense) training: Adapters are trained with random projection matrices
A
frozen andB
matrices dense. Sparse masks are then extracted. - LoRI-S (Sparse) training: Training continues with the extracted sparse masks applied to matrices
B
, typically at 90% sparsity.
Training Hyperparameters
- Training regime: Mixed precision (e.g.,
bfloat16
for Llama-3) is typically used for training large models. - Adapter Rank (
r
): 64 (for thisLoRI-D_nlu_llama3_rank_64
model). - LoRA Alpha (
lora_alpha
): 128 (fromadapter_config.json
). - LoRA Dropout (
lora_dropout
): 0.05 (fromadapter_config.json
). - Target Modules (
target_modules
):o_proj
,k_proj
,up_proj
,q_proj
,v_proj
,down_proj
,gate_proj
(fromadapter_config.json
).
Evaluation
Testing Data, Factors & Metrics
LoRI's performance has been extensively evaluated across natural language understanding, mathematical reasoning, code generation (e.g., HumanEval), and safety alignment tasks.
Metrics
Performance is measured using relevant metrics for each task. The paper demonstrates that LoRI consistently outperforms full fine-tuning and existing PEFT methods across various tasks, while using up to 95% fewer trainable parameters than traditional LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference. For detailed quantitative results, please refer to the paper.
Technical Specifications
Model Architecture and Objective
LoRI introduces a novel architecture where projection matrices A
in LoRA are frozen as random projections, and matrices B
are sparsified using task-specific masks. This design is intended to achieve monosemantic experts, reduce trainable parameters, and minimize cross-task interference. The objective remains focused on improving performance on downstream tasks while promoting parameter efficiency and modularity.
Compute Infrastructure
Hardware
Training was performed in a multi-GPU environment using technologies like Fully Sharded Data Parallel (FSDP).
Software
The implementation uses Python, PyTorch, and the Hugging Face transformers
and peft
libraries.
Acknowledgements
This project builds on the codebase of dpo-rlaif and incorporates code from lottery-ticket-adaptation. Code generation performance on HumanEval is evaluated using the bigcode-evaluation-harness.
Citation
If you use LoRI in your work, please cite:
@article{zhang2025lori,
title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation},
author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom},
journal={arXiv preprint arXiv:2504.07448},
year={2025}
}
Model Card Contact
For questions or inquiries, please refer to the contact information provided in the original repository.
Framework versions
- PEFT 0.12.0