Text Generation
PEFT
Safetensors
juzhengz's picture
Improve model card with full details and usage for LoRI-D_nlu_llama3_rank_64 (#1)
8470f0f verified
metadata
base_model: meta-llama/Meta-Llama-3-8B
library_name: peft
pipeline_tag: text-generation
license: apache-2.0

Model Card for LoRI-D_nlu_llama3_rank_64

This model is part of LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation.

This is an adapter model based on the paper LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation, which introduces a simple yet effective approach to Low-Rank Adaptation (LoRA) for Large Language Models (LLMs). LoRI freezes the projection matrices A as random projections and sparsifies the matrices B using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance, minimizes cross-task interference in adapter merging, and supports continual learning by using sparsity to mitigate catastrophic forgetting.

LoRI Framework

✨ Key Highlights

  • Scalable & Efficient: Uses up to 95% fewer trainable parameters than traditional LoRA while maintaining performance.
  • Reduced Interference: Minimizes cross-task interference in multi-task scenarios by leveraging orthogonality between adapter subspaces.
  • Continual Learning: Supports continual learning by using sparsity to mitigate catastrophic forgetting.
  • Universal Applicability: Evaluated across natural language understanding, mathematical reasoning, code generation, and safety alignment tasks.

Model Details

Model Description

The LoRI-D_nlu_llama3_rank_64 model is a LoRA adapter specifically designed for Natural Language Understanding (NLU) tasks, fine-tuned on the meta-llama/Meta-Llama-3-8B base model with a rank of 64. It is part of the LoRI family of models, which aims to provide parameter-efficient fine-tuning with reduced cross-task interference.

  • Developed by: Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein
  • Model type: Low-Rank Adaptation (LoRI) adapter (PEFT method for LLMs)
  • Language(s) (NLP): English
  • License: Apache 2.0
  • Finetuned from model: meta-llama/Meta-Llama-3-8B

Model Sources

Uses

Direct Use

This model is intended to be used as a PEFT adapter on top of the meta-llama/Meta-Llama-3-8B base model for natural language understanding tasks, leveraging its efficient design for reduced parameter overhead and improved multi-task performance.

Downstream Use

LoRI adapters can be merged for multi-task applications or sequentially applied for continual learning without significant performance degradation. This makes LoRI suitable for building generalist agents or systems that need to learn new skills over time.

Out-of-Scope Use

This model is not intended for use in high-stakes or safety-critical applications without further rigorous testing and validation. Given its focus on NLU tasks, its performance on other domains or tasks without specific fine-tuning is not guaranteed.

Bias, Risks, and Limitations

As with any language model, this model may inherit biases present in its training data, including the base model (Llama-3-8B) and the datasets used for LoRI fine-tuning. Potential risks include generating biased, inaccurate, or harmful content.

Recommendations

Users should carefully evaluate the model's output for their specific application and consider fine-tuning on domain-specific, curated data to mitigate potential biases or limitations.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3-8B",
    torch_dtype=torch.bfloat16, # or torch.float16 depending on your hardware
    device_map="auto"
)

# Load the LoRI adapter
adapter = PeftModel.from_pretrained(base_model, "tomg-group-umd/LoRI-D_nlu_llama3_rank_64")

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")

# Example usage for a general text generation task (adjust for specific NLU use-cases)
prompt = "The quick brown fox jumps over the lazy dog."
inputs = tokenizer(prompt, return_tensors="pt").to(adapter.device)

# Generate text
outputs = adapter.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

# For specific NLU tasks, the prompt and expected output format would vary.
# You would then apply relevant NLU processing to the generated text or use the adapter's output directly.

Training Details

Training Data

The LoRI models are trained on various datasets depending on the task:

  • Natural Language Understanding (NLU): Specific NLU datasets, as indicated by this model.
  • Code generation: CodeAlpaca dataset.
  • Mathematical reasoning: GSM8K dataset.
  • Safety alignment: Saferpaca dataset.

More details on specific datasets can be found in the GitHub repository.

Training Procedure

LoRI is implemented using Fully Sharded Data Parallel (FSDP) for multi-GPU training. The training involves two main stages:

  1. LoRI-D (Dense) training: Adapters are trained with random projection matrices A frozen and B matrices dense. Sparse masks are then extracted.
  2. LoRI-S (Sparse) training: Training continues with the extracted sparse masks applied to matrices B, typically at 90% sparsity.

Training Hyperparameters

  • Training regime: Mixed precision (e.g., bfloat16 for Llama-3) is typically used for training large models.
  • Adapter Rank (r): 64 (for this LoRI-D_nlu_llama3_rank_64 model).
  • LoRA Alpha (lora_alpha): 128 (from adapter_config.json).
  • LoRA Dropout (lora_dropout): 0.05 (from adapter_config.json).
  • Target Modules (target_modules): o_proj, k_proj, up_proj, q_proj, v_proj, down_proj, gate_proj (from adapter_config.json).

Evaluation

Testing Data, Factors & Metrics

LoRI's performance has been extensively evaluated across natural language understanding, mathematical reasoning, code generation (e.g., HumanEval), and safety alignment tasks.

Metrics

Performance is measured using relevant metrics for each task. The paper demonstrates that LoRI consistently outperforms full fine-tuning and existing PEFT methods across various tasks, while using up to 95% fewer trainable parameters than traditional LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference. For detailed quantitative results, please refer to the paper.

Technical Specifications

Model Architecture and Objective

LoRI introduces a novel architecture where projection matrices A in LoRA are frozen as random projections, and matrices B are sparsified using task-specific masks. This design is intended to achieve monosemantic experts, reduce trainable parameters, and minimize cross-task interference. The objective remains focused on improving performance on downstream tasks while promoting parameter efficiency and modularity.

Compute Infrastructure

Hardware

Training was performed in a multi-GPU environment using technologies like Fully Sharded Data Parallel (FSDP).

Software

The implementation uses Python, PyTorch, and the Hugging Face transformers and peft libraries.

Acknowledgements

This project builds on the codebase of dpo-rlaif and incorporates code from lottery-ticket-adaptation. Code generation performance on HumanEval is evaluated using the bigcode-evaluation-harness.

Citation

If you use LoRI in your work, please cite:

@article{zhang2025lori,
  title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation},
  author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom},
  journal={arXiv preprint arXiv:2504.07448},
  year={2025}
}

Model Card Contact

For questions or inquiries, please refer to the contact information provided in the original repository.

Framework versions

  • PEFT 0.12.0