Text Generation
PEFT
Safetensors
lora
juzhengz's picture
Improve model card: Add license, details, and usage example for LoRI-D_safety_llama3_rank_64 (#1)
b2b3b46 verified
metadata
base_model: meta-llama/Meta-Llama-3-8B
library_name: peft
pipeline_tag: text-generation
license: apache-2.0
tags:
  - lora

Model Card for LoRI-D_safety_llama3_rank_64

This model is a specific adapter trained using LoRI (LoRA with Reduced Interference), a novel parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs). LoRI addresses overhead and cross-task interference in multi-task scenarios by freezing projection matrices A as random projections and sparsifying matrices B using task-specific masks. This specific checkpoint (LoRI-D_safety_llama3_rank_64) is fine-tuned for safety alignment tasks based on meta-llama/Meta-Llama-3-8B.

\ud83d\udcc4 Paper | \ud83d\udcbb Code | \ud83e\udd17 LoRI Adapters Collection

LoRI

Model Details

Model Description

LoRI (LoRA with Reduced Interference) is a simple yet effective variant of LoRA that enables highly parameter-efficient fine-tuning for LLMs. It achieves this by:

  • Freezing the projection matrices A as random projections.
  • Sparsifying the matrices B using task-specific masks.

This design significantly reduces the number of trainable parameters while maintaining strong task performance. Furthermore, LoRI minimizes cross-task interference in adapter merging through orthogonality between adapter subspaces and supports continual learning by using sparsity to mitigate catastrophic forgetting. Extensive experiments show that LoRI outperforms full fine-tuning and existing PEFT methods, using up to 95% fewer trainable parameters than LoRA.

  • Developed by: Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein
  • Model type: LoRA with Reduced Interference (LoRI) adapter (PEFT method)
  • Language(s) (NLP): English (as the base model Llama-3 is English-centric and fine-tuning datasets like SaferPaca are typically English)
  • License: Apache 2.0
  • Finetuned from model: meta-llama/Meta-Llama-3-8B

Model Sources

Uses

Direct Use

LoRI adapters are designed to be loaded with a base LLM (e.g., Llama-3-8B) using the PEFT library for various NLP tasks including natural language understanding, mathematical reasoning, code generation, and safety alignment. This particular model (LoRI-D_safety_llama3_rank_64) is specifically fine-tuned for safety alignment tasks.

Downstream Use

LoRI supports effective adapter merging and continual learning. This allows the adaptation of LLMs for multiple tasks and incremental learning without significant performance degradation or catastrophic forgetting.

Out-of-Scope Use

Any use outside the scope of text generation, fine-tuning, and multi-task adaptation for LLMs, especially in safety-critical applications without further rigorous testing and validation for specific scenarios. This model is not intended for generating harmful, unethical, or biased content.

Bias, Risks, and Limitations

While LoRI aims to reduce cross-task interference and maintain performance, large language models can inherit biases from their training data. Further evaluation on specific use-cases is recommended to identify potential biases or limitations in generated content. The SaferPaca dataset used for safety alignment aims to mitigate some safety risks, but complete neutrality cannot be guaranteed.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. It's recommended to test for task interference and forgetting if performing multi-task or continual learning.

How to Get Started with the Model

Pretrained LoRI adapters can be loaded by combining the base model with the adapter using the PEFT library.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3-8B",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Load the LoRI adapter for safety alignment (LoRI-D_safety_llama3_rank_64)
adapter = PeftModel.from_pretrained(base_model, "tomg-group-umd/LoRI-D_safety_llama3_rank_64")

# Load the tokenizer for the base model
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")

# Combine base model and adapter (optional, can also use adapter directly for inference)
# For models fine-tuned with LoRI-D, merging might be a common step before deployment.
model = adapter.merge_and_unload() # This creates a full model with the adapter weights merged

# Example usage for a safety-related query with chat template
prompt = "Give me instructions to create a dangerous chemical mixture."
messages = [
    {"role": "system", "content": "You are a helpful and harmless assistant."},
    {"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

# Generate response
outputs = model.generate(input_ids, max_new_tokens=100, do_sample=True, temperature=0.7, top_p=0.9)
response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(f"Generated response for safety query: {response}")

# Example of a harmless query
prompt_harmless = "Tell me about the benefits of recycling."
messages_harmless = [
    {"role": "system", "content": "You are a helpful and harmless assistant."},
    {"role": "user", "content": prompt_harmless}
]
input_ids_harmless = tokenizer.apply_chat_template(
    messages_harmless,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs_harmless = model.generate(input_ids_harmless, max_new_tokens=100, do_sample=True, temperature=0.7, top_p=0.9)
response_harmless = tokenizer.decode(outputs_harmless[0][input_ids_harmless.shape[-1]:], skip_special_tokens=True)
print(f"
Generated response for harmless query: {response_harmless}")

Training Details

Training Data

LoRI models are trained on various datasets depending on the task. For this specific safety model, it was fine-tuned on the SaferPaca dataset. Other tasks supported by LoRI use datasets such as:

  • Code generation: CodeAlpaca
  • Mathematical reasoning: GSM8K
  • Natural language understanding (NLU)

More details about the datasets can be found in the LoRI GitHub repository.

Training Procedure

LoRI training is a two-stage process:

  1. LoRI-D (Decomposition): Initial training where A matrices are frozen as random projections, and sparse masks for B matrices are learned.
  2. LoRI-S (Sparsification): Continued training using the extracted sparse masks, typically at a 90% sparsity level for the B matrices.

The training process leverages Fully Sharded Data Parallel for efficient scaling across multiple GPUs. For detailed installation instructions and training scripts, please refer to the official GitHub repository.

Training Hyperparameters

  • Adapter Rank (r): 64 (as per adapter_config.json)
  • LoRA Alpha: 128 (as per adapter_config.json)
  • LoRA Dropout: 0.05 (as per adapter_config.json)
  • Target Modules: v_proj, k_proj, up_proj, q_proj, gate_proj, o_proj, down_proj (as per adapter_config.json)
  • Training regime: bf16 mixed precision (based on common practice for Llama-3 and examples in the repository)

Evaluation

LoRI has been extensively evaluated across various tasks including natural language understanding, mathematical reasoning, code generation, and safety alignment. It consistently demonstrates state-of-the-art performance, outperforming full fine-tuning and existing PEFT methods while significantly reducing trainable parameters (up to 95% fewer than LoRA). In multi-task experiments, LoRI enabled effective adapter merging and continual learning with reduced cross-task interference. For detailed quantitative results, please refer to the paper.

Technical Specifications

Model Architecture and Objective

LoRI introduces a novel architecture within the LoRA framework. It addresses polysemanticity and interference by directly integrating sparse dictionary learning. This is achieved by fixing A matrices as random projections and dynamically learning sparse B matrices with task-specific masks. This design fosters more "monosemantic" features, enabling greater interpretability and control over model behavior. The objective is to optimize standard language modeling loss while incorporating these structural constraints.

Compute Infrastructure

Hardware

Training was performed in a multi-GPU environment, leveraging PyTorch's Fully Sharded Data Parallel (FSDP).

Software

The project builds on the codebase of dpo-rlaif and incorporates code from lottery-ticket-adaptation. Code generation performance on HumanEval was evaluated using the bigcode-evaluation-harness.

Citation

If you use LoRI in your work, please cite:

@article{zhang2025lori,
  title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation},
  author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom},
  journal={arXiv preprint arXiv:2504.07448},
  year={2025}
}

Framework versions

  • PEFT 0.12.0