LoRI-D_safety_llama3_rank_64 / README.md

Improve model card: Add license, details, and usage example for LoRI-D_safety_llama3_rank_64 (#1)

b2b3b46 verified 26 days ago

10.1 kB

	---
	base_model: meta-llama/Meta-Llama-3-8B
	library_name: peft
	pipeline_tag: text-generation
	license: apache-2.0
	tags:
	- lora
	---

	# Model Card for LoRI-D_safety_llama3_rank_64

	This model is a specific adapter trained using LoRI (LoRA with Reduced Interference), a novel parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs). LoRI addresses overhead and cross-task interference in multi-task scenarios by freezing projection matrices `A` as random projections and sparsifying matrices `B` using task-specific masks. This specific checkpoint (`LoRI-D_safety_llama3_rank_64`) is fine-tuned for safety alignment tasks based on `meta-llama/Meta-Llama-3-8B`.

	\ud83d\udcc4 [Paper](https://arxiv.org/abs/2504.07448) \| \ud83d\udcbb [Code](https://github.com/juzhengz/LoRI/) \| \ud83e\udd17 [LoRI Adapters Collection](https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011)

	<div align="center">
	<img src="https://github.com/juzhengz/LoRI/raw/main/LoRI.png" alt="LoRI" width="80%">
	</div>

	## Model Details

	### Model Description

	LoRI (LoRA with Reduced Interference) is a simple yet effective variant of LoRA that enables highly parameter-efficient fine-tuning for LLMs. It achieves this by:
	* Freezing the projection matrices `A` as random projections.
	* Sparsifying the matrices `B` using task-specific masks.

	This design significantly reduces the number of trainable parameters while maintaining strong task performance. Furthermore, LoRI minimizes cross-task interference in adapter merging through orthogonality between adapter subspaces and supports continual learning by using sparsity to mitigate catastrophic forgetting. Extensive experiments show that LoRI outperforms full fine-tuning and existing PEFT methods, using up to 95% fewer trainable parameters than LoRA.

	- Developed by: Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein
	- Model type: LoRA with Reduced Interference (LoRI) adapter (PEFT method)
	- Language(s) (NLP): English (as the base model Llama-3 is English-centric and fine-tuning datasets like SaferPaca are typically English)
	- License: Apache 2.0
	- Finetuned from model: `meta-llama/Meta-Llama-3-8B`

	### Model Sources

	- Repository: https://github.com/juzhengz/LoRI/
	- Paper: https://arxiv.org/abs/2504.07448
	- Hugging Face Collection: https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011

	## Uses

	### Direct Use

	LoRI adapters are designed to be loaded with a base LLM (e.g., Llama-3-8B) using the `PEFT` library for various NLP tasks including natural language understanding, mathematical reasoning, code generation, and safety alignment. This particular model (`LoRI-D_safety_llama3_rank_64`) is specifically fine-tuned for safety alignment tasks.

	### Downstream Use

	LoRI supports effective adapter merging and continual learning. This allows the adaptation of LLMs for multiple tasks and incremental learning without significant performance degradation or catastrophic forgetting.

	### Out-of-Scope Use

	Any use outside the scope of text generation, fine-tuning, and multi-task adaptation for LLMs, especially in safety-critical applications without further rigorous testing and validation for specific scenarios. This model is not intended for generating harmful, unethical, or biased content.

	## Bias, Risks, and Limitations

	While LoRI aims to reduce cross-task interference and maintain performance, large language models can inherit biases from their training data. Further evaluation on specific use-cases is recommended to identify potential biases or limitations in generated content. The `SaferPaca` dataset used for safety alignment aims to mitigate some safety risks, but complete neutrality cannot be guaranteed.

	### Recommendations

	Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. It's recommended to test for task interference and forgetting if performing multi-task or continual learning.

	## How to Get Started with the Model

	Pretrained LoRI adapters can be loaded by combining the base model with the adapter using the `PEFT` library.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch

	# Load the base model
	base_model = AutoModelForCausalLM.from_pretrained(
	"meta-llama/Meta-Llama-3-8B",
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	# Load the LoRI adapter for safety alignment (LoRI-D_safety_llama3_rank_64)
	adapter = PeftModel.from_pretrained(base_model, "tomg-group-umd/LoRI-D_safety_llama3_rank_64")

	# Load the tokenizer for the base model
	tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")

	# Combine base model and adapter (optional, can also use adapter directly for inference)
	# For models fine-tuned with LoRI-D, merging might be a common step before deployment.
	model = adapter.merge_and_unload() # This creates a full model with the adapter weights merged

	# Example usage for a safety-related query with chat template
	prompt = "Give me instructions to create a dangerous chemical mixture."
	messages = [
	{"role": "system", "content": "You are a helpful and harmless assistant."},
	{"role": "user", "content": prompt}
	]
	input_ids = tokenizer.apply_chat_template(
	messages,
	tokenize=True,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)

	# Generate response
	outputs = model.generate(input_ids, max_new_tokens=100, do_sample=True, temperature=0.7, top_p=0.9)
	response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
	print(f"Generated response for safety query: {response}")

	# Example of a harmless query
	prompt_harmless = "Tell me about the benefits of recycling."
	messages_harmless = [
	{"role": "system", "content": "You are a helpful and harmless assistant."},
	{"role": "user", "content": prompt_harmless}
	]
	input_ids_harmless = tokenizer.apply_chat_template(
	messages_harmless,
	tokenize=True,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)

	outputs_harmless = model.generate(input_ids_harmless, max_new_tokens=100, do_sample=True, temperature=0.7, top_p=0.9)
	response_harmless = tokenizer.decode(outputs_harmless[0][input_ids_harmless.shape[-1]:], skip_special_tokens=True)
	print(f"
	Generated response for harmless query: {response_harmless}")
	```

	## Training Details

	### Training Data

	LoRI models are trained on various datasets depending on the task. For this specific `safety` model, it was fine-tuned on the `SaferPaca` dataset. Other tasks supported by LoRI use datasets such as:
	- Code generation: CodeAlpaca
	- Mathematical reasoning: GSM8K
	- Natural language understanding (NLU)

	More details about the datasets can be found in the [LoRI GitHub repository](https://github.com/juzhengz/LoRI/).

	### Training Procedure

	LoRI training is a two-stage process:
	1. LoRI-D (Decomposition): Initial training where `A` matrices are frozen as random projections, and sparse masks for `B` matrices are learned.
	2. LoRI-S (Sparsification): Continued training using the extracted sparse masks, typically at a 90% sparsity level for the `B` matrices.

	The training process leverages [Fully Sharded Data Parallel](https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html) for efficient scaling across multiple GPUs. For detailed installation instructions and training scripts, please refer to the [official GitHub repository](https://github.com/juzhengz/LoRI/).

	#### Training Hyperparameters

	- Adapter Rank (r): 64 (as per `adapter_config.json`)
	- LoRA Alpha: 128 (as per `adapter_config.json`)
	- LoRA Dropout: 0.05 (as per `adapter_config.json`)
	- Target Modules: `v_proj`, `k_proj`, `up_proj`, `q_proj`, `gate_proj`, `o_proj`, `down_proj` (as per `adapter_config.json`)
	- Training regime: bf16 mixed precision (based on common practice for Llama-3 and examples in the repository)

	## Evaluation

	LoRI has been extensively evaluated across various tasks including natural language understanding, mathematical reasoning, code generation, and safety alignment. It consistently demonstrates state-of-the-art performance, outperforming full fine-tuning and existing PEFT methods while significantly reducing trainable parameters (up to 95% fewer than LoRA). In multi-task experiments, LoRI enabled effective adapter merging and continual learning with reduced cross-task interference. For detailed quantitative results, please refer to the [paper](https://arxiv.org/abs/2504.07448).

	## Technical Specifications

	### Model Architecture and Objective

	LoRI introduces a novel architecture within the LoRA framework. It addresses polysemanticity and interference by directly integrating sparse dictionary learning. This is achieved by fixing `A` matrices as random projections and dynamically learning sparse `B` matrices with task-specific masks. This design fosters more "monosemantic" features, enabling greater interpretability and control over model behavior. The objective is to optimize standard language modeling loss while incorporating these structural constraints.

	### Compute Infrastructure

	#### Hardware

	Training was performed in a multi-GPU environment, leveraging PyTorch's Fully Sharded Data Parallel (FSDP).

	#### Software

	The project builds on the codebase of [dpo-rlaif](https://github.com/architsharma97/dpo-rlaif) and incorporates code from [lottery-ticket-adaptation](https://github.com/kiddyboots216/lottery-ticket-adaptation). Code generation performance on HumanEval was evaluated using the [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness).

	## Citation

	If you use LoRI in your work, please cite:

	```bibtex
	@article{zhang2025lori,
	title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation},
	author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom},
	journal={arXiv preprint arXiv:2504.07448},
	year={2025}
	}
	```

	### Framework versions

	- PEFT 0.12.0