Improve model card: Add license, paper, code, and usage for LoRI-D_code_llama3_rank_64 (#1)

86d2488 verified 28 days ago

7.53 kB

	---
	base_model: meta-llama/Meta-Llama-3-8B
	library_name: peft
	pipeline_tag: text-generation
	license: apache-2.0
	---

	# Model Card for LoRI-D_code_llama3_rank_64

	This model is part of [LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation](https://arxiv.org/abs/2504.07448).

	LoRI (LoRA with Reduced Interference) is a simple yet effective approach that freezes the projection matrices $A$ as random projections and sparsifies the matrices $B$ using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance. Moreover, LoRI minimizes cross-task interference in adapter merging by leveraging the orthogonality between adapter subspaces, and supports continual learning by using sparsity to mitigate catastrophic forgetting.

	<div align="center">
	<img src="https://github.com/juzhengz/LoRI/raw/main/LoRI.png" alt="LoRI" width="80%">
	</div>

	## Model Details

	### Model Description

	LoRI-D_code_llama3_rank_64 is an adapter for the `meta-llama/Meta-Llama-3-8B` base model, fine-tuned using the LoRI (LoRA with Reduced Interference) framework specifically for code generation tasks. LoRI is a parameter-efficient fine-tuning (PEFT) method designed to address overhead and parameter interference in multi-task scenarios when using traditional LoRA. It achieves this by freezing projection matrices `A` as random projections and sparsifying matrices `B` with task-specific masks, significantly reducing trainable parameters while maintaining strong performance. This model utilizes a rank of 64 for its LoRA adaptations.

	- Developed by: Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein
	- Model type: Low-Rank Adaptation (LoRA) adapter for Causal Language Models
	- Language(s) (NLP): English
	- License: Apache-2.0
	- Finetuned from model: `meta-llama/Meta-Llama-3-8B`

	### Model Sources

	- Repository: [https://github.com/juzhengz/LoRI/](https://github.com/juzhengz/LoRI/)
	- Paper: [https://huggingface.co/papers/2504.07448](https://huggingface.co/papers/2504.07448)
	- Hugging Face Collection: [LoRI Adapters](https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011)

	## Uses

	### Direct Use

	This model is intended to be loaded as a PEFT adapter on top of the `meta-llama/Meta-Llama-3-8B` base model to enhance its performance on code generation tasks. It provides an efficient way to fine-tune large language models with significantly fewer trainable parameters.

	### Downstream Use

	LoRI adapters facilitate effective adapter merging and continual learning across various tasks, including natural language understanding, mathematical reasoning, code generation, and safety alignment. This makes them suitable for multi-task learning environments and adaptive model deployments.

	### Out-of-Scope Use

	This model is not intended for generating harmful, biased, or unethical content. Users should exercise caution and implement appropriate safeguards when deploying it in real-world applications, especially in sensitive domains.

	## Bias, Risks, and Limitations

	As an adaptation method built upon pre-trained Large Language Models, LoRI models inherit biases and risks present in their base models (e.g., Meta-Llama-3-8B) and the datasets they were fine-tuned on. Users should be aware of potential issues related to fairness, toxicity, and factual accuracy. Specific limitations include:
	- Performance might vary depending on the chosen base model and the sparsity level.
	- While LoRI significantly reduces cross-task interference, perfect isolation of knowledge across tasks is not guaranteed during adapter merging.

	### Recommendations

	Users (both direct and downstream) should refer to the original `meta-llama/Meta-Llama-3-8B` model card for inherent biases and risks. It is recommended to perform task-specific evaluations and careful validation when deploying models fine-tuned with LoRI in sensitive applications.

	## How to Get Started with the Model

	Pretrained LoRI adapters are available via the Hugging Face collection and can be loaded as follows:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch

	# Load the base model
	base_model = AutoModelForCausalLM.from_pretrained(
	"meta-llama/Meta-Llama-3-8B",
	torch_dtype=torch.bfloat16,
	device_map="auto" # or specify your device, e.g., "cuda"
	)

	# Load the LoRI adapter
	adapter = PeftModel.from_pretrained(base_model, "tomg-group-umd/LoRI-D_code_llama3_rank_64")

	# Load the tokenizer
	tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")

	# Example for text generation (code generation)
	prompt = "def factorial(n):
	if n == 0:
	return 1
	else:
	"
	inputs = tokenizer(prompt, return_tensors="pt").to(base_model.device)

	# Generate text
	with torch.no_grad():
	outputs = adapter.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)

	generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(generated_text)
	```

	## Training Details

	### Training Data

	LoRI adapters were extensively evaluated and trained on various datasets relevant to specific tasks. For code generation tasks, like this model, the `CodeAlpaca` dataset was primarily used. Other tasks included:
	- Mathematical reasoning: `GSM8K`
	- Safety alignment: `Saferpaca`
	- Natural language understanding: (specific datasets for NLU implied but not detailed in source)

	### Training Procedure

	LoRI is implemented using Fully Sharded Data Parallel (FSDP) and supports multi-GPU training environments. The training process involves two main stages:
	1. LoRI-D (Decomposition): Initial training where projection matrices `A` are frozen as random projections, and matrices `B` are learned. This stage also extracts sparse masks.
	2. LoRI-S (Sparsity): Continued training with the learned sparse masks (e.g., 90% sparsity) applied to matrices `B`, further reducing parameters and promoting orthogonality.

	#### Training Hyperparameters

	- Adapter ranks: Models were trained with adapter ranks of 32 and 64 (this model uses rank 64).
	- Sparsity: 90% (for `LoRI-S` stage).
	- Base models used: LLaMA-3-8B and Mistral-7B.

	## Evaluation

	Extensive experiments demonstrated that LoRI outperforms full fine-tuning and existing PEFT methods while using up to 95% fewer trainable parameters than standard LoRA. For code generation, performance was evaluated on the HumanEval benchmark. In multi-task experiments, LoRI enabled effective adapter merging and continual learning with reduced cross-task interference. Detailed evaluation results and comparisons can be found in the accompanying paper.

	## Acknowledgements

	This project builds on the codebase of [dpo-rlaif](https://github.com/architsharma97/dpo-rlaif) and incorporates code from [lottery-ticket-adaptation](https://github.com/kiddyboots216/lottery-ticket-adaptation). Code generation performance on HumanEval is evaluated using the [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness).

	## Citation

	If you use LoRI in your work, please cite:

	```bibtex
	@article{zhang2025lori,
	title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation},
	author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom},
	journal={arXiv preprint arXiv:2504.07448},
	year={2025}
	}
	```

	## Framework versions

	- PEFT 0.12.0