Improve model card with full details and usage for LoRI-D_nlu_llama3_rank_64 (#1)

8470f0f verified 26 days ago

9 kB

	---
	base_model: meta-llama/Meta-Llama-3-8B
	library_name: peft
	pipeline_tag: text-generation
	license: apache-2.0
	---

	# Model Card for LoRI-D_nlu_llama3_rank_64

	This model is part of [LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation](https://arxiv.org/abs/2504.07448).

	This is an adapter model based on the paper LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation, which introduces a simple yet effective approach to Low-Rank Adaptation (LoRA) for Large Language Models (LLMs). LoRI freezes the projection matrices A as random projections and sparsifies the matrices B using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance, minimizes cross-task interference in adapter merging, and supports continual learning by using sparsity to mitigate catastrophic forgetting.

	<div align="center">
	<img src="https://github.com/juzhengz/LoRI/raw/main/LoRI.png" alt="LoRI Framework" width="80%">
	</div>

	### ✨ Key Highlights

	* Scalable & Efficient: Uses up to 95% fewer trainable parameters than traditional LoRA while maintaining performance.
	* Reduced Interference: Minimizes cross-task interference in multi-task scenarios by leveraging orthogonality between adapter subspaces.
	* Continual Learning: Supports continual learning by using sparsity to mitigate catastrophic forgetting.
	* Universal Applicability: Evaluated across natural language understanding, mathematical reasoning, code generation, and safety alignment tasks.

	## Model Details

	### Model Description

	The `LoRI-D_nlu_llama3_rank_64` model is a LoRA adapter specifically designed for Natural Language Understanding (NLU) tasks, fine-tuned on the `meta-llama/Meta-Llama-3-8B` base model with a rank of 64. It is part of the LoRI family of models, which aims to provide parameter-efficient fine-tuning with reduced cross-task interference.

	- Developed by: Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein
	- Model type: Low-Rank Adaptation (LoRI) adapter (PEFT method for LLMs)
	- Language(s) (NLP): English
	- License: Apache 2.0
	- Finetuned from model: `meta-llama/Meta-Llama-3-8B`

	### Model Sources

	- Repository: [https://github.com/juzhengz/LoRI/](https://github.com/juzhengz/LoRI/)
	- Paper: [https://arxiv.org/abs/2504.07448](https://arxiv.org/abs/2504.07448)
	- HuggingFace Collection: [https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011](https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011)

	## Uses

	### Direct Use

	This model is intended to be used as a PEFT adapter on top of the `meta-llama/Meta-Llama-3-8B` base model for natural language understanding tasks, leveraging its efficient design for reduced parameter overhead and improved multi-task performance.

	### Downstream Use

	LoRI adapters can be merged for multi-task applications or sequentially applied for continual learning without significant performance degradation. This makes LoRI suitable for building generalist agents or systems that need to learn new skills over time.

	### Out-of-Scope Use

	This model is not intended for use in high-stakes or safety-critical applications without further rigorous testing and validation. Given its focus on NLU tasks, its performance on other domains or tasks without specific fine-tuning is not guaranteed.

	## Bias, Risks, and Limitations

	As with any language model, this model may inherit biases present in its training data, including the base model (`Llama-3-8B`) and the datasets used for LoRI fine-tuning. Potential risks include generating biased, inaccurate, or harmful content.

	### Recommendations

	Users should carefully evaluate the model's output for their specific application and consider fine-tuning on domain-specific, curated data to mitigate potential biases or limitations.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch

	# Load the base model
	base_model = AutoModelForCausalLM.from_pretrained(
	"meta-llama/Meta-Llama-3-8B",
	torch_dtype=torch.bfloat16, # or torch.float16 depending on your hardware
	device_map="auto"
	)

	# Load the LoRI adapter
	adapter = PeftModel.from_pretrained(base_model, "tomg-group-umd/LoRI-D_nlu_llama3_rank_64")

	# Load the tokenizer
	tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")

	# Example usage for a general text generation task (adjust for specific NLU use-cases)
	prompt = "The quick brown fox jumps over the lazy dog."
	inputs = tokenizer(prompt, return_tensors="pt").to(adapter.device)

	# Generate text
	outputs = adapter.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7)
	generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(generated_text)

	# For specific NLU tasks, the prompt and expected output format would vary.
	# You would then apply relevant NLU processing to the generated text or use the adapter's output directly.
	```

	## Training Details

	### Training Data

	The LoRI models are trained on various datasets depending on the task:
	- Natural Language Understanding (NLU): Specific NLU datasets, as indicated by this model.
	- Code generation: CodeAlpaca dataset.
	- Mathematical reasoning: GSM8K dataset.
	- Safety alignment: Saferpaca dataset.

	More details on specific datasets can be found in the [GitHub repository](https://github.com/juzhengz/LoRI/).

	### Training Procedure

	LoRI is implemented using Fully Sharded Data Parallel (FSDP) for multi-GPU training. The training involves two main stages:
	1. LoRI-D (Dense) training: Adapters are trained with random projection matrices `A` frozen and `B` matrices dense. Sparse masks are then extracted.
	2. LoRI-S (Sparse) training: Training continues with the extracted sparse masks applied to matrices `B`, typically at 90% sparsity.

	#### Training Hyperparameters

	- Training regime: Mixed precision (e.g., `bfloat16` for Llama-3) is typically used for training large models.
	- Adapter Rank (`r`): 64 (for this `LoRI-D_nlu_llama3_rank_64` model).
	- LoRA Alpha (`lora_alpha`): 128 (from `adapter_config.json`).
	- LoRA Dropout (`lora_dropout`): 0.05 (from `adapter_config.json`).
	- Target Modules (`target_modules`): `o_proj`, `k_proj`, `up_proj`, `q_proj`, `v_proj`, `down_proj`, `gate_proj` (from `adapter_config.json`).

	## Evaluation

	### Testing Data, Factors & Metrics

	LoRI's performance has been extensively evaluated across natural language understanding, mathematical reasoning, code generation (e.g., HumanEval), and safety alignment tasks.

	#### Metrics

	Performance is measured using relevant metrics for each task. The paper demonstrates that LoRI consistently outperforms full fine-tuning and existing PEFT methods across various tasks, while using up to 95% fewer trainable parameters than traditional LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference. For detailed quantitative results, please refer to the [paper](https://arxiv.org/abs/2504.07448).

	## Technical Specifications

	### Model Architecture and Objective

	LoRI introduces a novel architecture where projection matrices `A` in LoRA are frozen as random projections, and matrices `B` are sparsified using task-specific masks. This design is intended to achieve monosemantic experts, reduce trainable parameters, and minimize cross-task interference. The objective remains focused on improving performance on downstream tasks while promoting parameter efficiency and modularity.

	### Compute Infrastructure

	#### Hardware

	Training was performed in a multi-GPU environment using technologies like Fully Sharded Data Parallel (FSDP).

	#### Software

	The implementation uses Python, PyTorch, and the Hugging Face `transformers` and `peft` libraries.

	## Acknowledgements

	This project builds on the codebase of [dpo-rlaif](https://github.com/architsharma97/dpo-rlaif) and incorporates code from [lottery-ticket-adaptation](https://github.com/kiddyboots216/lottery-ticket-adaptation). Code generation performance on HumanEval is evaluated using the [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness).

	## Citation

	If you use LoRI in your work, please cite:

	```bibtex
	@article{zhang2025lori,
	title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation},
	author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom},
	journal={arXiv preprint arXiv:2504.07448},
	year={2025}
	}
	```

	## Model Card Contact

	For questions or inquiries, please refer to the contact information provided in the original [repository](https://github.com/juzhengz/LoRI/).

	### Framework versions

	- PEFT 0.12.0