---
license: apache-2.0
language:
- en
- fr
- ja
- fi
- id
- ru
- ar
- it
- uk
- es
- pt
- ko
- 'no'
- vi
- tr
- da
- ca
- zh
- nl
- et
metrics:
- code_eval
base_model:
- Qwen/Qwen2.5-1.5B-Instruct
library_name: transformers
tags:
- code
datasets:
- lewishamilton21/LLM_Multilingual_dataset

---
Hugging Face Model Card (`README.md`)

````markdown
---
license: apache-2.0
tags:
- causal-lm
- text-generation
- chatbot
- qwen
- deepseek
- lora
- 4bit
- bitsandbytes
library_name: transformers
pipeline_tag: text-generation
quantized: true
base_model: Qwen/Qwen2.5-1.5B-Instruct
---

Qwen_1.5B_multilingual_Fine-Tuned_LLM — LoRA 4-bit Fine-Tuned Model

This is a conversational language model based on [Qwen/Qwen2.5-1.5B-Instruct](: https://huggingface.co/Gensyn/Qwen2.5-1.5B-Instruct) fine-tuned with [LoRA adapters](https://github.com/huggingface/peft) for efficient training and inference. The model is loaded using **4-bit quantization (NF4)** through [BitsAndBytes](https://github.com/TimDettmers/bitsandbytes), enabling memory-efficient inference on consumer-grade GPUs.

---

##  Model Details

- **Base model**: `: 'Qwen2.5-1.5B-Instruct`
- **Fine-tuning technique**: LoRA (Low-Rank Adaptation)
- **Quantization**: 4-bit NF4 via BitsAndBytes
- **Framework**: Hugging Face Transformers + PEFT
- **Pipeline**: `text-generation`

---

##  Intended Use

This model is designed for **multi-turn chatbot applications**, creative writing, instruction following, and general-purpose text generation tasks within responsible use guidelines.

---

##  Example Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_name = "lewishamilton21/Qwen_1.5B_multilingual_Fine-Tuned_LLM"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16"
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto"
)

inputs = tokenizer("Hello, how are you today?", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
````

---

##  Evaluation Metrics

| Metric            | Value (example) |
| :---------------- | :-------------- |
| Quantization Type | 4-bit NF4       |
| LoRA Rank         | 8 or 16         |
| Max Length Tested | 2048 tokens     |
| VRAM (A100 40GB)  | \~3.5 GB        |

*Custom benchmarks coming soon.*

---

##  Training & Fine-Tuning

Fine-tuned via LoRA adapters using PEFT. To reproduce:

```python
from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training
from transformers import TrainingArguments, Trainer

# Load model in 4bit
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=bnb_config,
    device_map="auto"
)

model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, LoraConfig(...))

# Trainer setup
trainer = Trainer(
    model=model,
    args=TrainingArguments(...),
    train_dataset=dataset
)
trainer.train()
```

---

##  License

Apache 2.0 — free for research and commercial use within the license terms.

---


##  Acknowledgements

* DeepSeek AI
* Hugging Face Transformers
* BitsAndBytes by Tim Dettmers
* Hugging Face PEFT

---

```