--- license: apache-2.0 language: - en - fr - ja - fi - id - ru - ar - it - uk - es - pt - ko - 'no' - vi - tr - da - ca - zh - nl - et metrics: - code_eval base_model: - Qwen/Qwen2.5-1.5B-Instruct library_name: transformers tags: - code datasets: - lewishamilton21/LLM_Multilingual_dataset --- Hugging Face Model Card (`README.md`) ````markdown --- license: apache-2.0 tags: - causal-lm - text-generation - chatbot - qwen - deepseek - lora - 4bit - bitsandbytes library_name: transformers pipeline_tag: text-generation quantized: true base_model: Qwen/Qwen2.5-1.5B-Instruct --- Qwen_1.5B_multilingual_Fine-Tuned_LLM — LoRA 4-bit Fine-Tuned Model This is a conversational language model based on [Qwen/Qwen2.5-1.5B-Instruct](: https://huggingface.co/Gensyn/Qwen2.5-1.5B-Instruct) fine-tuned with [LoRA adapters](https://github.com/huggingface/peft) for efficient training and inference. The model is loaded using **4-bit quantization (NF4)** through [BitsAndBytes](https://github.com/TimDettmers/bitsandbytes), enabling memory-efficient inference on consumer-grade GPUs. --- ## Model Details - **Base model**: `: 'Qwen2.5-1.5B-Instruct` - **Fine-tuning technique**: LoRA (Low-Rank Adaptation) - **Quantization**: 4-bit NF4 via BitsAndBytes - **Framework**: Hugging Face Transformers + PEFT - **Pipeline**: `text-generation` --- ## Intended Use This model is designed for **multi-turn chatbot applications**, creative writing, instruction following, and general-purpose text generation tasks within responsible use guidelines. --- ## Example Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig model_name = "lewishamilton21/Qwen_1.5B_multilingual_Fine-Tuned_LLM" bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype="float16" ) tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=bnb_config, device_map="auto" ) inputs = tokenizer("Hello, how are you today?", return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ```` --- ## Evaluation Metrics | Metric | Value (example) | | :---------------- | :-------------- | | Quantization Type | 4-bit NF4 | | LoRA Rank | 8 or 16 | | Max Length Tested | 2048 tokens | | VRAM (A100 40GB) | \~3.5 GB | *Custom benchmarks coming soon.* --- ## Training & Fine-Tuning Fine-tuned via LoRA adapters using PEFT. To reproduce: ```python from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training from transformers import TrainingArguments, Trainer # Load model in 4bit model = AutoModelForCausalLM.from_pretrained( base_model_id, quantization_config=bnb_config, device_map="auto" ) model = prepare_model_for_kbit_training(model) model = get_peft_model(model, LoraConfig(...)) # Trainer setup trainer = Trainer( model=model, args=TrainingArguments(...), train_dataset=dataset ) trainer.train() ``` --- ## License Apache 2.0 — free for research and commercial use within the license terms. --- ## Acknowledgements * DeepSeek AI * Hugging Face Transformers * BitsAndBytes by Tim Dettmers * Hugging Face PEFT --- ```