Phi-3.5 Mini Instruct Medical Chat (LoRA Adapter)

This is a LoRA adapter for the microsoft/Phi-3.5-mini-instruct model, fine-tuned using QLoRA on medical instruction-following datasets. This is NOT a standalone modelβ€”you must load it with the base model.

πŸ”₯ How to Use the LoRA Adapter

To use this adapter, you need the base model microsoft/Phi-3.5-mini-instruct. Load it with peft:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)

# Define base model and your fine-tuned LoRA checkpoint
base_model_name = "microsoft/Phi-3.5-mini-instruct"  
lora_model_path = "syubraj/Phi-3.5-mini-instruct-MedicalChat-QLoRA"  

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# Load model with proper 4-bit quantization settings
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,  
    bnb_4bit_quant_type="nf4"
)

base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3.5-mini-instruct",
    quantization_config=bnb_config,
    device_map="auto"
)

model = PeftModel.from_pretrained(base_model, lora_model_path)

model = model.merge_and_unload()
model.to(device)

print("Model successfully loaded!")

# Inference function
def generate_response(user_query, system_message=None, max_length=1024):
    if system_message is None:
        system_message = ("You are a trusted AI-powered medical assistant. "
                          "Analyze patient queries carefully and provide accurate, professional, and empathetic responses. "
                          "Prioritize patient safety, adhere to medical best practices, and recommend consulting a healthcare provider when necessary.")

    # Prepare input prompt
    prompt = f"<|system|> {system_message} <|end|>\n<|user|> {user_query} <|end|>\n<|assistant|>"

    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    outputs = model.generate(**inputs, max_length=max_length)

    # Decode response
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("<|assistant|>")[-1].strip().split("<|end|>")[0].strip()

if __name__ == "__main__":
    res = generate_response("Hi, How can someone let go of fever?")
    print(res)

πŸ’‘ Training Details

  • Base Model: microsoft/Phi-3.5-mini-instruct
  • Fine-Tuned On: Medical conversations & instruction-based datasets
  • Fine-Tuning Method: QLoRA
  • Precision: 4-bit (bitsandbytes)

πŸ“Œ License & Credits

  • This adapter follows the Apache-2.0 License.
  • Credits: syubraj for fine-tuning.

πŸš€ Citation

If you use this model, please cite:

@misc{syubraj2024phi3.5medical,
  title={Phi-3.5 Mini Instruct Medical Chat (LoRA Adapter)},
  author={syubraj},
  year={2024},
  url={https://huggingface.co/syubraj/Phi-3.5-mini-instruct-MedicalChat-adapter}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for syubraj/Phi-3.5-mini-instruct-MedicalChat-QLoRA

Finetuned
(64)
this model

Dataset used to train syubraj/Phi-3.5-mini-instruct-MedicalChat-QLoRA