--- base_model: unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit tags: - text-generation-inference - transformers - unsloth - qwen2 - trl - grpo - deepseek license: apache-2.0 language: - en datasets: - gretelai/symptom_to_diagnosis --- A Qwen2.5 3Billion parameter model trained to "think" like DeepSeek's R1 using GRPO to be able deduce a disease using patients' complaints in one-shot! Tiny but really impressive model. Training to think and reason has also resulted significant boost in general ELO of the model. # Uploaded model - **Developed by:** dumbequation - **License:** apache-2.0 - **Finetuned from model :** unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.