A Qwen2.5 3Billion parameter model trained to "think" like DeepSeek's R1 using GRPO to be able deduce a disease using patients' complaints in one-shot!

Tiny but really impressive model. Training to think and reason has also resulted significant boost in general ELO of the model.

Uploaded model

  • Developed by: dumbequation
  • License: apache-2.0
  • Finetuned from model : unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit

This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
52
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train dumbequation/Qwen2.5-3B-reasoning-medical-symptoms-GRPO-f16

Space using dumbequation/Qwen2.5-3B-reasoning-medical-symptoms-GRPO-f16 1

Collection including dumbequation/Qwen2.5-3B-reasoning-medical-symptoms-GRPO-f16