Base Model: Qwen/DeepSeek-R1-Distill-Qwen-7B
Training Epochs: 3
Training Objective: RL only
Training Data: ReasoningEval/Huatuo-RL
Base Model: Qwen/DeepSeek-R1-Distill-Qwen-7B
Training Epochs: 3
Training Objective: RL only
Training Data: ReasoningEval/Huatuo-RL