Base Model: Qwen/DeepSeek-R1-Distill-Qwen-7B | |
Training Epochs: 3 | |
Training Objective: RL only | |
Training Data: ReasoningEval/Huatuo-RL |
Base Model: Qwen/DeepSeek-R1-Distill-Qwen-7B | |
Training Epochs: 3 | |
Training Objective: RL only | |
Training Data: ReasoningEval/Huatuo-RL |