cihangxie nielsr HF Staff commited on
Commit
9d605f0
·
verified ·
1 Parent(s): e390f03

Add/improve model card (#1)

Browse files

- Add/improve model card (cb1ccb7c9c86e9d4836e723d08c93ba50909df3f)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +53 -3
README.md CHANGED
@@ -1,3 +1,9 @@
 
 
 
 
 
 
1
  <div align="center">
2
  <h1>
3
  <b>m1</b>: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models
@@ -9,13 +15,57 @@ A simple test-time scaling strategy, with minimal fine-tuning, can unlock strong
9
 
10
  ## ⚡ Introduction
11
 
 
12
 
13
- Hi! Welcome to the huggingface repository for m1 (https://github.com/UCSC-VLAA/m1)!
14
 
15
  **m1** is a medical LLM designed to enhance reasoning through efficient test-time scaling. It enables lightweight models to match or exceed the performance of much larger counterparts by extending inference-time “thinking.” Unlike methods that rely on complex RL or expert supervision, m1 achieves strong results through:
16
 
17
- - **Fine-tuning on a small, high-quality set of verified medical reasoning examples**, showing that even with just 1K–23K examples, m1-7B *surpasses* models like HuatuoGPT-o1-7B and UltraMedical-8B, and m1-32B *rivals* 70B-scale models.
18
 
19
- - **Scaling reasoning at inference using token budgets**, which consistently improves performance across medical QA tasksup to an optimal ~4K token budget, beyond which performance may degrade due to overthinking.
20
 
21
  - **Identifying medical knowledge as the key bottleneck**, revealing that additional reasoning alone cannot overcome knowledge gaps; instead, improvements require better data quality and increased model capacity.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ pipeline_tag: question-answering
5
+ ---
6
+
7
  <div align="center">
8
  <h1>
9
  <b>m1</b>: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models
 
15
 
16
  ## ⚡ Introduction
17
 
18
+ ![](assets/teaser.png)
19
 
20
+ Hi! Welcome to the repository for **m1** (📃 [Paper](https://arxiv.org/abs/2504.00869))!
21
 
22
  **m1** is a medical LLM designed to enhance reasoning through efficient test-time scaling. It enables lightweight models to match or exceed the performance of much larger counterparts by extending inference-time “thinking.” Unlike methods that rely on complex RL or expert supervision, m1 achieves strong results through:
23
 
24
+ - **Fine-tuning on a small, high-quality set of verified medical reasoning examples**, showing that even with just 1K–23K examples, m1-7B *surpasses* previous SOTA models like HuatuoGPT-o1-7B and UltraMedical-8B, and m1-32B *rivals* 70B-scale models.
25
 
26
+ - **Scaling reasoning at inference using token budgets**, which consistently improves performance across medical QA tasks: up to an optimal ~4K token budget, beyond which performance may degrade due to overthinking.
27
 
28
  - **Identifying medical knowledge as the key bottleneck**, revealing that additional reasoning alone cannot overcome knowledge gaps; instead, improvements require better data quality and increased model capacity.
29
+
30
+ We open-sourced our models, data, and code here.
31
+
32
+
33
+
34
+ ****************************************************************
35
+
36
+ **Updates:**
37
+
38
+ * 2025-03: We release our code, data, models, and paper!
39
+
40
+ ****************************************************************
41
+
42
+ ### 🌍 Environment
43
+
44
+ Please refer to [docs/ENV.md](docs/ENV.md).
45
+
46
+ ### 👨‍⚕️ Models and Data
47
+
48
+ | Model | Backbone | Training Data | Link |
49
+ | ---------------- | --------------------- | ----------------------------------------------------------------------------- | -------------------------------------------------------------- |
50
+ | **m1-32b-1k** | Qwen2.5-32B-Instruct | [m1k](https://huggingface.co/datasets/UCSC-VLAA/m1k-tokenized) | [HF Link](https://huggingface.co/UCSC-VLAA/m1-32B-1K) |
51
+ | **m1-7b-1k** | Qwen2.5-7B-Instruct | [m1k](https://huggingface.co/datasets/UCSC-VLAA/m1k-tokenized) | [HF Link](https://huggingface.co/UCSC-VLAA/m1-7B-1K) |
52
+ | **m1-7b-23k** | Qwen2.5-7B-Instruct | [m23k](https://huggingface.co/datasets/UCSC-VLAA/m23k-tokenized) | [HF Link](https://huggingface.co/UCSC-VLAA/m1-7B-23K) |
53
+
54
+
55
+ ### 🏃 Inference
56
+
57
+ (... same content as original README ...)
58
+
59
+ ### 📖 Citation
60
+
61
+ ```
62
+ @misc{huang2025m1UnleashPotential,
63
+ title={m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models},
64
+ author={Xiaoke Huang and Juncheng Wu and Hui Liu and Xianfeng Tang and Yuyin Zhou},
65
+ year={2025},
66
+ eprint={2504.00869},
67
+ archivePrefix={arXiv},
68
+ primaryClass={cs.CL},
69
+ url={https://arxiv.org/abs/2504.00869},
70
+ }
71
+ ```