Add model card metadata and link to code (#1)

- Add model card metadata and link to code (5acf4fea4bf58c84cc44c7390251823af88bd7ef)

Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,3 +1,10 @@
 <div align="center">
 <h1>
   <b>m1</b>: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models
@@ -7,10 +14,13 @@ A simple test-time scaling strategy, with minimal fine-tuning, can unlock strong
 </p>
 </div>
-## ⚡ Introduction
-Hi! Welcome to the huggingface repository for m1 (https://github.com/UCSC-VLAA/m1)!
 **m1** is a medical LLM designed to enhance reasoning through efficient test-time scaling. It enables lightweight models to match or exceed the performance of much larger counterparts by extending inference-time “thinking.” Unlike methods that rely on complex RL or expert supervision, m1 achieves strong results through:
@@ -19,3 +29,4 @@ Hi! Welcome to the huggingface repository for m1 (https://github.com/UCSC-VLAA/m
 - **Scaling reasoning at inference using token budgets**, which consistently improves performance across medical QA tasks—up to an optimal ~4K token budget, beyond which performance may degrade due to overthinking.
 - **Identifying medical knowledge as the key bottleneck**, revealing that additional reasoning alone cannot overcome knowledge gaps; instead, improvements require better data quality and increased model capacity.

+---
+license: mit
+library_name: transformers
+pipeline_tag: question-answering
+---
+```markdown
 <div align="center">
 <h1>
   <b>m1</b>: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models
 </p>
 </div>
+This repository contains the model presented in the paper [m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models](https://huggingface.co/papers/2504.00869).
+Code: https://github.com/UCSC-VLAA/m1
+## ⚡ Introduction
+Hi! Welcome to the huggingface repository for m1!
 **m1** is a medical LLM designed to enhance reasoning through efficient test-time scaling. It enables lightweight models to match or exceed the performance of much larger counterparts by extending inference-time “thinking.” Unlike methods that rely on complex RL or expert supervision, m1 achieves strong results through:
 - **Scaling reasoning at inference using token budgets**, which consistently improves performance across medical QA tasks—up to an optimal ~4K token budget, beyond which performance may degrade due to overthinking.
 - **Identifying medical knowledge as the key bottleneck**, revealing that additional reasoning alone cannot overcome knowledge gaps; instead, improvements require better data quality and increased model capacity.
+```