POLARIS-Project
/

Polaris-4B-Preview

Model card Files Files and versions

Chancy commited on Jun 29

Commit

5ae511a

·

verified ·

1 Parent(s): c5763ac

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -33,6 +33,10 @@ base_model:
 Polaris is an open-source post-training method that uses reinforcement learning (RL) scaling to refine and enhance models with advanced reasoning abilities. Our research shows that even top-tier models like Qwen3-4B can achieve significant improvements on challenging reasoning tasks when optimized with Polaris.
  By leveraging open-source data and academic-level resources, Polaris pushes the capabilities of open-recipe reasoning models to unprecedented heights. In benchmark tests, our method even surpasses top commercial systems, including Claude-4-Opus, Grok-3-Beta, and o3-mini-high (2025/01/03).
 ## Polaris's Recipe
 - **Data Difficulty:**  Before training, Polaris analyzes and maps the distribution of data difficulty. The dataset should not be overwhelmed by either overly difficult or trivially easy problems. We recommend using a data distribution with a slight bias toward challenging problems, which typically exhibits a mirrored J-shaped distribution.
 - **Diversity-Based Rollout:** We leverage the *diversity among rollouts* to initialize the sampling temperature, which is then progressively increased throughout the RL training stages.

 Polaris is an open-source post-training method that uses reinforcement learning (RL) scaling to refine and enhance models with advanced reasoning abilities. Our research shows that even top-tier models like Qwen3-4B can achieve significant improvements on challenging reasoning tasks when optimized with Polaris.
  By leveraging open-source data and academic-level resources, Polaris pushes the capabilities of open-recipe reasoning models to unprecedented heights. In benchmark tests, our method even surpasses top commercial systems, including Claude-4-Opus, Grok-3-Beta, and o3-mini-high (2025/01/03).
+<div align="center">
+  <img src="https://raw.githubusercontent.com/ChenxinAn-fdu/POLARIS/main/figs/aime25.png" alt="performance" style="width:60%;">
+</div>
 ## Polaris's Recipe
 - **Data Difficulty:**  Before training, Polaris analyzes and maps the distribution of data difficulty. The dataset should not be overwhelmed by either overly difficult or trivially easy problems. We recommend using a data distribution with a slight bias toward challenging problems, which typically exhibits a mirrored J-shaped distribution.
 - **Diversity-Based Rollout:** We leverage the *diversity among rollouts* to initialize the sampling temperature, which is then progressively increased throughout the RL training stages.