Chancy commited on
Commit
5ae511a
·
verified ·
1 Parent(s): c5763ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -33,6 +33,10 @@ base_model:
33
  Polaris is an open-source post-training method that uses reinforcement learning (RL) scaling to refine and enhance models with advanced reasoning abilities. Our research shows that even top-tier models like Qwen3-4B can achieve significant improvements on challenging reasoning tasks when optimized with Polaris.
34
  By leveraging open-source data and academic-level resources, Polaris pushes the capabilities of open-recipe reasoning models to unprecedented heights. In benchmark tests, our method even surpasses top commercial systems, including Claude-4-Opus, Grok-3-Beta, and o3-mini-high (2025/01/03).
35
 
 
 
 
 
36
  ## Polaris's Recipe
37
  - **Data Difficulty:** Before training, Polaris analyzes and maps the distribution of data difficulty. The dataset should not be overwhelmed by either overly difficult or trivially easy problems. We recommend using a data distribution with a slight bias toward challenging problems, which typically exhibits a mirrored J-shaped distribution.
38
  - **Diversity-Based Rollout:** We leverage the *diversity among rollouts* to initialize the sampling temperature, which is then progressively increased throughout the RL training stages.
 
33
  Polaris is an open-source post-training method that uses reinforcement learning (RL) scaling to refine and enhance models with advanced reasoning abilities. Our research shows that even top-tier models like Qwen3-4B can achieve significant improvements on challenging reasoning tasks when optimized with Polaris.
34
  By leveraging open-source data and academic-level resources, Polaris pushes the capabilities of open-recipe reasoning models to unprecedented heights. In benchmark tests, our method even surpasses top commercial systems, including Claude-4-Opus, Grok-3-Beta, and o3-mini-high (2025/01/03).
35
 
36
+ <div align="center">
37
+ <img src="https://raw.githubusercontent.com/ChenxinAn-fdu/POLARIS/main/figs/aime25.png" alt="performance" style="width:60%;">
38
+ </div>
39
+
40
  ## Polaris's Recipe
41
  - **Data Difficulty:** Before training, Polaris analyzes and maps the distribution of data difficulty. The dataset should not be overwhelmed by either overly difficult or trivially easy problems. We recommend using a data distribution with a slight bias toward challenging problems, which typically exhibits a mirrored J-shaped distribution.
42
  - **Diversity-Based Rollout:** We leverage the *diversity among rollouts* to initialize the sampling temperature, which is then progressively increased throughout the RL training stages.