zjunlp
/

KnowRL-Skywork-OR1-7B-Preview

Model card Files Files and versions

BaochangRen commited on Jul 1

Commit

c36ad99

·

verified ·

1 Parent(s): 9495b39

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -74,7 +74,8 @@ zjunlp/KnowRL-Train-Data` dataset.
 * **Stage 1: Cold-Start SFT**: The base model undergoes supervised fine-tuning on the `knowrl_coldstart.json` dataset. This stage helps the model adopt a fact-based, slow-thinking response structure.
 * **Stage 2: Knowledgeable RL**: The SFT-tuned model is further trained using reinforcement learning (GRPO). The reward function combines a correctness reward with a factuality reward, which is calculated by verifying the model's thinking process against an external knowledge base. This stage uses the `knowrl_RLdata.json` and `KnowRL_RLtrain_data_withknowledge.json` files.
-For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL).
 ---
@@ -82,7 +83,7 @@ For complete details on the training configuration and hyperparameters, please r
 If you find this model useful in your research, please consider citing our paper:
 ```bibtex
 @article{ren2025knowrl,
-  title={{KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality}},
   author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
   journal={arXiv preprint arXiv:2506.19807},
   year={2025}

 * **Stage 1: Cold-Start SFT**: The base model undergoes supervised fine-tuning on the `knowrl_coldstart.json` dataset. This stage helps the model adopt a fact-based, slow-thinking response structure.
 * **Stage 2: Knowledgeable RL**: The SFT-tuned model is further trained using reinforcement learning (GRPO). The reward function combines a correctness reward with a factuality reward, which is calculated by verifying the model's thinking process against an external knowledge base. This stage uses the `knowrl_RLdata.json` and `KnowRL_RLtrain_data_withknowledge.json` files.
+For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL
+).
 ---
 If you find this model useful in your research, please consider citing our paper:
 ```bibtex
 @article{ren2025knowrl,
+  title={KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality},
   author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
   journal={arXiv preprint arXiv:2506.19807},
   year={2025}