BaochangRen commited on
Commit
c36ad99
·
verified ·
1 Parent(s): 9495b39

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -74,7 +74,8 @@ zjunlp/KnowRL-Train-Data` dataset.
74
  * **Stage 1: Cold-Start SFT**: The base model undergoes supervised fine-tuning on the `knowrl_coldstart.json` dataset. This stage helps the model adopt a fact-based, slow-thinking response structure.
75
  * **Stage 2: Knowledgeable RL**: The SFT-tuned model is further trained using reinforcement learning (GRPO). The reward function combines a correctness reward with a factuality reward, which is calculated by verifying the model's thinking process against an external knowledge base. This stage uses the `knowrl_RLdata.json` and `KnowRL_RLtrain_data_withknowledge.json` files.
76
 
77
- For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL).
 
78
 
79
  ---
80
 
@@ -82,7 +83,7 @@ For complete details on the training configuration and hyperparameters, please r
82
  If you find this model useful in your research, please consider citing our paper:
83
  ```bibtex
84
  @article{ren2025knowrl,
85
- title={{KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality}},
86
  author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
87
  journal={arXiv preprint arXiv:2506.19807},
88
  year={2025}
 
74
  * **Stage 1: Cold-Start SFT**: The base model undergoes supervised fine-tuning on the `knowrl_coldstart.json` dataset. This stage helps the model adopt a fact-based, slow-thinking response structure.
75
  * **Stage 2: Knowledgeable RL**: The SFT-tuned model is further trained using reinforcement learning (GRPO). The reward function combines a correctness reward with a factuality reward, which is calculated by verifying the model's thinking process against an external knowledge base. This stage uses the `knowrl_RLdata.json` and `KnowRL_RLtrain_data_withknowledge.json` files.
76
 
77
+ For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL
78
+ ).
79
 
80
  ---
81
 
 
83
  If you find this model useful in your research, please consider citing our paper:
84
  ```bibtex
85
  @article{ren2025knowrl,
86
+ title={KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality},
87
  author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
88
  journal={arXiv preprint arXiv:2506.19807},
89
  year={2025}