Update README.md
Browse files
README.md
CHANGED
@@ -74,7 +74,8 @@ zjunlp/KnowRL-Train-Data` dataset.
|
|
74 |
* **Stage 1: Cold-Start SFT**: The base model undergoes supervised fine-tuning on the `knowrl_coldstart.json` dataset. This stage helps the model adopt a fact-based, slow-thinking response structure.
|
75 |
* **Stage 2: Knowledgeable RL**: The SFT-tuned model is further trained using reinforcement learning (GRPO). The reward function combines a correctness reward with a factuality reward, which is calculated by verifying the model's thinking process against an external knowledge base. This stage uses the `knowrl_RLdata.json` and `KnowRL_RLtrain_data_withknowledge.json` files.
|
76 |
|
77 |
-
For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL
|
|
|
78 |
|
79 |
---
|
80 |
|
@@ -82,7 +83,7 @@ For complete details on the training configuration and hyperparameters, please r
|
|
82 |
If you find this model useful in your research, please consider citing our paper:
|
83 |
```bibtex
|
84 |
@article{ren2025knowrl,
|
85 |
-
title={
|
86 |
author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
|
87 |
journal={arXiv preprint arXiv:2506.19807},
|
88 |
year={2025}
|
|
|
74 |
* **Stage 1: Cold-Start SFT**: The base model undergoes supervised fine-tuning on the `knowrl_coldstart.json` dataset. This stage helps the model adopt a fact-based, slow-thinking response structure.
|
75 |
* **Stage 2: Knowledgeable RL**: The SFT-tuned model is further trained using reinforcement learning (GRPO). The reward function combines a correctness reward with a factuality reward, which is calculated by verifying the model's thinking process against an external knowledge base. This stage uses the `knowrl_RLdata.json` and `KnowRL_RLtrain_data_withknowledge.json` files.
|
76 |
|
77 |
+
For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL
|
78 |
+
).
|
79 |
|
80 |
---
|
81 |
|
|
|
83 |
If you find this model useful in your research, please consider citing our paper:
|
84 |
```bibtex
|
85 |
@article{ren2025knowrl,
|
86 |
+
title={KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality},
|
87 |
author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
|
88 |
journal={arXiv preprint arXiv:2506.19807},
|
89 |
year={2025}
|