wangclnlp
/

GRAM-RR-LLaMA-3.1-8B-RewardModel

Text Generation

RewardReasoning

Model card Files Files and versions

wangclnlp commited on 4 days ago

Commit

47fb536

·

verified ·

1 Parent(s): 5b5261b

Update README.md

Files changed (1) hide show

README.md +10 -2

README.md CHANGED Viewed

@@ -220,6 +220,14 @@ print(f"The better response is response{max(set(res), key=res.count)} in {k} vot
 Tips: To accelerate inference, GRAM-R^2 can be run with [vLLM](https://github.com/vllm-project/vllm) using multiple processes and threads. We also provide this script as a reference implementation at [this](https://github.com/wangclnlp/GRAM/tree/main/extensions/GRAM-RR).
 ### Citation
-```bash
-coming soon
 ```

 Tips: To accelerate inference, GRAM-R^2 can be run with [vLLM](https://github.com/vllm-project/vllm) using multiple processes and threads. We also provide this script as a reference implementation at [this](https://github.com/wangclnlp/GRAM/tree/main/extensions/GRAM-RR).
 ### Citation
+```
+@misc{wang2025gramr2,
+      title={GRAM-R$^2$: Self-Training Generative Foundation Reward Models for Reward Reasoning},
+      author={Chenglong Wang and Yongyu Mu and Hang Zhou and Yifu Huo and Ziming Zhu and Jiali Zeng and Murun Yang and Bei Li and Tong Xiao and Xiaoyang Hao and Chunliang Zhang and Fandong Meng and Jingbo Zhu},
+      year={2025},
+      eprint={2509.02492},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2509.02492},
+}
 ```