Update README.md
Browse files
README.md
CHANGED
@@ -220,6 +220,14 @@ print(f"The better response is response{max(set(res), key=res.count)} in {k} vot
|
|
220 |
Tips: To accelerate inference, GRAM-R^2 can be run with [vLLM](https://github.com/vllm-project/vllm) using multiple processes and threads. We also provide this script as a reference implementation at [this](https://github.com/wangclnlp/GRAM/tree/main/extensions/GRAM-RR).
|
221 |
|
222 |
### Citation
|
223 |
-
```
|
224 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
225 |
```
|
|
|
220 |
Tips: To accelerate inference, GRAM-R^2 can be run with [vLLM](https://github.com/vllm-project/vllm) using multiple processes and threads. We also provide this script as a reference implementation at [this](https://github.com/wangclnlp/GRAM/tree/main/extensions/GRAM-RR).
|
221 |
|
222 |
### Citation
|
223 |
+
```
|
224 |
+
@misc{wang2025gramr2,
|
225 |
+
title={GRAM-R$^2$: Self-Training Generative Foundation Reward Models for Reward Reasoning},
|
226 |
+
author={Chenglong Wang and Yongyu Mu and Hang Zhou and Yifu Huo and Ziming Zhu and Jiali Zeng and Murun Yang and Bei Li and Tong Xiao and Xiaoyang Hao and Chunliang Zhang and Fandong Meng and Jingbo Zhu},
|
227 |
+
year={2025},
|
228 |
+
eprint={2509.02492},
|
229 |
+
archivePrefix={arXiv},
|
230 |
+
primaryClass={cs.CL},
|
231 |
+
url={https://arxiv.org/abs/2509.02492},
|
232 |
+
}
|
233 |
```
|