Update README.md
Browse files
README.md
CHANGED
@@ -6,19 +6,28 @@ language:
|
|
6 |
base_model:
|
7 |
- Qwen/Qwen2.5-3B-Instruct
|
8 |
---
|
|
|
9 |
# Qwen2.5‑3B Search‑R1‑Multiturn (reproduce)
|
10 |
|
11 |
> **Author · Seungyoun Shin**
|
12 |
> 🤗 Model Hub: [https://huggingface.co/Seungyoun/qwen2.5-3b-it\_searchR1-like-multiturn](https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn)
|
|
|
13 |
|
14 |
-
A faithful re‑implementation of the *Search‑R1* on **Qwen 2.5‑3B**, trained purely on the Wikipedia‑based Search‑R1 corpus
|
|
|
15 |
|
16 |
---
|
17 |
|
18 |
## 🚀 Quick start
|
19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
-
### Full inference script
|
22 |
|
23 |
Below is the exact script used in our experiments—drop it next to the model weights and run.
|
24 |
|
@@ -132,7 +141,7 @@ if __name__ == "__main__":
|
|
132 |
|
133 |
---
|
134 |
|
135 |
-
## 📊 Evaluation (
|
136 |
|
137 |
| Dataset | Original Search‑R1 (Qwen2.5‑3B) | **This work** |
|
138 |
| --------- | ------------------------------- | ------------- |
|
@@ -144,6 +153,7 @@ if __name__ == "__main__":
|
|
144 |
| Musique | **0.124** | 0.111 |
|
145 |
| Bamboogle | 0.232 | **0.296** |
|
146 |
|
|
|
147 |
|
148 |
---
|
149 |
|
@@ -168,4 +178,4 @@ Code is released under **MIT**; model weights under the original **Qwen open‑s
|
|
168 |
howpublished = {HuggingFace Model Hub},
|
169 |
url = {https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn}
|
170 |
}
|
171 |
-
```
|
|
|
6 |
base_model:
|
7 |
- Qwen/Qwen2.5-3B-Instruct
|
8 |
---
|
9 |
+
|
10 |
# Qwen2.5‑3B Search‑R1‑Multiturn (reproduce)
|
11 |
|
12 |
> **Author · Seungyoun Shin**
|
13 |
> 🤗 Model Hub: [https://huggingface.co/Seungyoun/qwen2.5-3b-it\_searchR1-like-multiturn](https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn)
|
14 |
+
> 📈 W\&B Report: [https://wandb.ai/yoon1001/search\_r1\_like\_async\_rl/reports/Qwen2-5-3b-it-search-r1-reproduce--VmlldzoxMzA2NzA2NA](https://wandb.ai/yoon1001/search_r1_like_async_rl/reports/Qwen2-5-3b-it-search-r1-reproduce--VmlldzoxMzA2NzA2NA)
|
15 |
|
16 |
+
A faithful re‑implementation of the *Search‑R1* retrieval‑augmented QA agent on **Qwen 2.5‑3B**, trained purely on the Wikipedia‑based Search‑R1 corpus with GRPO via the open‑source [VERL](https://github.com/volcengine/verl) framework.
|
17 |
+
During inference we replace the original SGLang runtime with a compact DuckDuckGo‑powered tool loop implemented in a single script.
|
18 |
|
19 |
---
|
20 |
|
21 |
## 🚀 Quick start
|
22 |
|
23 |
+
```bash
|
24 |
+
pip install "transformers>=4.41" torch duckduckgo_search>=6.3.5 accelerate
|
25 |
+
|
26 |
+
# one‑command demo
|
27 |
+
python search_r1_infer.py "현재 대한민국 대통령은 누구야?"
|
28 |
+
```
|
29 |
|
30 |
+
### Full inference script
|
31 |
|
32 |
Below is the exact script used in our experiments—drop it next to the model weights and run.
|
33 |
|
|
|
141 |
|
142 |
---
|
143 |
|
144 |
+
## 📊 Evaluation (Exact Match)
|
145 |
|
146 |
| Dataset | Original Search‑R1 (Qwen2.5‑3B) | **This work** |
|
147 |
| --------- | ------------------------------- | ------------- |
|
|
|
153 |
| Musique | **0.124** | 0.111 |
|
154 |
| Bamboogle | 0.232 | **0.296** |
|
155 |
|
156 |
+
*Metrics computed with the official Search‑R1 evaluation scripts.*
|
157 |
|
158 |
---
|
159 |
|
|
|
178 |
howpublished = {HuggingFace Model Hub},
|
179 |
url = {https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn}
|
180 |
}
|
181 |
+
```
|