Seungyoun commited on
Commit
b3d0c6e
·
verified ·
1 Parent(s): 5b965c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -4
README.md CHANGED
@@ -6,19 +6,28 @@ language:
6
  base_model:
7
  - Qwen/Qwen2.5-3B-Instruct
8
  ---
 
9
  # Qwen2.5‑3B Search‑R1‑Multiturn (reproduce)
10
 
11
  > **Author · Seungyoun Shin**
12
  > 🤗 Model Hub: [https://huggingface.co/Seungyoun/qwen2.5-3b-it\_searchR1-like-multiturn](https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn)
 
13
 
14
- A faithful re‑implementation of the *Search‑R1* on **Qwen 2.5‑3B**, trained purely on the Wikipedia‑based Search‑R1 corpus (HotpotQA train split) with GRPO via the open‑source [VERL](https://github.com/volcengine/verl) framework.
 
15
 
16
  ---
17
 
18
  ## 🚀 Quick start
19
 
 
 
 
 
 
 
20
 
21
- ### Full inference script (Using duck-duck-go)
22
 
23
  Below is the exact script used in our experiments—drop it next to the model weights and run.
24
 
@@ -132,7 +141,7 @@ if __name__ == "__main__":
132
 
133
  ---
134
 
135
- ## 📊 Evaluation (Pass@1)
136
 
137
  | Dataset | Original Search‑R1 (Qwen2.5‑3B) | **This work** |
138
  | --------- | ------------------------------- | ------------- |
@@ -144,6 +153,7 @@ if __name__ == "__main__":
144
  | Musique | **0.124** | 0.111 |
145
  | Bamboogle | 0.232 | **0.296** |
146
 
 
147
 
148
  ---
149
 
@@ -168,4 +178,4 @@ Code is released under **MIT**; model weights under the original **Qwen open‑s
168
  howpublished = {HuggingFace Model Hub},
169
  url = {https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn}
170
  }
171
- ```
 
6
  base_model:
7
  - Qwen/Qwen2.5-3B-Instruct
8
  ---
9
+
10
  # Qwen2.5‑3B Search‑R1‑Multiturn (reproduce)
11
 
12
  > **Author · Seungyoun Shin**
13
  > 🤗 Model Hub: [https://huggingface.co/Seungyoun/qwen2.5-3b-it\_searchR1-like-multiturn](https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn)
14
+ > 📈 W\&B Report: [https://wandb.ai/yoon1001/search\_r1\_like\_async\_rl/reports/Qwen2-5-3b-it-search-r1-reproduce--VmlldzoxMzA2NzA2NA](https://wandb.ai/yoon1001/search_r1_like_async_rl/reports/Qwen2-5-3b-it-search-r1-reproduce--VmlldzoxMzA2NzA2NA)
15
 
16
+ A faithful re‑implementation of the *Search‑R1* retrieval‑augmented QA agent on **Qwen 2.5‑3B**, trained purely on the Wikipedia‑based Search‑R1 corpus with GRPO via the open‑source [VERL](https://github.com/volcengine/verl) framework.
17
+ During inference we replace the original SGLang runtime with a compact DuckDuckGo‑powered tool loop implemented in a single script.
18
 
19
  ---
20
 
21
  ## 🚀 Quick start
22
 
23
+ ```bash
24
+ pip install "transformers>=4.41" torch duckduckgo_search>=6.3.5 accelerate
25
+
26
+ # one‑command demo
27
+ python search_r1_infer.py "현재 대한민국 대통령은 누구야?"
28
+ ```
29
 
30
+ ### Full inference script
31
 
32
  Below is the exact script used in our experiments—drop it next to the model weights and run.
33
 
 
141
 
142
  ---
143
 
144
+ ## 📊 Evaluation (Exact Match)
145
 
146
  | Dataset | Original Search‑R1 (Qwen2.5‑3B) | **This work** |
147
  | --------- | ------------------------------- | ------------- |
 
153
  | Musique | **0.124** | 0.111 |
154
  | Bamboogle | 0.232 | **0.296** |
155
 
156
+ *Metrics computed with the official Search‑R1 evaluation scripts.*
157
 
158
  ---
159
 
 
178
  howpublished = {HuggingFace Model Hub},
179
  url = {https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn}
180
  }
181
+ ```