Seungyoun
/

qwen2.5-3b-it_searchR1-like-multiturn

Model card Files Files and versions Community

Seungyoun commited on Jun 3

Commit

b3d0c6e

·

verified ·

1 Parent(s): 5b965c6

Update README.md

Files changed (1) hide show

README.md +14 -4

README.md CHANGED Viewed

@@ -6,19 +6,28 @@ language:
 base_model:
 - Qwen/Qwen2.5-3B-Instruct
 ---
 # Qwen2.5‑3B Search‑R1‑Multiturn (reproduce)
 > **Author · Seungyoun Shin**
 > 🤗 Model Hub: [https://huggingface.co/Seungyoun/qwen2.5-3b-it\_searchR1-like-multiturn](https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn)
-A faithful re‑implementation of the *Search‑R1* on **Qwen 2.5‑3B**, trained purely on the Wikipedia‑based Search‑R1 corpus (HotpotQA train split) with GRPO via the open‑source [VERL](https://github.com/volcengine/verl) framework.
 ---
 ## 🚀 Quick start
-### Full inference script (Using duck-duck-go)
 Below is the exact script used in our experiments—drop it next to the model weights and run.
@@ -132,7 +141,7 @@ if __name__ == "__main__":
 ---
-## 📊 Evaluation (Pass@1)
 | Dataset   | Original Search‑R1 (Qwen2.5‑3B) | **This work** |
 | --------- | ------------------------------- | ------------- |
@@ -144,6 +153,7 @@ if __name__ == "__main__":
 | Musique   | **0.124**                       | 0.111         |
 | Bamboogle | 0.232                           | **0.296**     |
 ---
@@ -168,4 +178,4 @@ Code is released under **MIT**; model weights under the original **Qwen open‑s
   howpublished = {HuggingFace Model Hub},
   url          = {https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn}
 }
-```

 base_model:
 - Qwen/Qwen2.5-3B-Instruct
 ---
 # Qwen2.5‑3B Search‑R1‑Multiturn (reproduce)
 > **Author · Seungyoun Shin**
 > 🤗 Model Hub: [https://huggingface.co/Seungyoun/qwen2.5-3b-it\_searchR1-like-multiturn](https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn)
+> 📈 W\&B Report: [https://wandb.ai/yoon1001/search\_r1\_like\_async\_rl/reports/Qwen2-5-3b-it-search-r1-reproduce--VmlldzoxMzA2NzA2NA](https://wandb.ai/yoon1001/search_r1_like_async_rl/reports/Qwen2-5-3b-it-search-r1-reproduce--VmlldzoxMzA2NzA2NA)
+A faithful re‑implementation of the *Search‑R1* retrieval‑augmented QA agent on **Qwen 2.5‑3B**, trained purely on the Wikipedia‑based Search‑R1 corpus with GRPO via the open‑source [VERL](https://github.com/volcengine/verl) framework.
+During inference we replace the original SGLang runtime with a compact DuckDuckGo‑powered tool loop implemented in a single script.
 ---
 ## 🚀 Quick start
+```bash
+pip install "transformers>=4.41" torch duckduckgo_search>=6.3.5 accelerate
+# one‑command demo
+python search_r1_infer.py "현재 대한민국 대통령은 누구야?"
+```
+### Full inference script
 Below is the exact script used in our experiments—drop it next to the model weights and run.
 ---
+## 📊 Evaluation (Exact Match)
 | Dataset   | Original Search‑R1 (Qwen2.5‑3B) | **This work** |
 | --------- | ------------------------------- | ------------- |
 | Musique   | **0.124**                       | 0.111         |
 | Bamboogle | 0.232                           | **0.296**     |
+*Metrics computed with the official Search‑R1 evaluation scripts.*
 ---
   howpublished = {HuggingFace Model Hub},
   url          = {https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn}
 }
+```