Seungyoun
/

qwen2.5-3b-it_searchR1-like-multiturn

Model card Files Files and versions Community

Seungyoun commited on Jun 3

Commit

8352556

·

verified ·

1 Parent(s): 7fab937

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -15,8 +15,7 @@ base_model:
 >
 > 📈 W\&B Report: [wandb](https://wandb.ai/yoon1001/search_r1_like_async_rl/reports/Qwen2-5-3b-it-search-r1-reproduce--VmlldzoxMzA2NzA2NA)
-A faithful re‑implementation of the *Search‑R1* retrieval‑augmented QA agent on **Qwen 2.5‑3B**, trained purely on the Wikipedia‑based Search‑R1 corpus with GRPO via the open‑source [VERL](https://github.com/volcengine/verl) framework.
-During inference we replace the original SGLang runtime with a compact DuckDuckGo‑powered tool loop implemented in a single script.
 ---
@@ -26,7 +25,7 @@ During inference we replace the original SGLang runtime with a compact DuckDuckG
 pip install "transformers>=4.41" torch duckduckgo_search>=6.3.5 accelerate
 # one‑command demo
-python search_r1_infer.py "현재 대한민국 대통령은 누구야?"
 ```
 ### Full inference script

 >
 > 📈 W\&B Report: [wandb](https://wandb.ai/yoon1001/search_r1_like_async_rl/reports/Qwen2-5-3b-it-search-r1-reproduce--VmlldzoxMzA2NzA2NA)
+A faithful re‑implementation of the *Search‑R1* on **Qwen 2.5‑3B-instruct**, trained purely on `nq-hotpotqa-train` with GRPO via the open‑source [VERL](https://github.com/volcengine/verl) framework.
 ---
 pip install "transformers>=4.41" torch duckduckgo_search>=6.3.5 accelerate
 # one‑command demo
+python search_r1_infer.py "how's the weather in seoul?"
 ```
 ### Full inference script