Update README.md
Browse files
README.md
CHANGED
@@ -15,8 +15,7 @@ base_model:
|
|
15 |
>
|
16 |
> 📈 W\&B Report: [wandb](https://wandb.ai/yoon1001/search_r1_like_async_rl/reports/Qwen2-5-3b-it-search-r1-reproduce--VmlldzoxMzA2NzA2NA)
|
17 |
|
18 |
-
A faithful re‑implementation of the *Search‑R1*
|
19 |
-
During inference we replace the original SGLang runtime with a compact DuckDuckGo‑powered tool loop implemented in a single script.
|
20 |
|
21 |
---
|
22 |
|
@@ -26,7 +25,7 @@ During inference we replace the original SGLang runtime with a compact DuckDuckG
|
|
26 |
pip install "transformers>=4.41" torch duckduckgo_search>=6.3.5 accelerate
|
27 |
|
28 |
# one‑command demo
|
29 |
-
python search_r1_infer.py "
|
30 |
```
|
31 |
|
32 |
### Full inference script
|
|
|
15 |
>
|
16 |
> 📈 W\&B Report: [wandb](https://wandb.ai/yoon1001/search_r1_like_async_rl/reports/Qwen2-5-3b-it-search-r1-reproduce--VmlldzoxMzA2NzA2NA)
|
17 |
|
18 |
+
A faithful re‑implementation of the *Search‑R1* on **Qwen 2.5‑3B-instruct**, trained purely on `nq-hotpotqa-train` with GRPO via the open‑source [VERL](https://github.com/volcengine/verl) framework.
|
|
|
19 |
|
20 |
---
|
21 |
|
|
|
25 |
pip install "transformers>=4.41" torch duckduckgo_search>=6.3.5 accelerate
|
26 |
|
27 |
# one‑command demo
|
28 |
+
python search_r1_infer.py "how's the weather in seoul?"
|
29 |
```
|
30 |
|
31 |
### Full inference script
|