File size: 6,343 Bytes
5b965c6 b3d0c6e 5dacd74 456f29c 5dacd74 7fab937 2b48277 34bb3d5 2b48277 456f29c 8352556 456f29c 5dacd74 456f29c 5dacd74 b3d0c6e 5dacd74 b3d0c6e 5dacd74 456f29c 5dacd74 456f29c 5dacd74 456f29c 5dacd74 456f29c 5dacd74 456f29c 66d7155 456f29c 5dacd74 456f29c 5dacd74 456f29c 5dacd74 456f29c 5dacd74 456f29c 5dacd74 456f29c 5dacd74 456f29c 5dacd74 456f29c 5dacd74 b3d0c6e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
---
datasets:
- PeterJinGo/nq_hotpotqa_train
language:
- en
base_model:
- Qwen/Qwen2.5-3B-Instruct
---
# Qwen2.5‑3B Search‑R1‑Multiturn (reproduce)
> **Author · Seungyoun Shin**
>
> 🤗 Model Hub: [hf](https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn)
>
> 📈 W\&B Report: [wandb](https://wandb.ai/yoon1001/search_r1_like_async_rl/reports/Qwen2-5-3b-it-search-r1-reproduce--VmlldzoxMzA2NzA2NA)
A faithful re‑implementation of the *Search‑R1* on **Qwen 2.5‑3B-instruct**, trained purely on `nq-hotpotqa-train` with GRPO via the open‑source [VERL](https://github.com/volcengine/verl) framework.
---
## 🚀 Quick start
```bash
pip install "transformers>=4.41" torch duckduckgo_search>=6.3.5 accelerate
```
### Full inference script
Below is the exact script used in our experiments—drop it next to the model weights and run.
```python
#!/usr/bin/env python3
"""
Minimal **multi‑turn tool‑calling** demo for the Qwen2.5‑3B Search‑R1 model
Highlights
-----------
* Presents the `search` function schema via `tools=[…]` so the model emits JSON calls.
* Detects `<tool_call>` → parses `{name:"search", arguments:{query_list:[…]}}` and runs DuckDuckGo.
* Streams results back in `<tool_response>` until an `<answer>` block appears.
"""
from __future__ import annotations
import json, re, sys
from typing import List
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from duckduckgo_search import DDGS
DEFAULT_SYSTEM_CONTENT = "You are a helpful and harmless assistant."
DEFAULT_USER_CONTENT_PREFIX = (
"Answer the given question. You must conduct reasoning inside <think> and "
"</think> first every time you get new information. After reasoning, if you "
"find you lack some knowledge, you can call a search engine by <tool_call> "
"query </tool_call> and it will return the top searched results between "
"<tool_response> and </tool_response>. You can search as many times as your "
"want. If you find no further external knowledge needed, you can directly "
"provide the answer inside <answer> and </answer>, without detailed "
"illustrations. For example, <answer> Beijing </answer>. Question: "
)
MODEL_NAME = "Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn"
MAX_TURNS, MAX_RESPONSE_TOKENS = 4, 512
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
SEARCH_SCHEMA = {
"type": "function",
"function": {
"name": "search",
"description": "DuckDuckGo web search",
"parameters": {
"type": "object",
"properties": {
"query_list": {
"type": "array",
"items": {"type": "string"},
"description": "Fully‑formed semantic queries."
}
},
"required": ["query_list"],
},
},
}
def create_prompt(q: str) -> List[dict]:
return [
{"role": "system", "content": DEFAULT_SYSTEM_CONTENT},
{"role": "user", "content": DEFAULT_USER_CONTENT_PREFIX + q},
]
def ddg_search(query: str, k: int = 5) -> str:
with DDGS() as ddgs:
hits = list(ddgs.text(query, safesearch="moderate", max_results=k))
return "\n".join(f"{i+1}. {h['title']} – {h['body']} ({h['href']})" for i,h in enumerate(hits))
def extract_queries(raw: str) -> List[str]:
try:
payload = json.loads(raw)
if payload.get("name") == "search":
return payload.get("arguments", {}).get("query_list", [])
except json.JSONDecodeError:
pass
return [raw]
def main() -> None:
q = sys.argv[1] if len(sys.argv) > 1 else "How is the weather in Seoul?"
tok = AutoTokenizer.from_pretrained(MODEL_NAME, padding_side="left")
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.bfloat16, device_map="auto")
msgs = create_prompt(q)
history = tok.apply_chat_template(msgs, tools=[SEARCH_SCHEMA], add_generation_prompt=True, tokenize=False)
pattern = re.compile(r"<tool_call>\s*(.*?)\s*</tool_call>", re.S)
for turn in range(MAX_TURNS):
enc = tok(history, return_tensors="pt").to(DEVICE)
out = model.generate(**enc, max_new_tokens=MAX_RESPONSE_TOKENS, temperature=0.7, do_sample=True)
new = tok.decode(out[0][enc.input_ids.shape[1]:], skip_special_tokens=True)
print(f"\n===== Assistant (turn {turn+1}) =====\n{new}\n")
history += new
m = pattern.search(new)
if not m: break
results = "\n---\n".join(ddg_search(q,5) for q in extract_queries(m.group(1)))
history += f"<tool_response>\n{results}\n</tool_response>"
if __name__ == "__main__":
main()
```
---
## 🧠 Reasoning style
```
<think> … chain‑of‑thought … </think>
<tool_call>{"name":"search", "arguments":{"query_list":["…"]}}</tool_call>
<tool_response>
1. web result
…
</tool_response>
<answer> final concise answer </answer>
```
---
## 📊 Evaluation (Pass@1)
| Dataset | Original Search‑R1 (Qwen2.5‑3B) | **This work** |
| --------- | ------------------------------- | ------------- |
| NQ | 0.397 | **0.406** |
| TriviaQA | 0.565 | **0.582** |
| PopQA | 0.391 | **0.420** |
| HotpotQA | 0.331 | **0.338** |
| 2Wiki | 0.310 | **0.332** |
| Musique | **0.124** | 0.111 |
| Bamboogle | 0.232 | **0.296** |
---
## 🤝 Acknowledgements
* [Qwen LM](https://github.com/QwenLM) for the base model.
* [Search‑R1 authors](https://github.com/PeterGriffinJin/Search-R1) for the dataset & baseline.
* [Volcengine **VERL**](https://github.com/volcengine/verl) for the GRPO training toolkit.
* HuggingFace for the open ecosystem.
---
## 📄 License & citation
Code is released under **MIT**; model weights under the original **Qwen open‑source license**.
```bibtex
@misc{shin2025qwen25_searchr1_multiturn,
author = {Seungyoun Shin},
title = {Qwen2.5-3B Search-R1-Multiturn (reproduce)},
year = 2025,
howpublished = {HuggingFace Model Hub},
url = {https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn}
}
```
|