File size: 6,343 Bytes
5b965c6
 
 
 
 
 
 
 
b3d0c6e
5dacd74
456f29c
5dacd74
7fab937
2b48277
34bb3d5
2b48277
456f29c
8352556
456f29c
5dacd74
456f29c
5dacd74
 
b3d0c6e
 
 
5dacd74
b3d0c6e
5dacd74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
456f29c
5dacd74
456f29c
5dacd74
456f29c
5dacd74
 
 
 
 
 
 
 
 
456f29c
5dacd74
456f29c
66d7155
456f29c
5dacd74
 
 
 
 
 
 
 
 
456f29c
 
5dacd74
456f29c
5dacd74
456f29c
5dacd74
 
 
 
456f29c
5dacd74
456f29c
5dacd74
456f29c
5dacd74
456f29c
5dacd74
 
 
 
 
 
 
 
b3d0c6e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
---
datasets:
- PeterJinGo/nq_hotpotqa_train
language:
- en
base_model:
- Qwen/Qwen2.5-3B-Instruct
---

# Qwen2.5‑3B Search‑R1‑Multiturn (reproduce)

> **Author · Seungyoun Shin**
> 
> 🤗 Model Hub: [hf](https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn)
> 
> 📈 W\&B Report: [wandb](https://wandb.ai/yoon1001/search_r1_like_async_rl/reports/Qwen2-5-3b-it-search-r1-reproduce--VmlldzoxMzA2NzA2NA)

A faithful re‑implementation of the *Search‑R1* on **Qwen 2.5‑3B-instruct**, trained purely on `nq-hotpotqa-train` with GRPO via the open‑source [VERL](https://github.com/volcengine/verl) framework.

---

## 🚀 Quick start

```bash
pip install "transformers>=4.41" torch duckduckgo_search>=6.3.5 accelerate
```

### Full inference script

Below is the exact script used in our experiments—drop it next to the model weights and run.

```python
#!/usr/bin/env python3
"""
Minimal **multi‑turn tool‑calling** demo for the Qwen2.5‑3B Search‑R1 model

Highlights
-----------
* Presents the `search` function schema via `tools=[…]` so the model emits JSON calls.
* Detects `<tool_call>` → parses `{name:"search", arguments:{query_list:[…]}}` and runs DuckDuckGo.
* Streams results back in `<tool_response>` until an `<answer>` block appears.
"""
from __future__ import annotations
import json, re, sys
from typing import List
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from duckduckgo_search import DDGS

DEFAULT_SYSTEM_CONTENT = "You are a helpful and harmless assistant."
DEFAULT_USER_CONTENT_PREFIX = (
    "Answer the given question. You must conduct reasoning inside <think> and "
    "</think> first every time you get new information. After reasoning, if you "
    "find you lack some knowledge, you can call a search engine by <tool_call> "
    "query </tool_call> and it will return the top searched results between "
    "<tool_response> and </tool_response>. You can search as many times as your "
    "want. If you find no further external knowledge needed, you can directly "
    "provide the answer inside <answer> and </answer>, without detailed "
    "illustrations. For example, <answer> Beijing </answer>. Question: "
)
MODEL_NAME = "Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn"
MAX_TURNS, MAX_RESPONSE_TOKENS = 4, 512
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

SEARCH_SCHEMA = {
    "type": "function",
    "function": {
        "name": "search",
        "description": "DuckDuckGo web search",
        "parameters": {
            "type": "object",
            "properties": {
                "query_list": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Fully‑formed semantic queries."
                }
            },
            "required": ["query_list"],
        },
    },
}

def create_prompt(q: str) -> List[dict]:
    return [
        {"role": "system", "content": DEFAULT_SYSTEM_CONTENT},
        {"role": "user", "content": DEFAULT_USER_CONTENT_PREFIX + q},
    ]

def ddg_search(query: str, k: int = 5) -> str:
    with DDGS() as ddgs:
        hits = list(ddgs.text(query, safesearch="moderate", max_results=k))
    return "\n".join(f"{i+1}. {h['title']} – {h['body']} ({h['href']})" for i,h in enumerate(hits))

def extract_queries(raw: str) -> List[str]:
    try:
        payload = json.loads(raw)
        if payload.get("name") == "search":
            return payload.get("arguments", {}).get("query_list", [])
    except json.JSONDecodeError:
        pass
    return [raw]

def main() -> None:
    q = sys.argv[1] if len(sys.argv) > 1 else "How is the weather in Seoul?"
    tok = AutoTokenizer.from_pretrained(MODEL_NAME, padding_side="left")
    model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.bfloat16, device_map="auto")
    msgs = create_prompt(q)
    history = tok.apply_chat_template(msgs, tools=[SEARCH_SCHEMA], add_generation_prompt=True, tokenize=False)
    pattern = re.compile(r"<tool_call>\s*(.*?)\s*</tool_call>", re.S)
    for turn in range(MAX_TURNS):
        enc = tok(history, return_tensors="pt").to(DEVICE)
        out = model.generate(**enc, max_new_tokens=MAX_RESPONSE_TOKENS, temperature=0.7, do_sample=True)
        new = tok.decode(out[0][enc.input_ids.shape[1]:], skip_special_tokens=True)
        print(f"\n===== Assistant (turn {turn+1}) =====\n{new}\n")
        history += new
        m = pattern.search(new)
        if not m: break
        results = "\n---\n".join(ddg_search(q,5) for q in extract_queries(m.group(1)))
        history += f"<tool_response>\n{results}\n</tool_response>"

if __name__ == "__main__":
    main()
```

---

## 🧠 Reasoning style

```
<think> … chain‑of‑thought … </think>
<tool_call>{"name":"search", "arguments":{"query_list":["…"]}}</tool_call>
<tool_response>
1. web result

</tool_response>
<answer> final concise answer </answer>
```

---

## 📊 Evaluation (Pass@1)

| Dataset   | Original Search‑R1 (Qwen2.5‑3B) | **This work** |
| --------- | ------------------------------- | ------------- |
| NQ        | 0.397                           | **0.406**     |
| TriviaQA  | 0.565                           | **0.582**     |
| PopQA     | 0.391                           | **0.420**     |
| HotpotQA  | 0.331                           | **0.338**     |
| 2Wiki     | 0.310                           | **0.332**     |
| Musique   | **0.124**                       | 0.111         |
| Bamboogle | 0.232                           | **0.296**     |


---

## 🤝 Acknowledgements

* [Qwen LM](https://github.com/QwenLM) for the base model.
* [Search‑R1 authors](https://github.com/PeterGriffinJin/Search-R1) for the dataset & baseline.
* [Volcengine **VERL**](https://github.com/volcengine/verl) for the GRPO training toolkit.
* HuggingFace for the open ecosystem.

---

## 📄 License & citation

Code is released under **MIT**; model weights under the original **Qwen open‑source license**.

```bibtex
@misc{shin2025qwen25_searchr1_multiturn,
  author       = {Seungyoun Shin},
  title        = {Qwen2.5-3B Search-R1-Multiturn (reproduce)},
  year         = 2025,
  howpublished = {HuggingFace Model Hub},
  url          = {https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn}
}
```