---
datasets:
- PeterJinGo/nq_hotpotqa_train
language:
- en
base_model:
- Qwen/Qwen2.5-3B-Instruct
---

# Qwen2.5‑3B Search‑R1‑Multiturn (reproduce)

> **Author · Seungyoun Shin**
> 
> 🤗 Model Hub: [hf](https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn)
> 
> 📈 W\&B Report: [wandb](https://wandb.ai/yoon1001/search_r1_like_async_rl/reports/Qwen2-5-3b-it-search-r1-reproduce--VmlldzoxMzA2NzA2NA)

A faithful re‑implementation of the *Search‑R1* on **Qwen 2.5‑3B-instruct**, trained purely on `nq-hotpotqa-train` with GRPO via the open‑source [VERL](https://github.com/volcengine/verl) framework.

---

## 🚀 Quick start

```bash
pip install "transformers>=4.41" torch duckduckgo_search>=6.3.5 accelerate
```

### Full inference script

Below is the exact script used in our experiments—drop it next to the model weights and run.

```python
#!/usr/bin/env python3
"""
Minimal **multi‑turn tool‑calling** demo for the Qwen2.5‑3B Search‑R1 model

Highlights
-----------
* Presents the `search` function schema via `tools=[…]` so the model emits JSON calls.
* Detects `<tool_call>` → parses `{name:"search", arguments:{query_list:[…]}}` and runs DuckDuckGo.
* Streams results back in `<tool_response>` until an `<answer>` block appears.
"""
from __future__ import annotations
import json, re, sys
from typing import List
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from duckduckgo_search import DDGS

DEFAULT_SYSTEM_CONTENT = "You are a helpful and harmless assistant."
DEFAULT_USER_CONTENT_PREFIX = (
    "Answer the given question. You must conduct reasoning inside <think> and "
    "</think> first every time you get new information. After reasoning, if you "
    "find you lack some knowledge, you can call a search engine by <tool_call> "
    "query </tool_call> and it will return the top searched results between "
    "<tool_response> and </tool_response>. You can search as many times as your "
    "want. If you find no further external knowledge needed, you can directly "
    "provide the answer inside <answer> and </answer>, without detailed "
    "illustrations. For example, <answer> Beijing </answer>. Question: "
)
MODEL_NAME = "Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn"
MAX_TURNS, MAX_RESPONSE_TOKENS = 4, 512
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

SEARCH_SCHEMA = {
    "type": "function",
    "function": {
        "name": "search",
        "description": "DuckDuckGo web search",
        "parameters": {
            "type": "object",
            "properties": {
                "query_list": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Fully‑formed semantic queries."
                }
            },
            "required": ["query_list"],
        },
    },
}

def create_prompt(q: str) -> List[dict]:
    return [
        {"role": "system", "content": DEFAULT_SYSTEM_CONTENT},
        {"role": "user", "content": DEFAULT_USER_CONTENT_PREFIX + q},
    ]

def ddg_search(query: str, k: int = 5) -> str:
    with DDGS() as ddgs:
        hits = list(ddgs.text(query, safesearch="moderate", max_results=k))
    return "\n".join(f"{i+1}. {h['title']} – {h['body']} ({h['href']})" for i,h in enumerate(hits))

def extract_queries(raw: str) -> List[str]:
    try:
        payload = json.loads(raw)
        if payload.get("name") == "search":
            return payload.get("arguments", {}).get("query_list", [])
    except json.JSONDecodeError:
        pass
    return [raw]

def main() -> None:
    q = sys.argv[1] if len(sys.argv) > 1 else "How is the weather in Seoul?"
    tok = AutoTokenizer.from_pretrained(MODEL_NAME, padding_side="left")
    model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.bfloat16, device_map="auto")
    msgs = create_prompt(q)
    history = tok.apply_chat_template(msgs, tools=[SEARCH_SCHEMA], add_generation_prompt=True, tokenize=False)
    pattern = re.compile(r"<tool_call>\s*(.*?)\s*</tool_call>", re.S)
    for turn in range(MAX_TURNS):
        enc = tok(history, return_tensors="pt").to(DEVICE)
        out = model.generate(**enc, max_new_tokens=MAX_RESPONSE_TOKENS, temperature=0.7, do_sample=True)
        new = tok.decode(out[0][enc.input_ids.shape[1]:], skip_special_tokens=True)
        print(f"\n===== Assistant (turn {turn+1}) =====\n{new}\n")
        history += new
        m = pattern.search(new)
        if not m: break
        results = "\n---\n".join(ddg_search(q,5) for q in extract_queries(m.group(1)))
        history += f"<tool_response>\n{results}\n</tool_response>"

if __name__ == "__main__":
    main()
```

---

## 🧠 Reasoning style

```
<think> … chain‑of‑thought … </think>
<tool_call>{"name":"search", "arguments":{"query_list":["…"]}}</tool_call>
<tool_response>
1. web result
…
</tool_response>
<answer> final concise answer </answer>
```

---

## 📊 Evaluation (Pass@1)

| Dataset   | Original Search‑R1 (Qwen2.5‑3B) | **This work** |
| --------- | ------------------------------- | ------------- |
| NQ        | 0.397                           | **0.406**     |
| TriviaQA  | 0.565                           | **0.582**     |
| PopQA     | 0.391                           | **0.420**     |
| HotpotQA  | 0.331                           | **0.338**     |
| 2Wiki     | 0.310                           | **0.332**     |
| Musique   | **0.124**                       | 0.111         |
| Bamboogle | 0.232                           | **0.296**     |


---

## 🤝 Acknowledgements

* [Qwen LM](https://github.com/QwenLM) for the base model.
* [Search‑R1 authors](https://github.com/PeterGriffinJin/Search-R1) for the dataset & baseline.
* [Volcengine **VERL**](https://github.com/volcengine/verl) for the GRPO training toolkit.
* HuggingFace for the open ecosystem.

---

## 📄 License & citation

Code is released under **MIT**; model weights under the original **Qwen open‑source license**.

```bibtex
@misc{shin2025qwen25_searchr1_multiturn,
  author       = {Seungyoun Shin},
  title        = {Qwen2.5-3B Search-R1-Multiturn (reproduce)},
  year         = 2025,
  howpublished = {HuggingFace Model Hub},
  url          = {https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn}
}
```