File size: 6,343 Bytes

5b965c6
 
 
 
 
 
 
 
b3d0c6e
5dacd74
456f29c
5dacd74
7fab937
2b48277
34bb3d5
2b48277
456f29c
8352556
456f29c
5dacd74
456f29c
5dacd74
 
b3d0c6e
 
 
5dacd74
b3d0c6e
5dacd74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
456f29c
5dacd74
456f29c
5dacd74
456f29c
5dacd74
 
 
 
 
 
 
 
 
456f29c
5dacd74
456f29c
66d7155
456f29c
5dacd74
 
 
 
 
 
 
 
 
456f29c
 
5dacd74
456f29c
5dacd74
456f29c
5dacd74
 
 
 
456f29c
5dacd74
456f29c
5dacd74
456f29c
5dacd74
456f29c
5dacd74
 
 
 
 
 
 
 
b3d0c6e

---
datasets:
- PeterJinGo/nq_hotpotqa_train
language:
- en
base_model:
- Qwen/Qwen2.5-3B-Instruct
---

# Qwen2.5‑3B Search‑R1‑Multiturn (reproduce)

> **Author · Seungyoun Shin**
> 
> 🤗 Model Hub: [hf](https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn)
> 
> 📈 W\&B Report: [wandb](https://wandb.ai/yoon1001/search_r1_like_async_rl/reports/Qwen2-5-3b-it-search-r1-reproduce--VmlldzoxMzA2NzA2NA)

A faithful re‑implementation of the *Search‑R1* on **Qwen 2.5‑3B-instruct**, trained purely on `nq-hotpotqa-train` with GRPO via the open‑source [VERL](https://github.com/volcengine/verl) framework.

---

## 🚀 Quick start

```bash
pip install "transformers>=4.41" torch duckduckgo_search>=6.3.5 accelerate
```

### Full inference script

Below is the exact script used in our experiments—drop it next to the model weights and run.

```python
#!/usr/bin/env python3
"""
Minimal **multi‑turn tool‑calling** demo for the Qwen2.5‑3B Search‑R1 model

Highlights
-----------
* Presents the `search` function schema via `tools=[…]` so the model emits JSON calls.
* Detects `<tool_call>` → parses `{name:"search", arguments:{query_list:[…]}}` and runs DuckDuckGo.
* Streams results back in `<tool_response>` until an `<answer>` block appears.
"""
from __future__ import annotations
import json, re, sys
from typing import List
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from duckduckgo_search import DDGS

DEFAULT_SYSTEM_CONTENT = "You are a helpful and harmless assistant."
DEFAULT_USER_CONTENT_PREFIX = (
    "Answer the given question. You must conduct reasoning inside <think> and "
    "</think> first every time you get new information. After reasoning, if you "
    "find you lack some knowledge, you can call a search engine by <tool_call> "
    "query </tool_call> and it will return the top searched results between "
    "<tool_response> and </tool_response>. You can search as many times as your "
    "want. If you find no further external knowledge needed, you can directly "
    "provide the answer inside <answer> and </answer>, without detailed "
    "illustrations. For example, <answer> Beijing </answer>. Question: "
)
MODEL_NAME = "Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn"
MAX_TURNS, MAX_RESPONSE_TOKENS = 4, 512
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

SEARCH_SCHEMA = {
    "type": "function",
    "function": {
        "name": "search",
        "description": "DuckDuckGo web search",
        "parameters": {
            "type": "object",
            "properties": {
                "query_list": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Fully‑formed semantic queries."
                }
            },
            "required": ["query_list"],
        },
    },
}

def create_prompt(q: str) -> List[dict]:
    return [
        {"role": "system", "content": DEFAULT_SYSTEM_CONTENT},
        {"role": "user", "content": DEFAULT_USER_CONTENT_PREFIX + q},
    ]

def ddg_search(query: str, k: int = 5) -> str:
    with DDGS() as ddgs:
        hits = list(ddgs.text(query, safesearch="moderate", max_results=k))
    return "\n".join(f"{i+1}. {h['title']} – {h['body']} ({h['href']})" for i,h in enumerate(hits))

def extract_queries(raw: str) -> List[str]:
    try:
        payload = json.loads(raw)
        if payload.get("name") == "search":
            return payload.get("arguments", {}).get("query_list", [])
    except json.JSONDecodeError:
        pass
    return [raw]

def main() -> None:
    q = sys.argv[1] if len(sys.argv) > 1 else "How is the weather in Seoul?"
    tok = AutoTokenizer.from_pretrained(MODEL_NAME, padding_side="left")
    model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.bfloat16, device_map="auto")
    msgs = create_prompt(q)
    history = tok.apply_chat_template(msgs, tools=[SEARCH_SCHEMA], add_generation_prompt=True, tokenize=False)
    pattern = re.compile(r"<tool_call>\s*(.*?)\s*</tool_call>", re.S)
    for turn in range(MAX_TURNS):
        enc = tok(history, return_tensors="pt").to(DEVICE)
        out = model.generate(**enc, max_new_tokens=MAX_RESPONSE_TOKENS, temperature=0.7, do_sample=True)
        new = tok.decode(out[0][enc.input_ids.shape[1]:], skip_special_tokens=True)
        print(f"\n===== Assistant (turn {turn+1}) =====\n{new}\n")
        history += new
        m = pattern.search(new)
        if not m: break
        results = "\n---\n".join(ddg_search(q,5) for q in extract_queries(m.group(1)))
        history += f"<tool_response>\n{results}\n</tool_response>"

if __name__ == "__main__":
    main()
```

---

## 🧠 Reasoning style

```
<think> … chain‑of‑thought … </think>
<tool_call>{"name":"search", "arguments":{"query_list":["…"]}}</tool_call>
<tool_response>
1. web result
…
</tool_response>
<answer> final concise answer </answer>
```

---

## 📊 Evaluation (Pass@1)

| Dataset   | Original Search‑R1 (Qwen2.5‑3B) | **This work** |
| --------- | ------------------------------- | ------------- |
| NQ        | 0.397                           | **0.406**     |
| TriviaQA  | 0.565                           | **0.582**     |
| PopQA     | 0.391                           | **0.420**     |
| HotpotQA  | 0.331                           | **0.338**     |
| 2Wiki     | 0.310                           | **0.332**     |
| Musique   | **0.124**                       | 0.111         |
| Bamboogle | 0.232                           | **0.296**     |


---

## 🤝 Acknowledgements

* [Qwen LM](https://github.com/QwenLM) for the base model.
* [Search‑R1 authors](https://github.com/PeterGriffinJin/Search-R1) for the dataset & baseline.
* [Volcengine **VERL**](https://github.com/volcengine/verl) for the GRPO training toolkit.
* HuggingFace for the open ecosystem.

---

## 📄 License & citation

Code is released under **MIT**; model weights under the original **Qwen open‑source license**.

```bibtex
@misc{shin2025qwen25_searchr1_multiturn,
  author       = {Seungyoun Shin},
  title        = {Qwen2.5-3B Search-R1-Multiturn (reproduce)},
  year         = 2025,
  howpublished = {HuggingFace Model Hub},
  url          = {https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn}
}
```