Qwen2.5‑3B Search‑R1‑Multiturn (reproduce)

Author · Seungyoun Shin

🤗 Model Hub: hf

📈 W&B Report: wandb

A faithful re‑implementation of the Search‑R1 on Qwen 2.5‑3B-instruct, trained purely on nq-hotpotqa-train with GRPO via the open‑source VERL framework.

🚀 Quick start

pip install "transformers>=4.41" torch duckduckgo_search>=6.3.5 accelerate

Full inference script

Below is the exact script used in our experiments—drop it next to the model weights and run.

#!/usr/bin/env python3
"""
Minimal **multi‑turn tool‑calling** demo for the Qwen2.5‑3B Search‑R1 model

Highlights
-----------
* Presents the `search` function schema via `tools=[…]` so the model emits JSON calls.
* Detects `<tool_call>` → parses `{name:"search", arguments:{query_list:[…]}}` and runs DuckDuckGo.
* Streams results back in `<tool_response>` until an `<answer>` block appears.
"""
from __future__ import annotations
import json, re, sys
from typing import List
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from duckduckgo_search import DDGS

DEFAULT_SYSTEM_CONTENT = "You are a helpful and harmless assistant."
DEFAULT_USER_CONTENT_PREFIX = (
    "Answer the given question. You must conduct reasoning inside <think> and "
    "</think> first every time you get new information. After reasoning, if you "
    "find you lack some knowledge, you can call a search engine by <tool_call> "
    "query </tool_call> and it will return the top searched results between "
    "<tool_response> and </tool_response>. You can search as many times as your "
    "want. If you find no further external knowledge needed, you can directly "
    "provide the answer inside <answer> and </answer>, without detailed "
    "illustrations. For example, <answer> Beijing </answer>. Question: "
)
MODEL_NAME = "Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn"
MAX_TURNS, MAX_RESPONSE_TOKENS = 4, 512
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

SEARCH_SCHEMA = {
    "type": "function",
    "function": {
        "name": "search",
        "description": "DuckDuckGo web search",
        "parameters": {
            "type": "object",
            "properties": {
                "query_list": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Fully‑formed semantic queries."
                }
            },
            "required": ["query_list"],
        },
    },
}

def create_prompt(q: str) -> List[dict]:
    return [
        {"role": "system", "content": DEFAULT_SYSTEM_CONTENT},
        {"role": "user", "content": DEFAULT_USER_CONTENT_PREFIX + q},
    ]

def ddg_search(query: str, k: int = 5) -> str:
    with DDGS() as ddgs:
        hits = list(ddgs.text(query, safesearch="moderate", max_results=k))
    return "\n".join(f"{i+1}. {h['title']} – {h['body']} ({h['href']})" for i,h in enumerate(hits))

def extract_queries(raw: str) -> List[str]:
    try:
        payload = json.loads(raw)
        if payload.get("name") == "search":
            return payload.get("arguments", {}).get("query_list", [])
    except json.JSONDecodeError:
        pass
    return [raw]

def main() -> None:
    q = sys.argv[1] if len(sys.argv) > 1 else "How is the weather in Seoul?"
    tok = AutoTokenizer.from_pretrained(MODEL_NAME, padding_side="left")
    model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.bfloat16, device_map="auto")
    msgs = create_prompt(q)
    history = tok.apply_chat_template(msgs, tools=[SEARCH_SCHEMA], add_generation_prompt=True, tokenize=False)
    pattern = re.compile(r"<tool_call>\s*(.*?)\s*</tool_call>", re.S)
    for turn in range(MAX_TURNS):
        enc = tok(history, return_tensors="pt").to(DEVICE)
        out = model.generate(**enc, max_new_tokens=MAX_RESPONSE_TOKENS, temperature=0.7, do_sample=True)
        new = tok.decode(out[0][enc.input_ids.shape[1]:], skip_special_tokens=True)
        print(f"\n===== Assistant (turn {turn+1}) =====\n{new}\n")
        history += new
        m = pattern.search(new)
        if not m: break
        results = "\n---\n".join(ddg_search(q,5) for q in extract_queries(m.group(1)))
        history += f"<tool_response>\n{results}\n</tool_response>"

if __name__ == "__main__":
    main()

🧠 Reasoning style

<think> … chain‑of‑thought … </think>
<tool_call>{"name":"search", "arguments":{"query_list":["…"]}}</tool_call>
<tool_response>
1. web result
…
</tool_response>
<answer> final concise answer </answer>

📊 Evaluation (Pass@1)

Dataset	Original Search‑R1 (Qwen2.5‑3B)	This work
NQ	0.397	0.406
TriviaQA	0.565	0.582
PopQA	0.391	0.420
HotpotQA	0.331	0.338
2Wiki	0.310	0.332
Musique	0.124	0.111
Bamboogle	0.232	0.296

🤝 Acknowledgements

Qwen LM for the base model.
Search‑R1 authors for the dataset & baseline.
Volcengine VERL for the GRPO training toolkit.
HuggingFace for the open ecosystem.

📄 License & citation

Code is released under MIT; model weights under the original Qwen open‑source license.

@misc{shin2025qwen25_searchr1_multiturn,
  author       = {Seungyoun Shin},
  title        = {Qwen2.5-3B Search-R1-Multiturn (reproduce)},
  year         = 2025,
  howpublished = {HuggingFace Model Hub},
  url          = {https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn}
}

Seungyoun
/

qwen2.5-3b-it_searchR1-like-multiturn

Qwen2.5‑3B Search‑R1‑Multiturn (reproduce)

🚀 Quick start

Full inference script

🧠 Reasoning style

📊 Evaluation (Pass@1)

🤝 Acknowledgements

📄 License & citation

Model tree for Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn

Dataset used to train Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn

Qwen2.5‑3B Search‑R1‑Multiturn (reproduce)

🚀 Quick start

Full inference script

🧠 Reasoning style

📊 Evaluation (Pass@1)

🤝 Acknowledgements

📄 License & citation

Model tree for Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn

Dataset used to train Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn

🚀 Quick start