Qwen2.5‑3B Search‑R1‑Multiturn (reproduce)
Author · Seungyoun Shin
🤗 Model Hub: hf
📈 W&B Report: wandb
A faithful re‑implementation of the Search‑R1 on Qwen 2.5‑3B-instruct, trained purely on nq-hotpotqa-train
with GRPO via the open‑source VERL framework.
🚀 Quick start
pip install "transformers>=4.41" torch duckduckgo_search>=6.3.5 accelerate
Full inference script
Below is the exact script used in our experiments—drop it next to the model weights and run.
#!/usr/bin/env python3
"""
Minimal **multi‑turn tool‑calling** demo for the Qwen2.5‑3B Search‑R1 model
Highlights
-----------
* Presents the `search` function schema via `tools=[…]` so the model emits JSON calls.
* Detects `<tool_call>` → parses `{name:"search", arguments:{query_list:[…]}}` and runs DuckDuckGo.
* Streams results back in `<tool_response>` until an `<answer>` block appears.
"""
from __future__ import annotations
import json, re, sys
from typing import List
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from duckduckgo_search import DDGS
DEFAULT_SYSTEM_CONTENT = "You are a helpful and harmless assistant."
DEFAULT_USER_CONTENT_PREFIX = (
"Answer the given question. You must conduct reasoning inside <think> and "
"</think> first every time you get new information. After reasoning, if you "
"find you lack some knowledge, you can call a search engine by <tool_call> "
"query </tool_call> and it will return the top searched results between "
"<tool_response> and </tool_response>. You can search as many times as your "
"want. If you find no further external knowledge needed, you can directly "
"provide the answer inside <answer> and </answer>, without detailed "
"illustrations. For example, <answer> Beijing </answer>. Question: "
)
MODEL_NAME = "Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn"
MAX_TURNS, MAX_RESPONSE_TOKENS = 4, 512
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
SEARCH_SCHEMA = {
"type": "function",
"function": {
"name": "search",
"description": "DuckDuckGo web search",
"parameters": {
"type": "object",
"properties": {
"query_list": {
"type": "array",
"items": {"type": "string"},
"description": "Fully‑formed semantic queries."
}
},
"required": ["query_list"],
},
},
}
def create_prompt(q: str) -> List[dict]:
return [
{"role": "system", "content": DEFAULT_SYSTEM_CONTENT},
{"role": "user", "content": DEFAULT_USER_CONTENT_PREFIX + q},
]
def ddg_search(query: str, k: int = 5) -> str:
with DDGS() as ddgs:
hits = list(ddgs.text(query, safesearch="moderate", max_results=k))
return "\n".join(f"{i+1}. {h['title']} – {h['body']} ({h['href']})" for i,h in enumerate(hits))
def extract_queries(raw: str) -> List[str]:
try:
payload = json.loads(raw)
if payload.get("name") == "search":
return payload.get("arguments", {}).get("query_list", [])
except json.JSONDecodeError:
pass
return [raw]
def main() -> None:
q = sys.argv[1] if len(sys.argv) > 1 else "How is the weather in Seoul?"
tok = AutoTokenizer.from_pretrained(MODEL_NAME, padding_side="left")
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.bfloat16, device_map="auto")
msgs = create_prompt(q)
history = tok.apply_chat_template(msgs, tools=[SEARCH_SCHEMA], add_generation_prompt=True, tokenize=False)
pattern = re.compile(r"<tool_call>\s*(.*?)\s*</tool_call>", re.S)
for turn in range(MAX_TURNS):
enc = tok(history, return_tensors="pt").to(DEVICE)
out = model.generate(**enc, max_new_tokens=MAX_RESPONSE_TOKENS, temperature=0.7, do_sample=True)
new = tok.decode(out[0][enc.input_ids.shape[1]:], skip_special_tokens=True)
print(f"\n===== Assistant (turn {turn+1}) =====\n{new}\n")
history += new
m = pattern.search(new)
if not m: break
results = "\n---\n".join(ddg_search(q,5) for q in extract_queries(m.group(1)))
history += f"<tool_response>\n{results}\n</tool_response>"
if __name__ == "__main__":
main()
🧠 Reasoning style
<think> … chain‑of‑thought … </think>
<tool_call>{"name":"search", "arguments":{"query_list":["…"]}}</tool_call>
<tool_response>
1. web result
…
</tool_response>
<answer> final concise answer </answer>
📊 Evaluation (Pass@1)
Dataset | Original Search‑R1 (Qwen2.5‑3B) | This work |
---|---|---|
NQ | 0.397 | 0.406 |
TriviaQA | 0.565 | 0.582 |
PopQA | 0.391 | 0.420 |
HotpotQA | 0.331 | 0.338 |
2Wiki | 0.310 | 0.332 |
Musique | 0.124 | 0.111 |
Bamboogle | 0.232 | 0.296 |
🤝 Acknowledgements
- Qwen LM for the base model.
- Search‑R1 authors for the dataset & baseline.
- Volcengine VERL for the GRPO training toolkit.
- HuggingFace for the open ecosystem.
📄 License & citation
Code is released under MIT; model weights under the original Qwen open‑source license.
@misc{shin2025qwen25_searchr1_multiturn,
author = {Seungyoun Shin},
title = {Qwen2.5-3B Search-R1-Multiturn (reproduce)},
year = 2025,
howpublished = {HuggingFace Model Hub},
url = {https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn}
}
- Downloads last month
- 494
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support