--- datasets: - PeterJinGo/nq_hotpotqa_train language: - en base_model: - Qwen/Qwen2.5-3B-Instruct --- # Qwen2.5‑3B Search‑R1‑Multiturn (reproduce) > **Author · Seungyoun Shin** > > 🤗 Model Hub: [hf](https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn) > > 📈 W\&B Report: [wandb](https://wandb.ai/yoon1001/search_r1_like_async_rl/reports/Qwen2-5-3b-it-search-r1-reproduce--VmlldzoxMzA2NzA2NA) A faithful re‑implementation of the *Search‑R1* on **Qwen 2.5‑3B-instruct**, trained purely on `nq-hotpotqa-train` with GRPO via the open‑source [VERL](https://github.com/volcengine/verl) framework. --- ## 🚀 Quick start ```bash pip install "transformers>=4.41" torch duckduckgo_search>=6.3.5 accelerate ``` ### Full inference script Below is the exact script used in our experiments—drop it next to the model weights and run. ```python #!/usr/bin/env python3 """ Minimal **multi‑turn tool‑calling** demo for the Qwen2.5‑3B Search‑R1 model Highlights ----------- * Presents the `search` function schema via `tools=[…]` so the model emits JSON calls. * Detects `` → parses `{name:"search", arguments:{query_list:[…]}}` and runs DuckDuckGo. * Streams results back in `` until an `` block appears. """ from __future__ import annotations import json, re, sys from typing import List import torch from transformers import AutoTokenizer, AutoModelForCausalLM from duckduckgo_search import DDGS DEFAULT_SYSTEM_CONTENT = "You are a helpful and harmless assistant." DEFAULT_USER_CONTENT_PREFIX = ( "Answer the given question. You must conduct reasoning inside and " " first every time you get new information. After reasoning, if you " "find you lack some knowledge, you can call a search engine by " "query and it will return the top searched results between " " and . You can search as many times as your " "want. If you find no further external knowledge needed, you can directly " "provide the answer inside and , without detailed " "illustrations. For example, Beijing . Question: " ) MODEL_NAME = "Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn" MAX_TURNS, MAX_RESPONSE_TOKENS = 4, 512 DEVICE = "cuda" if torch.cuda.is_available() else "cpu" SEARCH_SCHEMA = { "type": "function", "function": { "name": "search", "description": "DuckDuckGo web search", "parameters": { "type": "object", "properties": { "query_list": { "type": "array", "items": {"type": "string"}, "description": "Fully‑formed semantic queries." } }, "required": ["query_list"], }, }, } def create_prompt(q: str) -> List[dict]: return [ {"role": "system", "content": DEFAULT_SYSTEM_CONTENT}, {"role": "user", "content": DEFAULT_USER_CONTENT_PREFIX + q}, ] def ddg_search(query: str, k: int = 5) -> str: with DDGS() as ddgs: hits = list(ddgs.text(query, safesearch="moderate", max_results=k)) return "\n".join(f"{i+1}. {h['title']} – {h['body']} ({h['href']})" for i,h in enumerate(hits)) def extract_queries(raw: str) -> List[str]: try: payload = json.loads(raw) if payload.get("name") == "search": return payload.get("arguments", {}).get("query_list", []) except json.JSONDecodeError: pass return [raw] def main() -> None: q = sys.argv[1] if len(sys.argv) > 1 else "How is the weather in Seoul?" tok = AutoTokenizer.from_pretrained(MODEL_NAME, padding_side="left") model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.bfloat16, device_map="auto") msgs = create_prompt(q) history = tok.apply_chat_template(msgs, tools=[SEARCH_SCHEMA], add_generation_prompt=True, tokenize=False) pattern = re.compile(r"\s*(.*?)\s*", re.S) for turn in range(MAX_TURNS): enc = tok(history, return_tensors="pt").to(DEVICE) out = model.generate(**enc, max_new_tokens=MAX_RESPONSE_TOKENS, temperature=0.7, do_sample=True) new = tok.decode(out[0][enc.input_ids.shape[1]:], skip_special_tokens=True) print(f"\n===== Assistant (turn {turn+1}) =====\n{new}\n") history += new m = pattern.search(new) if not m: break results = "\n---\n".join(ddg_search(q,5) for q in extract_queries(m.group(1))) history += f"\n{results}\n" if __name__ == "__main__": main() ``` --- ## 🧠 Reasoning style ``` … chain‑of‑thought … {"name":"search", "arguments":{"query_list":["…"]}} 1. web result … final concise answer ``` --- ## 📊 Evaluation (Pass@1) | Dataset | Original Search‑R1 (Qwen2.5‑3B) | **This work** | | --------- | ------------------------------- | ------------- | | NQ | 0.397 | **0.406** | | TriviaQA | 0.565 | **0.582** | | PopQA | 0.391 | **0.420** | | HotpotQA | 0.331 | **0.338** | | 2Wiki | 0.310 | **0.332** | | Musique | **0.124** | 0.111 | | Bamboogle | 0.232 | **0.296** | --- ## 🤝 Acknowledgements * [Qwen LM](https://github.com/QwenLM) for the base model. * [Search‑R1 authors](https://github.com/PeterGriffinJin/Search-R1) for the dataset & baseline. * [Volcengine **VERL**](https://github.com/volcengine/verl) for the GRPO training toolkit. * HuggingFace for the open ecosystem. --- ## 📄 License & citation Code is released under **MIT**; model weights under the original **Qwen open‑source license**. ```bibtex @misc{shin2025qwen25_searchr1_multiturn, author = {Seungyoun Shin}, title = {Qwen2.5-3B Search-R1-Multiturn (reproduce)}, year = 2025, howpublished = {HuggingFace Model Hub}, url = {https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn} } ```