ZhuofengLi
/

octo-search-qwen2.5-7b-grpo-155-step-v1

Model card Files Files and versions

octo-search-qwen2.5-7b-grpo-155-step-v1 / README.md

ZhuofengLi's picture

Update README.md

a1ec414 verified about 2 months ago

|

history blame contribute delete

1.17 kB

	## Training Details
	+ Dataset: nq_search
	+ Training Curve: https://wandb.ai/1004271927-SHU/search_r1_qa_em/runs/k1ilsvfj?nw=nwuser1004271927

	## Prompt Format
	```python
	question = (
	'Answer the given question by calling the Search tool. '
	'You must perform your reasoning within <think> and </think> before each tool call. '
	'After reasoning, call the Search tool (described as: a tool that performs web search based on a given text query) '
	'by passing the query inside <tool_query>...</tool_query>. '
	'The tool will return its result between <tool_result> and </tool_result>. '
	'You may call the tool as many times as needed. '
	'If no further tool calls are required, provide the final answer directly within <answer>...</answer>, '
	'without additional explanation. For example: <answer> Beijing </answer>. '
	'Question: {question}'
	)

	user_question = "What is the capital of France?"

	formatted_question = question.format(question=user_question)

	messages = [{"role": "user", "content": formatted_question}]

	input = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=False
	)

	print(input)

	```