callanwu commited on
Commit
396282a
·
verified ·
1 Parent(s): f8de16f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -3,6 +3,9 @@ license: mit
3
  base_model:
4
  - Qwen/QwQ-32B
5
  ---
 
 
 
6
  - Native agentic search reasoning model using ReAct framework towards autonomous information seeking agency and Deep Research-like model.
7
  - We introduce a four-stage training paradigm comprising browsing data construction, trajectory sampling, supervised fine-tuning for effective cold start, and reinforcement learning for improved generalization, enabling the agent to autonomously acquire autonomous search and reasoning skills.
8
  - Our data-centric approach integrates trajectory-level supervision fine-tuning and reinforcement learning (DAPO) to develop a scalable pipeline for training agentic systems via SFT or RL.
 
3
  base_model:
4
  - Qwen/QwQ-32B
5
  ---
6
+
7
+ You can download the model then run the inference scipts in https://github.com/Alibaba-NLP/WebAgent.
8
+
9
  - Native agentic search reasoning model using ReAct framework towards autonomous information seeking agency and Deep Research-like model.
10
  - We introduce a four-stage training paradigm comprising browsing data construction, trajectory sampling, supervised fine-tuning for effective cold start, and reinforcement learning for improved generalization, enabling the agent to autonomously acquire autonomous search and reasoning skills.
11
  - Our data-centric approach integrates trajectory-level supervision fine-tuning and reinforcement learning (DAPO) to develop a scalable pipeline for training agentic systems via SFT or RL.