|
--- |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
license: apache-2.0 |
|
language: |
|
- en |
|
base_model: |
|
- miromind-ai/MiroThinker-14B-SFT-v0.1 |
|
tags: |
|
- agent |
|
- open-source |
|
- miromind |
|
--- |
|
|
|
<div align="center"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/68525b342230a897a65cc1c0/87mYQ_a-4jpnMkVR4hrgm.png" width="55%" alt="MiroThinker" /> |
|
</div> |
|
<!-- <hr> --> |
|
<div align="center"> |
|
|
|
[](https://dr.miromind.ai/) |
|
[](https://huggingface.co/collections/miromind-ai/mirothinker-v01-689301b6d0563321862d44a1) |
|
[](https://huggingface.co/datasets/miromind-ai/MiroVerse-v0.1) |
|
[](https://miromind.ai/blog/miromind-open-deep-research) |
|
|
|
[](https://github.com/MiroMindAI/MiroThinker) |
|
[](https://discord.com/invite/GPqEnkzQZd) |
|
[](https://cdn-uploads.huggingface.co/production/uploads/68525b342230a897a65cc1c0/SGK70isvVpeJwk_fny9sb.png) |
|
[](https://www.xiaohongshu.com/user/profile/663098830000000003033edc) |
|
[](https://miromind.ai/) |
|
|
|
</div> |
|
|
|
## Introduction |
|
|
|
MiroThinker is an open-source agentic model series built on top of Qwen3. Designed for deep research and complex, long-horizon problem solving, it integrates strong capabilities in task decomposition, multi-hop reasoning, retrieval-augmented generation, code execution, web browsing, and document/file processing, making it suitable for a wide range of real-world applications. |
|
|
|
We have released the MiroThinker-v0.1 series, including both SFT and DPO variants at parameter scales of 8B, 14B, and 32B. Notably, MiroThinker v0.1 achieves state-of-the-art performance among open-source models on the [GAIA benchmark](https://huggingface.co/datasets/gaia-benchmark/GAIA), a rigorous evaluation suite for advanced agentic capabilities, demonstrating its strength in long-context, decision-intensive, and real-world task scenarios. |
|
|
|
## Online Demo |
|
|
|
Welcome to try out our online demo [here](https://dr.miromind.ai/). In this demo, we have deployed our [MiroThinker-32B-DPO-v0.1](https://huggingface.co/miromind-ai/MiroThinker-32B-DPO-v0.1) along with commercial tools (you can find more details in our [GitHub](https://github.com/MiroMindAI/MiroThinker)), aiming to deliver a better experience. |
|
|
|
## Performance |
|
|
|
### GAIA Benchmark |
|
|
|
| **Method** | Text-103<br>Best Pass@1 | Text-103<br>Pass@1 (Avg@8) | Val-165<br>Best Pass@1 | Val-165<br>Pass@1 (Avg@8) | |
|
| ----------------------------------------------------------------- | :--: | :--: | :--: | :--: | |
|
| Search-o1-7B | 17.5 | - | - | - | |
|
| R1-Searcher-7B | 20.4 | - | - | - | |
|
| WebDancer-7B | 31.0 | - | - | - | |
|
| WebSailor-7B | 37.9 | - | - | - | |
|
| CK-Pro-8B | 40.3 | - | 32.7 | - | |
|
| MiroThinker-8B-SFT-v0.1 | 44.7 | 40.1 | 34.6 | 31.8 | |
|
| + Commercial Tools | 46.6 | 42.1 | 37.6 | 33.9 | |
|
| MiroThinker-8B-DPO-v0.1 | 46.6 | 44.8 | 37.0 | 35.4 | |
|
| + Commercial Tools | 50.5 | 46.7 | 38.2 | 35.9 | |
|
| | | | | | |
|
| Search-o1-32B | 28.2 | - | - | - | |
|
| WebThinker-32B-RL | 48.5 | - | - | - | |
|
| WebDancer-QwQ-32B | 51.5 | - | - | - | |
|
| WebSailor-32B | 53.2 | - | - | - | |
|
| WebShaper-QwQ-32B | 53.3 | - | - | - | |
|
| WebShaper-72B | 60.1 | - | - | - | |
|
| MiroThinker-14B-SFT-v0.1 | 47.6 | 44.4 | 37.0 | 34.4 | |
|
| + Commercial Tools | 49.5 | 47.5 | 41.8 | 39.8 | |
|
| MiroThinker-14B-DPO-v0.1 | 48.5 | 46.6 | 42.4 | 39.2 | |
|
| + Commercial Tools | 52.4 | 48.5 | 45.5 | 42.0 | |
|
| MiroThinker-32B-SFT-v0.1 | 55.3 | 51.3 | 44.9 | 42.7 | |
|
| + Commercial Tools | 58.3 | 54.2 | 48.5 | 45.8 | |
|
| <span style="white-space:nowrap;">MiroThinker-32B-DPO-v0.1</span> | 57.3 | 54.1 | 48.5 | 45.9 | |
|
| + Commercial Tools | **60.2** | **57.9** | **50.9** | **48.9** | |
|
|
|
1. Following the practices of WebThinker, WebAgents, and CognitiveKernel, we report the Best Pass@1, the highest score across three runs, which often reflects stronger performance, though it may exhibit some variability. To provide a more stable measure, we additionally report Pass@1 (Avg@8), which offers greater consistency at the cost of slightly lower scores. |
|
|
|
2. For consistency with prior open-source works, we evaluate GAIA-Text-103 using the WebAgents LLM-as-judge template, and report results on GAIA-Val-165 using the official GAIA scorer script. |
|
|
|
3. By default, we use open-source tools wherever possible, except for the code tool [E2B](https://github.com/e2b-dev/E2B) and the Google search tool [Serper](https://serper.dev/). We use [Whisper](https://huggingface.co/openai/whisper-large-v3-turbo), [Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct), and [Qwen3-235B-A22B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507) in our implementation. The framework can be easily extended to other open-source tools of your choice. |
|
|
|
4. Commercial tools were mainly used for multimodal capabilities and certain complex reasoning subtasks. The majority of tasks, including planning, browsing, refinement, navigation, and more, were handled by our models. |
|
|
|
### More Benchmarks |
|
|
|
| Method | HLE<br>Pass@1 | Frames<br>Pass@1 | BrowseComp<br>Pass@1 | <span style="white-space:nowrap;">BrowseComp-ZH</span><br>Pass@1 | WebWalkerQA<br>Pass@1 | |
|
|-------------------------------------------------------------------|:-------------:|:----------------:|:--------------------:|:----------------------------------------------------------------:|:---------------------:| |
|
| OpenAI Deep Research | 26.6 | - | 51.5 | 42.9 | - | |
|
| Gemini Deep Research | 26.9 | - | - | - | - | |
|
| Kimi-Researcher | 26.9 | 78.8 | - | - | - | |
|
| | | | | | | |
|
| WebDancer-7B | - | - | - | - | 36.0 | |
|
| WebSailor-7B | - | - | 6.7 | 14.2 | - | |
|
| MiroThinker-8B-SFT-v0.1 | - | 58.0 | 5.5 | 9.3 | 41.3 | |
|
| MiroThinker-8B-DPO-v0.1 | - | 64.4 | 8.7 | 13.6 | 45.7 | |
|
| | | | | | | |
|
| WebThinker-32B-RL | - | - | - | - | 46.5 | |
|
| WebDancer-QwQ-32B | - | - | 3.8 | 18.0 | 47.9 | |
|
| WebSailor-32B | - | - | 10.5 | 25.5 | - | |
|
| WebShaper-32B | - | - | - | - | 51.4 | |
|
| MiroThinker-32B-SFT-v0.1 | 10.2 | 70.4 | 10.6 | 13.8 | 45.7 | |
|
| <span style="white-space:nowrap;">MiroThinker-32B-DPO-v0.1</span> | 11.8 | 71.7 | 13.0 | 17.0 | 49.3 | |
|
|
|
1. MiroThinker’s performance was tested with [this repository](https://github.com/MiroMindAI/MiroThinker) and open-source tools; other models’ results are from their papers and official sites. |
|
|
|
2. As [MiroVerse-v0.1](https://huggingface.co/datasets/miromind-ai/MiroVerse-v0.1) mainly contains English data, the model’s Chinese capability is limited. We plan to add more Chinese data in the next version. |
|
|
|
## Quick Start |
|
|
|
MiroThinker-v0.1 is trained on our large-scale, high-quality trajectory and preference datasets [MiroVerse-v0.1](https://huggingface.co/datasets/miromind-ai/MiroVerse-v0.1), utilizing the efficient training framework [MiroTrain](https://github.com/MiroMindAI/MiroTrain), and enhanced with tool-use capabilities through our agentic framework [MiroFlow](https://github.com/MiroMindAI/MiroFlow). |
|
|
|
To promote reproducibility and benefit the community, we decided to open-source the entire suite mentioned above. For more technical details, evaluation results, and usage tutorials, please visit our [GitHub repository](https://github.com/MiroMindAI/MiroThinker). |
|
|
|
## License |
|
|
|
MiroThinker-v0.1 is licensed under Apache 2.0. |
|
|
|
## Contact Us |
|
|
|
MiroThinker is developed by the MiroMind Foundation Model Team. |
|
If you would like to leave us a message, feel free to get in touch. |
|
In addition to [GitHub](https://github.com/MiroMindAI/), |
|
[Discord](https://discord.com/invite/GPqEnkzQZd), |
|
[WeChat](https://cdn-uploads.huggingface.co/production/uploads/68525b342230a897a65cc1c0/SGK70isvVpeJwk_fny9sb.png), |
|
and [RedNote](https://www.xiaohongshu.com/user/profile/663098830000000003033edc), |
|
you can also reach us via email at [email protected]. |