--- license: apache-2.0 --- # Klear-Qwen3-Thinking-Preview ## 1. Introduction Improving the reasoning capabilities of large language models (LLMs) has recently attracted significant attention in the AI community. The current paradigm for developing strong reasoning models typically involves a two-stage approach: supervised fine-tuning (SFT) with distilled data, followed by reinforcement learning (RL). While the open-source community has flourished with increasingly available open-source datasets, many critical training details remain unclear. In this study, we present a comprehensive and open-source pipeline for training a high-performance reasoning model, named `Klear-Qwen3-Thinking`, starting from the `Qwen3-8B-Base`. We balance training stability and exploratory behavior in RL through multiple strategies. `Klear-Qwen3-Thinking-Preview` achieves 76.4% on AIME 2025 and 63.9% on LiveCodeBench V5, improving +13.7% and +8.8% over its SFT baseline, respectively. Notably, `Klear-Qwen3-Thinking-Preview` yields better performance than `Qwen3-8B` (Thinking mode), and competitive performance as `DeepSeek-R1-0528-Qwen3-8B` in math and coding, without distilling from DeepSeek-R1-0528. 👨‍💻 [Github](https://github.com/Kwai-Klear/Klear-Qwen3-Thinking-Preview), 🤗 [HF Model](https://huggingface.co/Kwai-Klear/Klear-Qwen3-Thinking-Preview), 🤗 [HF Dataset](https://huggingface.co/datasets/Kwai-Klear/Klear-Qwen3-Thinking-Preview), 📖 [Tech Report](comming soon), 🔎 [Evaluation results](https://west-mask-4fa.notion.site/Klear-Qwen3-Thinking-Preview-23aab5f955ec8063b115da7d59dd9427?pvs=143) ## 2. Evaluation Results

Performance in comparison with SOTA models on AIME 24&25 and LiveCodeBench v5. Klear-SFT and Klear-Preview refer to Klear-Qwen3-Thinking-SFT and Klear-Qwen3-Thinking-Preview, respectively. Among 7B and 8B models, we outperform [AceReason-Nemotron-1.1-7B](https://arxiv.org/pdf/2506.13284) (AceReason) and [Qwen3-8B](https://arxiv.org/pdf/2505.09388). Although we do not use the [DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) dataset, we achieve comparable results to [DeepSeek-R1-0528-Qwen3-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B). Additionally, compared to larger models like [Qwen3-32B](https://arxiv.org/pdf/2505.09388) and [DeepSeek-R1 (0120)](https://huggingface.co/deepseek-ai/DeepSeek-R1), we also demonstrate significant advantages. ## 3. Citation ```latex @misc{Klear-thinking, title = {Klear-Qwen3-Thinking-Preview}, url = {https://west-mask-4fa.notion.site/Klear-Qwen3-Thinking-Preview-23aab5f955ec8063b115da7d59dd9427}, author = {Zhang, Jingyuan and Fu, Kai and Yue, Yang and Sun, Chenxi and Zhang, Hongzhi, and Liu, Yahui and Ji, Xingguang and Fu, Jia and Zhang, Tinghai and Li, Yan and Wang, Qi and Zhang, Fuzheng and Zhou, Guorui and Gai, Kun} year = {2025} } ```