--- license: apache-2.0 --- # Klear-Qwen3-Thinking-Preview ## 1. Introduction Improving the reasoning capabilities of large language models (LLMs) has recently attracted significant attention in the AI community. The current paradigm for developing strong reasoning models typically involves a two-stage approach: supervised fine-tuning (SFT) with distilled data, followed by reinforcement learning (RL). While the open-source community has flourished with increasingly available open-source datasets, many critical training details remain unclear. In this study, we present a comprehensive and open-source pipeline for training a high-performance reasoning model, named `Klear-Qwen3-Thinking`, starting from the `Qwen3-8B-Base`. We balance training stability and exploratory behavior in RL through multiple strategies. `Klear-Qwen3-Thinking-Preview` achieves 76.4% on AIME 2025 and 63.9% on LiveCodeBench V5, improving +13.7% and +8.8% over its SFT baseline, respectively. Notably, `Klear-Qwen3-Thinking-Preview` yields better performance than `Qwen3-8B` (Thinking mode), and competitive performance as `DeepSeek-R1-0528-Qwen3-8B` in math and coding, without distilling from DeepSeek-R1-0528. 👨💻 [Github](https://github.com/Kwai-Klear/Klear-Qwen3-Thinking-Preview), 🤗 [HF Model](https://huggingface.co/Kwai-Klear/Klear-Qwen3-Thinking-Preview), 🤗 [HF Dataset](https://huggingface.co/datasets/Kwai-Klear/Klear-Qwen3-Thinking-Preview), 📖 [Tech Report](comming soon), 🔎 [Evaluation results](https://west-mask-4fa.notion.site/Klear-Qwen3-Thinking-Preview-23aab5f955ec8063b115da7d59dd9427?pvs=143) ## 2. Evaluation Results