Klear-Qwen3-Thinking-Preview

1. Introduction

Improving the reasoning capabilities of large language models (LLMs) has recently attracted significant attention in the AI community. The current paradigm for developing strong reasoning models typically involves a two-stage approach: supervised fine-tuning (SFT) with distilled data, followed by reinforcement learning (RL). While the open-source community has flourished with increasingly available open-source datasets, many critical training details remain unclear.

In this study, we present a comprehensive and open-source pipeline for training a high-performance reasoning model, named Klear-Qwen3-Thinking, starting from the Qwen3-8B-Base. We balance training stability and exploratory behavior in RL through multiple strategies. Klear-Qwen3-Thinking-Preview achieves 76.4% on AIME 2025 and 63.9% on LiveCodeBench V5, improving +13.7% and +8.8% over its SFT baseline, respectively. Notably, Klear-Qwen3-Thinking-Preview yields better performance than Qwen3-8B (Thinking mode), and competitive performance as DeepSeek-R1-0528-Qwen3-8B in math and coding, without distilling from DeepSeek-R1-0528.

👨‍💻 Github, 🤗 HF Model, 🤗 HF Dataset, 📖 [Tech Report](comming soon), 🔎 Evaluation results

2. Evaluation Results

Performance in comparison with SOTA models on AIME 24&25 and LiveCodeBench v5. Klear-SFT and Klear-Preview refer to Klear-Qwen3-Thinking-SFT and Klear-Qwen3-Thinking-Preview, respectively. Among 7B and 8B models, we outperform AceReason-Nemotron-1.1-7B (AceReason) and Qwen3-8B. Although we do not use the DeepSeek-R1-0528 dataset, we achieve comparable results to DeepSeek-R1-0528-Qwen3-8B. Additionally, compared to larger models like Qwen3-32B and DeepSeek-R1 (0120), we also demonstrate significant advantages.

3. Citation

@misc{Klear-thinking,
    title = {Klear-Qwen3-Thinking-Preview},
    url = {https://west-mask-4fa.notion.site/Klear-Qwen3-Thinking-Preview-23aab5f955ec8063b115da7d59dd9427},
    author = {Zhang, Jingyuan and Fu, Kai and Yue, Yang and Sun, Chenxi and Zhang, Hongzhi, and Liu, Yahui and Ji, Xingguang and Fu, Jia and Zhang, Tinghai and Li, Yan and Wang, Qi and Zhang, Fuzheng and Zhou, Guorui and Gai, Kun}
    year = {2025}
}

Kwai-Klear
/

Klear-Qwen3-Thinking-Preview

Klear-Qwen3-Thinking-Preview

1. Introduction

2. Evaluation Results

3. Citation

Model tree for Kwai-Klear/Klear-Qwen3-Thinking-Preview