Kwai-Klear
/

Klear-Qwen3-Thinking-Preview

Model card Files Files and versions

YHLLEO commited on Jul 26

Commit

08ba82f

·

verified ·

1 Parent(s): b1bacc5

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -16,6 +16,7 @@ In this study, we present a comprehensive and open-source pipeline for training
 <p align="center">
   <img width="100%" src="https://west-mask-4fa.notion.site/image/attachment%3A49aa5b9e-0fbc-49aa-b3e2-eddea14e6c47%3Abenchmark_comparison_panels.png?table=block&id=23bab5f9-55ec-80b0-a536-e347209ebde5&spaceId=ac3ab5f9-55ec-815c-b1fd-0003d8804c06&width=1420&userId=&cache=v2">
 </p>
 Performance in comparison with SOTA models on AIME 24&25 and LiveCodeBench v5. Klear-SFT and Klear-Preview refer to Klear-Qwen3-Thinking-SFT and Klear-Qwen3-Thinking-Preview, respectively. Among 7B and 8B models, we outperform [AceReason-Nemotron-1.1-7B](https://arxiv.org/pdf/2506.13284) (AceReason) and [Qwen3-8B](https://arxiv.org/pdf/2505.09388). Although we do not use the [DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) dataset, we achieve comparable results to [DeepSeek-R1-0528-Qwen3-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B). Additionally, compared to larger models like [Qwen3-32B](https://arxiv.org/pdf/2505.09388) and [DeepSeek-R1 (0120)](https://huggingface.co/deepseek-ai/DeepSeek-R1), we also demonstrate significant advantages.

 <p align="center">
   <img width="100%" src="https://west-mask-4fa.notion.site/image/attachment%3A49aa5b9e-0fbc-49aa-b3e2-eddea14e6c47%3Abenchmark_comparison_panels.png?table=block&id=23bab5f9-55ec-80b0-a536-e347209ebde5&spaceId=ac3ab5f9-55ec-815c-b1fd-0003d8804c06&width=1420&userId=&cache=v2">
 </p>
 Performance in comparison with SOTA models on AIME 24&25 and LiveCodeBench v5. Klear-SFT and Klear-Preview refer to Klear-Qwen3-Thinking-SFT and Klear-Qwen3-Thinking-Preview, respectively. Among 7B and 8B models, we outperform [AceReason-Nemotron-1.1-7B](https://arxiv.org/pdf/2506.13284) (AceReason) and [Qwen3-8B](https://arxiv.org/pdf/2505.09388). Although we do not use the [DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) dataset, we achieve comparable results to [DeepSeek-R1-0528-Qwen3-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B). Additionally, compared to larger models like [Qwen3-32B](https://arxiv.org/pdf/2505.09388) and [DeepSeek-R1 (0120)](https://huggingface.co/deepseek-ai/DeepSeek-R1), we also demonstrate significant advantages.