Klear

🔥News

  • 2025.09.05: We’ve released the Klear-46B-A2.5B series, which currently includes a base model and an instruction-tuned model with DPO. A reasoning-enhanced variant is also in training — stay tuned for upcoming updates!

1. Introduction

Klear-46B-A2.5B is a sparse Mixture-of-Experts (MoE) large language model developed by the Kwai-Klear Team at Kuaishou, designed to deliver both high performance and inference efficiency. It features 256 experts, with only 8 experts and 1 shared expert activated per layer during the forward pass, resulting in 46 billion total parameters but just 2.5 billion active — achieving dense-level performance at a fraction of the computational cost.

The model was trained on over 22 trillion tokens using a three-stage progressive curriculum:

1. Foundational Knowledge Learning (12T tokens): General-purpose datasets such as CommonCrawl were processed with stratified quality filters, following a curriculum learning strategy that progresses from lower to higher data quality.

2. Data Complexity Enhancement (8T tokens): The proportion of mathematical, coding, and STEM-related data was gradually increased to strengthen the model's reasoning and problem-solving capabilities.

3. Reasoning Enhancement and Longcontext Stage (2T tokens): Training focused on synthetic and reasoning-intensive data, combined with a fast learning rate annealing strategy to maximize data efficiency and optimize final performance.

As a result, Klear-46B-A2.5B-Base matches or surpasses the performance of dense models with several times more active parameters, while offering significantly better efficiency and cost-effectiveness for real-world deployment.

Model Summary

The base and instruction tuned + DPO models have the following architecture:

key value
hidden_size 2048
moe_intermediate_size 896
n_shared_experts 1
num_attention_heads 32
num_experts 256
num_experts_per_tok 8
num_hidden_layers 32
num_key_value_heads 4
vocab_size 151936
tie_word_embeddings false
context length 65536

Model Downloads

Model #Total Params #Activated Params Context Length Download Link
Klear-46B-A2.5B-Base 46B 2.5B 64K 🤗 Hugging Face
Klear-46B-A2.5B-Instruct 46B 2.5B 64K 🤗 Hugging Face

2. Benchmark Evaluation

Klear-46B-A2.5B-Base Evaluation Results

Ability Benchmark Klear-46B-A2.5B-Base MiMO-7B-Base Qwen3-8B-BASE Qwen3-14B-BASE Ling-lite-1.5-Base Qwen3-30B-A3B-BASE
# Total Params 46B 7B 8B 14B 16.8B 30B
# Activated Params 2.5B 7B 8B 14B 2.75B 3B
Code HumanEval† (0-shot) 89 - 84.1 87.8 83.5 90.9
MBPP (3-shot) 76 69.2* 69 74 66.6 75.6
Math MATH (4-shot, cot) 55.7 38.8 60.8* 62.02* 59.9 59.04*
CMATH (3-shot) 87.83 78.5 88.3 90.7 85.7 89.7
GSM8K (4-shot, cot) 87.3 78.47 89.4 90.3 87.6 91.1
General MMLU-Pro (5-shot, cot) 57.6 43.1 55.2 58.1 49.9 58.8
MMLU (5-shot) 80.5 69.24 77.1 80.6 73.7 80.4
CEval (5-shot) 89.8 67.98 81.9 84.8 78.2 87.4
CMMLU (5-shot) 88 70.79 82 85.6 81.2 87.1
GPQA (0-shot) 35.3 31.03 33.9 35.7 30.1 35.5
AGIEval (0-shot) 52.3 48.3* 51.7 55.7 54.3 56
BBH (3-shot, cot) 77.9 75.6 78.1 80.1 75.4 81.2
HellaSwag (0-shot) 80.5 80* 78.7 81.5 80 81.2
Triviaqa (5-shot) 69.6 60.8* 56.3 62.1 60.9 65.6
Naturalqs (5-shot) 37.5 23.46 25.7 29.1 28 30.7
PIQA (0-shot) 81.6 80.14 79.5 81.9 82 80.7
OpenBookQA (0-shot) 37.8 34.2 35 35.6 38.2 34.6
Average 69.66 - 66.62 69.60 65.60 70.41

Note:

  1. Results marked with * are sourced from their public report, other evaluations are conducted based on internal evaluation frameworks.
  2. During pretraining, we found that the HumanEval metric fluctuated significantly and was extremely sensitive to formatting. Therefore, we referred to the prompt from Ling-series paper to modify the original HumanEval. The results in the table are the evaluation metrics after this modification.

Klear-46B-A2.5B-Instruct Evaluation Results

Ability Benchmark Klear-46B-A2.5B--Instruct InternLM3-8B-Instruct MiniCPM4-8B Qwen3-8B (NoThink) gemma3-12b-it Phi4-14B Qwen3-30B-A3B-2507
# Total Params 46B 8B 8B 8B 12B 14B 30B
# Activated Params 2.5B 8B 8B 8B 12B 14B 3B
General MMLU-Redux 81.95 74.65 77.63 79.32 78.39 83.09 88.11
MMLU-Pro 63.61 50.87 54.69 63.8 60.69 67.25 78.22
GPQA-Diamoind 49.12 38.76 38.51 51.77 39.02 59.47 71.21
SimpleQA 6.2 4.44 3.51 5.5 6.22 3.28 23.39
CLUEWSC 88.49 77.63 81.91 82.89 91.12 88.16 92.11
CEval 85.98 84.26 81.78 81.66 60.81 64.79 88.57
C-SimpleQA 42.8 25.87 23.13 37.07 28.97 24.77 75.37
LiveBench 1125 50 26.3 25.5 52.1 43.1 40 68.4
Math MATH500 86.4 68.4 79.8 85 86.8 80.6 97.2
AIME24 28.33 11.25 22.92 28.33 23.96 15.83 75
AIME25 19.17 8.12 15.21 20.62 18.33 18.75 61.88
Code HumanEval 86.59 82.3* 78.05 83.54 82.32 85.37 81.71
HumanEval+ 79.27 - 73.17 76.83 75.61 83.54 76.83
MBPPEvalplus 79.9 62.4 83.3 76.2 85.7 77.5 89.4
MBPPEvalplus++ 68.8 50.4 71.7 66.1 74.1 66.7 75.1
LiveCodeBench v5(2408-2501) 27.96 14.7 12.19 27.24 24.73 23.66 41.22
Alignment IF-Eval 81.89 79.3 73.01 84.47 81.52 59.33 83.92
Multi-IF(en+zh) 78.46 61.83 61.79 78.95 76.56 62.7 77.75
MTBench 8.42 7.86 6.875 8.21 8.68 8.62 9.33
MT-Eval 8.13 7.36 6.7 8.18 8.45 8.12 -
AlignBench v1.1 7 6.13 5.99 6.95 6.3 6.33 7.06
Average 53.74 - 46.54 52.61 50.54 48.95 -

Note:

  1. For InternLM3-8B-Instruct, the results marked with * are sourced from their official website, other evaluations are conducted based on internal evaluation frameworks.
  2. For Multi-IF, we report the overall average computed across all three rounds, pooling the Chinese and English metrics.

3. Quick start

Inference with huggingface

You can now inference in Transformers starting from version 4.56.0 and set trust_remote_code=True.

Klear-46B-A2.5B-Base

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = "/path/to/Klear-Base"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
    
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", dtype=torch.bfloat16, trust_remote_code=True)

text = "世界上最大的湖是"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs.to(model.device), max_new_tokens=256)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

Klear-46B-A2.5B-Instruct

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_path = "/path/to/Klear-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)

model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", dtype=torch.bfloat16, trust_remote_code=True)

messages = [
    {"role": "user", "content": "帮我用 python 写一个计算器的代码吧。"}
]
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=1024)

result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
print(result)

Inference with vllm

vLLM is a high-speed and memery-efficicent inference framework. We provide our own forked version of vLLM here.

git clone https://github.com/Kwai-Klear/vllm.git
cd vllm
VLLM_USE_PRECOMPILED=1 pip install --editable .
vllm serve /path/to/Klear-Instruct --port 8000 --tensor-parallel-size 8 --trust-remote-code

An OpenAI-compatible API will be available at http://localhost:8000/v1.

Or you can refer to the following Python script for offline inference

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

model_path = "/path/to/Klear-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

llm = LLM(
    model=model_path,
    trust_remote_code=True,
    tensor_parallel_size=torch.cuda.device_count(),
    gpu_memory_utilization=0.7
)
messages = [
    {"role": "user", "content": "帮我用 python 写一个计算器的代码吧。"}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

sampling_params = SamplingParams(
    temperature=0.6, top_p=0.95, top_k=40, max_new_tokens=1024
)

outputs = llm.generate([prompt], sampling_params)

print(outputs[0].outputs[0].text)
Downloads last month
15
Safetensors
Model size
46.2B params
Tensor type
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kwai-Klear/Klear-46B-A2.5B-Base

Finetunes
1 model

Collection including Kwai-Klear/Klear-46B-A2.5B-Base