DeepSeekR1蒸馏Qwen2.5 32B版本经过Int4 GPTQ Marlin算法量化的版本，推荐RTX4090 24GB 2块GPU推理，性能达到1700tokens/秒，最优并发128同时使用。比PF16版本性能相当，ceval评测82.3，显存降低50%

Safetensors

Model size

5.7B params

Tensor type

I64

I32

BF16

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for ExceedZhang/DeepSeek-R1-Distill-Qwen-32B-W4A16-G128

Base model

Quantized

(116)

this model

ExceedZhang
/

DeepSeek-R1-Distill-Qwen-32B-W4A16-G128