DeepSeekR1蒸馏Qwen2.5 32B版本经过Int4 GPTQ Marlin算法量化的版本,推荐RTX4090 24GB 2块GPU推理,性能达到1700tokens/秒,最优并发128同时使用。 比PF16版本性能相当,ceval评测82.3,显存降低50%

Downloads last month
16
Safetensors
Model size
5.7B params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for ExceedZhang/DeepSeek-R1-Distill-Qwen-32B-W4A16-G128

Quantized
(116)
this model

Dataset used to train ExceedZhang/DeepSeek-R1-Distill-Qwen-32B-W4A16-G128