MPWARE
/

DeepSeek-R1-Distill-Qwen-7B-BnB-4bits

4-bit precision

Model card Files Files and versions Community

BitsAndBytes 4 bits quantization from DeepSeek-R1-Distill-Qwen-7B commit 393119fcd6a873e5776c79b0db01c96911f5f0fc

Tested successfully with vLLM 0.7.2 with the following parameters:

llm_model = LLM(
    "MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits",
    task="generate",
    dtype=torch.bfloat16,
    max_num_seqs=8192,
    max_model_len=8192,
    trust_remote_code=True,
    quantization="bitsandbytes",
    load_format="bitsandbytes",
    enforce_eager=True, # Required for vLLM architecture V1
    tensor_parallel_size=1, 
    gpu_memory_utilization=0.95,  
    seed=42
)

Downloads last month: 149

Safetensors

Model size

4.45B params

Tensor type

FP16

·

F32

·

U8

·

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

Quantized

(111)

this model

Collection including MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits

DeepSeek-R1 Family

7 items • Updated 10 days ago