BitsAndBytes 4 bits quantization from DeepSeek-R1-Distill-Qwen-7B commit 393119fcd6a873e5776c79b0db01c96911f5f0fc

Tested successfully with vLLM 0.7.2 with the following parameters:

llm_model = LLM(
    "MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits",
    task="generate",
    dtype=torch.bfloat16,
    max_num_seqs=8192,
    max_model_len=8192,
    trust_remote_code=True,
    quantization="bitsandbytes",
    load_format="bitsandbytes",
    enforce_eager=True, # Required for vLLM architecture V1
    tensor_parallel_size=1, 
    gpu_memory_utilization=0.95,  
    seed=42
)
Downloads last month
149
Safetensors
Model size
4.45B params
Tensor type
FP16
·
F32
·
U8
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits

Quantized
(111)
this model

Collection including MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits