--- license: mit language: - en base_model: - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --- BitsAndBytes 4 bits quantization from DeepSeek-R1-Distill-Qwen-7B commit 393119fcd6a873e5776c79b0db01c96911f5f0fc Tested successfully with vLLM 0.7.2 with the following parameters: ```python llm_model = LLM( "MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits", task="generate", dtype=torch.bfloat16, max_num_seqs=8192, max_model_len=8192, trust_remote_code=True, quantization="bitsandbytes", load_format="bitsandbytes", enforce_eager=True, # Required for vLLM architecture V1 tensor_parallel_size=1, gpu_memory_utilization=0.95, seed=42 ) ```