metadata
license: apache-2.0
tags:
- gptq
- quantization
- vllm
- text-generation
- transformer
inference: false
library_name: vllm
model_creator: menlo
base_model: Menlo/Jan-nano
Jan-nano GPTQ 4bit (vLLM-ready)
This is a 4-bit GPTQ quantized version of Menlo/Jan-nano, optimized for fast inference with vLLM.
- Quantization: GPTQ (4-bit)
- Group size: 128
- Dtype: float16
- Backend:
gptqmodel
- Max context length: 4096 tokens
🔧 Usage with vLLM
vllm serve ./jan-nano-4b-gptqmodel-4bit \
--quantization gptq \
--dtype half \
--max-model-len 4096
📁 Files
- Sharded
.safetensors
model weights model.safetensors.index.json
tokenizer.json
,tokenizer_config.json
config.json
,generation_config.json
,quantize_config.json
(if available)