ramgpt's picture
Upload full Jan-nano GPTQ 4bit model with all files
014d5fb
metadata
license: apache-2.0
tags:
  - gptq
  - quantization
  - vllm
  - text-generation
  - transformer
inference: false
library_name: vllm
model_creator: menlo
base_model: Menlo/Jan-nano

Jan-nano GPTQ 4bit (vLLM-ready)

This is a 4-bit GPTQ quantized version of Menlo/Jan-nano, optimized for fast inference with vLLM.

  • Quantization: GPTQ (4-bit)
  • Group size: 128
  • Dtype: float16
  • Backend: gptqmodel
  • Max context length: 4096 tokens

🔧 Usage with vLLM

vllm serve ./jan-nano-4b-gptqmodel-4bit \
  --quantization gptq \
  --dtype half \
  --max-model-len 4096

📁 Files

  • Sharded .safetensors model weights
  • model.safetensors.index.json
  • tokenizer.json, tokenizer_config.json
  • config.json, generation_config.json, quantize_config.json (if available)

🙏 Credits

  • Original model by Menlo
  • Quantized and shared by ramgpt