File size: 1,060 Bytes
014d5fb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
---
license: apache-2.0
tags:
- gptq
- quantization
- vllm
- text-generation
- transformer
inference: false
library_name: vllm
model_creator: menlo
base_model: Menlo/Jan-nano
---
# Jan-nano GPTQ 4bit (vLLM-ready)
This is a 4-bit GPTQ quantized version of [Menlo/Jan-nano](https://huggingface.co/Menlo/Jan-nano), optimized for fast inference with [vLLM](https://github.com/vllm-project/vllm).
- **Quantization**: GPTQ (4-bit)
- **Group size**: 128
- **Dtype**: float16
- **Backend**: `gptqmodel`
- **Max context length**: 4096 tokens
---
## 🔧 Usage with vLLM
```bash
vllm serve ./jan-nano-4b-gptqmodel-4bit \
--quantization gptq \
--dtype half \
--max-model-len 4096
```
---
## 📁 Files
- Sharded `.safetensors` model weights
- `model.safetensors.index.json`
- `tokenizer.json`, `tokenizer_config.json`
- `config.json`, `generation_config.json`, `quantize_config.json` (if available)
---
## 🙏 Credits
- Original model by [Menlo](https://huggingface.co/Menlo)
- Quantized and shared by [ramgpt](https://huggingface.co/ramgpt)
|