Phi-4 GPTQ (4-bit Quantized)
Model Description
This is a 4-bit GPTQ-quantized version of the Phi-4 transformer model, optimized for efficient inference while maintaining performance.
- Base Model: Phi-4
- Quantization: GPTQ (4-bit)
- Format:
safetensors
- Tokenizer: Uses standard
vocab.json
andmerges.txt
Intended Use
- Fast inference with minimal VRAM usage
- Deployment in resource-constrained environments
- Optimized for low-latency text generation
Model Details
Attribute | Value |
---|---|
Model Name | Phi-4 GPTQ |
Quantization | 4-bit (GPTQ) |
File Format | .safetensors |
Tokenizer | phi-4-tokenizer.json |
VRAM Usage | ~X GB (depending on batch size) |
- Downloads last month
- 38
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for fhamborg/phi-4-4bit-gptq
Base model
microsoft/phi-4