Phi-4 GPTQ (4-bit Quantized)

Model Description

This is a 4-bit GPTQ-quantized version of the Phi-4 transformer model, optimized for efficient inference while maintaining performance.

Base Model: Phi-4
Quantization: GPTQ (4-bit)
Format: safetensors
Tokenizer: Uses standard vocab.json and merges.txt

Intended Use

Fast inference with minimal VRAM usage
Deployment in resource-constrained environments
Optimized for low-latency text generation

Model Details

Attribute	Value
Model Name	Phi-4 GPTQ
Quantization	4-bit (GPTQ)
File Format	`.safetensors`
Tokenizer	`phi-4-tokenizer.json`
VRAM Usage	~X GB (depending on batch size)

Downloads last month: 38

Safetensors

Model size

2.85B params

Tensor type

I32

·

FP16

·

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Model tree for fhamborg/phi-4-4bit-gptq

Base model

microsoft/phi-4

Quantized

(111)

this model