Quantization: ExLlamaV2 (ExL2) at 8.0 bits per weight.

Overview

This is an ExLlamaV2 (ExL2) 8.0 bpw quantized version of microsoft/phi-4.

Quantization By

I often have idle A100 GPUs while building/testing the app, so I put them to use quantizing models.

I hope the community finds these quantizations useful.

Andrew Webby @ RolePlai

Downloads last month
12
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.