JamePeng2023/Llama-3.2-3B-Instruct-abliterated-SpinQuant-w4a8

Using the SpinQuant quantization method from https://github.com/facebookresearch/SpinQuant, I quantized the Llama-3.2-3B-Instruct-abliterated model from https://huggingface.co/huihui-ai/Llama-3.2-3B-Instruct-abliterated.

This quantization is for on-device deployment to Android apps with Executorch.

To make it easier for everyone to quickly test and deploy the Executorch on-device demo, I've also converted the quantized PTH file to PTE format and uploaded it.

2025-02-16 19:40:24,099 - spinquant - INFO - wiki2 ppl is: 11.502239227294922

JamePeng2023
/

Llama-3.2-3B-Instruct-abliterated-SpinQuant-w4a8

Model tree for JamePeng2023/Llama-3.2-3B-Instruct-abliterated-SpinQuant-w4a8