Saving to q5_k_m GGUF

#1
by sasha1234567 - opened

This model fine tunable but you can't export it to GGUF, is this going to be fixed?
As I understand you can only save those models as F16 and cannot quantize it as well?
means you need 24GB+ VRAM if you want to finetune it?

sasha1234567 changed discussion title from Saving to GGUF to Saving to q5_k_m GGUF
Unsloth AI org

This model fine tunable but you can't export it to GGUF, is this going to be fixed?
As I understand you can only save those models as F16 and cannot quantize it as well?
means you need 24GB+ VRAM if you want to finetune it?

You cannot export it to GGUF because llama.cpp does not support it. :(

Hopefully they will soon

Hopefully they will soon

Thanks! Do you know by chance if there are viable vision model alternatives to fine tune on A100 to GGUF and then run it on RTX3060Ti (8GB RAM)?

Are you talking about NF4 support in llama.cpp?
I found mentions of NF4 double quant. Does anyone have a detailed description of the algorithm and bitstream format?

Anyone found a way to GGUF it? :)

Sign up or log in to comment