Saving to q5_k_m GGUF

by sasha1234567 - opened 11 days ago

11 days ago

•

This model fine tunable but you can't export it to GGUF, is this going to be fixed?
As I understand you can only save those models as F16 and cannot quantize it as well?
means you need 24GB+ VRAM if you want to finetune it?

sasha1234567 changed discussion title from Saving to GGUF to Saving to q5_k_m GGUF 11 days ago

shimmyshimmer

Unsloth AI org 10 days ago

This model fine tunable but you can't export it to GGUF, is this going to be fixed?
As I understand you can only save those models as F16 and cannot quantize it as well?
means you need 24GB+ VRAM if you want to finetune it?

You cannot export it to GGUF because llama.cpp does not support it. :(

Hopefully they will soon

sasha1234567

10 days ago

Hopefully they will soon

Thanks! Do you know by chance if there are viable vision model alternatives to fine tune on A100 to GGUF and then run it on RTX3060Ti (8GB RAM)?

TobDeBer

10 days ago

Are you talking about NF4 support in llama.cpp?
I found mentions of NF4 double quant. Does anyone have a detailed description of the algorithm and bitstream format?

sasha1234567

4 days ago

Anyone found a way to GGUF it? :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment