Llama.cpp support

by wsbagnsv1 - opened 5 days ago

5 days ago

Would it be possible to add support for this model into llama.cpp? Since this is based on qwen3 and siglip it shouldnt be too hard, since those are already implemented in llama.cpp. The reason i ask, is because at7b this model would nicely fit into a 12gb vram card in even q8 quant but full precision is just too big for that /:

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment