π Working quants for Qwen2.5 VL 7B.
We'll be uploading benchmark results along with the quants here.
The models have been tested on latest llama.cpp built with CLIP hardware acceleration manually enabled!
Consult the following post for more details: https://github.com/ggml-org/llama.cpp/issues/11483#issuecomment-2676422772
For now you can only do single prompts via the cli:
llama-qwen2vl-cli -m ~/gguf/Qwen2.5-VL-7B-Instruct-Q4_0.gguf --mmproj ~/gguf/mmproj-Qwen2.5-VL-7B-Instruct-f32.gguf --n_gpu_layers 9999 -p "Describe the image." --image ~/Pictures/test_small.png
We're working on a wrapper API solution until multimodal support is added back to llama.cpp
API will be published here: https://github.com/Independent-AI-Labs/local-super-agents
Let us know if you need a specific quant!
πͺ Benchmarking Update:
The latest main looks stable with Vulkan CLIP and any model thrown at it so far. Some preliminary insights:
- 1200x1200 is the maximum you can encode with 16GB of VRAM. clip.cpp does not seem to support multi-GPU Vulkan yet.
- A 4060Ti-class GPU delivers 20-30 t/s with the Q8_0 and double that on Q4 @ 16-32K context.
- Batching (multiple images) in a single cli call seems to be working fine:
llama-qwen2vl-cli--ctx-size 16000 -n 16000 -m ~/gguf/Qwen2.5-VL-7B-Instruct-Q4_0.gguf --mmproj ~/gguf/mmproj-Qwen2.5-VL-7B-Instruct-f32.gguf --n_gpu_layers 9999 -p "Describe the image in detail. Extract all textual information from it. Output as detailed JSON." -p "Analyze the image." --image ~/Pictures/test_small.png --image ~/Pictures/test_small.png
Output quality looks very promising! We'll release all of the benchmark code when ready, so the process can be streamlined for other models.
- Downloads last month
- 146
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for IAILabs/Qwen2.5-VL-7b-Instruct-GGUF
Base model
Qwen/Qwen2.5-VL-7B-Instruct