Update README.md
Browse files
README.md
CHANGED
@@ -12,14 +12,14 @@ base_model: Qwen/Qwen2.5-VL-72B-Instruct
|
|
12 |
library_name: transformers
|
13 |
---
|
14 |
|
15 |
-
# Qwen2.5-VL-72B-Instruct-quantized-
|
16 |
|
17 |
## Model Overview
|
18 |
- **Model Architecture:** Qwen/Qwen2.5-VL-72B-Instruct
|
19 |
- **Input:** Vision-Text
|
20 |
- **Output:** Text
|
21 |
- **Model Optimizations:**
|
22 |
-
- **Weight quantization:**
|
23 |
- **Activation quantization:** FP16
|
24 |
- **Release Date:** 2/24/2025
|
25 |
- **Version:** 1.0
|
@@ -29,7 +29,7 @@ Quantized version of [Qwen/Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/
|
|
29 |
|
30 |
### Model Optimizations
|
31 |
|
32 |
-
This model was obtained by quantizing the weights of [Qwen/Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) to
|
33 |
|
34 |
## Deployment
|
35 |
|
@@ -203,10 +203,10 @@ The model was evaluated using [mistral-evals](https://github.com/neuralmagic/mis
|
|
203 |
- chartqa
|
204 |
|
205 |
```
|
206 |
-
vllm serve
|
207 |
|
208 |
python -m eval.run eval_vllm \
|
209 |
-
--model_name
|
210 |
--url http://0.0.0.0:8000 \
|
211 |
--output_dir ~/tmp \
|
212 |
--eval_name <vision_task_name>
|
|
|
12 |
library_name: transformers
|
13 |
---
|
14 |
|
15 |
+
# Qwen2.5-VL-72B-Instruct-quantized-w4a16
|
16 |
|
17 |
## Model Overview
|
18 |
- **Model Architecture:** Qwen/Qwen2.5-VL-72B-Instruct
|
19 |
- **Input:** Vision-Text
|
20 |
- **Output:** Text
|
21 |
- **Model Optimizations:**
|
22 |
+
- **Weight quantization:** INT4
|
23 |
- **Activation quantization:** FP16
|
24 |
- **Release Date:** 2/24/2025
|
25 |
- **Version:** 1.0
|
|
|
29 |
|
30 |
### Model Optimizations
|
31 |
|
32 |
+
This model was obtained by quantizing the weights of [Qwen/Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) to INT4 data type, ready for inference with vLLM >= 0.5.2.
|
33 |
|
34 |
## Deployment
|
35 |
|
|
|
203 |
- chartqa
|
204 |
|
205 |
```
|
206 |
+
vllm serve RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w4a16 --tensor_parallel_size 1 --max_model_len 25000 --trust_remote_code --max_num_seqs 8 --gpu_memory_utilization 0.9 --dtype float16 --limit_mm_per_prompt image=7
|
207 |
|
208 |
python -m eval.run eval_vllm \
|
209 |
+
--model_name RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w4a16 \
|
210 |
--url http://0.0.0.0:8000 \
|
211 |
--output_dir ~/tmp \
|
212 |
--eval_name <vision_task_name>
|