nm-research commited on
Commit
0354d13
·
verified ·
1 Parent(s): 285c7b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -12,14 +12,14 @@ base_model: Qwen/Qwen2.5-VL-72B-Instruct
12
  library_name: transformers
13
  ---
14
 
15
- # Qwen2.5-VL-72B-Instruct-quantized-w8a8
16
 
17
  ## Model Overview
18
  - **Model Architecture:** Qwen/Qwen2.5-VL-72B-Instruct
19
  - **Input:** Vision-Text
20
  - **Output:** Text
21
  - **Model Optimizations:**
22
- - **Weight quantization:** INT8
23
  - **Activation quantization:** FP16
24
  - **Release Date:** 2/24/2025
25
  - **Version:** 1.0
@@ -29,7 +29,7 @@ Quantized version of [Qwen/Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/
29
 
30
  ### Model Optimizations
31
 
32
- This model was obtained by quantizing the weights of [Qwen/Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) to INT8 data type, ready for inference with vLLM >= 0.5.2.
33
 
34
  ## Deployment
35
 
@@ -203,10 +203,10 @@ The model was evaluated using [mistral-evals](https://github.com/neuralmagic/mis
203
  - chartqa
204
 
205
  ```
206
- vllm serve neuralmagic/pixtral-12b-quantized.w8a8 --tensor_parallel_size 1 --max_model_len 25000 --trust_remote_code --max_num_seqs 8 --gpu_memory_utilization 0.9 --dtype float16 --limit_mm_per_prompt image=7
207
 
208
  python -m eval.run eval_vllm \
209
- --model_name neuralmagic/pixtral-12b-quantized.w8a8 \
210
  --url http://0.0.0.0:8000 \
211
  --output_dir ~/tmp \
212
  --eval_name <vision_task_name>
 
12
  library_name: transformers
13
  ---
14
 
15
+ # Qwen2.5-VL-72B-Instruct-quantized-w4a16
16
 
17
  ## Model Overview
18
  - **Model Architecture:** Qwen/Qwen2.5-VL-72B-Instruct
19
  - **Input:** Vision-Text
20
  - **Output:** Text
21
  - **Model Optimizations:**
22
+ - **Weight quantization:** INT4
23
  - **Activation quantization:** FP16
24
  - **Release Date:** 2/24/2025
25
  - **Version:** 1.0
 
29
 
30
  ### Model Optimizations
31
 
32
+ This model was obtained by quantizing the weights of [Qwen/Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) to INT4 data type, ready for inference with vLLM >= 0.5.2.
33
 
34
  ## Deployment
35
 
 
203
  - chartqa
204
 
205
  ```
206
+ vllm serve RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w4a16 --tensor_parallel_size 1 --max_model_len 25000 --trust_remote_code --max_num_seqs 8 --gpu_memory_utilization 0.9 --dtype float16 --limit_mm_per_prompt image=7
207
 
208
  python -m eval.run eval_vllm \
209
+ --model_name RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w4a16 \
210
  --url http://0.0.0.0:8000 \
211
  --output_dir ~/tmp \
212
  --eval_name <vision_task_name>