Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ tags:
|
|
12 |
- FP8
|
13 |
---
|
14 |
|
15 |
-
# Qwen3-32B-FP8-
|
16 |
|
17 |
## Model Overview
|
18 |
- **Model Architecture:** Qwen3ForCausalLM
|
@@ -30,7 +30,7 @@ tags:
|
|
30 |
- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws).
|
31 |
- **Release Date:** 05/02/2025
|
32 |
- **Version:** 1.0
|
33 |
-
- **Model Developers:**
|
34 |
|
35 |
### Model Optimizations
|
36 |
|
@@ -51,7 +51,7 @@ This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/
|
|
51 |
from vllm import LLM, SamplingParams
|
52 |
from transformers import AutoTokenizer
|
53 |
|
54 |
-
model_id = "
|
55 |
number_gpus = 1
|
56 |
sampling_params = SamplingParams(temperature=0.6, top_p=0.95, top_k=20, min_p=0, max_tokens=256)
|
57 |
|
@@ -128,7 +128,7 @@ The model was evaluated on the OpenLLM leaderboard tasks (version 1), using [lm-
|
|
128 |
```
|
129 |
lm_eval \
|
130 |
--model vllm \
|
131 |
-
--model_args pretrained="
|
132 |
--tasks openllm \
|
133 |
--apply_chat_template\
|
134 |
--fewshot_as_multiturn \
|
|
|
12 |
- FP8
|
13 |
---
|
14 |
|
15 |
+
# Qwen3-32B-FP8-Dynamic
|
16 |
|
17 |
## Model Overview
|
18 |
- **Model Architecture:** Qwen3ForCausalLM
|
|
|
30 |
- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws).
|
31 |
- **Release Date:** 05/02/2025
|
32 |
- **Version:** 1.0
|
33 |
+
- **Model Developers:** BC Card, Redhat
|
34 |
|
35 |
### Model Optimizations
|
36 |
|
|
|
51 |
from vllm import LLM, SamplingParams
|
52 |
from transformers import AutoTokenizer
|
53 |
|
54 |
+
model_id = "BCCard/Qwen3-32B-FP8-dynamic"
|
55 |
number_gpus = 1
|
56 |
sampling_params = SamplingParams(temperature=0.6, top_p=0.95, top_k=20, min_p=0, max_tokens=256)
|
57 |
|
|
|
128 |
```
|
129 |
lm_eval \
|
130 |
--model vllm \
|
131 |
+
--model_args pretrained="BCCard/Qwen3-32B-FP8-dynamic",dtype=auto,gpu_memory_utilization=0.5,max_model_len=8192,enable_chunk_prefill=True,tensor_parallel_size=1 \
|
132 |
--tasks openllm \
|
133 |
--apply_chat_template\
|
134 |
--fewshot_as_multiturn \
|