RedHatAI
/

Qwen2.5-7B-Instruct-quantized.w4a16

Text Generation

compressed-tensors

Model card Files Files and versions

alexmarques commited on May 14

Commit

625e5e5

·

verified ·

1 Parent(s): 3540dcb

Update README.md

Files changed (1) hide show

README.md +1 -3

README.md CHANGED Viewed

@@ -134,9 +134,7 @@ The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-
 ```
 lm_eval \
   --model vllm \
-  --model_args pretrained="neuralmagic/Qwen2.5-7B-Instruct-quantized.w4a16",dtype=auto,gpu_memory_utilization=0.5,max_model_len=4096,enable_chunk_prefill=True,tensor_parallel_size=1 \
-  --apply_chat_template \
-  --fewshot_as_multiturn \
   --tasks openllm \
   --batch_size auto
 ```

 ```
 lm_eval \
   --model vllm \
+  --model_args pretrained="neuralmagic/Qwen2.5-7B-Instruct-quantized.w4a16",dtype=auto,gpu_memory_utilization=0.5,max_model_len=4096,add_bos_token=True,enable_chunk_prefill=True,tensor_parallel_size=1 \
   --tasks openllm \
   --batch_size auto
 ```