mgoin commited on
Commit
a87b234
·
verified ·
1 Parent(s): fa2ec82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -6,7 +6,7 @@ tags:
6
 
7
 
8
  Meta-Llama-3-8B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.
9
- This model checkpoint also includes per-tensor scales for FP8 quantized KV Cache, accessed through the `--kv-cache-dtype fp8` argument in vLLM.
10
 
11
  ```python
12
  from vllm import LLM
 
6
 
7
 
8
  Meta-Llama-3-8B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.
9
+ This model checkpoint also includes experimental per-tensor scales for FP8 quantized KV Cache, accessed through the `--kv-cache-dtype fp8` argument in vLLM.
10
 
11
  ```python
12
  from vllm import LLM