RedHatAI
/

Meta-Llama-3-8B-Instruct-FP8-KV

Text Generation

text-generation-inference

Model card Files Files and versions

mgoin commited on Jun 11, 2024

Commit

a87b234

·

verified ·

1 Parent(s): fa2ec82

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ tags:
 Meta-Llama-3-8B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.
-This model checkpoint also includes per-tensor scales for FP8 quantized KV Cache, accessed through the `--kv-cache-dtype fp8` argument in vLLM.
 ```python
 from vllm import LLM

 Meta-Llama-3-8B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.
+This model checkpoint also includes experimental per-tensor scales for FP8 quantized KV Cache, accessed through the `--kv-cache-dtype fp8` argument in vLLM.
 ```python
 from vllm import LLM