cpatonn
/

gpt-oss-20b-BF16

Text Generation

Model card Files Files and versions

cpatonn commited on 11 days ago

Commit

0b9aa1f

·

verified ·

1 Parent(s): fe39fe4

Update README.md

Files changed (1) hide show

README.md +19 -0

README.md CHANGED Viewed

@@ -31,8 +31,27 @@ model.save_pretrained(output_dir, save_safetensors=True, save_compressed=False)
 tokenizer.save_pretrained(output_dir)
 ```
 # gpt-oss-20b
 <p align="center">
   <img alt="gpt-oss-20b" src="https://raw.githubusercontent.com/openai/gpt-oss/main/docs/gpt-oss-20b.svg">
 </p>

 tokenizer.save_pretrained(output_dir)
 ```
+## Inference
+### Prerequisite
+Install the latest vllm version:
+```
+pip install -U vllm \
+    --pre \
+    --extra-index-url https://wheels.vllm.ai/nightly
+```
+### vllm
+For Ampere devices, please use TRITON_ATTN_VLLM_V1 attention backend i.e.,
+```
+VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1 vllm serve cpatonn/gpt-oss-20b-BF16 --async-scheduling
+```
+For further information, please visit this [guide](https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html).
 # gpt-oss-20b
 <p align="center">
   <img alt="gpt-oss-20b" src="https://raw.githubusercontent.com/openai/gpt-oss/main/docs/gpt-oss-20b.svg">
 </p>