cpatonn commited on
Commit
0b9aa1f
·
verified ·
1 Parent(s): fe39fe4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md CHANGED
@@ -31,8 +31,27 @@ model.save_pretrained(output_dir, save_safetensors=True, save_compressed=False)
31
  tokenizer.save_pretrained(output_dir)
32
  ```
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  # gpt-oss-20b
 
36
  <p align="center">
37
  <img alt="gpt-oss-20b" src="https://raw.githubusercontent.com/openai/gpt-oss/main/docs/gpt-oss-20b.svg">
38
  </p>
 
31
  tokenizer.save_pretrained(output_dir)
32
  ```
33
 
34
+ ## Inference
35
+
36
+ ### Prerequisite
37
+ Install the latest vllm version:
38
+ ```
39
+ pip install -U vllm \
40
+ --pre \
41
+ --extra-index-url https://wheels.vllm.ai/nightly
42
+ ```
43
+
44
+ ### vllm
45
+
46
+ For Ampere devices, please use TRITON_ATTN_VLLM_V1 attention backend i.e.,
47
+ ```
48
+ VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1 vllm serve cpatonn/gpt-oss-20b-BF16 --async-scheduling
49
+ ```
50
+
51
+ For further information, please visit this [guide](https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html).
52
 
53
  # gpt-oss-20b
54
+
55
  <p align="center">
56
  <img alt="gpt-oss-20b" src="https://raw.githubusercontent.com/openai/gpt-oss/main/docs/gpt-oss-20b.svg">
57
  </p>