Update README.md
Browse files
README.md
CHANGED
@@ -31,8 +31,27 @@ model.save_pretrained(output_dir, save_safetensors=True, save_compressed=False)
|
|
31 |
tokenizer.save_pretrained(output_dir)
|
32 |
```
|
33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
# gpt-oss-20b
|
|
|
36 |
<p align="center">
|
37 |
<img alt="gpt-oss-20b" src="https://raw.githubusercontent.com/openai/gpt-oss/main/docs/gpt-oss-20b.svg">
|
38 |
</p>
|
|
|
31 |
tokenizer.save_pretrained(output_dir)
|
32 |
```
|
33 |
|
34 |
+
## Inference
|
35 |
+
|
36 |
+
### Prerequisite
|
37 |
+
Install the latest vllm version:
|
38 |
+
```
|
39 |
+
pip install -U vllm \
|
40 |
+
--pre \
|
41 |
+
--extra-index-url https://wheels.vllm.ai/nightly
|
42 |
+
```
|
43 |
+
|
44 |
+
### vllm
|
45 |
+
|
46 |
+
For Ampere devices, please use TRITON_ATTN_VLLM_V1 attention backend i.e.,
|
47 |
+
```
|
48 |
+
VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1 vllm serve cpatonn/gpt-oss-20b-BF16 --async-scheduling
|
49 |
+
```
|
50 |
+
|
51 |
+
For further information, please visit this [guide](https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html).
|
52 |
|
53 |
# gpt-oss-20b
|
54 |
+
|
55 |
<p align="center">
|
56 |
<img alt="gpt-oss-20b" src="https://raw.githubusercontent.com/openai/gpt-oss/main/docs/gpt-oss-20b.svg">
|
57 |
</p>
|