Deploy gpt-oss models in your own AWS account using vLLM and Tensorfuse

#37
by agam30 - opened

Hi all,

we have released a guide to deploy openai's latest oss models in your own AWS account. What's included:

Optimized dockerfile with the latest vllm-openai:gptoss for both 20b and 120b models
we achieved 240 tps on 1xH100 with 20b model and 200tps on 8xH100 with 120b model
Served with full context length of 130k
Follow the guide to run it in your AWS account: https://tensorfuse.io/docs/guides/modality/text/openai_oss

get started with tensorfuse here: https://app.tensorfuse.io/

Would be awesome to release metrics on Consumer hardware.

Sign up or log in to comment