openai/gpt-oss-120b · Enterprise AI factory OS

First of all, thank you OpenAI for releasing gpt-oss-120B — this is a major enabler for building IP-sovereign, enterprise-grade AI factories. It aligns perfectly with our mission to deploy affordable, high-efficiency, private infrastructure.

Quick question:
What are the recommended hardware and software best practices to run gpt-oss-120B efficiently in a containerized H100 setup, ideally with support for MXFP4, multi-GPU, and low-latency inference?

We’d also appreciate any pointers on:
• Preferred inference stack (vLLM, DeepSpeed-MoE, etc.)
• Token latency vs. throughput tuning
• Any considerations for SaaS-style deployments

Thanks again — this is a huge step forward for the open AI ecosystem.

Best,
David
— HPC Data