Two clarifications on gpt-oss-120B hardware (fine-tuning vs inference, MoE VRAM)
Hello Everyone,
I would appreciate official clarifications for development and production:
Fine-tuning vs inference: Does fine-tuning gpt-oss-120B require the same GPU/VRAM as inference? Specifically, is the forward-only “development” run identical to inference (single 80 GB GPU with MXFP4), and if not, what’s the minimum recommended GPU/VRAM for fine-tuning?
MoE memory at inference: Please confirm that at inference time all experts’ weights must still reside in VRAM (even though only top-k -default 4- experts are active), i.e., MoE reduces compute per token but does not reduce the loaded VRAM footprint of the 120B MXFP4 checkpoint. So the inference still requires ≥60GB VRAM ?
If there’s an official doc that states this, a link would be great. Looking forward for a response
@OpenAIDevs
.
Thanks!