so its a bit confusing .. is the model on hugging faces an moe version or a dense version?
in the description it says Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single H100 GPU and the gpt-oss-20b model run within 16GB of memory. But the files in hugging faces are .safetensores .. so is it a dense version on hugging faces.
There are no dense version anywhere mate, moe or dense can all be stored in safetensor format. You are overcomplicating it
hmm ok I guess someone must have manually merged the moes and figured out the dense equivalent to get this guff then https://huggingface.co/ggml-org/gpt-oss-20b-GGUF
But moe models don't have to get merged to be provided as 1 unique file. number of files is just arbitrary by splitting the big file to make it easier to download in case one fails, to make it more robust against file corruption or sometimes to comply with filesystems limitations. Here the safetensors big file has just been split in 3, but it could have been just 1 big file instead. Number of files are independent of number of experts or even any model architecture. GGUFs often are packaged as 1 file because they are smaller files, that's all (here it's not the case, but Unsloth did the choice to serve them as single files because the model isn't that big anyway).
It's already complicated enough overall, so the best is to keep it simple when we can :D
Had thought llama.cpp didnt support moe but looks likt thats changed so just wanted a gguf version to run locally as i wanted to run on llama.cpp and was wondering how to convert to gguf. The one in ggml-org works so haven't looked in further as its working for me :)