running mxfp4 on H100 using tranformers with triton_kernel: make_default_matmul_mxfp4_w_layout not found
Has anyone gotten mxfp4 to run on H100 using transformers and triton kernel?
System Info
transformers
version: 4.55.0- Platform: Linux-5.15.0-144-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.34.3
- Safetensors version: 0.5.3
- Accelerate version: 1.9.0
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.7.1+cu126 (CUDA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?:
- Using GPU in script?:
- GPU type: NVIDIA H100 80GB HBM3
Reproduction
I tried to run the openai gpt-oss-120B model in mxfp4 on H100, following this setup command instruction as given by this linkpip install -U transformers accelerate torch triton kernelspip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels
I ran the script provided here)
(And I had to manually upgrade triton to 3.4.0)
The error message states:raceback (most recent call last): File "/workspace/projects/gpt_oss/generate.py", line 6, in <module> model = AutoModelForCausalLM.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 600, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/modeling_utils.py", line 316, in _wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5061, in from_pretrained ) = cls._load_pretrained_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5524, in _load_pretrained_model _error_msgs, disk_offload_index, cpu_offload_index = load_shard_file(args) ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/modeling_utils.py", line 974, in load_shard_file disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/projects/trainnew/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/modeling_utils.py", line 882, in _load_state_dict_into_meta_model hf_quantizer.create_quantized_param( File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/quantizers/quantizer_mxfp4.py", line 223, in create_quantized_param load_and_swizzle_mxfp4( File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/integrations/mxfp4.py", line 375, in load_and_swizzle_mxfp4 triton_weight_tensor, weight_scale = swizzle_mxfp4( ^^^^^^^^^^^^^^ File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/integrations/mxfp4.py", line 64, in swizzle_mxfp4 value_layout, value_layout_opts = layout.make_default_matmul_mxfp4_w_layout(mx_axis=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: module 'triton_kernels.tensor_details.layout' has no attribute 'make_default_matmul_mxfp4_w_layout'
Can you run this on a separate line by itself?
pip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels
Can you run this on a separate line by itself?
pip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels
Hi Yes, i did run this on a separate line by itself. This seems to be a typo on the original post but I copied it over verbatim for consistency
I think I got it! You need torch 2.8:
pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/test/cu128
And I'm reasonably sure you need Python 3.12.
I actually installed the torch nightly: torch==2.9.0.dev20250804+cu128
I checked your other packages and the versions match with mine. I have a H100 96GB and it works with vLLM. Below is my vLLM install command:
uv pip install --pre vllm==0.10.1+gptoss --extra-index-url https://wheels.vllm.ai/gpt-oss/ --extra-index-url https://download.pytorch.org/whl/nightly/cu128 --index-strategy unsafe-best-match
With transformers main, it should even work on a T4 ! Please try to following google colab: https://colab.research.google.com/drive/15DJv6QWgc49MuC7dlNS9ifveXBDjCWO5?usp=sharing
I got the error "No module named 'triton.tools.ragged_tma' and for some reason, I can't build triton from source. Has anyone solved this issue? Thanks a lot