running mxfp4 on H100 using tranformers with triton_kernel: make_default_matmul_mxfp4_w_layout not found

#64

by uillliu - opened 14 days ago

Discussion

uillliu

14 days ago

Has anyone gotten mxfp4 to run on H100 using transformers and triton kernel?

System Info

transformers version: 4.55.0
Platform: Linux-5.15.0-144-generic-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.34.3
Safetensors version: 0.5.3
Accelerate version: 1.9.0
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (accelerator?): 2.7.1+cu126 (CUDA)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA H100 80GB HBM3

Reproduction

I tried to run the openai gpt-oss-120B model in mxfp4 on H100, following this setup command instruction as given by this link
pip install -U transformers accelerate torch triton kernelspip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels

I ran the script provided here)

(And I had to manually upgrade triton to 3.4.0)

The error message states:
raceback (most recent call last): File "/workspace/projects/gpt_oss/generate.py", line 6, in <module> model = AutoModelForCausalLM.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 600, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/modeling_utils.py", line 316, in _wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5061, in from_pretrained ) = cls._load_pretrained_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5524, in _load_pretrained_model _error_msgs, disk_offload_index, cpu_offload_index = load_shard_file(args) ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/modeling_utils.py", line 974, in load_shard_file disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/projects/trainnew/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/modeling_utils.py", line 882, in _load_state_dict_into_meta_model hf_quantizer.create_quantized_param( File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/quantizers/quantizer_mxfp4.py", line 223, in create_quantized_param load_and_swizzle_mxfp4( File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/integrations/mxfp4.py", line 375, in load_and_swizzle_mxfp4 triton_weight_tensor, weight_scale = swizzle_mxfp4( ^^^^^^^^^^^^^^ File "/workspace/projects/trainnew/lib/python3.11/site-packages/transformers/integrations/mxfp4.py", line 64, in swizzle_mxfp4 value_layout, value_layout_opts = layout.make_default_matmul_mxfp4_w_layout(mx_axis=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: module 'triton_kernels.tensor_details.layout' has no attribute 'make_default_matmul_mxfp4_w_layout'

emaadmanzoor

14 days ago

Can you run this on a separate line by itself?

pip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels

uillliu

14 days ago

Can you run this on a separate line by itself?

pip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels

Hi Yes, i did run this on a separate line by itself. This seems to be a typo on the original post but I copied it over verbatim for consistency

emaadmanzoor

14 days ago

I think I got it! You need torch 2.8:

pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/test/cu128

And I'm reasonably sure you need Python 3.12.

I actually installed the torch nightly: torch==2.9.0.dev20250804+cu128

I checked your other packages and the versions match with mine. I have a H100 96GB and it works with vLLM. Below is my vLLM install command:

uv pip install --pre vllm==0.10.1+gptoss     --extra-index-url https://wheels.vllm.ai/gpt-oss/     --extra-index-url https://download.pytorch.org/whl/nightly/cu128     --index-strategy unsafe-best-match

marcsun13

14 days ago

With transformers main, it should even work on a T4 ! Please try to following google colab: https://colab.research.google.com/drive/15DJv6QWgc49MuC7dlNS9ifveXBDjCWO5?usp=sharing

stargazerx0

11 days ago

I got the error "No module named 'triton.tools.ragged_tma' and for some reason, I can't build triton from source. Has anyone solved this issue? Thanks a lot

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment