int32 overflow detected for operation add in `_keyed_add` kernel under TRITON_DEBUG=1

#135
by findhao - opened

Hi experts,

When running with TRITON_DEBUG=1, I hit a device-side assertion inside routing_details/_routing_compute.py:

unknown: block: [130,0,0], thread: [121,0,0] Assertion `int32 overflow detected for operation add` failed.

This happens in the following function:



@triton
	.jit
def _keyed_add(x, y):
    # we keep the key in the upper 16 bits of a uint32:
    key_mask: tl.constexpr = 0xffff0000

    kx = x & key_mask
    ky = y & key_mask
==>    z = tl.where(kx == ky, x + y - kx, y)
    return z

I'm running the example in github readme file with TRITON_DEBUG=1.

Can you tell me if this is an intentional overflow, or it is a real bug?

Sign up or log in to comment