int32 overflow detected for operation add in `_keyed_add` kernel under TRITON_DEBUG=1
#135
by
findhao
- opened
Hi experts,
When running with TRITON_DEBUG=1, I hit a device-side assertion inside routing_details/_routing_compute.py
:
unknown: block: [130,0,0], thread: [121,0,0] Assertion `int32 overflow detected for operation add` failed.
This happens in the following function:
@triton
.jit
def _keyed_add(x, y):
# we keep the key in the upper 16 bits of a uint32:
key_mask: tl.constexpr = 0xffff0000
kx = x & key_mask
ky = y & key_mask
==> z = tl.where(kx == ky, x + y - kx, y)
return z
I'm running the example in github readme file with TRITON_DEBUG=1
.
Can you tell me if this is an intentional overflow, or it is a real bug?