memory allocation exceeded error

#19
by aditya-shinde - opened

why do i get memory allocation error on 4090 (24gb vram) when i clearly says 16g is enough.

same here.

image.png

the model card says 12B, BF16 - then it requires at least 24g ram. or I forgot to enable the native MXFP4 somewhere?

I installed this version of triton which fixed the error on rtx5090
pip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels

Sign up or log in to comment