OOM on 3090

#60
by TheBigBlockPC - opened

I tried running this LLM on my dual 3090 PC but it runs on of memory in a single or even both 3090s. Quantizing uding botsandbytes doesn't work. I use transformers. Can someone help fixing that

Have you tried MXFP4 precision? It is suggested in their cookbook. That way it should fit in 16GB

fixed it, here is what i found out: link

TheBigBlockPC changed discussion status to closed

Sign up or log in to comment