Can this version with quantization w4a16 run on V100？

by underkongkong - opened Jul 15

Jul 15

I have tried the llm-compressor with compressed-tensors library and choiced w4a16 . However the quantized model can run as the V100 Compute Capability is 70.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment