Can this version with quantization w4a16 run on V100?
#2
by
underkongkong
- opened
I have tried the llm-compressor with compressed-tensors library and choiced w4a16 . However the quantized model can run as the V100 Compute Capability is 70.