int4 quantization destorys function_call accuracy

#99
by opter - opened

Quantization​​ is a compression algorithm that preserves semantics but cannot guarantee the output tokens will exactly align with the original tokens. Consequently, quantization alters the tokens of function calls—for example, changing "name": "Alice" to name: Alice. This causes garbled outputs when incorporating function calls into conversations. ​​How can quantization be adapted to properly support function call functionality?​

opter changed discussion title from int4 quantization destory function_call accuracy to int4 quantization destorys function_call accuracy

Sign up or log in to comment