int4 quantization destorys function_call accuracy
#99
by
opter
- opened
Quantization is a compression algorithm that preserves semantics but cannot guarantee the output tokens will exactly align with the original tokens. Consequently, quantization alters the tokens of function calls—for example, changing "name": "Alice" to name: Alice. This causes garbled outputs when incorporating function calls into conversations. How can quantization be adapted to properly support function call functionality?
opter
changed discussion title from
int4 quantization destory function_call accuracy
to int4 quantization destorys function_call accuracy