int4 quantization destorys function_call accuracy

#99

by opter - opened Jun 17

Jun 17

Quantization is a compression algorithm that preserves semantics but cannot guarantee the output tokens will exactly align with the original tokens. Consequently, quantization alters the tokens of function calls—for example, changing "name": "Alice" to name: Alice. This causes garbled outputs when incorporating function calls into conversations. How can quantization be adapted to properly support function call functionality?

opter changed discussion title from int4 quantization destory function_call accuracy to int4 quantization destorys function_call accuracy Jun 17

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment