IQ1_S模型合并后部署于ollama上,推理生成效果差
#26
by
gaozj
- opened
我使用在ollama中部署了IQ1_S模型,最后使用OpenAI API的端点进行推理,推理会生成许多与问题无关的回答(生成效果差),这是否是量化模型的通病?我在部署其他量化模型也发现模型会生成过多无关的回答。
Are you sure youre using the correct chat template? Did you set temp to 0.6?
LLM newbie here,
I ran into similar issue which I solved it by setting up Template in the Ollama Modfile, which is used when running the ollama create <custom_model_name> -f <ModelFile name>
.
Here's the ModelFile I used (the template and parameters are copied from https://ollama.com/library/deepseek-r1
FROM .\llama-b4754-bin-win-cuda-cu12.4-x64\DeepSeek-R1-GGUF\DeepSeek-R1-Merged\DeepSeek-R1-UD-IQ1_S-merged.gguf
TEMPLATE """{{- if .System }}{{ .System }}{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1}}
{{- if eq .Role "user" }}<|User|>{{ .Content }}
{{- else if eq .Role "assistant" }}<|Assistant|>{{ .Content }}{{- if not $last }}<|end▁of▁sentence|>{{- end }}
{{- end }}
{{- if and $last (ne .Role "assistant") }}<|Assistant|>{{- end }}
{{- end }}
"""
PARAMETER stop "<|begin_of_sentence|>"
PARAMETER stop "<|end_of_sentence|>"
PARAMETER stop "<|User|>"
PARAMETER stop "<|Assistant"