model quality issues
are there some ways to make the model perform better because the model seems to have some issues with quality where it breaks while coding or performing bad at creative tasks. are there some settings to make the model perform better. i won't be able to run the 120b version because i have dual 3090 which is only 2x 24GB (effectively 48GB) which won't be enough for it to run. are there some settings to belp the model work properly
A hear a lot of people saying configuring the harmony chat template perfectly is crucial to maintain quality. However, I suspect the performance issues you're experiencing will persist no matter how it's configured. Mainly because I used their official demo and the performance was very unreliable. For example...
It often never stops thinking or freezes.
It's broadly and profoundly ignorant outside of coding, math and STEM in general, and is completely missing entire pockets of popular information, which makes the model completely fall of the rails across numerous use cases.
It's aggressively aligned for safety, which not only results in a flood of unreasonable denials, but commonly pushes responses away from the what users are asking for.
Like Microsoft's Phi series it's trained primarily using opt-in synthetic data. That is, instead of training on a snapshot of humanity (e.g. Webrips and Wikipedia), they instead selectively added what they deemed appropriate information to the corpus, with most of it being synthetic (taken from other AI models). All models that do this end up with real-world performance which is notably worse than what the benchmark scores suggest.
My honest opinion is this model is crap, and it was designed to be crap. OpenAi wanted to make a symbolic contribution to the OS community that is frustratingly lacking relative to their proprietary offerings.
A hear a lot of people saying configuring the harmony chat template perfectly is crucial to maintain quality. However, I suspect the performance issues you're experiencing will persist no matter how it's configured. Mainly because I used their official demo and the performance was very unreliable. For example...
It often never stops thinking or freezes.
It's broadly and profoundly ignorant outside of coding, math and STEM in general, and is completely missing entire pockets of popular information, which makes the model completely fall of the rails across numerous use cases.
It's aggressively aligned for safety, which not only results in a flood of unreasonable denials, but commonly pushes responses away from the what users are asking for.
Like the Microsoft's Phi series it's trained primarily using opt-in synthetic data. That is, instead of training on a snapshot of humanity (e.g. Webrips and Wikipedia), they instead selectively added what they deemed appropriate information to the corpus, with most of it being synthetic (taken from other AI models). All models that do this end up with real-world performance which is notably worse than what the benchmark scores suggest.
My honest opinion is this model is crap, and it was designed to be crap. OpenAi wanted to make a symbolic contribution to the OS community that is frustratingly lacking relative to their proprietary offerings.
is the model somewhat fixable on dual 3090 and is it even worth or is using qwen3 a better solution and running it in 4bit.
for chat tenplates i use this code:
def tokenize(chat):
system_message = (
SystemContent.new()
.with_model_identity(
"You are ChatGPT, a large language model trained by OpenAI."
)
.with_reasoning_effort(ReasoningEffort.MEDIUM)
.with_conversation_start_date("2025-06-28")
.with_knowledge_cutoff("2024-06")
.with_required_channels(["analysis", "commentary", "final"])
)
role_mapping = {
"user": Role.USER,
"assistant": Role.ASSISTANT,
"system": Role.DEVELOPER,
}
set_channel = {
"user":None,
"assistant":"final",
"system":None,
}
msgs = [
Message.from_role_and_content(Role.SYSTEM, system_message)
]
for data in chat:
msg = Message.from_role_and_content(role_mapping[data["role"]], data["content"])
if set_channel[data["role"]] is not None:
msg = msg.with_channel(set_channel[data["role"]])
msgs.append(msg)
convo = Conversation.from_messages(msgs)
tokens = encoding.render_conversation_for_completion(convo,Role.ASSISTANT)
return tokens
I don't know if that chat template is valid or not, but out of curiosity I asked gpt-oss 120b to check for errors and pasted its response below.
The main thing it pointed out is the role_mapping for system is Role.Developer, but then was later referenced as Role. System.
"{ "analysis": "The template contains several issues that will likely cause runtime errors or incorrect behavior when used with Harmony AI:\n\n1. Incorrect role mapping for system messages\n - The role_mapping dictionary maps the string "system" to Role.DEVELOPER, yet the template adds a system message with Role.SYSTEM (Message.from_role_and_content(Role.SYSTEM, system_message)). This inconsistency will raise a KeyError or produce a message with the wrong role if the input chat contains a "system" entry. The mapping should use Role.SYSTEM (or, if "system" is meant to be a developer instruction, the system message should be created with Role.DEVELOPER).\n\n2. Channel specification uses plain strings\n - The set_channel dictionary assigns the string "final" for assistant messages. In Harmony AI, channels are typically represented by the Channel enum (e.g., Channel.FINAL). Passing a raw string may result in a type‑error when Message.with_channel validates the argument.\n\n3. SystemContent usage may be wrong\n - SystemContent is a special builder that produces a structured system payload. Message.from_role_and_content generally expects a plain string for the content field. Supplying a SystemContent object directly may cause a serialization error. The correct pattern is often system_message.render() or system_message.to_string() to obtain the textual representation before embedding it in a Message.\n\n4. Date and channel arguments are strings, not typed objects\n - .with_conversation_start_date("2025-06-28") likely expects a datetime.date or a formatted string that the library parses internally. If it expects a date object, passing a raw string will raise a validation error.\n - .with_required_channels(["analysis", "commentary", "final"]) should probably receive a list of Channel enum values ([Channel.ANALYSIS, Channel.COMMENTARY, Channel.FINAL]). Using plain strings could again break type checking.\n\n5. Missing imports / undefined variables\n - The snippet references SystemContent, ReasoningEffort, Role, Message, Conversation, and encoding without showing imports. In practice you need to import these from the Harmony SDK (from harmony import SystemContent, ReasoningEffort, Role, Message, Conversation, encoding). Forgetting to import will lead to NameErrors.\n\n6. Future conversation start date\n - The start date is set to "2025-06-28", which is in the future relative to most current runtimes. While not a code error, it may cause the model to reject the payload if it validates that the start date must not be later than the current date.\n\n7. Potential key‑error on unknown roles\n - The loop accesses role_mapping[data["role"]] and set_channel[data["role"]] without validation. If the incoming chat list contains an unexpected role (e.g., "tool"), a KeyError will be raised.\n\nOverall, the most critical errors are the mismatched role mapping for "system" and the misuse of raw strings for channels and system content.\n\n---\n\nSuggested fixes (illustrative):\npython\nfrom harmony import (\n SystemContent, ReasoningEffort, Role, Message,\n Conversation, encoding, Channel\n)\n\ndef tokenize(chat):\n system_message = (\n SystemContent.new()\n .with_model_identity(\n "You are ChatGPT, a large language model trained by OpenAI."\n )\n .with_reasoning_effort(ReasoningEffort.MEDIUM)\n .with_conversation_start_date(date(2025, 6, 28))\n .with_knowledge_cutoff("2024-06")\n .with_required_channels([Channel.ANALYSIS, Channel.COMMENTARY, Channel.FINAL])\n )\n\n role_mapping = {\n "user": Role.USER,\n "assistant": Role.ASSISTANT,\n "system": Role.SYSTEM, # corrected\n }\n set_channel = {\n "user": None,\n "assistant": Channel.FINAL, # use enum\n "system": None,\n }\n\n msgs = [Message.from_role_and_content(Role.SYSTEM, system_message.render())]\n for data in chat:\n role = role_mapping.get(data["role"]) # safe lookup\n if role is None:\n continue # or raise a descriptive error\n msg = Message.from_role_and_content(role, data["content"])\n if set_channel[data["role"]] is not None:\n msg = msg.with_channel(set_channel[data["role"]])\n msgs.append(msg)\n\n convo = Conversation.from_messages(msgs)\n return encoding.render_conversation_for_completion(convo, Role.ASSISTANT)\n\n\n---\n\nConclusion: Yes, the template has multiple obvious errors—most notably the wrong role mapping for system messages, the use of plain strings where enums or rendered strings are required, and missing imports. Adjusting those points will make the tokenization function work as intended.", "commentary": "The code mixes up system roles, passes raw strings where enums are expected, and feeds a SystemContent object directly into a message without rendering it. Fix the role map, use Channel enums, render the system payload to a string, and ensure all required imports are present. After those changes the tokenizer should run without errors.", "final": "Yes – the template has obvious mistakes (wrong system role mapping, raw strings for channels, unrendered SystemContent, missing imports, etc.). Correct those issues for the function to work." }"
For issues related to model execution, you can refer to OpenAI's Harmony. If using llama.cpp, check out Unsloth's quantized versions and documentation.
For issues related to model execution, you can refer to OpenAI's Harmony. If using llama.cpp, check out Unsloth's quantized versions and documentation.
fixed the issue, it was my chat formatting becuase the developer message was as a text instead of as the expected developer message format, now creative parts work