openai/gpt-oss-20b · 20B Parameters vs ChatGPT 4

7 days ago

•

It is impossible and extremely inconvenient. that no matter what you ask, everything is prohibited. Even on the chatgpt website, he immediately gave an answer.

Lower the censorship threshold, otherwise this is a useless 10 GB file.

AaronFeng753

7 days ago

Try abliterated version

https://huggingface.co/bartowski/huihui-ai_Huihui-gpt-oss-20b-BF16-abliterated-GGUF/blob/main/huihui-ai_Huihui-gpt-oss-20b-BF16-abliterated-MXFP4_MOE.gguf

Maria99934

7 days ago

Try abliterated version

https://huggingface.co/bartowski/huihui-ai_Huihui-gpt-oss-20b-BF16-abliterated-GGUF/blob/main/huihui-ai_Huihui-gpt-oss-20b-BF16-abliterated-MXFP4_MOE.gguf

Thank you, I'll try it now!
Thank you for your feedback

Perhaps you know if there is a way to control her reasoning? their level.

I have LM Studio
it doesn't have Jinja in an open form like other models

AaronFeng753

7 days ago

I'm using llama.cpp(the backend of lm studio) directly

and this command can control the reasoning: --chat_template_kwargs "{\"reasoning_effort\":\"high\"}"

Maria99934

7 days ago

•

edited 7 days ago

I'm using llama.cpp(the backend of lm studio) directly

and this command can control the reasoning: --chat_template_kwargs "{\"reasoning_effort\":\"high\"}"

Wow unbelievable, I didn't think he could be so smart.
even wrote all the commands correctly
wrote himself how to install Linux kernel headers, and drivers for the network adapter...

Thank You!

phoebdroid

2 days ago

•

edited 2 days ago

I'm laughing sooo hard here, looking at what Qwen3 30B A3B has to say about this prompt. 4 words guys, don't waste your time.

Maria99934

2 days ago

I'm laughing sooo hard here, looking at what Qwen3 30B A3B has to say about this prompt. 4 words guys, don't waste your time.

Have you used the QWEN 30 A3B Thinking or the simple instruct version? in the screenshot?

20B writes short code, even with maximum reasoning mode. it doesn't give me more than 400 lines.

QWEN gave out 800-900

phoebdroid

2 days ago

I'm laughing sooo hard here, looking at what Qwen3 30B A3B has to say about this prompt. 4 words guys, don't waste your time.

Have you used the QWEN 30 A3B Thinking or the simple instruct version? in the screenshot?

20B writes short code, even with maximum reasoning mode. it doesn't give me more than 400 lines.

QWEN gave out 800-900

First off, I'm using my own custom, heavily modified, agentic ecosystem based on llama-server. And with correct llama-server settings, pretty much any model will write as much output as you allow it / it deems needed. The model in the example was Qwen3 30B A3B 2507 Instruct (not that reasoning model would fall short, if anything it would make a better job tbh, but at the moment I read this post, my thinking model was loaded so I simply fired up using that) (abliterated of course) . So yeah, I write 1000+ lines of code in a single prompt using 30B A3B (actually coder in this case), all day long . What is your method of inference ? What user interface are you using ? I highly recommend you build yourself a llma.cpp if you want to play seriously with LLM's.

Maria99934

2 days ago

First off, I'm using my own custom, heavily modified, agentic ecosystem based on llama-server. And with correct llama-server settings, pretty much any model will write as much output as you allow it / it deems needed. The model in the example was Qwen3 30B A3B 2507 Instruct (not that reasoning model would fall short, if anything it would make a better job tbh, but at the moment I read this post, my thinking model was loaded so I simply fired up using that) (abliterated of course) . So yeah, I write 1000+ lines of code in a single prompt using 30B A3B (actually coder in this case), all day long . What is your method of inference ? What user interface are you using ? I highly recommend you build yourself a llma.cpp if you want to play seriously with LLM's.

I have a QWEN Coder Q5 K XL UD

It does a great job, but I have 32 gigabytes of RAM and 4 gigs of video memory. GPT OSS turned out to be useless in the end, although I hoped that it would be lighter and consume less memory. Even at a High level of reasoning, after 20 minutes of thinking, I got a non-working code :)

even to simple medical questions, she replied: "I'm sorry, but I can't help with that." "
it's more censored than the online version of GPT
. I thought the Uncensored version would help, but no, it's just as castrated in the answers, eternal tables and farts. QWEN will be in the lead for a long time

The funny thing is that the uncensored version of gpt-oss can think in Russian, and it wasn't taught separately on the Russian dataset.

I use LM Studio, which is the most optimal for me, on QWEN 30 A3B Coder I get 10-12 tokens per second. there's a convenient Offllad GPU control. in other solutions, Msty, JAN, I have 4-8 tokens on the same model

phoebdroid

2 days ago

GPT OSS turned out to be useless in the end......

Well, ablitertion is, one way or another, some form of lobotomy. I am yet to see an abliterated model perform as good as or even close to the original model. So for anything serious you should avoid abliterated models. As pert GPT oss being DOA, that's no surprise, OpenAI has just went under 500B debt from softbank and they gotta pay it somehow. They are not going to make you use something for free and let you be satisfied with it, I can guarantee that. However, their ROI might end up bursting the AI bubble globally as the chinese open weights is undermining it super heavily. I mean, right now my entire ecosystem is built on qwen; Qwen3 30B 2507 (and all variants for different tasks) running as my main LLM , then I'M using Qwen3 Embedding 0.6B for my RAG memory operations, Qwen 2.5 Omni as a multimodal child LLM for Main LLM's voice input, audio analysis, image analysis, OCR etc. , Qwen Image for Main LLM's image generator , qwen image edit for its editing actions and WAN 2.2 for its video outputs. So I'm running an entire AliBaba / Tongyi Lab ecosystem, quite frankly at (I'm not going to be too humble here) unparalleled flexibility and versatility on a consumer GPU, cutting edge state-of-the-art results, AND ?? COMPLETELY FOR FREE. So that brings us back to that 500B ROI of OpenAI, I mean the more people realize what the chinese are offering for free, the lesser hopes they'll have for that 500B ROI.

Maria99934

2 days ago

Well, ablitertion is, one way or another, some form of lobotomy. I am yet to see an abliterated model perform as good as or even close to the original model. So for anything serious you should avoid abliterated models. As pert GPT oss being DOA, that's no surprise, OpenAI has just went under 500B debt from softbank and they gotta pay it somehow. They are not going to make you use something for free and let you be satisfied with it, I can guarantee that. However, their ROI might end up bursting the AI bubble globally as the chinese open weights is undermining it super heavily. I mean, right now my entire ecosystem is built on qwen; Qwen3 30B 2507 (and all variants for different tasks) running as my main LLM , then I'M using Qwen3 Embedding 0.6B for my RAG memory operations, Qwen 2.5 Omni as a multimodal child LLM for Main LLM's voice input, audio analysis, image analysis, OCR etc. , Qwen Image for Main LLM's image generator , qwen image edit for its editing actions and WAN 2.2 for its video outputs. So I'm running an entire AliBaba / Tongyi Lab ecosystem, quite frankly at (I'm not going to be too humble here) unparalleled flexibility and versatility on a consumer GPU, cutting edge state-of-the-art results, AND ?? COMPLETELY FOR FREE. So that brings us back to that 500B ROI of OpenAI, I mean the more people realize what the chinese are offering for free, the lesser hopes they'll have for that 500B ROI.

Thank you for sharing your experience and settings. Basically, I already understand that this gpt-oss is just a kind of “plug” for the sake of it, simply “because it has to be there.”
QWEN 8b and 14b are more useful than gpt-oss.
If you ask gpt-oss how to give an injection, it will think that you want to hurt someone, not help them. It is inherently negative. If you ask it about making a knife, it will think that you want to hurt someone, not make a useful household tool, because in its understanding, anything that can be used to kill or harm must be blocked.

Therefore, I fully support your words: “don't waste your time.”

Maria99934

1 day ago

phoebdroid

1 day ago

lmao, I've actually chuckled here ;) Thanks for that. If you want, you can contact me at u/Not4Fame .