adamo1139/GPT-OSS-20B-HESOYAM-1108-WIP-CHATML-GGUF

GPT-OSS-20B-HESOYAM-1108-WIP-CHATML-GGUF

GPT-OSS-20B fine-tuned on adamo1139/HESOYAM_v0.4 dataset, 1 epoch, chatml format that erases reasoning. 1024 rank, 128 alpha QLoRA made with Unsloth. It will be undergoing further preference alignment once some issues preventing me from doing it right now will be patched out.

Total batch size 16, learning rate 0.0002 with cosine schedule, with sample packing enabled. Training took about 8 hours on single 3090 Ti.

Loss curve looks a bit underwhelming.

I tried merging this lora with the huizimao/gpt-oss-20b-uncensored-mxfp4 but that wasn't producing great effects.

No reasoning is present, and model definitely learns something from the dataset, but it feels pretty dumb, so it could be a wrong path.

adamo1139
/

GPT-OSS-20B-HESOYAM-1108-WIP-CHATML-GGUF

Model tree for adamo1139/GPT-OSS-20B-HESOYAM-1108-WIP-CHATML-GGUF

Dataset used to train adamo1139/GPT-OSS-20B-HESOYAM-1108-WIP-CHATML-GGUF