Wur doomed!

#14
by jukofyork - opened

Continuation of THE THREAD OF DOOM.

jukofyork pinned discussion

What do you and the others think of the distilled R1 models for writing?

The llama3 / qwen models SFT'd on R1 outputs? I only tried 2 of them.

R1 Qwen (32b) - Lacks knowledge of fiction (same as the official Qwen release), so it's writing is no better.

R1 Llama3 - This is generally the worst of them (not just for writing). It'll generate the CoT and then write something completely different.

CoT traces won't let the model do anything out of distribution, so not very useful if the base model doesn't have a lot in it's training data.

Yeah, I have tried the same two and felt the same way.

I also felt that any attempt to add an R1 distill to the merge recipe of an existing merge project made it worse...so far...

@gghfez @BigHuggyD that has been my experience as well, which is a shame as I had a go of R1 on Openrouter and I was blown away.

What model is anywhere close that is usable on a 24gb vram machine with 32gb of ram in your experience?

There's nothing like it for now. I'm running R1 slowly on my ThreadRipper:

prompt eval time =   14026.61 ms /   918 tokens (   15.28 ms per token,    65.45 tokens per second)
       eval time =  398806.12 ms /  1807 tokens (  220.70 ms per token,     4.53 tokens per second)
      total time =  412832.73 ms /  2725 tokens

I tried training Wizard2 8x22b MoE on R1 data, but it doesn't really work well. It will plan ahead in think tags eg:

I need to ensure the story maintains its gritty, realistic tone without becoming overly melodramatic. The characters' growth should be subtle but significant. Also, the ending should leave a sense of hope but not be too neat—their redemption is fragile, and the future is uncertain.

Let me outline the next few chapters:

Chapter 5: Nightmares and Trust
...

But it doesn't backtrack like R1 does. Just kind of agrees with it's self and ends up writing how it usually would:

“I don’t know what I want anymore,” she admitted, voice barely above a whisper as rain tapped against corrugated roofing overhead.

lol

Ahhh thats a shame :-(

"I don’t know what I want anymore,” she admitted, voice barely above a whisper as rain tapped against corrugated roofing overhead."

Oh god!

I'll have to keep an eye on this thread.

I did enjoy Ppoyaa/MythoNemo-L3.1-70B-v1.0

But my tastes are probably not as refined as others on this thread ;-)

Chapter 5 - meta payoff. It all comes together.

Nice work.

Does your MLA version keep the context memory footprint smaller (like the ik_llama fork) of does it balloon like mainline llama does?

Chapter 5 - meta payoff. It all comes together.

Nice work.

Does your MLA version keep the context memory footprint smaller (like the ik_llama fork) of does it balloon like mainline llama does?

It's fairydreaming who added it to llama and then someone else forked it to ik_llama:

https://github.com/ggml-org/llama.cpp/pull/11446

I think ik_llama has merged the MLA branch so it should be the same and only uses a couple of GB for 32k context.

It does seem to trade this for extra compute though and you lose a couple of tokens per second.

I think all the problems I've had may be from a flag fairydreaming added to transpose one of the tensors for a small gain on his CPU system, but it seems that the CUDA backend has a lot of tests that ship off the operations to different functions depending on the shape and float type.

@jukofyork , are you using deterministic samplers or something else? Other than your user request, is your prompt otherwise empty?

@jukofyork , are you using deterministic samplers or something else? Other than your user request, is your prompt otherwise empty?

I'm using Deepseek's own suggested parameters which somebody posted on Reddit a week or so ago:

  • No system message (I tend to not use one anyway)
  • Temperature of 0.6 (--temp 0.6 )
  • All other samplers disabled (--min-p 0.0 --top-p 1.0 --top-k 0 )

I suspect that there are probably lots of buggy instances of this being served. When it wasn't working properly, it has the feeling of being "slightly off" as though it was reading the previous text in some kind of corrupted/distorted way (likely due to the RoPE encoding being mangled and/or some other numerical error). For example:

The gallows creaked like a dying man’s breath. Varis Coldwater stood atop the splintered platform, the noose in his hands as familiar as sin. Below, the town of Blackmoor huddled—a carcass of leaning timber and crumbling stone, its people gaunt shadows gathered in the square. They reeked of desperation, their eyes hollow as the sockets of the skulls nailed above the butcher’s stall. Autumn had stripped the oaks bare, and the wind carried the stench of rot from the moors, where the peat devoured the unwary.

“Last chance,” Varis growled, the hemp rough against his palms. He’d hanged seven men on this rope. Their faces blurred now, but their screams never did. “Confess, and your soul stays your own.”

Jurl dangled, wrists bound, his boyish frame too slight for the iron chains. Blood crusted his temple where the mob had struck him—before they lost their nerve. Before they saw the black veins spidering beneath his skin.

The thing inside Jurl laughed, a sound like drowned lungs heaving. “You’re knee-deep in it, Coldwater.” The boy’s jaw cracked sideways, teeth splintering as the grin split his face. “The Great Wolf’s howlin’ again. The Dead’s Gate’s rusted through…”

You can see here it's used all the given dialogue and in the correct order. Before it would use some of the dialogue in a "Chinese Whispers" sort of way: not the actual phrases and often not in the correct order (ie: mention something about "knees" later on).

Now it's working properly, it's razor sharp about things like this.

My second attempt to buy a M2 Ultra looks to be yet another retarded "delayed dispatch" scam attempt - sigh.

I have a feeling it could even be the very same person as last time who delayed for a week then tried to say they were "unexpectedly working in Africa" until the end of February... :/

If anybody is interested in running this then you can probably pick up an old "HP Z8 G4" or "DELL T7920" quite cheaply off eBay. Just make sure to get one with Cascade Lake CPUs (ie: with a "2" as the second digit) and populate all 24 dimms with 32GB 2933Mhz DDR4.

It will have around 2x the memory bandwidth of my old E5-2699v4 and paired with a 24GB GPU it should work well (ie: keeping only the essential 32MB attn_k_b unquantised until the overflow problem is found/fixed).

I suspect this will get almost the same as people running purely from CPU with higher RAM bandwidth, as the actual MLA calculations seem to make token generation much more compute-bound than normal (obviously higher RAM bandwidth AND a 24/48GB GPU will beat this, but will also cost 10x more for perhaps 1.5-2x the performance).

@jukofyork , are you using deterministic samplers or something else? Other than your user request, is your prompt otherwise empty?

I'm using Deepseek's own suggested parameters which somebody posted on Reddit a week or so ago:

  • No system message (I tend to not use one anyway)
  • Temperature of 0.6 (--temp 0.6 )
  • All other samplers disabled (--min-p 0.0 --top-p 1.0 --top-k 0 )

I suspect that there are probably lots of buggy instances of this being served. When it wasn't working properly, it has the feeling of being "slightly off" as though it was reading the previous text in some kind of corrupted/distorted way (likely due to the RoPE encoding being mangled and/or some other numerical error). For example:

The gallows creaked like a dying man’s breath. Varis Coldwater stood atop the splintered platform, the noose in his hands as familiar as sin. Below, the town of Blackmoor huddled—a carcass of leaning timber and crumbling stone, its people gaunt shadows gathered in the square. They reeked of desperation, their eyes hollow as the sockets of the skulls nailed above the butcher’s stall. Autumn had stripped the oaks bare, and the wind carried the stench of rot from the moors, where the peat devoured the unwary.

“Last chance,” Varis growled, the hemp rough against his palms. He’d hanged seven men on this rope. Their faces blurred now, but their screams never did. “Confess, and your soul stays your own.”

Jurl dangled, wrists bound, his boyish frame too slight for the iron chains. Blood crusted his temple where the mob had struck him—before they lost their nerve. Before they saw the black veins spidering beneath his skin.

The thing inside Jurl laughed, a sound like drowned lungs heaving. “You’re knee-deep in it, Coldwater.” The boy’s jaw cracked sideways, teeth splintering as the grin split his face. “The Great Wolf’s howlin’ again. The Dead’s Gate’s rusted through…”

You can see here it's used all the given dialogue and in the correct order. Before it would use some of the dialogue in a "Chinese Whispers" sort of way: not the actual phrases and often not in the correct order (ie: mention something about "knees" later on).

Now it's working properly, it's razor sharp about things like this.

Thanks, yeah. My biggest challenge I have right now is it gets stuck on a pattern. So if chapter one had someone slap a table in paragraph one and someone leave the room in anger in the last paragraph, then chapter two will have someone bang their knee in paragraph one and someone leave to the bathroom in the last paragraph, chapter three will be some other similar event .. bump a shoulder and head to the gym and so on and so on

Thanks, yeah. My biggest challenge I have right now is it gets stuck on a pattern. So if chapter one had someone slap a table in paragraph one and someone leave the room in anger in the last paragraph, then chapter two will have someone bang their knee in paragraph one and someone leave to the bathroom in the last paragraph, chapter three will be some other similar event .. bump a shoulder and head to the gym and so on and so on

Try adding some extra semi-random text at the end of your prompts.

Here is an example of a buggy float16 version I was testing that hallucinated this Korean character at the start and it completely changes things:

USER

Varis adjusted the noose, its hemp fibers grinding beneath his calluses. “Last chance,” he said, voice like gravel dragged through mud. “Confess, and your soul stays your own.”

Jurl laughed—a wet, gurgling sound. “You’re knee-deep in it, Coldwater. ” The thing inside him twisted the boy’s lips into a grin too wide for his face. “The Great Wolf’s howlin’ again. The Dead’s Gate’s rusted through… ”

Turn this into the opening chapter of a Grimdark trilogy.

ASSISTANT

韓國

It seems like you're working on a creative writing project that blends dark fantasy and noir elements. Let me help you connect these elements while addressing the "Korean" reference at the end. Here's a streamlined version of your concept with a Korean twist:


Synergy of Elements:
The gritty atmosphere of "Varis adjusting the noose" and Jurl's unnerving laughter could merge with Korean mythos. Imagine Varis as a disillusioned exorcist (gut shaman) in modern Seoul, hunting a parasitic gumiho (nine-tailed fox spirit) that possesses people like Jurl. The "thing inside him" could tie into Korean folklore, such as:

  • A gwisin (vengeful spirit) bound to Varis, demanding justice.
  • A dokkaebi (mischievous goblin) offering cursed deals.

Korean Influence:
Instead of a generic "confess your sins," the story could explore han (Korean collective sorrow) and the duality of modernity/tradition. The "hemp fibers" and "gravel voice" evoke visceral texture, while phrases like "knee-deep in it" could nod to Seoul’s flooded Cheonggyecheon River, haunted by urban legends.

Revised Snippet with Korean Lore:
Varis tightened the noose, its minari fibers scraping his palms—a remedy no shaman had taught him. The boy’s body arched. “Last chance,” Varis growled. “Confess, and I won’t bury you where the gwisin wail.”
*Jurl’s laughter bubbled like a clogged drain. “You smell of grave moss, Coldwater. Still chasing your sister’s ghost

Sign up or log in to comment