Any plan to release 120b and 20-30b level models?

by Sunny2038 - opened 8 days ago

Sunny2038

8 days ago

It would be great if LongChat could have 120b and 20-30b level models, like gpt-120b, GLM-4.5-Air, or/and gpt-20b, seed-36b, qwen3-30b.

aquiffoo

8 days ago

yeah... like 100B A3.3-5.5B and 32B A1-2B. this would be great

Oldify

7 days ago

also any dense versions plz

kalashshah19

7 days ago

also any dense versions plz

Why dense ?

MrDevolver

5 days ago

I'm all for Mixture of Experts in size of approximately 30B with about 3-6B active parameters. That seems to be a good trade-off between speed and performance and especially useful for bringing better quality models to those regular PC users who cannot afford more powerful hardware. Recent advancements in this model size suggest this size of AI models has great potential that's yet to be unleashed.

kalashshah19

4 days ago

I'm all for Mixture of Experts in size of approximately 30B with about 3-6B active parameters. That seems to be a good trade-off between speed and performance and especially useful for bringing better quality models to those regular PC users who cannot afford more powerful hardware. Recent advancements in this model size suggest this size of AI models has great potential that's yet to be unleashed.

Ohh yeah i got it, thanks.

Downtown-Case

3 days ago

TBH I'd rather have them try something more experimental, like a bitnet (or at least partially bitnet) model for cheaper deployment, or an 'alternate' attention scheme for longer context.

We have tons of mid-sized options now. I suppose a mid-sized MLA model with a long context would be interesting though, as everything else seems to use regular GQA.

ct-2

3 days ago

TBH I'd rather have them try something more experimental, like a bitnet (or at least partially bitnet) model for cheaper deployment, or an 'alternate' attention scheme for longer context.

We have tons of mid-sized options now. I suppose a mid-sized MLA model with a long context would be interesting though, as everything else seems to use regular GQA.

Yeah, I agree lets hope something comes out with a good trim like 2bit and native fp4 like from Apple foundation model and openai gpt-oss

Downtown-Case

3 days ago

Yeah, I agree lets hope something comes out with a good trim like 2bit and native fp4 like from Apple foundation model and openai gpt-oss

We already got a native 2-bit QAT 300B, with Baidu 300B! Yeah, more native QAT would be awesome.

No one's converted the 300B weights out of PaddlePaddle though, as far as I know.

ct-2

3 days ago

•

edited 3 days ago

PaddlePaddle

To be honest, no one has a clue about this either. Maybe the Chinese speaking folks have their own llama.cpp alternatives and have been running them on their CPUs? Who knows? 😎

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment