Any plans for 32B/70B distilled models?

#83

by NanaBanana22 - opened May 31

May 31

Hey. Any plans to distill qwen3 32b / llama 70b?

May 31

we want this too!

Jun 3

qwen3 30b a3b

Jun 3

Please no more distills. They just lack so far behind because they use entirely different architectures (in this case, Qwen3)

I'd rather have a DeepSeek R1 Lite. The same model, with the same training data, just scaled down so it can run on consumer hardware.

Jun 28

I'm using SlimMoE to try to make a smaller version

4 days ago

@ehartford Do you have a copy of the SlimMoE codebase? I'm trying to find tools for pruning experts, and the Microsoft repository was made private and removed from the readmes of the relevant models: https://huggingface.co/microsoft/Phi-tiny-MoE-instruct/commit/12110926a833f9a59d8c084c6ae17b938db40eb2, https://github.com/microsoft/MoE-compression
Thank you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment