Any plans for 32B/70B distilled models?
#83
by
NanaBanana22
- opened
Hey. Any plans to distill qwen3 32b / llama 70b?
we want this too!
qwen3 30b a3b
Please no more distills. They just lack so far behind because they use entirely different architectures (in this case, Qwen3)
I'd rather have a DeepSeek R1 Lite. The same model, with the same training data, just scaled down so it can run on consumer hardware.
I'm using SlimMoE to try to make a smaller version
@ehartford
Do you have a copy of the SlimMoE codebase? I'm trying to find tools for pruning experts, and the Microsoft repository was made private and removed from the readmes of the relevant models: https://huggingface.co/microsoft/Phi-tiny-MoE-instruct/commit/12110926a833f9a59d8c084c6ae17b938db40eb2, https://github.com/microsoft/MoE-compression
Thank you!