enzomich (Enzo Michelangeli)

commented on Open-source DeepResearch – Freeing our search agents 14 days ago

DeepSeek-R1-Distill-Llama-70B and its derivatives are not DeepSeek R1: they are Meta Llama models finetuned to add DeepSeek R1's reasoning (CoT) capabilities. This process of transferring knowledge from a larger model to a smaller one is called "distillation", and has been used also for other Llama and Qwen2.5 smaller models.

It's also possible to run on limited hardware (better if having ~100 Gb of RAM) a quantized version of the real thing, the 671B DeepSeek R1: see https://unsloth.ai/blog/deepseekr1-dynamic . I managed to run it on a gaming laptop (an MSI Katana 15 with 64 Gb RAM) but it's very slow, generating around 0.22 tokens per second (meaning 0.7 to 0.9 characters per second). On a 128 Gb NVIDIA DIGITS box, promised for May at ~$3,000 (https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwell-on-every-desk-and-at-every-ai-developers-fingertips ), it should have an more acceptable throughput.

commented on Open-R1: a fully open reproduction of DeepSeek-R1 23 days ago

So basically you're asking to replace existing censorship with another flavour of censorship?

New activity in huggingchat/chat-ui 10 months ago

Image Gen + an Advanced Image Gnerator

28

#399 opened 11 months ago by

KingNish

Enzo Michelangeli

AI & ML interests

Recent Activity

Organizations

enzomich's activity

Image Gen + an Advanced Image Gnerator