1 14

Tranheden

WilhelmT

AI & ML interests

None yet

Recent Activity

liked a Space 42 minutes ago

embedl/Edge-Inference-Benchmarks

reacted to JonnaMat's post with 🔥 4 days ago

🤯 Edge-Grade Vision Reasoning. Now Practically Lossless. 🤯 Introducing 👉 https://huggingface.co/embedl/Cosmos-Reason2-2B-W4A16-Edge2 Optimized for Jetson Orin Nano Super and AGX Orin https://huggingface.co/nvidia . 🚄 Try it out on Jetson (image+video+text): ``` docker run --rm -it \ --network host \ --shm-size=8g \ --ulimit memlock=-1 \ --ulimit stack=67108864 \ --runtime=nvidia \ --name=vllm-serve \ -e HF_TOKEN=hf_*** \ -e HF_HOME=/root/.cache/huggingface \ ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin \ vllm serve "embedl/Cosmos-Reason2-2B-W4A16-Edge2" \ --max-model-len 8192 \ --gpu-memory-utilization 0.75 \ --max-num-seqs 2 ``` 🤓 What is Edge2? Most weights → INT4 | Activations → FP16 | Select sensitive layers → kept in FP16. Edge2 preserves precision where it matters most; while keeping the model small and fast enough for edge GPUs. 😎

liked a model 5 days ago

embedl/Cosmos-Reason2-2B-NVFP4A16

View all activity

Organizations

liked a Space 42 minutes ago

Edge Inference Benchmarks

🚀

On-Device benchmarks across devices and models.

reacted to JonnaMat's post with 🔥 4 days ago

Post

1581

🤯 Edge-Grade Vision Reasoning. Now Practically Lossless. 🤯

Introducing
👉 embedl/Cosmos-Reason2-2B-W4A16-Edge2
Optimized for Jetson Orin Nano Super and AGX Orin

nvidia .

🚄 Try it out on Jetson (image+video+text):

docker run --rm -it \
  --network host \
  --shm-size=8g \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --runtime=nvidia \
  --name=vllm-serve \
  -e HF_TOKEN=hf_*** \
  -e HF_HOME=/root/.cache/huggingface \
  ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin \
  vllm serve "embedl/Cosmos-Reason2-2B-W4A16-Edge2" \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.75 \
    --max-num-seqs 2

🤓 What is Edge2? Most weights → INT4 | Activations → FP16 | Select sensitive layers → kept in FP16.
Edge2 preserves precision where it matters most; while keeping the model small and fast enough for edge GPUs. 😎

liked 2 models 5 days ago

embedl/Cosmos-Reason2-2B-NVFP4A16

Image-Text-to-Text • 2B • Updated 5 days ago • 243 • 2

embedl/Cosmos-Reason2-2B-W4A16-Edge2

Image-Text-to-Text • 2B • Updated about 5 hours ago • 9.57k • 9

reacted to JonnaMat's post with 🔥 6 days ago

Post

2483

⚡ Blackwell-native Vision Reasoning at the edge ⚡

Released a NVFP4A16-variant of nvidia/Cosmos-Reason2-2B:
embedl/Cosmos-Reason2-2B-NVFP4A16

💖 Optimized for Blackwell with minimal accuracy drop compared to its FP16 counterpart.

Thorough on-device benchmarks on AGX Thor in the modelcard. 🤓 📊

Try it out:

docker run --rm -it \
  --network host \
  --shm-size=8g \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --runtime=nvidia \
  --name=vllm-serve \
  -e HF_TOKEN=hf_*** \
  -e HF_HOME=/root/.cache/huggingface \
  nvcr.io/nvidia/vllm:26.01-py3 \
  vllm serve "embedl/Cosmos-Reason2-2B-NVFP4A16" \
    --host 0.0.0.0 \
    --port 8000 \
    --tensor-parallel-size 1 \
    --max-model-len 16384 \
    --gpu-memory-utilization 0.9