Cosmos-Reason2-2B running on Jetson Orin Nano Super 8GB

by WilhelmT - opened 8 days ago

Discussion

WilhelmT

8 days ago

•

edited 1 day ago

Hi,

I saw the note about upcoming system requirements and performance benchmarks:

“This model requires a minimum of 24 GB of GPU memory. Inference latency across different NVIDIA GPU platforms will be published shortly.”

Are you open to contributions of Jetson benchmarks, including configurations that run under 8GB VRAM?

We’ve been experimenting with Cosmos-Reason2-2B and implemented a W4A16 quantized variant that runs across the full Jetson lineup, including Orin Nano 8GB / Nano Super. We also documented memory usage, setup details (vLLM on Jetson), and basic performance numbers.

Model + setup + benchmarks:
https://huggingface.co/embedl/Cosmos-Reason2-2B-W4A16-Edge2

If useful, we’d be happy to:

Contribute a Jetson section to the docs
Update our benchmarks if needed to align on recommended serving configurations

Let us know if contributions in this direction would be helpful to you!

WilhelmT

7 days ago

•

edited 1 day ago

Quickstart (vLLM Jetson container):

-gpu-memory-utilization and --max-num-seqs should be adapted to system specifications (i.e., available RAM).

docker run --rm -it
--network host
--shm-size=8g
--ulimit memlock=-1
--ulimit stack=67108864
--runtime=nvidia
--name=vllm-serve
ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin
vllm serve "embedl/Cosmos-Reason2-2B-W4A16-Edge2"
--max-model-len 8192
--gpu-memory-utilization 0.75
--max-num-seqs 2

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Cosmos-Reason2-2B running on Jetson Orin Nano Super 8GB

🎉 Free Image Generator Now Available!