Cosmos-Reason2-2B running on Jetson Orin Nano Super 8GB
Hi,
I saw the note about upcoming system requirements and performance benchmarks:
“This model requires a minimum of 24 GB of GPU memory. Inference latency across different NVIDIA GPU platforms will be published shortly.”
Are you open to contributions of Jetson benchmarks, including configurations that run under 8GB VRAM?
We’ve been experimenting with Cosmos-Reason2-2B and implemented a W4A16 quantized variant that runs across the full Jetson lineup, including Orin Nano 8GB / Nano Super. We also documented memory usage, setup details (vLLM on Jetson), and basic performance numbers.
Model + setup + benchmarks:
https://huggingface.co/embedl/Cosmos-Reason2-2B-W4A16-Edge2
If useful, we’d be happy to:
- Contribute a Jetson section to the docs
- Update our benchmarks if needed to align on recommended serving configurations
Let us know if contributions in this direction would be helpful to you!
Quickstart (vLLM Jetson container):
-gpu-memory-utilization and --max-num-seqs should be adapted to system specifications (i.e., available RAM).
docker run --rm -it
--network host
--shm-size=8g
--ulimit memlock=-1
--ulimit stack=67108864
--runtime=nvidia
--name=vllm-serve
ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin
vllm serve "embedl/Cosmos-Reason2-2B-W4A16-Edge2"
--max-model-len 8192
--gpu-memory-utilization 0.75
--max-num-seqs 2