---
license: apache-2.0
pipeline_tag: depth-estimation
---

# Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots

This repository contains the Camera Depth Models (CDMs) from the paper:
[Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots](https://huggingface.co/papers/2509.02530).

## Overview

Camera Depth Models (CDMs) are proposed as a simple plugin for daily-use depth cameras. They take RGB images and raw depth signals as input and output denoised, accurate metric depth. This enables policies trained purely in simulation to transfer directly to real robots by providing nearly simulation-level accurate depth perception.

## Links

*   **Project Page**: [https://manipulation-as-in-simulation.github.io](https://manipulation-as-in-simulation.github.io/)
*   **Code Repository**: [https://github.com/ByteDance-Seed/manip-as-in-sim-suite](https://github.com/ByteDance-Seed/manip-as-in-sim-suite)

## Usage

For detailed installation instructions and further usage examples, please refer to the [CDM documentation in the GitHub repository](https://github.com/ByteDance-Seed/manip-as-in-sim-suite/tree/main/cdm).

### CDM Inference Example

To run depth inference on RGB-D camera data, use the following command:

```bash
cd cdm
python infer.py \
    --encoder vitl \
    --model-path /path/to/model.pth \
    --rgb-image /path/to/rgb.jpg \
    --depth-image /path/to/depth.png \
    --output result.png
```

## Citation

If you use this work in your research, please cite:

```bibtex
@article{liu2025manipulation,
  title={Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots},
  author={Liu, Minghuan and Zhu, Zhengbang and Han, Xiaoshen and Hu, Peng and Lin, Haotong and 
          Li, Xinyao and Chen, Jingxiao and Xu, Jiafeng and Yang, Yichu and Lin, Yunfeng and 
          Li, Xinghang and Yu, Yong and Zhang, Weinan and Kong, Tao and Kang, Bingyi},
  journal={arXiv preprint},
  year={2025}
}
```