| | --- |
| | license: apache-2.0 |
| | pipeline_tag: robotics |
| | library_name: transformers |
| | --- |
| | # Mantis |
| |
|
| | > This is the official checkpoint of **Mantis: A Versatile Vision-Language-Action Model |
| | with Disentangled Visual Foresight** |
| |
|
| | - **Paper:** https://arxiv.org/pdf/2511.16175 |
| | - **Code:** https://github.com/zhijie-group/Mantis |
| |
|
| | ### 🔥 Highlights |
| | - **Disentangled Visual Foresight** augments action learning without overburdening the backbone. |
| | - **Progressive Training** preserves the understanding capabilities of the backbone. |
| | - **Adaptive Temporal Ensemble** reduces inference cost while maintaining stable control. |
| |
|
| | ### How to use |
| | This is the Mantis model trained on the [LIBERO](https://huggingface.co/datasets/Yysrc/mantis_libero_lerobot/tree/main) long dataset. For detailed usage please refer to [our repository](https://github.com/zhijie-group/Mantis). |
| |
|
| | ### 📝 Citation |
| | If you find our code or models useful in your work, please cite [our paper](https://arxiv.org/pdf/2511.16175): |
| | ``` |
| | @article{yang2025mantis, |
| | title={Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight}, |
| | author={Yang, Yi and Li, Xueqi and Chen, Yiyang and Song, Jin and Wang, Yihan and Xiao, Zipeng and Su, Jiadi and Qiaoben, You and Liu, Pengfei and Deng, Zhijie}, |
| | journal={arXiv preprint arXiv:2511.16175}, |
| | year={2025} |
| | } |
| | ``` |