TIGER-Lab
/

Vamba-Qwen2-VL-7B

Video-Text-to-Text

text-generation-inference

Model card Files Files and versions

wren93 commited on Mar 17

Commit

0bbfef9

·

verified ·

1 Parent(s): 9b2c439

Update README.md

Files changed (1) hide show

README.md +10 -2

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ pipeline_tag: video-text-to-text
 This repo contains model checkpoints for **Vamba-Qwen2-VL-7B**. Vamba is a hybrid Mamba-Transformer model that leverages cross-attention layers and Mamba-2 blocks for efficient hour-long video understanding.
-[**🌐 Homepage**](https://tiger-ai-lab.github.io/Vamba/) | [**📖 arXiv**](https://arxiv.org/) | [**💻 GitHub**](https://github.com/TIGER-AI-Lab/Vamba) | [**🤗 Model**](https://huggingface.co/TIGER-Lab/Vamba-Qwen2-VL-7B)
 ## Vamba Model Architecture
 <p align="center">
@@ -21,5 +21,13 @@ The main computation overhead in the transformer-based LMMs comes from the quadr
 ## Citation
 If you find our paper useful, please cite us with
 ```
-coming soon
 ```

 This repo contains model checkpoints for **Vamba-Qwen2-VL-7B**. Vamba is a hybrid Mamba-Transformer model that leverages cross-attention layers and Mamba-2 blocks for efficient hour-long video understanding.
+[**🌐 Homepage**](https://tiger-ai-lab.github.io/Vamba/) | [**📖 arXiv**](https://arxiv.org/abs/2503.11579) | [**💻 GitHub**](https://github.com/TIGER-AI-Lab/Vamba) | [**🤗 Model**](https://huggingface.co/TIGER-Lab/Vamba-Qwen2-VL-7B)
 ## Vamba Model Architecture
 <p align="center">
 ## Citation
 If you find our paper useful, please cite us with
 ```
+@misc{ren2025vambaunderstandinghourlongvideos,
+      title={Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers},
+      author={Weiming Ren and Wentao Ma and Huan Yang and Cong Wei and Ge Zhang and Wenhu Chen},
+      year={2025},
+      eprint={2503.11579},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2503.11579},
+}
 ```