Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ pipeline_tag: video-text-to-text
|
|
7 |
|
8 |
This repo contains model checkpoints for **Vamba-Qwen2-VL-7B**. Vamba is a hybrid Mamba-Transformer model that leverages cross-attention layers and Mamba-2 blocks for efficient hour-long video understanding.
|
9 |
|
10 |
-
[**🌐 Homepage**](https://tiger-ai-lab.github.io/Vamba/) | [**📖 arXiv**](https://arxiv.org/) | [**💻 GitHub**](https://github.com/TIGER-AI-Lab/Vamba) | [**🤗 Model**](https://huggingface.co/TIGER-Lab/Vamba-Qwen2-VL-7B)
|
11 |
|
12 |
## Vamba Model Architecture
|
13 |
<p align="center">
|
@@ -21,5 +21,13 @@ The main computation overhead in the transformer-based LMMs comes from the quadr
|
|
21 |
## Citation
|
22 |
If you find our paper useful, please cite us with
|
23 |
```
|
24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
```
|
|
|
7 |
|
8 |
This repo contains model checkpoints for **Vamba-Qwen2-VL-7B**. Vamba is a hybrid Mamba-Transformer model that leverages cross-attention layers and Mamba-2 blocks for efficient hour-long video understanding.
|
9 |
|
10 |
+
[**🌐 Homepage**](https://tiger-ai-lab.github.io/Vamba/) | [**📖 arXiv**](https://arxiv.org/abs/2503.11579) | [**💻 GitHub**](https://github.com/TIGER-AI-Lab/Vamba) | [**🤗 Model**](https://huggingface.co/TIGER-Lab/Vamba-Qwen2-VL-7B)
|
11 |
|
12 |
## Vamba Model Architecture
|
13 |
<p align="center">
|
|
|
21 |
## Citation
|
22 |
If you find our paper useful, please cite us with
|
23 |
```
|
24 |
+
@misc{ren2025vambaunderstandinghourlongvideos,
|
25 |
+
title={Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers},
|
26 |
+
author={Weiming Ren and Wentao Ma and Huan Yang and Cong Wei and Ge Zhang and Wenhu Chen},
|
27 |
+
year={2025},
|
28 |
+
eprint={2503.11579},
|
29 |
+
archivePrefix={arXiv},
|
30 |
+
primaryClass={cs.CV},
|
31 |
+
url={https://arxiv.org/abs/2503.11579},
|
32 |
+
}
|
33 |
```
|