Update README.md
Browse files
README.md
CHANGED
@@ -24,10 +24,8 @@ benchmarks, demonstrating an average performance improvement of 2% compared to L
|
|
24 |
## Evaluation Results
|
25 |
| **Model** | **Size** | **LongVideoBench** | **MLVU** | **VideoMME (Short)** | **VideoMME (Medium)** | **VideoMME (Long)** | **VideoMME (Average)** |
|
26 |
|-------------------------------------|----------|---------------------|----------|----------------------|-----------------------|----------------------|-------------------------|
|
27 |
-
| **
|
28 |
-
| **
|
29 |
-
| **LongVA-7B [3]** | 7B | 51.3 | 58.8 | 61.3/61.6 | 50.4/53.6 | 46.2/47.6 | 52.6/54.3 |
|
30 |
-
| **LongVA-TPO (ours)** | 7B | **54.2** | 61.7 | 63.1/66.6 | 54.8/55.3 | 47.4/47.9 | **55.1**/56.6 |
|
31 |
|
32 |
## Get Started
|
33 |
|
@@ -94,6 +92,5 @@ This project utilizes certain datasets and checkpoints that are subject to their
|
|
94 |
|
95 |
**References:**
|
96 |
|
97 |
-
|
98 |
-
[
|
99 |
-
[3]. Zhang, P., Zhang, K., Li, B., Zeng, G., Yang, J., Zhang, Y., ... & Liu, Z. (2024). Long context transfer from language to vision. arXiv preprint arXiv:2406.16852.
|
|
|
24 |
## Evaluation Results
|
25 |
| **Model** | **Size** | **LongVideoBench** | **MLVU** | **VideoMME (Short)** | **VideoMME (Medium)** | **VideoMME (Long)** | **VideoMME (Average)** |
|
26 |
|-------------------------------------|----------|---------------------|----------|----------------------|-----------------------|----------------------|-------------------------|
|
27 |
+
| **LongVA-7B [1]** | 7B | 51.3 | 58.8 | 61.3/61.6 | 50.4/53.6 | 46.2/47.6 | 52.6/54.3 |
|
28 |
+
| **LongVA-TPO (ours)** | 7B | **54.2** | **61.7** | **63.1/66.6** | **54.8/55.3** | **47.4/47.9** | **55.1/56.6** |
|
|
|
|
|
29 |
|
30 |
## Get Started
|
31 |
|
|
|
92 |
|
93 |
**References:**
|
94 |
|
95 |
+
|
96 |
+
[1]. Zhang, P., Zhang, K., Li, B., Zeng, G., Yang, J., Zhang, Y., ... & Liu, Z. (2024). Long context transfer from language to vision. arXiv preprint arXiv:2406.16852.
|
|