[email protected]
commited on
Commit
·
595112c
1
Parent(s):
87f9654
Update README.md
Browse files- README.md +10 -1
- RTL-GeLM-7B/tokenizer.model +0 -0
- assets/architecture_v3.jpeg +0 -0
README.md
CHANGED
|
@@ -3,4 +3,13 @@ datasets:
|
|
| 3 |
- SurplusDeficit/MultiHop-EgoQA
|
| 4 |
---
|
| 5 |
|
| 6 |
-
# Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
- SurplusDeficit/MultiHop-EgoQA
|
| 4 |
---
|
| 5 |
|
| 6 |
+
# Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
## GeLM Model
|
| 10 |
+
|
| 11 |
+
We propose a novel architecture, termed as <b><u>GeLM</u></b> for *MH-VidQA*, to leverage the world knowledge reasoning capabilities of multi-modal large language models (LLMs), while incorporating a grounding module to retrieve temporal evidence in the video with flexible grounding tokens.
|
| 12 |
+
|
| 13 |
+
<div align="center">
|
| 14 |
+
<img src="./assets/architecture_v3.jpeg" style="width: 80%;">
|
| 15 |
+
</div>
|
RTL-GeLM-7B/tokenizer.model
CHANGED
|
Binary files a/RTL-GeLM-7B/tokenizer.model and b/RTL-GeLM-7B/tokenizer.model differ
|
|
|
assets/architecture_v3.jpeg
ADDED
|