Improve model card with abstract summary and GitHub link (#3)
Browse files- Improve model card with abstract summary and GitHub link (25028e1171ab6df4acede6017ee62449396cdfcf)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -8,16 +8,19 @@ tags:
|
|
8 |
- multimodal
|
9 |
---
|
10 |
|
11 |
-
# Kwai Keye-VL
|
12 |
|
13 |
|
14 |
<div align="center">
|
15 |
<img src="asset/keye_logo_2.png" width="100%" alt="Kwai Keye-VL Logo">
|
16 |
</div>
|
17 |
|
|
|
|
|
18 |
<font size=3><div align='center' >
|
19 |
[[🍎 Home Page](https://kwai-keye.github.io/)]
|
20 |
-
[[📖
|
|
|
21 |
[[📊 Keye-VL-8B-Preview](https://huggingface.co/Kwai-Keye/Keye-VL-8B-Preview) ]
|
22 |
[[📊 Keye-VL-1.5-8B](https://huggingface.co/Kwai-Keye/Keye-VL-1_5-8B/) ]
|
23 |
[[🚀 Demo](https://huggingface.co/spaces/Kwai-Keye/Keye-VL-8B-Preview)]
|
@@ -38,7 +41,7 @@ tags:
|
|
38 |
|
39 |
## Contents <!-- omit in toc -->
|
40 |
|
41 |
-
- [Kwai Keye-VL](#kwai-keye-vl)
|
42 |
- [🔥 News](#-news)
|
43 |
- [📐 Quick Start](#-quick-start)
|
44 |
- [Preprocess and Inference](#preprocess-and-inference)
|
@@ -409,7 +412,7 @@ def prepare_message_for_vllm(content_messages):
|
|
409 |
new_content_list = []
|
410 |
for part_message in message_content_list:
|
411 |
if 'video' in part_message:
|
412 |
-
video_message = [{'content': [part_message]}]
|
413 |
image_inputs, video_inputs, video_kwargs = process_vision_info(video_message, return_video_kwargs=True)
|
414 |
assert video_inputs is not None, "video_inputs should not be None"
|
415 |
video_input = (video_inputs.pop()).permute(0, 2, 3, 1).numpy().astype(np.uint8)
|
@@ -515,4 +518,4 @@ If you find our work helpful for your research, please consider citing our work.
|
|
515 |
|
516 |
## Acknowledgement
|
517 |
|
518 |
-
Kwai Keye-VL is developed based on the codebases of the following projects: [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding work.
|
|
|
8 |
- multimodal
|
9 |
---
|
10 |
|
11 |
+
# Kwai Keye-VL 1.5
|
12 |
|
13 |
|
14 |
<div align="center">
|
15 |
<img src="asset/keye_logo_2.png" width="100%" alt="Kwai Keye-VL Logo">
|
16 |
</div>
|
17 |
|
18 |
+
Keye-VL-1.5 is a cutting-edge Multimodal Large Language Model (MLLM) that addresses fundamental challenges in video comprehension. It features a novel Slow-Fast video encoding strategy, a progressive four-stage pre-training methodology to extend context length up to 128K tokens, and a comprehensive post-training pipeline focusing on reasoning enhancement and human preference alignment. The model demonstrates significant improvements in video understanding tasks and maintains competitive performance on general multimodal benchmarks.
|
19 |
+
|
20 |
<font size=3><div align='center' >
|
21 |
[[🍎 Home Page](https://kwai-keye.github.io/)]
|
22 |
+
[[📖 Technical Report](https://arxiv.org/abs/2509.01563)]
|
23 |
+
[[💻 GitHub Repository](https://github.com/Kwai-Keye/Keye)]
|
24 |
[[📊 Keye-VL-8B-Preview](https://huggingface.co/Kwai-Keye/Keye-VL-8B-Preview) ]
|
25 |
[[📊 Keye-VL-1.5-8B](https://huggingface.co/Kwai-Keye/Keye-VL-1_5-8B/) ]
|
26 |
[[🚀 Demo](https://huggingface.co/spaces/Kwai-Keye/Keye-VL-8B-Preview)]
|
|
|
41 |
|
42 |
## Contents <!-- omit in toc -->
|
43 |
|
44 |
+
- [Kwai Keye-VL 1.5](#kwai-keye-vl-15)
|
45 |
- [🔥 News](#-news)
|
46 |
- [📐 Quick Start](#-quick-start)
|
47 |
- [Preprocess and Inference](#preprocess-and-inference)
|
|
|
412 |
new_content_list = []
|
413 |
for part_message in message_content_list:
|
414 |
if 'video' in part_message:
|
415 |
+
video_message = [{'content': [part_message]}]\
|
416 |
image_inputs, video_inputs, video_kwargs = process_vision_info(video_message, return_video_kwargs=True)
|
417 |
assert video_inputs is not None, "video_inputs should not be None"
|
418 |
video_input = (video_inputs.pop()).permute(0, 2, 3, 1).numpy().astype(np.uint8)
|
|
|
518 |
|
519 |
## Acknowledgement
|
520 |
|
521 |
+
Kwai Keye-VL is developed based on the codebases of the following projects: [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding work.
|