Improve model card with abstract summary and GitHub link
Browse filesThis PR enhances the model card for Kwai Keye-VL 1.5 by:
- Updating the main title to "Kwai Keye-VL 1.5" for better precision, aligning with the paper and news.
- Adding a concise summary of the model's key innovations and capabilities, derived from the paper's abstract, to provide an immediate overview for users.
- Integrating a direct link to the official GitHub repository (`https://github.com/Kwai-Keye/Keye`) in the prominent initial links section, improving code discoverability.
- Correcting the typo "Technique Report" to "Technical Report" in the introductory links.
All existing metadata, code snippets, and other detailed content remain unchanged to preserve functionality and user experience.
README.md
CHANGED
@@ -8,16 +8,19 @@ tags:
|
|
8 |
- multimodal
|
9 |
---
|
10 |
|
11 |
-
# Kwai Keye-VL
|
12 |
|
13 |
|
14 |
<div align="center">
|
15 |
<img src="asset/keye_logo_2.png" width="100%" alt="Kwai Keye-VL Logo">
|
16 |
</div>
|
17 |
|
|
|
|
|
18 |
<font size=3><div align='center' >
|
19 |
[[🍎 Home Page](https://kwai-keye.github.io/)]
|
20 |
-
[[📖
|
|
|
21 |
[[📊 Keye-VL-8B-Preview](https://huggingface.co/Kwai-Keye/Keye-VL-8B-Preview) ]
|
22 |
[[📊 Keye-VL-1.5-8B](https://huggingface.co/Kwai-Keye/Keye-VL-1_5-8B/) ]
|
23 |
[[🚀 Demo](https://huggingface.co/spaces/Kwai-Keye/Keye-VL-8B-Preview)]
|
@@ -38,7 +41,7 @@ tags:
|
|
38 |
|
39 |
## Contents <!-- omit in toc -->
|
40 |
|
41 |
-
- [Kwai Keye-VL](#kwai-keye-vl)
|
42 |
- [🔥 News](#-news)
|
43 |
- [📐 Quick Start](#-quick-start)
|
44 |
- [Preprocess and Inference](#preprocess-and-inference)
|
@@ -409,7 +412,7 @@ def prepare_message_for_vllm(content_messages):
|
|
409 |
new_content_list = []
|
410 |
for part_message in message_content_list:
|
411 |
if 'video' in part_message:
|
412 |
-
video_message = [{'content': [part_message]}]
|
413 |
image_inputs, video_inputs, video_kwargs = process_vision_info(video_message, return_video_kwargs=True)
|
414 |
assert video_inputs is not None, "video_inputs should not be None"
|
415 |
video_input = (video_inputs.pop()).permute(0, 2, 3, 1).numpy().astype(np.uint8)
|
@@ -515,4 +518,4 @@ If you find our work helpful for your research, please consider citing our work.
|
|
515 |
|
516 |
## Acknowledgement
|
517 |
|
518 |
-
Kwai Keye-VL is developed based on the codebases of the following projects: [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding work.
|
|
|
8 |
- multimodal
|
9 |
---
|
10 |
|
11 |
+
# Kwai Keye-VL 1.5
|
12 |
|
13 |
|
14 |
<div align="center">
|
15 |
<img src="asset/keye_logo_2.png" width="100%" alt="Kwai Keye-VL Logo">
|
16 |
</div>
|
17 |
|
18 |
+
Keye-VL-1.5 is a cutting-edge Multimodal Large Language Model (MLLM) that addresses fundamental challenges in video comprehension. It features a novel Slow-Fast video encoding strategy, a progressive four-stage pre-training methodology to extend context length up to 128K tokens, and a comprehensive post-training pipeline focusing on reasoning enhancement and human preference alignment. The model demonstrates significant improvements in video understanding tasks and maintains competitive performance on general multimodal benchmarks.
|
19 |
+
|
20 |
<font size=3><div align='center' >
|
21 |
[[🍎 Home Page](https://kwai-keye.github.io/)]
|
22 |
+
[[📖 Technical Report](https://arxiv.org/abs/2509.01563)]
|
23 |
+
[[💻 GitHub Repository](https://github.com/Kwai-Keye/Keye)]
|
24 |
[[📊 Keye-VL-8B-Preview](https://huggingface.co/Kwai-Keye/Keye-VL-8B-Preview) ]
|
25 |
[[📊 Keye-VL-1.5-8B](https://huggingface.co/Kwai-Keye/Keye-VL-1_5-8B/) ]
|
26 |
[[🚀 Demo](https://huggingface.co/spaces/Kwai-Keye/Keye-VL-8B-Preview)]
|
|
|
41 |
|
42 |
## Contents <!-- omit in toc -->
|
43 |
|
44 |
+
- [Kwai Keye-VL 1.5](#kwai-keye-vl-15)
|
45 |
- [🔥 News](#-news)
|
46 |
- [📐 Quick Start](#-quick-start)
|
47 |
- [Preprocess and Inference](#preprocess-and-inference)
|
|
|
412 |
new_content_list = []
|
413 |
for part_message in message_content_list:
|
414 |
if 'video' in part_message:
|
415 |
+
video_message = [{'content': [part_message]}]\
|
416 |
image_inputs, video_inputs, video_kwargs = process_vision_info(video_message, return_video_kwargs=True)
|
417 |
assert video_inputs is not None, "video_inputs should not be None"
|
418 |
video_input = (video_inputs.pop()).permute(0, 2, 3, 1).numpy().astype(np.uint8)
|
|
|
518 |
|
519 |
## Acknowledgement
|
520 |
|
521 |
+
Kwai Keye-VL is developed based on the codebases of the following projects: [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding work.
|