Improve model card with abstract summary and GitHub link (#3)

Browse files

- Improve model card with abstract summary and GitHub link (25028e1171ab6df4acede6017ee62449396cdfcf)

Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show

README.md +8 -5

README.md CHANGED Viewed

@@ -8,16 +8,19 @@ tags:
 - multimodal
 ---
-# Kwai Keye-VL
 <div align="center">
   <img src="asset/keye_logo_2.png" width="100%" alt="Kwai Keye-VL Logo">
 </div>
 <font size=3><div align='center' >
 [[🍎 Home Page](https://kwai-keye.github.io/)]
-[[📖 Technique Report](https://arxiv.org/abs/2509.01563)]
 [[📊 Keye-VL-8B-Preview](https://huggingface.co/Kwai-Keye/Keye-VL-8B-Preview) ]
 [[📊 Keye-VL-1.5-8B](https://huggingface.co/Kwai-Keye/Keye-VL-1_5-8B/) ]
 [[🚀 Demo](https://huggingface.co/spaces/Kwai-Keye/Keye-VL-8B-Preview)]
@@ -38,7 +41,7 @@ tags:
 ## Contents <!-- omit in toc -->
-- [Kwai Keye-VL](#kwai-keye-vl)
   - [🔥 News](#-news)
   - [📐 Quick Start](#-quick-start)
     - [Preprocess and Inference](#preprocess-and-inference)
@@ -409,7 +412,7 @@ def prepare_message_for_vllm(content_messages):
         new_content_list = []
         for part_message in message_content_list:
             if 'video' in part_message:
-                video_message = [{'content': [part_message]}]
                 image_inputs, video_inputs, video_kwargs = process_vision_info(video_message, return_video_kwargs=True)
                 assert video_inputs is not None, "video_inputs should not be None"
                 video_input = (video_inputs.pop()).permute(0, 2, 3, 1).numpy().astype(np.uint8)
@@ -515,4 +518,4 @@ If you find our work helpful for your research, please consider citing our work.
 ## Acknowledgement
-Kwai Keye-VL is developed based on the codebases of the following projects: [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding work.

 - multimodal
 ---
+# Kwai Keye-VL 1.5
 <div align="center">
   <img src="asset/keye_logo_2.png" width="100%" alt="Kwai Keye-VL Logo">
 </div>
+Keye-VL-1.5 is a cutting-edge Multimodal Large Language Model (MLLM) that addresses fundamental challenges in video comprehension. It features a novel Slow-Fast video encoding strategy, a progressive four-stage pre-training methodology to extend context length up to 128K tokens, and a comprehensive post-training pipeline focusing on reasoning enhancement and human preference alignment. The model demonstrates significant improvements in video understanding tasks and maintains competitive performance on general multimodal benchmarks.
 <font size=3><div align='center' >
 [[🍎 Home Page](https://kwai-keye.github.io/)]
+[[📖 Technical Report](https://arxiv.org/abs/2509.01563)]
+[[💻 GitHub Repository](https://github.com/Kwai-Keye/Keye)]
 [[📊 Keye-VL-8B-Preview](https://huggingface.co/Kwai-Keye/Keye-VL-8B-Preview) ]
 [[📊 Keye-VL-1.5-8B](https://huggingface.co/Kwai-Keye/Keye-VL-1_5-8B/) ]
 [[🚀 Demo](https://huggingface.co/spaces/Kwai-Keye/Keye-VL-8B-Preview)]
 ## Contents <!-- omit in toc -->
+- [Kwai Keye-VL 1.5](#kwai-keye-vl-15)
   - [🔥 News](#-news)
   - [📐 Quick Start](#-quick-start)
     - [Preprocess and Inference](#preprocess-and-inference)
         new_content_list = []
         for part_message in message_content_list:
             if 'video' in part_message:
+                video_message = [{'content': [part_message]}]\
                 image_inputs, video_inputs, video_kwargs = process_vision_info(video_message, return_video_kwargs=True)
                 assert video_inputs is not None, "video_inputs should not be None"
                 video_input = (video_inputs.pop()).permute(0, 2, 3, 1).numpy().astype(np.uint8)
 ## Acknowledgement
+Kwai Keye-VL is developed based on the codebases of the following projects: [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding work.