Safetensors
jiangchengchengNLP commited on
Commit
e314b28
·
verified ·
1 Parent(s): d7987fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -6,6 +6,12 @@ base_model:
6
  - Qwen/Qwen2.5-0.5B
7
  - openai/clip-vit-large-patch14-336
8
  ---
 
 
 
 
 
 
9
  # Visual Language Model Based on Qwen and CLIP
10
 
11
  This is a visual language multimodal model built upon the Qwen series language models and the CLIP visual encoder. It has been trained for 10 epochs on the LLaVA pre-training dataset and nearly 800K examples (150K instruction fine-tuning and 665K instruction mixed fine-tuning). However, due to data size is larger than model, so it can only perform simple question-answering tasks on images and currently supports only English question answering.
 
6
  - Qwen/Qwen2.5-0.5B
7
  - openai/clip-vit-large-patch14-336
8
  ---
9
+
10
+ # Note that this is a model library with errors.
11
+ # In subsequent learning, I found that my model only used one visual token, which was a fatal mistake that resulted in a decrease in the performance of the model.
12
+ # I will revise this model library and release a new model when I have time in the future.
13
+
14
+
15
  # Visual Language Model Based on Qwen and CLIP
16
 
17
  This is a visual language multimodal model built upon the Qwen series language models and the CLIP visual encoder. It has been trained for 10 epochs on the LLaVA pre-training dataset and nearly 800K examples (150K instruction fine-tuning and 665K instruction mixed fine-tuning). However, due to data size is larger than model, so it can only perform simple question-answering tasks on images and currently supports only English question answering.