Row11n commited on
Commit
6671a04
·
verified ·
1 Parent(s): 0c0ac58

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -0
README.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - google/siglip-so400m-patch14-384
5
+ pipeline_tag: image-feature-extraction
6
+ ---
7
+ # Model Card for CoMP-MM-1B
8
+
9
+ <!-- Provide a quick summary of what the model is/does. -->
10
+ This is an VFM that supports <b>native image resolution inputs</b>, continually pre-trained from [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384).
11
+
12
+ ## Model Sources
13
+
14
+ <!-- Provide the basic links for the model. -->
15
+
16
+ - **Repository:** [https://github.com/SliMM-X/CoMP-MM]
17
+ - **Paper:** [https://arxiv.org/abs/2503.18931]
18
+
19
+
20
+ ## How to Get Started with the Model
21
+
22
+ Install the github repo, and use the code below to get started with the model.
23
+
24
+ ```python
25
+ import torch
26
+ from slimm.model.processor import SliMMQwen2VLProcessor
27
+ from slimm.model.utils_vl import process_vision_info
28
+ from slimm.model.vision_encoder import CoMPSiglipVisionModel
29
+ from PIL import Image
30
+
31
+ model_path = "SliMM-X/CoMP-SigLIP-So400M"
32
+
33
+ model = CoMPSiglipVisionModel.from_pretrained(
34
+ model_path, torch_dtype="auto", device_map="cuda", w_merger=False
35
+ ).to(torch.bfloat16)
36
+
37
+
38
+ processor = SliMMQwen2VLProcessor.from_pretrained(model_path)
39
+
40
+ image_input = Image.open("https://slimm-x.github.io/comp/figs/teaser.png")
41
+ inputs = processor(
42
+ images=image_input,
43
+ return_tensors="pt",
44
+ )
45
+
46
+ inputs = inputs.to("cuda")
47
+ output_feat = model(inputs.pixel_values.to(torch.bfloat16), inputs.image_grid_thw)
48
+ print(output_feat)
49
+ ```
50
+
51
+ ## Citation
52
+
53
+
54
+ **BibTeX:**
55
+
56
+ ```bibtex
57
+ @article{comp2025,
58
+ title={CoMP: Continual Multimodal Pre-training for Vision Foundation Models},
59
+ author={Chen, Yitong and Meng, Lingchen and Peng, Wujian and Wu, Zuxuan and Jiang, Yu-Gang},
60
+ year={2025},
61
+ journal={arXiv preprint arXiv:2503.18931},
62
+ }
63
+ ```