lucabarsellotti commited on
Commit
d8e5e44
·
1 Parent(s): 081a09c
Files changed (1) hide show
  1. README.md +97 -0
README.md CHANGED
@@ -1,3 +1,100 @@
1
  ---
 
2
  license: apache-2.0
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
  license: apache-2.0
4
+ tags:
5
+ - open-vocabulary
6
+ - semantic-segmentation
7
+ base_model:
8
+ - timm/vit_large_patch14_dinov2.lvd142m
9
+ - timm/vit_base_patch14_dinov2.lvd142m
10
  ---
11
+
12
+ <div align="center">
13
+ <h2>
14
+ <img src="./src/favicon.png" height=25> <span style="color: #FF0078;">Free</span><span style="color: #00509A;">DA</span>: Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation (CVPR 2024) <br>
15
+ </h2>
16
+ <p></p>
17
+
18
+ <p></p>
19
+
20
+ <h3>
21
+ <a href="https://lucabarsellotti.github.io//">Luca Barsellotti*</a>&ensp;
22
+ <a href="https://www.robertoamoroso.it//">Roberto Amoroso*</a>&ensp;
23
+ <a href="https://aimagelab.ing.unimore.it/imagelab/person.asp?idpersona=90/">Marcella Cornia</a>&ensp;
24
+ <a href="https://www.lorenzobaraldi.com//">Lorenzo Baraldi</a>&ensp;
25
+ <a href="https://aimagelab.ing.unimore.it/imagelab/person.asp?idpersona=1">Rita Cucchiara</a>&ensp;
26
+ </h3>
27
+
28
+
29
+ [Project Page](https://aimagelab.github.io/freeda/) | [Paper](https://arxiv.org/abs/2404.06542) | [Code](https://github.com/aimagelab/freeda)
30
+
31
+ </div>
32
+
33
+ <div align="center">
34
+ <figure>
35
+ <img alt="Qualitative results" src="./src/assets/qualitatives1.png">
36
+ </figure>
37
+ </div>
38
+
39
+ ## Method
40
+
41
+ <div align="center">
42
+ <figure>
43
+ <img alt="FreeDA method" src="./src/assets/inference.png">
44
+ </figure>
45
+ </div>
46
+
47
+ <br/>
48
+
49
+ <details>
50
+ <summary> Additional qualitative examples </summary>
51
+ <p align="center">
52
+ <img alt="Additional qualitative results" src="./src/assets/qualitatives.png" width="800" />
53
+ </p>
54
+ </details>
55
+
56
+ <details>
57
+ <summary> Additional examples <i>in-the-wild</i> </summary>
58
+ <p align="center">
59
+ <img alt="In-the-wild examples" src="./src/assets/into_the_wild.png" width="800" />
60
+ </p>
61
+ </details>
62
+
63
+ ## Installation
64
+
65
+ ```
66
+ conda create --name freeda python=3.9
67
+ conda activate freeda
68
+ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
69
+ pip install -r requirements.txt
70
+ ```
71
+
72
+ ## How to use
73
+
74
+ ```
75
+ import freeda
76
+ from PIL import Image
77
+ import requests
78
+ from io import BytesIO
79
+
80
+ if __name__ == "__main__":
81
+ fr = freeda.load("dinov2_vitb_clip_vitb")
82
+ response1 = requests.get("https://farm9.staticflickr.com/8306/7926031760_b313dca06a_z.jpg")
83
+ img1 = Image.open(BytesIO(response1.content))
84
+ response2 = requests.get("https://farm3.staticflickr.com/2207/2157810040_4883738d2d_z.jpg")
85
+ img2 = Image.open(BytesIO(response2.content))
86
+ fr.set_categories(["cat", "table", "pen", "keyboard", "toilet", "wall"])
87
+ fr.set_images([img1, img2])
88
+ segmentation = fr()
89
+ fr.visualize(segmentation, ["plot.png", "plot1.png"])
90
+ ```
91
+
92
+ If you find FreeDA useful for your work please cite:
93
+ ```
94
+ @inproceedings{barsellotti2024training
95
+ title={Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation},
96
+ author={Barsellotti, Luca and Amoroso, Roberto and Cornia, Marcella and Baraldi, Lorenzo and Cucchiara, Rita},
97
+ booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
98
+ year={2024}
99
+ }
100
+ ```