File size: 3,748 Bytes
0b38144
 
 
 
 
 
 
01c7ac4
0b38144
01c7ac4
 
0b38144
01c7ac4
 
0b38144
01c7ac4
0b38144
 
 
 
 
 
 
 
01c7ac4
 
 
 
0b38144
 
 
 
 
 
01c7ac4
 
 
0b38144
 
 
 
 
 
 
 
01c7ac4
0b38144
c1a3627
 
 
 
 
0b38144
 
 
 
01c7ac4
0b38144
 
 
 
 
 
 
 
01c7ac4
0b38144
 
 
01c7ac4
0b38144
 
 
 
 
 
01c7ac4
0b38144
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
01c7ac4
0b38144
 
 
 
01c7ac4
 
 
 
 
 
 
0b38144
 
 
 
01c7ac4
b834f5f
01c7ac4
 
b834f5f
 
0b38144
01c7ac4
0b38144
 
 
01c7ac4
0b38144
 
 
 
 
 
01c7ac4
c1a3627
 
 
01c7ac4
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
tags:
- text-to-image
- lora
- diffusers
- template:diffusion-lora
widget:
- text: a photo of a laptop above a dog
  output:
    url: images/laptop-above-dog.jpg
- text: a photo of a bird below a skateboard
  output:
    url: images/bird-below-skateboard.jpg
- text: a photo of a horse to the left of a bottle
  output:
    url: images/horse-left-bottle.jpg
base_model: black-forest-labs/FLUX.1-dev
instance_prompt: null
license: other
license_name: compass-lora-weights-nc-license
license_link: LICENSE
---
# CoMPaSS-FLUX.1

\[[Project Page]\]
\[[code]\]
\[[arXiv]\]

<Gallery />

## Model description 

# CoMPaSS-FLUX.1

A LoRA adapter that enhances spatial understanding capabilities of the FLUX.1 text-to-image
diffusion model. This model demonstrates significant improvements in generating images with specific
spatial relationships between objects.

## Model Details

- **Base Model**: FLUX.1-dev
- **LoRA Rank**: 16
- **Training Data**: SCOP dataset (curated from COCO)
- **File Size**: ~50MiB
- **Framework**: Diffusers
- **License**: Non-Commercial (see [./LICENSE])

## ComfyUI Support

We provide a custom node with examples at [comfyui-node-impl]. Use the
ComfyUI-compatible LoRA checkpoint [comfyui-checkpoint] to get started.

## Intended Use

- Generating images with accurate spatial relationships between objects
- Creating compositions that require specific spatial arrangements
- Enhancing the base model's spatial understanding while maintaining its other capabilities

## Performance 

### Key Improvements

- VISOR benchmark: +98% relative improvement
- T2I-CompBench Spatial: +67% relative improvement
- GenEval Position: +131% relative improvement
- Maintains or improves base model's image fidelity (lower FID and CMMD scores than base model)

## Using the Model

See our [GitHub repository][code] to get started.

### Effective Prompting

The model works well with:
- Clear spatial relationship descriptors (left, right, above, below)
- Pairs of distinct objects
- Explicit spatial relationships (e.g., "a photo of A to the right of B")

## Training Details

### Training Data

- Built using the SCOP (Spatial Constraints-Oriented Pairing) data engine
- ~28,000 curated object pairs from COCO
- Enforces criteria for:
  - Visual significance
  - Semantic distinction
  - Spatial clarity
  - Object relationships
  - Visual balance

### Training Process

- Trained for 24,000 steps
- Batch size of 4
- Learning rate: 1e-4
- Optimizer: AdamW with β₁=0.9, β₂=0.999
- Weight decay: 1e-2

## Evaluation Results

| Metric | FLUX.1 | +CoMPaSS |
|--------|-------------|-----------|
| VISOR uncond (⬆️) | 37.96% | **75.17%** |
| T2I-CompBench Spatial (⬆️) | 0.18 | **0.30** |
| GenEval Position (⬆️) | 0.26 | **0.60** |
| FID (⬇️) | 27.96 | **26.40** |
| CMMD (⬇️) | 0.8737 | **0.6859** |

## Citation

If you use this model in your research, please cite:
```bibtex
@inproceedings{zhang2025compass,
  title={CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models},
  author={Zhang, Gaoyang and Fu, Bingtao and Fan, Qingnan and Zhang, Qi and Liu, Runxing and Gu, Hong and Zhang, Huaqi and Liu, Xinguo},
  booktitle={ICCV},
  year={2025}
}
```

## Contact

For questions about the model, please contact <[email protected]>

## Download model

Weights for this model are available in Safetensors format.

[Download](/blurgy/CoMPaSS-FLUX.1/tree/main) them in the Files & versions tab.

[comfyui-node-impl]: <https://github.com/blurgyy/CoMPaSS-FLUX.1-dev-ComfyUI>
[comfyui-checkpoint]: <./CoMPaSS-FLUX.1-comfyui.safetensors>

[./LICENSE]: <./LICENSE>
[Project page]: <https://compass.blurgy.xyz>
[code]: <https://github.com/blurgyy/CoMPaSS>
[arXiv]: <https://arxiv.org/abs/2412.13195>