CoMPaSS-FLUX.1 / README.md

Improve CoMPaSS-FLUX.1 Model Card: Add pipeline_tag, library_name, and format links

f36d205 verified about 1 month ago

3.64 kB

	---
	base_model: black-forest-labs/FLUX.1-dev
	license: other
	license_name: compass-lora-weights-nc-license
	license_link: LICENSE
	pipeline_tag: text-to-image
	library_name: diffusers
	tags:
	- text-to-image
	- lora
	- diffusers
	- template:diffusion-lora
	widget:
	- text: a photo of a laptop above a dog
	output:
	url: images/laptop-above-dog.jpg
	- text: a photo of a bird below a skateboard
	output:
	url: images/bird-below-skateboard.jpg
	- text: a photo of a horse to the left of a bottle
	output:
	url: images/horse-left-bottle.jpg
	---

	# CoMPaSS-FLUX.1: Enhancing Spatial Understanding in Text-to-Image Diffusion Models

	[Project Page](https://compass.blurgy.xyz) \| [Code](https://github.com/blurgyy/CoMPaSS) \| [arXiv](https://arxiv.org/abs/2412.13195)

	<Gallery />

	## Model description

	A LoRA adapter that enhances spatial understanding capabilities of the FLUX.1 text-to-image
	diffusion model. This model, presented in [CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models](https://arxiv.org/abs/2412.13195), demonstrates significant improvements in generating images with specific
	spatial relationships between objects.

	## Model Details

	- Base Model: FLUX.1-dev
	- LoRA Rank: 16
	- Training Data: SCOP dataset (curated from COCO)
	- File Size: ~50MiB
	- Framework: Diffusers
	- License: Non-Commercial (see [./LICENSE])

	## Intended Use

	- Generating images with accurate spatial relationships between objects
	- Creating compositions that require specific spatial arrangements
	- Enhancing the base model's spatial understanding while maintaining its other capabilities

	## Performance

	### Key Improvements

	- VISOR benchmark: +98% relative improvement
	- T2I-CompBench Spatial: +67% relative improvement
	- GenEval Position: +131% relative improvement
	- Maintains or improves base model's image fidelity (lower FID and CMMD scores than base model)

	## Using the Model

	See our [GitHub repository](https://github.com/blurgyy/CoMPaSS) to get started.

	### Effective Prompting

	The model works well with:
	- Clear spatial relationship descriptors (left, right, above, below)
	- Pairs of distinct objects
	- Explicit spatial relationships (e.g., "a photo of A to the right of B")

	## Training Details

	### Training Data

	- Built using the SCOP (Spatial Constraints-Oriented Pairing) data engine
	- ~28,000 curated object pairs from COCO
	- Enforces criteria for:
	- Visual significance
	- Semantic distinction
	- Spatial clarity
	- Object relationships
	- Visual balance

	### Training Process

	- Trained for 24,000 steps
	- Batch size of 4
	- Learning rate: 1e-4
	- Optimizer: AdamW with β₁=0.9, β₂=0.999
	- Weight decay: 1e-2

	## Evaluation Results

	\| Metric \| FLUX.1 \| +CoMPaSS \|
	\|--------\|-------------\|-----------\|
	\| VISOR uncond (⬆️) \| 37.96% \| 75.17% \|
	\| T2I-CompBench Spatial (⬆️) \| 0.18 \| 0.30 \|
	\| GenEval Position (⬆️) \| 0.26 \| 0.60 \|
	\| FID (⬇️) \| 27.96 \| 26.40 \|
	\| CMMD (⬇️) \| 0.8737 \| 0.6859 \|

	## Citation

	If you use this model in your research, please cite:
	```bibtex
	@inproceedings{zhang2025compass,
	title={CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models},
	author={Zhang, Gaoyang and Fu, Bingtao and Fan, Qingnan and Zhang, Qi and Liu, Runxing and Gu, Hong and Zhang, Huaqi and Liu, Xinguo},
	booktitle={ICCV},
	year={2025}
	}
	```

	## Contact

	For questions about the model, please contact <[email protected]>

	## Download model

	Weights for this model are available in Safetensors format.

	[Download](/blurgy/CoMPaSS-FLUX.1/tree/main) them in the Files & versions tab.

	[./LICENSE]: <./LICENSE>