NeCo / README.md

Update README.md

47b484b verified 6 months ago

4.76 kB

	---
	license: mit
	language:
	- en
	base_model:
	- facebook/dinov2-base
	- facebook/dinov2-small
	tags:
	- computer_vision
	---

	# Near, far: Patch-ordering enhances vision foundation models' scene understanding

	Welcome to the Hugging Face repository for NeCo. an adapted vision encoder that captures fine-grained details and structural information essential for performing key-point matching, semantic segmentation and more. This repository hosts pretrained checkpoints for NeCo, enabling easy integration into your projects.

	Our paper discussing our work:
	"Near, far: Patch-ordering enhances vision foundation models' scene understanding"
	[Valentinos Pariza](https://vpariza.github.io), [Mohammadreza Salehi](https://smsd75.github.io),[Gertjan J. Burghouts](https://gertjanburghouts.github.io), [Francesco Locatello](https://www.francescolocatello.com/), [Yuki M. Asano](yukimasano.github.io)

	🌐 [Project Page](https://vpariza.github.io/NeCo/)
	⌨️ [GitHub Repository](https://github.com/vpariza/NeCo)
	📄 [Read the Paper on arXiv](https://arxiv.org/abs/2408.11054)

	## Model Details

	### Model Description

	NeCo introduces a new self-supervised learning technique for enhancing spatial representations in vision transformers. By leveraging Patch Neighbor Consistency, NeCo captures fine-grained details and structural information that are crucial for various downstream tasks, such as semantic segmentation.

	- Model type: Vision Encoder (Dino, Dinov2, ...)
	- Language(s) (NLP): Python
	- License: MIT
	- Finetuned from model [optional]: Dinov2, Dinov2R, Dino, ...


	## How to Get Started with the Model

	To use NeCo models on downstream dense prediction tasks, you just need to install `timm` and `torch` and depending on which checkpoint you use you can load it as follows:

	The models can be download from our [NeCo Hugging Face repo](https://huggingface.co/FunAILab/NeCo/tree/main).

	#### Models after post-training dinov2 (following dinov2 architecture)

	##### NeCo on Dinov2
	```python
	import torch
	# change to dinov2_vitb14 for base as described in:
	# https://github.com/facebookresearch/dinov2
	model = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14')
	path_to_checkpoint = "<your path to downloaded ckpt>"
	state_dict = torch.load(path_to_checkpoint)
	model.load_state_dict(state_dict, strict=False)
	```
	##### NeCo on Dinov2 with Registers
	```python
	import torch
	# change to dinov2_vitb14_reg for base as described in:
	# https://github.com/facebookresearch/dinov2
	model = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14_reg')
	path_to_checkpoint = "<your path to downloaded ckpt>"
	state_dict = torch.load(path_to_checkpoint)
	model.load_state_dict(state_dict, strict=False)
	```
	#### Models after post-training dino or similar (following dino architecture)
	##### timm vit-small and vit-base architectures
	```python
	import torch
	from timm.models.vision_transformer import vit_small_patch16_224, vit_base_patch16_224
	# Change to vit_base_patch8_224() if you want to use our larger model
	model = vit_small_patch16_224()
	path_to_checkpoint = "<your path to downloaded ckpt>"
	state_dict = torch.load(path_to_checkpoint, map_location='cpu')
	model.load_state_dict(state_dict, strict=False)
	```

	Note: In case you want to directly load the weights of the model from a hugging face url, please execute:
	```python
	import torch
	state_dict = torch.hub.load_state_dict_from_url("<url to the hugging face checkpoint>")
	```

	## Training Details

	### Training Data

	* We have post-trained our models on the COCO Dataset.

	### Training Procedure

	Please look our repository and read our paper for more details.

	## Environmental Impact
	- Hardware Type: NVIDIA A100 GPU
	- Hours used: 18 (per model)
	- Cloud Provider: Helma NHR FAU (Germany), (Snellius The Netherlands)
	- Compute Region: Europe/Germany & Netherlands

	## Citation

	BibTeX:
	```
	@inproceedings{
	pariza2025near,
	title={Near, far: Patch-ordering enhances vision foundation models' scene understanding},
	author={Valentinos Pariza and Mohammadreza Salehi and Gertjan J. Burghouts and Francesco Locatello and Yuki M Asano},
	booktitle={The Thirteenth International Conference on Learning Representations},
	year={2025},
	url={https://openreview.net/forum?id=Qro97zWC29}
	}

	```

	<!-- APA: -->

	<!-- [More Information Needed] -->

	<!-- ## Glossary [optional] -->

	<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

	<!-- [More Information Needed]

	## More Information [optional]

	[More Information Needed]

	## Model Card Authors [optional]

	[More Information Needed]

	## Model Card Contact

	[More Information Needed] -->