PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

Project | GitHub | Demo

Model description

PolyFormer is a unified framework for referring image segmentation (RIS) and referring expression comprehension (REC) by formulating them as a sequence-to-sequence (seq2seq) prediction problem. For more details, please refer to our paper:

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation Jiang Liu*, Hui Ding*, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha, CVPR 2023

Training data

We pre-train PolyFormer on the REC task using Visual Genome, RefCOCO, RefCOCO+, RefCOCOg, and Flickr30k-entities, and the finetune on REC + RIS task using RefCOCO, RefCOCO+, and RefCOCOg.

  • PolyFormer-B: Swin-B as the visual encoder, BERT-base as the text encoder, 6 transformer encoder layers and 6 decoder layers.
  • PolyFormer-L: Swin-L as the visual encoder, BERT-base as the text encoder, 12 transformer encoder layers and 12 decoder layers.

Citation

If you find PolyFormer useful in your research, please cite the following paper:

@article{liu2023polyformer,
  title={PolyFormer: Referring Image Segmentation as Sequential Polygon Generation},
  author={Liu, Jiang and Ding, Hui and Cai, Zhaowei and Zhang, Yuting and Satzoda, Ravi Kumar and Mahadevan, Vijay and Manmatha, R},
  journal={arXiv preprint arXiv:2302.07387},
  year={2023}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Space using koajoel/PolyFormer 1