jiang commited on
Commit
3c6babc
1 Parent(s): 650c5f6
Files changed (1) hide show
  1. README.md +13 -185
README.md CHANGED
@@ -1,185 +1,13 @@
1
- # PolyFormer: Referring Image Segmentation as Sequential Polygon Generation (CVPR 2023)
2
- [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/polyformer-referring-image-segmentation-as/referring-expression-segmentation-on-refcocog)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcocog?p=polyformer-referring-image-segmentation-as)
3
- [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/polyformer-referring-image-segmentation-as/referring-expression-segmentation-on-refcoco)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco?p=polyformer-referring-image-segmentation-as)
4
- [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/polyformer-referring-image-segmentation-as/referring-expression-segmentation-on-refcoco-1)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-1?p=polyformer-referring-image-segmentation-as)
5
- [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/polyformer-referring-image-segmentation-as/referring-expression-comprehension-on-refcoco)](https://paperswithcode.com/sota/referring-expression-comprehension-on-refcoco?p=polyformer-referring-image-segmentation-as)
6
- [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/polyformer-referring-image-segmentation-as/referring-expression-comprehension-on-refcoco-1)](https://paperswithcode.com/sota/referring-expression-comprehension-on-refcoco-1?p=polyformer-referring-image-segmentation-as)
7
-
8
-
9
- \[[Project Page](https://polyformer.github.io/)\] \[[Paper](https://arxiv.org/abs/2302.07387)\]
10
-
11
- by [Jiang Liu*](https://joellliu.github.io/), [Hui Ding*](http://www.huiding.org/), [Zhaowei Cai](https://zhaoweicai.github.io/), [Yuting Zhang](https://scholar.google.com/citations?user=9UfZJskAAAAJ&hl=en), [Ravi Kumar Satzoda](https://scholar.google.com.sg/citations?user=4ngycwIAAAAJ&hl=en), [Vijay Mahadevan](https://scholar.google.com/citations?user=n9fRgvkAAAAJ&hl=en), [R. Manmatha](https://ciir.cs.umass.edu/~manmatha/).
12
-
13
-
14
- ## :notes: Introduction
15
- ![github_figure](pipeline.gif)
16
- PolyFormer is a unified model for referring image segmentation (polygon vertex sequence) and referring expression comprehension (bounding box corner points). The polygons are converted to segmentation masks in the end.
17
-
18
- **Contributions:**
19
-
20
- * State-of-the-art results on referring image segmentation and referring expression comprehension on 6 datasets;
21
- * A unified framework for referring image segmentation (RIS) and referring expression comprehension (REC) by formulating them as a sequence-to-sequence (seq2seq) prediction problem;
22
- * A regression-based decoder for accurate coordinate prediction, which outputs continuous 2D coordinates directly without quantization error..
23
-
24
-
25
-
26
- ## Getting Started
27
- ### Installation
28
- ```bash
29
- conda create -n polyformer python=3.7.4
30
- conda activate polyformer
31
- python -m pip install -r requirements.txt
32
- ```
33
- Note: if you are getting import errors from `fairseq`, try the following:
34
- ```bash
35
- python -m pip install pip==21.2.4
36
- pip uninstall fairseq
37
- pip install -r requirements.txt
38
- ```
39
-
40
- ## Datasets
41
- ### Prepare Pretraining Data
42
- 1. Create the dataset folders
43
- ```bash
44
- mkdir datasets
45
- mkdir datasets/images
46
- mkdir datasets/annotations
47
- ```
48
- 2. Download the *2014 Train images [83K/13GB]* from [COCO](https://cocodataset.org/#download),
49
- original [Flickr30K images](http://shannon.cs.illinois.edu/DenotationGraph/),
50
- [ReferItGame images](https://drive.google.com/file/d/1R6Tm7tQTHCil6A_eOhjudK3rgaBxkD2t/view?usp=sharing),
51
- and [Visual Genome images](http://visualgenome.org/api/v0/api_home.html), and extract them to `datasets/images`.
52
- 3. Download the annotation file for pretraining datasets [instances.json](https://drive.google.com/drive/folders/1O4hzL8_s3aUsnj_JZnM3CwANd7TejcJO)
53
- provided by [SeqTR](https://github.com/sean-zhuh/SeqTR) and store it in `datasets/annotations`.
54
- The workspace directory should be organized like this:
55
- ```
56
- PolyFormer/
57
- ├── datasets/
58
- │   ├── images
59
- │   │   ├── flickr30k/*.jpg
60
- │   │   ├── mscoco/
61
- │   │ │  └── train2014/*.jpg
62
- │   │   ├── saiaprtc12/*.jpg
63
- │   │   └── visual-genome/*.jpg
64
- │   └── annotations
65
- │      └── instances.json
66
- └── ...
67
- ```
68
- 4. Generate the tsv files for pretraining
69
- ```bash
70
- python data/create_pretraining_data.py
71
- ```
72
- ### Prepare Finetuning Data
73
- 1. Follow the instructions in the `./refer` directory to set up subdirectories
74
- and download annotations.
75
- This directory is based on the [refer](https://github.com/lichengunc/refer) API.
76
-
77
- 2. Generate the tsv files for finetuning
78
- ```bash
79
- python data/create_finetuning_data.py
80
- ```
81
-
82
-
83
-
84
-
85
- ## Pretraining
86
- 1. Create the checkpoints folder
87
- ```bash
88
- mkdir weights
89
- ```
90
- 2. Download pretrain weights of [Swin-base](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384_22k.pth),
91
- [Swin-large](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth),
92
- [BERT-base](https://cdn.huggingface.co/bert-base-uncased-pytorch_model.bin)
93
- and put the weight files in `./pretrained_weights`.
94
- These weights are needed for training to initialize the model.
95
-
96
-
97
- 3. Run the pretraining scripts for model pretraining on the referring expression comprehension task:
98
- ```bash
99
- cd run_scripts/pretrain
100
- bash pretrain_polyformer_b.sh # for pretraining PolyFormer-B model
101
- bash pretrain_polyformer_l.sh # for pretraining PolyFormer-L model
102
- ```
103
-
104
- ## Finetuning
105
- Run the finetuning scripts for model pretraining on the referring image segmentation and referring expression comprehension tasks:
106
- ```bash
107
- cd run_scripts/finetune
108
- bash train_polyformer_b.sh # for finetuning PolyFormer-B model
109
- bash train_polyformer_l.sh # for finetuning PolyFormer-L model
110
- ```
111
- Please make sure to link the pretrain weight paths (Line 20) in the finetuning scripts to the best pretraining checkpoints.
112
-
113
- ## Evaluation
114
- Run the evaluation scripts for evaluating on the referring image segmentation and referring expression comprehension tasks:
115
- ```bash
116
- cd run_scripts/evaluation
117
-
118
- # for evaluating PolyFormer-B model
119
- bash evaluate_polyformer_b_refcoco.sh
120
- bash evaluate_polyformer_b_refcoco+.sh
121
- bash evaluate_polyformer_b_refcocog.sh
122
-
123
- # for evaluating PolyFormer-L model
124
- bash evaluate_polyformer_l_refcoco.sh
125
- bash evaluate_polyformer_l_refcoco+.sh
126
- bash evaluate_polyformer_l_refcocog.sh
127
- ```
128
-
129
- ## Model Zoo
130
- Download the model weights to `./weights` if you want to use our trained models for finetuning and evaluation.
131
-
132
- | | Refcoco val| | | Refcoco testA| | | Refcoco testB| ||
133
- |-------------------------------------------------------------------------------------------------------|------|------|---------|------|-------|------|-----|------|------|
134
- | Model | oIoU | mIoU | [email protected] | oIoU | mIoU |[email protected] | oIoU | mIoU |[email protected] |
135
- | [PolyFormer-B](https://drive.google.com/file/d/1K0y-WBO6cL7gBzNnJaHAeNu3pgq4DbJ9/view?usp=share_link) | 74.82| 75.96 | 89.73 |76.64| 77.09 | 91.73| 71.06| 73.22 | 86.03 |
136
- | [PolyFormer-L](https://drive.google.com/file/d/15P6m5RI6HAQE2QXQXMAjw_oBsaPii7b3/view?usp=share_link) | 75.96| 76.94 | 90.38 |78.29| 78.49 | 92.89| 73.25| 74.83 | 87.16|
137
-
138
-
139
- | [test_demo.py](..%2F..%2FDownloads%2Ftest_demo.py) | Refcoco val| | | Refcoco testA| | | Refcoco testB| ||
140
- |--------------------------------------------------------------------------------------------------------|------|------|------|------|------|------|------|------|------|
141
- | Model | oIoU | mIoU |[email protected]| oIoU | mIoU |[email protected] | oIoU | mIoU |[email protected] |
142
- | [PolyFormer-B ](https://drive.google.com/file/d/12_ylFhsbqGySxDqgeEByn8nKoJtT2n2w/view?usp=share_link) | 67.64| 70.65 | 83.73 | 72.89| 74.51 | 88.60 | 59.33| 64.64 | 76.38 | 67.76| 69.36 |
143
- | [PolyFormer-L](https://drive.google.com/file/d/1lUCv7dUPctEz4vEpPr7aI8A8ZmfYCB8y/view?usp=share_link) | 69.33| 72.15 | 84.98 | 74.56| 75.71 | 89.77 | 61.87| 66.73 | 77.97 | 69.20| 71.15 |
144
-
145
-
146
- | | Refcocog val| || | Refcocog test| |
147
- |-------------------------------------------------------------------------------------------------------|------|------|------|------|------|------|
148
- | Model | oIoU | mIoU |[email protected] | oIoU | mIoU |[email protected] |
149
- | [PolyFormer-B](https://drive.google.com/file/d/12_ylFhsbqGySxDqgeEByn8nKoJtT2n2w/view?usp=share_link) | 67.76| 69.36 | 84.46| 69.05| 69.88 | 84.96 |
150
- | [PolyFormer-L](https://drive.google.com/file/d/1lUCv7dUPctEz4vEpPr7aI8A8ZmfYCB8y/view?usp=share_link) | 69.20| 71.15 | 85.83 | 70.19| 71.17 | 85.91|
151
-
152
- * Pretrained weights:
153
- * [PolyFormer-B](https://drive.google.com/file/d/1sAzfChYDdHdaeatB2K14lrJjG4uiXAol/view?usp=share_link)
154
- * [PolyFormer-L](https://drive.google.com/file/d/1knRxgM1lmEkuZZ-cOm_fmwKP1H0bJGU9/view?usp=share_link)
155
-
156
- # Acknowlegement
157
- This codebase is developed based on [OFA](https://github.com/OFA-Sys/OFA).
158
- Other related codebases include:
159
- * [Fairseq](https://github.com/pytorch/fairseq)
160
- * [refer](https://github.com/lichengunc/refer)
161
- * [LAVT-RIS](https://github.com/yz93/LAVT-RIS/)
162
- * [SeqTR](https://github.com/sean-zhuh/SeqTR)
163
-
164
-
165
-
166
- # Citation
167
- Please cite our paper if you find this codebase helpful :)
168
-
169
- ```
170
- @inproceedings{liu2023polyformer,
171
- title={PolyFormer: Referring Image Segmentation as Sequential Polygon Generation},
172
- author={Liu, Jiang and Ding, Hui and Cai, Zhaowei and Zhang, Yuting and Satzoda, Ravi Kumar and Mahadevan, Vijay and Manmatha, R},
173
- booktitle={CVPR},
174
- year={2023}
175
- }
176
- ```
177
-
178
- ## Security
179
-
180
- See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
181
-
182
- ## License
183
-
184
- This project is licensed under the Apache-2.0 License.
185
-
 
1
+ ---
2
+ title: PolyFormer
3
+ emoji: 🖌️🎨
4
+ colorFrom: pink
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 3.14.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: afl-3.0
11
+ ---
12
+
13
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference