English
Zhiminli commited on
Commit
d8993ab
1 Parent(s): 3939338

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +162 -0
  2. clip_img_encoder.pt +3 -0
  3. ipa.pt +3 -0
README.md ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Using HunyuanDiT IP-Adapter
2
+
3
+
4
+ ### Instructions
5
+
6
+ The dependencies and installation are basically the same as the base model, and we use the module weights for training.
7
+ Download the model using the following commands:
8
+
9
+ ```bash
10
+ cd HunyuanDiT
11
+ # Use the huggingface-cli tool to download the model.
12
+ # We recommend using module weights as the base model for IP-Adapter inference, as our provided pretrained weights are trained on them.
13
+ huggingface-cli download Tencent-Hunyuan/IP-Adapter/ipa.pt --local-dir ./ckpts/t2i/model
14
+ huggingface-cli download Tencent-Hunyuan/IP-Adapter/clip_img_encoder.pt --local-dir ./ckpts/t2i/model/clip_img_encoder
15
+
16
+ # Quick start
17
+ python3 sample_ipadapter.py --infer-mode fa --ref-image-path ipadapter/input/tiger.png --i-scale 1.0 --prompt 一只老虎在海洋中游泳,背景是海洋。构图方式是居中构图,呈现了动漫风格和文化,营造了平静的氛围。 --infer-steps 100 --is-ipa True --load-key module
18
+ ```
19
+
20
+ Examples of ref input and IP-Adapter results are as follows:
21
+ <table>
22
+ <tr>
23
+ <td colspan="3" align="center">Ref Input</td>
24
+ </tr>
25
+
26
+
27
+
28
+
29
+
30
+ <tr>
31
+ <td align="center"><img src="asset/input/tiger.png" alt="Image 0" width="200"/></td>
32
+ <td align="center"><img src="asset/input/beauty.png" alt="Image 1" width="200"/></td>
33
+ <td align="center"><img src="asset/input/xunyicao.png" alt="Image 2" width="200"/></td>
34
+
35
+ </tr>
36
+
37
+ <tr>
38
+ <td colspan="3" align="center">IP-Adapter Output</td>
39
+ </tr>
40
+
41
+ <tr>
42
+ <td align="center">一只老虎在奔跑。<br>(A tiger running.) </td>
43
+ <td align="center">一个卡通美女,抱着一只小猪。<br>(A cartoon beauty holding a little pig.) </td>
44
+ <td align="center">一片紫色薰衣草地。<br>(A purple lavender field.) </td>
45
+ </tr>
46
+
47
+ <tr>
48
+ <td align="center"><img src="asset/output/tiger_run.png" alt="Image 3" width="200"/></td>
49
+ <td align="center"><img src="asset/output/beauty_pig.png" alt="Image 4" width="200"/></td>
50
+ <td align="center"><img src="asset/output/xunyicao_res.png" alt="Image 5" width="200"/></td>
51
+ </tr>
52
+
53
+ <tr>
54
+ <td align="center">一只老虎在看书。<br>(A tiger is reading a book.) </td>
55
+ <td align="center">一个卡通美女,穿着绿色衣服。<br>(A cartoon beauty wearing green clothes.) </td>
56
+ <td align="center">一片紫色薰衣草地,有一只可爱的小狗。<br>(A purple lavender field with a cute puppy.) </td>
57
+ </tr>
58
+
59
+ <tr>
60
+ <td align="center"><img src="asset/output/tiger_book.png" alt="Image 3" width="200"/></td>
61
+ <td align="center"><img src="asset/output/beauty_green_cloth.png" alt="Image 4" width="200"/></td>
62
+ <td align="center"><img src="asset/output/xunyicao_dog.png" alt="Image 5" width="200"/></td>
63
+ </tr>
64
+
65
+ <tr>
66
+ <td align="center">一只老虎在咆哮。<br>(A tiger is roaring.) </td>
67
+ <td align="center">一个卡通美女,戴着墨镜。<br>(A cartoon beauty wearing sunglasses.) </td>
68
+ <td align="center">水墨风格,一片紫色薰衣草地。<br>(Ink style. A purple lavender field.) </td>
69
+ </tr>
70
+ <tr>
71
+ <td align="center"><img src="asset/output/tiger_roar.png" alt="Image 3" width="200"/></td>
72
+ <td align="center"><img src="asset/output/beauty_glass.png" alt="Image 4" width="200"/></td>
73
+ <td align="center"><img src="asset/output/xunyicao_style.png" alt="Image 5" width="200"/></td>
74
+ </tr>
75
+
76
+
77
+ </table>
78
+
79
+
80
+ ### Training
81
+
82
+ We provide base model weights for IP-Adapter training, you can use `module` weights for IP-Adapter training.
83
+
84
+ Here is an example, we load the `module` weights into the main model and conduct IP-Adapter training.
85
+
86
+ If apply multiple resolution training, you need to add the `--multireso` and `--reso-step 64` parameter.
87
+
88
+ ```bash
89
+ task_flag="IP_Adapter" # the task flag is used to identify folders. # checkpoint root for resume
90
+ index_file=path/to/your/index_file
91
+ results_dir=./log_EXP # save root for results
92
+ batch_size=1 # training batch size
93
+ image_size=1024 # training image resolution
94
+ grad_accu_steps=1 # gradient accumulation
95
+ warmup_num_steps=0 # warm-up steps
96
+ lr=0.0001 # learning rate
97
+ ckpt_every=10 # create a ckpt every a few steps.
98
+ ckpt_latest_every=10000 # create a ckpt named `latest.pt` every a few steps.
99
+ ckpt_every_n_epoch=2 # create a ckpt every a few epochs.
100
+ epochs=8 # total training epochs
101
+
102
+ PYTHONPATH=. \
103
+ sh $(dirname "$0")/run_g_ipadapter.sh \
104
+ --task-flag ${task_flag} \
105
+ --noise-schedule scaled_linear --beta-start 0.00085 --beta-end 0.018 \
106
+ --predict-type v_prediction \
107
+ --multireso \
108
+ --reso-step 64 \
109
+ --uncond-p 0.22 \
110
+ --uncond-p-t5 0.22\
111
+ --uncond-p-img 0.05\
112
+ --index-file ${index_file} \
113
+ --random-flip \
114
+ --lr ${lr} \
115
+ --batch-size ${batch_size} \
116
+ --image-size ${image_size} \
117
+ --global-seed 999 \
118
+ --grad-accu-steps ${grad_accu_steps} \
119
+ --warmup-num-steps ${warmup_num_steps} \
120
+ --use-flash-attn \
121
+ --use-fp16 \
122
+ --extra-fp16 \
123
+ --results-dir ${results_dir} \
124
+ --resume\
125
+ --resume-module-root ckpts/t2i/model/pytorch_model_module.pt \
126
+ --epochs ${epochs} \
127
+ --ckpt-every ${ckpt_every} \
128
+ --ckpt-latest-every ${ckpt_latest_every} \
129
+ --ckpt-every-n-epoch ${ckpt_every_n_epoch} \
130
+ --log-every 10 \
131
+ --deepspeed \
132
+ --use-zero-stage 2 \
133
+ --gradient-checkpointing \
134
+ --no-strict \
135
+ --training-parts ipadapter \
136
+ --is-ipa True \
137
+ --resume-ipa True \
138
+ --resume-ipa-root ckpts/t2i/model/ipa.pt \
139
+ "$@"
140
+
141
+ ```
142
+
143
+ Recommended parameter settings
144
+
145
+ | Parameter | Description | Recommended Parameter Value | Note|
146
+ |:---------------:|:---------:|:---------------------------------------------------:|:--:|
147
+ | `--batch-size` | Training batch size | 1 | Depends on GPU memory|
148
+ | `--grad-accu-steps` | Size of gradient accumulation | 2 | - |
149
+ | `--lr` | Learning rate | 0.0001 | - |
150
+ | `--training-parts` | be trained parameters when training IP-Adapter | ipadapter | - |
151
+ | `--is-ipa` | training IP-Adapter or not | True | - |
152
+ | `--resume-ipa-root` | resume ipa model or not when training | ipa model path | - |
153
+
154
+
155
+ ### Inference
156
+ Use the following command line for inference.
157
+
158
+ a. Use the parameter float i-scale to specify the weight of IP-Adapter reference image. The bigger parameter indicates more relativity to reference image.
159
+ ```bash
160
+ python3 sample_ipadapter.py --infer-mode fa --ref-image-path ipadapter/input/beach.png --i-scale 1.0 --prompt 一只老虎在海洋中游泳,背景是海洋。构图方式是居中构图,呈现了动漫风格和文化,营造了平静的氛围。 --infer-steps 100 --is-ipa True --load-key module
161
+ ```
162
+
clip_img_encoder.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34b1363abb93bffd1a7d1924054da7c8a5d57800bde67852890d9da06e6014c6
3
+ size 6753378451
ipa.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d4b3ced3b9e648790f19591ee9377de430db8d0c8ee1675f14d55beaa248ee6
3
+ size 247745311