license: apache-2.0 | |
**Base Model**: BLIP2-t5 pretrained version | |
**Finetune data**: LLAVA 150k (sample one pair of instruction-answer if multi-round conversations) | |
**Hyper-parameters**: | |
v0: | |
* lr = 2e-5 --> 0.0 with cosine lr scheduler | |
* gbs = 32 | |
* image size = 480 | |
* weight decay = 0.05 | |
v1 (same as LLAVA): | |
* lr = 2e-5 | |
* gbs = 32 | |
* image size = 480 | |
* weight decay = 0.0 | |