Farseq -> Transformers conversion
Thanks for you contribution. Actually I intended to upload the fairseq version of the caption ckpt, as users reported that it is hard to download from the aliyun oss. I'll directly upload a new one for transformers, and this one will be marked with Fairseq version
.
Thanks for your ofa checkpoints. With my own inference code, the origin checkpoints in ofa-large-caption (https://huggingface.co/OFA-Sys/ofa-large-caption) have a lower CIDEr about 130. But with your checkpoints converted from fairseq, the performance is correct with a CIDEr 146. This means that the code in transformers ofa model and my own inference code is correct. It looks like just the origin checkpoints has some minor issues.
Therefore, I would like to ask if it is possible to provide checkpoints of other transformers ofa model converted by fairseq? For example, the pretrain ofa (https://huggingface.co/OFA-Sys/ofa-large), this will be of great benefit to fine-tune our own model. Or maybe the code for converting from fairseq to transformers?
Thanks a lot!
Hi @cckevinn , here's the code I used for this conversion: https://colab.research.google.com/drive/1LLJewY92LXdeug5m_ceMUHdlqrRQwSQJ?usp=sharing
I can also share a GitHub repo with links to converted pretrained models later this week. I'm also working on a sample code for fine-tuning pretrained models directly in Transformers.
Hi @cckevinn , here's the code I used for this conversion: https://colab.research.google.com/drive/1LLJewY92LXdeug5m_ceMUHdlqrRQwSQJ?usp=sharing
I can also share a GitHub repo with links to converted pretrained models later this week. I'm also working on a sample code for fine-tuning pretrained models directly in Transformers.
@mys Thank you for sharing your awesome work.
Will the fine-tuning samples include visual grounding?
I would be interested to benchmark OFA vs Donut for UI RefExp task. Here is my working in progress with Donut:
https://huggingface.co/spaces/ivelin/ui-refexp
I know Visual Grounding is pre-trained on RefCoco family which is mostly physical objects , while UI RefExp is primarily RICO android mobile app screenshots. Nevertheless I am curious how fast OFA can transfer learn on RICO RefExp and with what ultimate performance. Happy to share my results as I am with Donut.