Merve Noyan PRO

merve

AI & ML interests

VLMs, vision & co

Articles

Organizations

Posts 41

view post
Post
5374
Fine-tune Florence-2 on any task πŸ”₯

Today we release a notebook and a walkthrough blog on fine-tuning Florence-2 on DocVQA dataset @andito @SkalskiP

Blog: https://huggingface.co/blog πŸ“•
Notebook: https://colab.research.google.com/drive/1hKDrJ5AH_o7I95PtZ9__VlCTNAo1Gjpf?usp=sharing πŸ“–
Florence-2 is a great vision-language model thanks to it's massive dataset and small size!

This model requires conditioning through task prefixes and it's not as generalist, requiring fine-tuning on a new task, such as DocVQA πŸ“

We have fine-tuned the model on A100 (and one can also use a smaller GPU with smaller batch size) and saw that model picks up new tasks πŸ₯Ή

See below how it looks like before and after FT 🀩
Play with the demo here andito/Florence-2-DocVQA πŸ„β€β™€οΈ