Fuyu-8B Colab

#1
by nengelmann - opened

Just wanted to let you all know, that here is a colab you can try the model and get started!
https://github.com/nengelmann/Fuyu-8B---Exploration/tree/main
Open In Colab

any suggestion to perform OCR recognition and location function, i.e. text bounding box, as showing in the blog?

I don't think there is any documentation yet.
I'd suggest, that you take a look into the processing script and try to figure it out.
https://github.com/huggingface/transformers/blob/aa4198a238f915e7ac04bc43d28ddbcb7fe690df/src/transformers/models/fuyu/processing_fuyu.py#L29

Please let me know if you found out which prompts are working for your case πŸ€“

Unfortunately, I'm unsure how to implement text-to-box functionality. I'm even uncertain if this base version supports such features like text-to-box or box-to-text. I can only await further details to be released.

Hi, I was wondering if you knew if this version of the model still has advantages over the original if GPU is able to load both?

It has less RAM usage.

I'd recommend checking it out yourself with the notebook above.
You can switch simply between the sharded and normal version.

Sign up or log in to comment