adept/fuyu-8b · Here is a simple multimodal like training script to see model working.

Nov 28, 2023

•

edited Nov 28, 2023

https://github.com/grahamannett/finetune-fuyu/blob/main/train-simple.py

If anyone would like to test their machine with fuyu, here is a small script that makes fake text + images but is a complete training loop. It is all self-contained and only needs transformers/torch/simple_parsing installed.
The idea is that since you may not know if the model will fit on your resources, better to try this before digging into FSDP/QLoRA/Accelerate.

I can add an FSDP/Accelerate/QLoRA example as well since those can be hard to get working with this model with limited resources.

Seungyoun

Nov 29, 2023

Can FuyuProcessor be modified to handle both multi-resolution and multiple images?
I looked through its code and noticed it only processes one image at a time and doesn't support this feature.

It would be great if the training process could support settings for both multi-resolution and multi-image processing.

besiktas

Nov 29, 2023

FuyuProcessor handles multi-resolution images and multiple images so long as they are each a different sample.

The current model does not allow multiple images per sample but it does seem to work with them if you change gather_continuous_embeddings

1glabs

Jul 22

This comment has been hidden