Efficient inference

#10

by cuongnguyenxuan - opened Jun 27

Jun 27

I fine-tuned Florence-2-base on my task. I can inference this fine-tuned model by both CPU, GPU without flash_attn. Both case took me more than 3GB to run? is this normal and can i reduce memory when inference it? By the way, when fine-tuned large version, i need more than 20 GB to fine-tuned it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment