How can I reproduce the results on OKVQA?
#71 opened 5 months ago
by
rayruiyang
Has anyone tried adding positional embeddings to the image patches to improve the model?
2
#70 opened 8 months ago
by
jchiu1234
How to evaluate it on AI2D dataset?
#69 opened 8 months ago
by
boydcheung
Masking the image tokens during training
#68 opened 8 months ago
by
jchiu1234
finetune fuyu-8b model
1
#67 opened 9 months ago
by
yinincanada
Is there any way to use image embeddings as input? (similar to input_embeds param)
#66 opened 10 months ago
by
sanchd
OCR function
#65 opened 11 months ago
by
linxi
Does localization really work?
#64 opened 11 months ago
by
Seungyoun
finetune fuyu8b text location with image size of 1920x1080 always got OOM even on A100*8
#63 opened 11 months ago
by
Nooodles
Are there special tokens that are ignored during loss computation?
9
#62 opened 11 months ago
by
Nyandwi
why does the coordinates need to be divided by two in scale_bbox_to_transformed_image?
2
#61 opened 11 months ago
by
Nooodles
Here is a simple multimodal like training script to see model working.
3
#60 opened 11 months ago
by
besiktas
GPU requirements
5
#59 opened 11 months ago
by
thightower1
I keep running out of memory. Why dont they just tell what equipment is required to run these models
#58 opened 11 months ago
by
alquimista888
crash kernel
6
#57 opened 12 months ago
by
simonbrbx
Tips on resolving this typing.Optional error seemingly related to PIL.Image?
3
#56 opened 12 months ago
by
justinwickett
demo of PDF vqa
1
#55 opened 12 months ago
by
verigle
Upload 2.jpg
#53 opened 12 months ago
by
Aaronx
8B? Or 9B?
#51 opened 12 months ago
by
mrfakename
Memory Spikes while Getting Model Logits
2
#49 opened 12 months ago
by
Nyandwi
Is there a way to run it on a 8GB GPU?
1
#47 opened 12 months ago
by
bobe94
issue with quantization on windows
#46 opened 12 months ago
by
FantasticMrCat
How does the Fuyu model Get images?
3
#45 opened 12 months ago
by
VatsaDev
For the vqav2 data set example "fish and carrot", why does the model output a sentence instead of a phrase?
8
#44 opened about 1 year ago
by
changgeli
fine-tuning using FSDP and non 80GB cards?
8
#43 opened about 1 year ago
by
besiktas
Released capabilities
6
#42 opened about 1 year ago
by
ludeksvoboda
Update README.md
1
#41 opened about 1 year ago
by
ybelkada
Colab
1
#39 opened about 1 year ago
by
nengelmann
whether special instruction is need to trigger OCR location function?
3
#38 opened about 1 year ago
by
liupei0408
How to get Image embedding using Fuyu
2
#37 opened about 1 year ago
by
oaishi
How to get the detailed description in the fuyu-8b-demo?
1
#35 opened about 1 year ago
by
dwdxdy
The Numbers
1
#33 opened about 1 year ago
by
changgeli
Questions about the examples in the blog
2
#32 opened about 1 year ago
by
AudreyLin
ImportError for FuyuProcessor in Transformers v4.34.1
3
#30 opened about 1 year ago
by
ClaraLovesFunk
hi love it
#29 opened about 1 year ago
by
boinc
The 8b model could get correct results for case showed on the offical blog
2
#28 opened about 1 year ago
by
YuntaoChen
long response times
1
#27 opened about 1 year ago
by
FantasticMrCat42
ValueError: Unable to infer channel dimension format
5
#26 opened about 1 year ago
by
vishal1278
A working demo.py for your reference
1
#25 opened about 1 year ago
by
Colderthanice
Using this model as a QA-tool/OCR on a text heavy document?
2
#24 opened about 1 year ago
by
Techie5879
Loading the model on multi-gpu setup?
1
#23 opened about 1 year ago
by
Techie5879
issue with inference
5
#22 opened about 1 year ago
by
zhangchaosunshine
issue with running the model
5
#21 opened about 1 year ago
by
slay
Possible for quantization other than bitsandbytes?
1
#20 opened about 1 year ago
by
Yhyu13
Run on MBP M1
4
#17 opened about 1 year ago
by
sagar-kris
License question
1
#16 opened about 1 year ago
by
deleted
Warning output
4
#15 opened about 1 year ago
by
dashesy
Bug when deploying to Inference Endpoints
2
#14 opened about 1 year ago
by
gpantalos