5 1 1

Li Zhang

Andcircle

AI & ML interests

None yet

Recent Activity

upvoted an article about 20 hours ago

Recoloring photos with diffusers

View all activity

Organizations

None yet

Andcircle's activity

upvoted an article about 20 hours ago

Article

Recoloring photos with diffusers

•

Oct 9

• 28

Reacted to BramVanroy's post with 👍 6 months ago

Post

2391

Does anyone have experience with finetuning Gemma? Even the 2B variant feels more memory heavy than mistral 7B. I know that its vocabulary is much larger (250k) but I'm a bit surprised that the max batch size that I can get in an A100 80GB is only 2 whereas I could fit 4 with mistral 7B - even though Gemma is much smaller except for the embedding layer. Both runs were using FA, same sequence length, same deepspeed zero 3 settings. Oh and yes I'm using the most recent hot fix of transformers that solves a memory issue with Gemma and others.

Any prior experience that you can share or suggestions to improve throughout?