Li Zhang

Andcircle
·

AI & ML interests

None yet

Recent Activity

Organizations

None yet

Andcircle's activity

upvoted an article about 20 hours ago
Reacted to BramVanroy's post with 👍 6 months ago
view post
Post
2391
Does anyone have experience with finetuning Gemma? Even the 2B variant feels more memory heavy than mistral 7B. I know that its vocabulary is much larger (250k) but I'm a bit surprised that the max batch size that I can get in an A100 80GB is only 2 whereas I could fit 4 with mistral 7B - even though Gemma is much smaller except for the embedding layer. Both runs were using FA, same sequence length, same deepspeed zero 3 settings. Oh and yes I'm using the most recent hot fix of transformers that solves a memory issue with Gemma and others.

Any prior experience that you can share or suggestions to improve throughout?
  • 4 replies
·
New activity in mistralai/Mistral-7B-v0.1 about 1 year ago

Does Mistral support accelerate library?

4
#65 opened about 1 year ago by Sp1der
New activity in tiiuae/falcon-40b over 1 year ago

[Bug] Does not work

58
#3 opened over 1 year ago by catid