Suggested Architecture for Small Mistral Model
#66
by
mnitin73
- opened
I want to pretrain a Model on a specific dataset from scratch. However, I only have access to a A100 80GB GPU. Can someone suggest a model architecture which can train on this GPU? I have tried the echarlaix/tiny-random-mistral and the illuin/tiny-random-MistralForCausalLM models. They work great but they are too tiny and simple for my requirement. I would like a slightly larger model architecture which gives better performance for my dataset.