Proposal for Renaming of Gemma-7B Model to Gemma-9B
I am reaching out to discuss a proposal regarding the Gemma-7B language model. Upon detailed analysis, it has come to attention that the Gemma-7B model comprises a total of 8.54 billion parameters, which diverges from its nominal designation.
In the interest of clarity and to maintain consistency with industry standards for naming models (wherein the name reflects the rounded parameter count), I propose renaming the Gemma-7B model to Gemma-9B. This adjustment would not only align the model's name with its actual parameter count but also ensure parity when compared with models from other frameworks, such as the Llama-7B, which follows a similar rounding convention.
We believe that this change would enhance the transparency and honesty of model representations within the AI community. We would appreciate your consideration of this proposal and look forward to your feedback.
in the case of naming LLMs in paper, you don't add the embedding parameter for the Model and you don't consider that in case Gemma uses Tie word embedding and uses a tokenizer with ~250K vocab_size that means embedding has something like ~1B parameters itself and we don't calculate that so technically the model is a 7B model.
Technically the Gemma~7B is an 8.54 billion parameters model. Even 7.75B (8.54B total size minus 0.79B of embs size) is an 8B model.
Hi all -- posted this in another thread, adding it here as well. Appreciate the conversation and feedback; feel free to reach out with questions. Pasted answer below:
For clarity, many of those are embedding parameters, which we often do not count in the total parameter count for papers and releases. With respect to the emerging 7B class of open models, we've targeted the same use cases as other models in the 7B class from a hardware and software compatibility standpoint -- so it should be strictly transferable for many, if not all, 7B-class use cases.
Hi tris, as another commenter pointed out, even after ignoring embedding parameters the model is an 8B model by the usual convention.
"so it should be strictly transferable for many, if not all, 7B-class use cases"
But it's still not a 7B model, when rounded to the nearest integer it has 8B parameters.