Regarding the Model size
#1
by
Prakh24s
- opened
Thank you for the amazing paper and model weights.
The model seems to be twice the size compared transformer based model for the same size (~5.9 GB for 3b transformer model vs 11.1GB Mamba model).
Is is expected?
This comment has been hidden
It's a float32 model, hence the size difference. Transformers are usually float16 or bfloat16.
Thank you for the answer!
Very excited for bigger/quantized models!
Prakh24s
changed discussion status to
closed
One more question: will float16 model still outperform Transformers
as said in the paper?