Regarding the Model size

#1
by Prakh24s - opened

Thank you for the amazing paper and model weights.

The model seems to be twice the size compared transformer based model for the same size (~5.9 GB for 3b transformer model vs 11.1GB Mamba model).
Is is expected?

State Space Models org
This comment has been hidden

It's a float32 model, hence the size difference. Transformers are usually float16 or bfloat16.

Thank you for the answer!
Very excited for bigger/quantized models!

Prakh24s changed discussion status to closed

One more question: will float16 model still outperform Transformers as said in the paper?

Sign up or log in to comment