Discrepancy in model sizes

#2
by varun4 - opened

Hello team! The size of the unquantized onnx model is 133mb, whereas the pytorch model is only 66.8mb. This is generally uncommon. For example, all-MiniLM-L6-v2's unquantized size is 90mb, roughly the same as the pytorch model.

While this isn't a problem itself, I wanted to raise this issue for further investigation.

Edit: I found Xenova has also uploaded his own version of this model, here, and it has the same issue.

Supabase org

@varun4 I was confused by this at first too. The pytorch model for gte-small is 16 bit as opposed to many other models that are 32 bit. The non-quantized ONNX models are always 32 bit, and quantized are 8 bit. This is why the non-quantized ONNX model is double the size of the pytorch model, and quantized ONNX model is half the size of the pytorch model.

That makes sense thank you!

ggrn changed discussion status to closed

Sign up or log in to comment