Transformers fix to mixed precision at long context lengths

#16
by nbroad HF staff - opened

Hi there,

A recent fix improved the perpelexity of models like mistral at long context lengths. Here is a figure showing the before-and-after.

I'm wondering if this would impact the figure on your model card. This only happens in fp16, afaik.

Sign up or log in to comment