Transformers fix to mixed precision at long context lengths
#16
by
nbroad
HF staff
- opened
Hi there,
A recent fix improved the perpelexity of models like mistral at long context lengths. Here is a figure showing the before-and-after.
I'm wondering if this would impact the figure on your model card. This only happens in fp16, afaik.