Weights are in FP16 (loaded in FP32) but paper mentions BF16
#17
by
AdrienC
- opened
The paper mentions that the training was done in bf16 (as one would expect with a Mistral model) however the safetensors files are float16 and the config.json loads the weights in float32. I would expect that saving the weights in FP16 could lead to overflows coming from BF16.
Could you give us more details on how to load and potentially fine-tune this model without running into issues ?