Add TF weights
Model converted by the transformers
' pt_to_tf
CLI.
All converted model outputs and hidden layers were validated against its Pytorch counterpart. Maximum crossload output difference=7.744e-04; Maximum converted output difference=7.744e-04.
cc @patrickvonplaten [HF maintainer(s) for this repo]
Related PR: https://github.com/huggingface/transformers/pull/17554
The error on the internal hidden layers was slightly above the desired level (<1e-5), but the output layers were fine. cc @sayakpaul @nielsr
The error on the internal hidden layers was slightly above the desired level (<1e-5), but the output layers were fine
Those probably were because of num_batches_tracked
as used in PyTorch's BatchNorm layers. There's relevant information here: https://github.com/huggingface/transformers/pull/17554.
Cc: @amyeroberts
It's probably it's not the case -- we have many models where these differences in the internal layers exist, but the output layers have the correct values. We haven't figured out why, but it seems that it is no cause for alarm. Models where needed weights are not being loaded have very big errors everywhere.
Nevertheless, I reported it above in case we need to revisit the models with this mismatch :)
Thanks for adding!
I'm not sure that accounts for the differences here. As mentioned in the PR https://github.com/huggingface/transformers/pull/17554#issuecomment-1149672281, num_batches_tracked
is important only if momentum isn't set and my understanding is it was set for all batch norm layers.