Add TF weights

by joaogante HF staff - opened Jun 28, 2022

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-0

joaogante

Jun 28, 2022

Model converted by the transformers' pt_to_tf CLI.

All converted model outputs and hidden layers were validated against its Pytorch counterpart. Maximum crossload output difference=7.744e-04; Maximum converted output difference=7.744e-04.

cc @patrickvonplaten [HF maintainer(s) for this repo]

Add TF weightsba10a5a4

joaogante

Jun 28, 2022

The error on the internal hidden layers was slightly above the desired level (<1e-5), but the output layers were fine. cc @sayakpaul @nielsr

carted-ml

Jun 28, 2022

The error on the internal hidden layers was slightly above the desired level (<1e-5), but the output layers were fine

Those probably were because of num_batches_tracked as used in PyTorch's BatchNorm layers. There's relevant information here: https://github.com/huggingface/transformers/pull/17554.

Cc: @amyeroberts

joaogante

Jun 28, 2022

•

edited Jun 28, 2022

It's probably it's not the case -- we have many models where these differences in the internal layers exist, but the output layers have the correct values. We haven't figured out why, but it seems that it is no cause for alarm. Models where needed weights are not being loaded have very big errors everywhere.

Nevertheless, I reported it above in case we need to revisit the models with this mismatch :)

nielsr

Jun 28, 2022

Thanks for adding!

nielsr changed pull request status to merged Jun 28, 2022

amyeroberts

Jun 28, 2022

I'm not sure that accounts for the differences here. As mentioned in the PR https://github.com/huggingface/transformers/pull/17554#issuecomment-1149672281, num_batches_tracked is important only if momentum isn't set and my understanding is it was set for all batch norm layers.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment