Add TF weights
Model converted by the transformers
' pt_to_tf
CLI. All converted model outputs and hidden layers were validated against its PyTorch counterpart.
Maximum crossload output difference=1.907e-05; Maximum crossload hidden layer difference=4.541e-02;
Maximum conversion output difference=1.907e-05; Maximum conversion hidden layer difference=4.541e-02;
CAUTION: The maximum admissible error was manually increased to 0.1!
@Rocketknight1 This is the smaller l1; as for l3, there is some larger variation in hidden layers values than when running tests or doing inference on an image. The predictions are matching though.
@D-Roberts This is something we often observe with PT->TF conversions. I did a deep dive once or twice and the cause seems to consistently just be accumulating small numerical variations because the frameworks use different kernels and TF can reorder or fuse operations during compilation. In general, the actual outputs/predictions seem reasonably robust to this, so it's not a major concern (although it does sometimes mask -actual- bugs/implementation differences).
Thank you @alanspike !