Inconsistent outputs when loading the model multiple times

#3
by isaacm - opened

I've noticed an inconsistency in the outputs when loading the model during different sessions, even when the model is loaded in the same manner:

image.png

For the code above, loading the config and model in a loop and calling on the same audio segment each time results in the following:

image.png

This behavior may be related to the following warning that prints when loading the models:

Some weights of the model checkpoint at microsoft/wavlm-large were not used when initializing WavLMModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing WavLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing WavLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of WavLMModel were not initialized from the model checkpoint at microsoft/wavlm-large and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

I'm using transformers 4.42.4

Is this reproducible on your end? And do you have any idea what I can do to solve this issue? Thanks in advance, you were already very helpful on my other questions.

Owner

Using 'AutoConfig' followed by 'AutoModelForAudioClassification' seems to create this issue for me as well.
Following the suggested code on the model card seems to fix it for me. I have not looked deeply into transformer package API, but please use the provided code and hopefully that resolves the issue.

using:
model = AutoModelForAudioClassification.from_pretrained("3loi/SER-Odyssey-Baseline-WavLM-Arousal", trust_remote_code=True)

instead of:
config = AutoConfig.from_pretrained("3loi/SER-Odyssey-Baseline-WavLM-Arousal", trust_remote_code = True)
model = AutoModelForAudioClassification.from_config(config, trust_remote_code = True)

changing that line of code to your suggestion worked, thanks again!

Sign up or log in to comment