Custom Classifier
Hello
@sanchit-gandhi
,
Thanks for sharing this Audio Classification model!!
Could you please suggest how I can use my own Audio Classifier? Is my understanding correct that AutoModelForAudioClassification actually uses WhisperForAudioClassification from https://github.com/huggingface/transformers/blob/v4.29.1/src/transformers/models/whisper/modeling_whisper.py#L1664 ?
If yes, can I define my own custom audio classifier the way WhisperForAudioClassification is defined with suitable modifications and use it directly without using AutoModelForAudioClassification ?
I am very much looking forward to hearing from you soon :)
Regards,
K
Hello @sanchit-gandhi ,
Thanks so very much for your suggestion! It did work for me :)
I saw in your code for WhisperForAudioClassification (line #1724), you defined freeze_encoder(), but in your run_audio_classification.py script for this model, you used model.freeze_feature_encoder() (line #357) instead of freeze_encoder(). Are these methods/function same? If not, why did you use model.freeze_feature_encoder()?
In your blog post: "Fine-Tune Whisper For Multilingual ASR with π€ Transformers", you said and used:
-#compute log-Mel input features from input audio array
batch["input_features"] = feature_extractor(audio["array"], sampling_rate=audio["sampling_rate"]).input_features[0]
- that is, the first element of input_features 'list' of feature_extractor, but run_audio_classification.py script (line #303) for this model you used input_features itself; not its first element (i.e. output_batch = {model_input_name: inputs.get(model_input_name)}). What is the reason of doing so?
Regards,
K