Spaces:

somosnlp-hackathon-2022
/

Audio-Sentiment-Classifier

Runtime error

App Files Files Community

DrishtiSharma commited on Apr 4, 2022

Commit

e8cc443

•

1 Parent(s): d770449

Update info.txt

Browse files

Files changed (1) hide show

info.txt +1 -1

info.txt CHANGED Viewed

@@ -2,7 +2,7 @@ Gradio demo for sentiment classification of Spanish audios using Wav2Vec2
 🔊 Audio Sentiment Classifier
 This is a Gradio demo for classifying the  sentiment of the speech/audio using Wav2Vec2-base fine-tuned on Mexican Emotional Speech Database (MESD) dataset. The MESD dataset contains single-word utterances for the emotive prosodies of anger, disgust, fear, happiness, neutrality, and sadness with Mexican culture shaping. In addition, the utterances in MESD dataset have been contributed by both adult and child non-professional actors: 3 female, 2 male, and 6 child voices are available.
 Targeted SGDs: [Goal 3] Good health and well-being & [Goal 16] Peace, Justice and Strong Institutions
-Potential Applications: [Goal 3- SDG] Presenting, recommending and categorizing the audio libraries or other media based on detected mood/preferences via user's speech or user's aural environment. A mood lighting system, in addition to the aforementioned features, can be implemented to make user's environment a bit more user-friendly, and so contribute a little in maintaining the user's mental health and overall well-being.   [Goal 16 -SDG] Additionally, the model can be trained on data with more class labels in order to be useful particularly in detecting brawls, and any other uneventful scenarios. That trained model (audio classifier) can be integrated in a surveillance system to detect brawls and other unsettling events that can be detected using "sound" and subsequently utilized by Peace, Justice and strong Institutions.
 To begin with, we didn't have enough Spanish audio data suitable for sentiment classification task. We had to make do with whatever data we could find in the MESD database because much of the material we came across was not open-source. Furthermore, in the MESD dataset, augmented versions of audios pre-existed, accounting for up to 25% of the total data, thus we decided not to undertake any more data augmentation to avoid overwhelming the original audio samples.
 The open-source MESD dataset was used to fine-tune the Wav2Vec2 base model, which contains ~1200 audio recordings, all of which were recorded in professional studios and were only one second long. Out of ~1200 audio recordings only ~890 of the recordings were utilized for training. Due to these factors, the model and hence this Gradio application may not be able to perform well in noisy environments or audio with background music or noise. It's also worth mentioning that this model performs poorly when it comes to audio recordings from the class "Fear," which the model often misclassifies.
 The aforementioned prototype may not function well in noisy environments or audio with a musical/noisy background due to the model(s) being trained on too little data (due to sparse availability). In order to make our model robust, our future work includes:

 🔊 Audio Sentiment Classifier
 This is a Gradio demo for classifying the  sentiment of the speech/audio using Wav2Vec2-base fine-tuned on Mexican Emotional Speech Database (MESD) dataset. The MESD dataset contains single-word utterances for the emotive prosodies of anger, disgust, fear, happiness, neutrality, and sadness with Mexican culture shaping. In addition, the utterances in MESD dataset have been contributed by both adult and child non-professional actors: 3 female, 2 male, and 6 child voices are available.
 Targeted SGDs: [Goal 3] Good health and well-being & [Goal 16] Peace, Justice and Strong Institutions
+Potential Applications: [Goal 3- SDG] Presenting, recommending and categorizing the audio libraries or other media based on detected mood/preferences via user's speech or user's aural environment. A mood lighting system, in addition to the aforementioned features, can be implemented to make user's environment a bit more user-friendly, and so contribute a little in maintaining the user's mental health and overall well-being.   [Goal 16 -SDG] Additionally, the model can be trained on data with more class labels in order to be useful particularly in detecting brawls, and any other uneventful scenarios. Furthermore, the trained model (audio classifier) can be integrated in a surveillance system to detect brawls and other unsettling events that can be detected using "sound" and subsequently utilized by Peace, Justice and strong Institutions.
 To begin with, we didn't have enough Spanish audio data suitable for sentiment classification task. We had to make do with whatever data we could find in the MESD database because much of the material we came across was not open-source. Furthermore, in the MESD dataset, augmented versions of audios pre-existed, accounting for up to 25% of the total data, thus we decided not to undertake any more data augmentation to avoid overwhelming the original audio samples.
 The open-source MESD dataset was used to fine-tune the Wav2Vec2 base model, which contains ~1200 audio recordings, all of which were recorded in professional studios and were only one second long. Out of ~1200 audio recordings only ~890 of the recordings were utilized for training. Due to these factors, the model and hence this Gradio application may not be able to perform well in noisy environments or audio with background music or noise. It's also worth mentioning that this model performs poorly when it comes to audio recordings from the class "Fear," which the model often misclassifies.
 The aforementioned prototype may not function well in noisy environments or audio with a musical/noisy background due to the model(s) being trained on too little data (due to sparse availability). In order to make our model robust, our future work includes: