AreejB's picture
Update README.md
0786134 verified
metadata
datasets:
  - narad/ravdess
language:
  - en
metrics:
  - f1
  - accuracy
  - recall
  - precision
pipeline_tag: audio-classification

Emotion Recognition in English Using RAVDESS and Wav2Vec 2.0

This model extracts emotions from audio recordings. It was trained on RAVDESS, a dataset containing English audio recordings. The model recognises six emotions: anger, disgust, fear, happiness, sadness and surprise.

The model recreates the work of this Greek emotion extractor using a pre-trained Wav2Vec2 model to process the data.

Model Details

Model Description

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

The RAVDESS dataset was split into training, validation and test sets with 60, 20 and 20 splits, respectively.

Training Procedure

The fine-tuning process was centred on four hyper-parameters:

  • the number of batches (4, 8),
  • gradient accumulation steps (GAS) (2, 4, 6, 8),
  • number of epochs (10, 20) and
  • the learning rate (1e-3, 1e-4, 1e-5).

Each experiment was repeated 10 times.

Evaluation

The set of hyper-parameters resulting in the best performance is: 4 batches, 4 GAS, 10 epochs and 1e-4 learning rate

Testing

The model was retrained on the combined train and validation sets using the best hyper-parameter set. The performance on the test set has an average Accuracy and F1 scores of 84.84% (SD 2 and 2.08, respectively)

Results

We retained the model providing the highest performance over the 10 runs.

Emotion Accuracy Precision Recall F1
Anger 96.55 87.50
Disgust 90.91 93.75
Fear 96.30 81.25
Happiness 93.10 84.38
Sad 81.58 96.88
Surprise 77.78 87.50
Total 88.54 89.37 88.54 88.62