firdhokk
/

speech-emotion-recognition-with-openai-whisper-large-v3

Audio Classification

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

Edit model card

speech-emotion-recognition-with-openai-whisper-large-v3

This model is a fine-tuned version of openai/whisper-large-v3 on the RAVDESS, SAVEE, TESS, and URDU dataset. It achieves the following results on the evaluation set:

Loss: 0.5008
Accuracy: 0.9199
Precision: 0.9230
Recall: 0.9199
F1: 0.9198

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 5
total_train_batch_size: 10
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 25
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
0.4948	0.9995	394	0.4911	0.8286	0.8449	0.8286	0.8302
0.6271	1.9990	788	0.5307	0.8225	0.8559	0.8225	0.8277
0.2364	2.9985	1182	0.5076	0.8692	0.8727	0.8692	0.8684
0.0156	3.9980	1576	0.5669	0.8732	0.8868	0.8732	0.8745
0.2305	5.0	1971	0.4578	0.9108	0.9142	0.9108	0.9114
0.0112	5.9995	2365	0.4701	0.9108	0.9159	0.9108	0.9114
0.0013	6.9990	2759	0.5232	0.9138	0.9204	0.9138	0.9137
0.1894	7.9985	3153	0.5008	0.9199	0.9230	0.9199	0.9198
0.0877	8.9980	3547	0.5517	0.9138	0.9152	0.9138	0.9138
0.1471	10.0	3942	0.5856	0.8895	0.9002	0.8895	0.8915
0.0026	10.9995	4336	0.8334	0.8773	0.8949	0.8773	0.8770

Framework versions

Transformers 4.44.2
Pytorch 2.4.1+cu121
Datasets 3.0.0
Tokenizers 0.19.1

Downloads last month: 2,877

Safetensors

Model size

637M params

Tensor type

F32

·

Inference Examples

Audio Classification

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for firdhokk/speech-emotion-recognition-with-openai-whisper-large-v3

Base model

openai/whisper-large-v3

Finetuned

(295)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard