Emotion Recognition in Turkish Speech using HuBERT
This HuBERT model is trained on TurEV-DB to achieve speech emotion recognition (SER) in Turkish.
How to use
Requirements
# requirement packages
!pip install git+https://github.com/huggingface/datasets.git
!pip install git+https://github.com/huggingface/transformers.git
!pip install torchaudio
!pip install librosa
!git clone https://github.com/SeaBenSea/HuBERT-SER.git
Prediction
import sys
sys.path.insert(1, './HuBERT-SER/')
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchaudio
from transformers import AutoConfig, Wav2Vec2FeatureExtractor
from src.models import Wav2Vec2ForSpeechClassification, HubertForSpeechClassification
model_name_or_path = "SeaBenSea/hubert-large-turkish-speech-emotion-recognition"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
config = AutoConfig.from_pretrained(model_name_or_path)
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_name_or_path)
sampling_rate = feature_extractor.sampling_rate
model = HubertForSpeechClassification.from_pretrained(model_name_or_path).to(device)
def speech_file_to_array_fn(path, sampling_rate):
speech_array, _sampling_rate = torchaudio.load(path)
resampler = torchaudio.transforms.Resample(_sampling_rate, sampling_rate)
speech = resampler(speech_array).squeeze().numpy()
return speech
def predict(path, sampling_rate):
speech = speech_file_to_array_fn(path, sampling_rate)
inputs = feature_extractor(speech, sampling_rate=sampling_rate, return_tensors="pt", padding=True)
inputs = {key: inputs[key].to(device) for key in inputs}
with torch.no_grad():
logits = model(**inputs).logits
scores = F.softmax(logits, dim=1).detach().cpu().numpy()[0]
outputs = [{"Emotion": config.id2label[i], "Score": f"{round(score * 100, 3):.1f}%"} for i, score in
enumerate(scores)]
return outputs
path = "../dataset/TurEV/Angry/1157_kz_acik.wav"
outputs = predict(path, sampling_rate)
outputs
[
{'Emotion': 'Angry', 'Score': '99.8%'},
{'Emotion': 'Calm', 'Score': '0.0%'},
{'Emotion': 'Happy', 'Score': '0.1%'},
{'Emotion': 'Sad', 'Score': '0.1%'}
]
Evaluation
The following tables summarize the scores obtained by model overall and per each class.
Emotions | precision | recall | f1-score | accuracy |
---|---|---|---|---|
Angry | 0.97 | 0.99 | 0.98 | |
Calm | 0.89 | 0.95 | 0.92 | |
Happy | 0.98 | 0.93 | 0.95 | |
Sad | 0.97 | 0.93 | 0.95 | |
Overal | 0.95 |
Questions?
Post a Github issue from HERE.
- Downloads last month
- 386
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.