speecht5_finetuned_emirhan_tr

This model is a fine-tuned version of microsoft/speecht5_tts on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.4392

Model description

The Speech T5 model is a text-to-speech (TTS) model based on the T5 architecture. It has been pretrained on a large corpus of speech data, allowing it to understand and generate human-like speech from input text. The model is capable of handling various speech synthesis tasks, making it suitable for applications such as virtual assistants, audiobook production, and more

Intended uses & limitations

More information needed

Training and evaluation data

The model was trained using a custom-made dataset of 170 audio samples, containing commonly asked interview lines. Synthetic audio was generated using Amazon AWS Polly, which offered diverse voice options. The dataset was carefully curated to ensure a variety of speech styles, accents, and phonetic structures, enhancing the model's ability to generalize.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 2
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 40
training_steps: 250
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.6078	2.2535	40	0.4783
0.5393	4.5070	80	0.4533
0.4864	6.7606	120	0.4480
0.4846	9.0141	160	0.4493
0.4628	11.2676	200	0.4383
0.4731	13.5211	240	0.4392

Framework versions

Transformers 4.45.0.dev0
Pytorch 2.4.1+cu118
Datasets 3.0.0
Tokenizers 0.20.0

omvishesh
/

SpeechT5_interview