Edit model card

speecht5_finetuned_emirhan_tr

This model is a fine-tuned version of microsoft/speecht5_tts on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4392

Model description

The Speech T5 model is a text-to-speech (TTS) model based on the T5 architecture. It has been pretrained on a large corpus of speech data, allowing it to understand and generate human-like speech from input text. The model is capable of handling various speech synthesis tasks, making it suitable for applications such as virtual assistants, audiobook production, and more

Intended uses & limitations

More information needed

Training and evaluation data

The model was trained using a custom-made dataset of 170 audio samples, containing commonly asked interview lines. Synthetic audio was generated using Amazon AWS Polly, which offered diverse voice options. The dataset was carefully curated to ensure a variety of speech styles, accents, and phonetic structures, enhancing the model's ability to generalize.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40
  • training_steps: 250
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.6078 2.2535 40 0.4783
0.5393 4.5070 80 0.4533
0.4864 6.7606 120 0.4480
0.4846 9.0141 160 0.4493
0.4628 11.2676 200 0.4383
0.4731 13.5211 240 0.4392

Framework versions

  • Transformers 4.45.0.dev0
  • Pytorch 2.4.1+cu118
  • Datasets 3.0.0
  • Tokenizers 0.20.0
Downloads last month
34
Safetensors
Model size
144M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for omvishesh/SpeechT5_interview

Finetuned
(766)
this model