Text-to-Speech (TTS) with Transformer trained on LJSpeech

This repository provides all the necessary tools for Text-to-Speech (TTS) with SpeechBrain using a Transformer pretrained on LJSpeech.

The pre-trained model takes in text input and produces a spectrogram in output. One can get the final waveform by applying a vocoder (e.g., HiFIGAN) on top of the generated spectrogram.

Perform Text-to-Speech (TTS)

import torchaudio
from speechbrain.inference.vocoders import HIFIGAN

texts = ["This is the example text"]

#initializing my model
my_tts_model = TextToSpeech.from_hparams(source="/content/")

#initializing vocoder(Hifigan) model
hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")

# Running the TTS
mel_output = my_tts_model.encode_text(texts)

# Running Vocoder (spectrogram-to-waveform)
waveforms = hifi_gan.decode_batch(mel_output)

# Save the waverform
torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050)
Downloads last month
0
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.