File size: 1,273 Bytes
a0bdac0
6415524
 
 
 
 
 
 
 
 
 
 
8993ff6
 
 
 
5dc6008
8993ff6
5dc6008
0189ccf
 
 
 
 
 
 
a556aa7
0189ccf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
library_name: speechbrain
pipeline_tag: text-to-speech
language: "en"
tags:
- text-to-speech
- TTS
- speech-synthesis
- speechbrain
license: "apache-2.0"
datasets:
- LJSpeech
---

# Text-to-Speech (TTS) with Transformer trained on LJSpeech

This repository provides all the necessary tools for Text-to-Speech (TTS)  with SpeechBrain using a [Transformer](https://arxiv.org/pdf/1809.08895.pdf) pretrained on [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).

The pre-trained model takes in text input and produces a spectrogram in output. One can get the final waveform by applying a vocoder (e.g., HiFIGAN) on top of the generated spectrogram.

### Perform Text-to-Speech (TTS)

```python
import torchaudio
from speechbrain.inference.vocoders import HIFIGAN

texts = ["This is the example text"]

#initializing my model
my_tts_model = TextToSpeech.from_hparams(source="/content/")

#initializing vocoder(Hifigan) model
hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")

# Running the TTS
mel_output = my_tts_model.encode_text(texts)

# Running Vocoder (spectrogram-to-waveform)
waveforms = hifi_gan.decode_batch(mel_output)

# Save the waverform
torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050)

```