Krisshvamsi
commited on
Commit
•
4d0d969
1
Parent(s):
7e4dbc8
Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ pipeline_tag: text-to-speech
|
|
18 |
|
19 |
# Text-to-Speech (TTS) with Transformer trained on LJSpeech
|
20 |
|
21 |
-
This repository provides all the necessary tools for Text-to-Speech (TTS) with SpeechBrain using a [
|
22 |
|
23 |
The pre-trained model takes in input a short text and produces a spectrogram in output. One can get the final waveform by applying a vocoder (e.g., HiFIGAN) on top of the generated spectrogram.
|
24 |
|
@@ -32,15 +32,16 @@ pip install speechbrain
|
|
32 |
|
33 |
```python
|
34 |
import torchaudio
|
35 |
-
from speechbrain.inference.TTS import Tacotron2
|
36 |
from speechbrain.inference.vocoders import HIFIGAN
|
37 |
|
38 |
-
|
39 |
-
|
|
|
|
|
40 |
hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")
|
41 |
|
42 |
# Running the TTS
|
43 |
-
mel_output, mel_length
|
44 |
|
45 |
# Running Vocoder (spectrogram-to-waveform)
|
46 |
waveforms = hifi_gan.decode_batch(mel_output)
|
@@ -49,19 +50,8 @@ waveforms = hifi_gan.decode_batch(mel_output)
|
|
49 |
torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050)
|
50 |
```
|
51 |
|
52 |
-
If you want to generate multiple sentences in one-shot,
|
53 |
-
|
54 |
-
```
|
55 |
-
from speechbrain.pretrained import Tacotron2
|
56 |
-
tacotron2 = Tacotron2.from_hparams(source="speechbrain/TTS_Tacotron2", savedir="tmpdir")
|
57 |
-
items = [
|
58 |
-
"A quick brown fox jumped over the lazy dog",
|
59 |
-
"How much wood would a woodchuck chuck?",
|
60 |
-
"Never odd or even"
|
61 |
-
]
|
62 |
-
mel_outputs, mel_lengths, alignments = tacotron2.encode_batch(items)
|
63 |
|
64 |
-
```
|
65 |
|
66 |
### Inference on GPU
|
67 |
To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
|
|
|
18 |
|
19 |
# Text-to-Speech (TTS) with Transformer trained on LJSpeech
|
20 |
|
21 |
+
This repository provides all the necessary tools for Text-to-Speech (TTS) with SpeechBrain using a [Transformer](https://arxiv.org/pdf/1809.08895.pdf) pretrained on [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).
|
22 |
|
23 |
The pre-trained model takes in input a short text and produces a spectrogram in output. One can get the final waveform by applying a vocoder (e.g., HiFIGAN) on top of the generated spectrogram.
|
24 |
|
|
|
32 |
|
33 |
```python
|
34 |
import torchaudio
|
|
|
35 |
from speechbrain.inference.vocoders import HIFIGAN
|
36 |
|
37 |
+
texts = ["This is a sample text for synthesis."]
|
38 |
+
|
39 |
+
# Intialize TTS (Transformer) and Vocoder (HiFIGAN)
|
40 |
+
my_tts_model = TTSModel.from_hparams(source="/content/")
|
41 |
hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")
|
42 |
|
43 |
# Running the TTS
|
44 |
+
mel_output, mel_length = my_tts_model.encode_text(texts)
|
45 |
|
46 |
# Running Vocoder (spectrogram-to-waveform)
|
47 |
waveforms = hifi_gan.decode_batch(mel_output)
|
|
|
50 |
torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050)
|
51 |
```
|
52 |
|
53 |
+
If you want to generate multiple sentences in one-shot, pass the sentences as items in a list.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
|
|
|
55 |
|
56 |
### Inference on GPU
|
57 |
To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
|