Krisshvamsi
commited on
Commit
•
9f21d56
1
Parent(s):
5198208
Update README.md
Browse files
README.md
CHANGED
@@ -28,28 +28,43 @@ The pre-trained model takes in input a short text and produces a spectrogram in
|
|
28 |
```
|
29 |
pip install speechbrain
|
30 |
```
|
31 |
-
### Perform Text-to-Speech (TTS)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
|
33 |
```python
|
34 |
import torchaudio
|
35 |
from TTSModel import TTSModel
|
36 |
-
from
|
37 |
from speechbrain.inference.vocoders import HIFIGAN
|
38 |
|
39 |
texts = ["This is a sample text for synthesis."]
|
40 |
|
|
|
41 |
# Intialize TTS (Transformer) and Vocoder (HiFIGAN)
|
42 |
-
my_tts_model = TTSModel.from_hparams(source=
|
43 |
hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")
|
44 |
|
45 |
# Running the TTS
|
46 |
-
mel_output
|
47 |
|
48 |
# Running Vocoder (spectrogram-to-waveform)
|
49 |
waveforms = hifi_gan.decode_batch(mel_output)
|
50 |
|
51 |
# Save the waverform
|
52 |
torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050)
|
|
|
53 |
```
|
54 |
|
55 |
If you want to generate multiple sentences in one-shot, pass the sentences as items in a list.
|
@@ -58,26 +73,7 @@ If you want to generate multiple sentences in one-shot, pass the sentences as it
|
|
58 |
### Inference on GPU
|
59 |
To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
|
60 |
|
61 |
-
|
62 |
-
### Training
|
63 |
-
The model was trained with SpeechBrain.
|
64 |
-
To train it from scratch follow these steps:
|
65 |
-
1. Clone SpeechBrain:
|
66 |
-
```bash
|
67 |
-
git clone https://github.com/speechbrain/speechbrain/
|
68 |
-
```
|
69 |
-
2. Install it:
|
70 |
-
```bash
|
71 |
-
cd speechbrain
|
72 |
-
pip install -r requirements.txt
|
73 |
-
pip install -e .
|
74 |
-
```
|
75 |
-
3. Run Training:
|
76 |
-
```bash
|
77 |
-
cd recipes/LJSpeech/TTS/tacotron2/
|
78 |
-
python train.py --device=cuda:0 --max_grad_norm=1.0 --data_folder=/your_folder/LJSpeech-1.1 hparams/train.yaml
|
79 |
-
```
|
80 |
-
|
81 |
|
82 |
### Limitations
|
83 |
The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
|
|
|
28 |
```
|
29 |
pip install speechbrain
|
30 |
```
|
31 |
+
### Perform Text-to-Speech (TTS) - Running Inference
|
32 |
+
To run model inference pull the interface directory as shown in the cell below
|
33 |
+
|
34 |
+
Note: Run on T4-GPU for faster inference
|
35 |
+
```
|
36 |
+
!pip install --upgrade --no-cache-dir gdown
|
37 |
+
!gdown 1oy8Y5zwkLel7diA63GNCD-6cfoBV4tq7
|
38 |
+
!unzip inference.zip
|
39 |
+
```
|
40 |
+
```python
|
41 |
+
%%capture
|
42 |
+
!pip install speechbrain
|
43 |
+
%cd inference
|
44 |
+
```
|
45 |
|
46 |
```python
|
47 |
import torchaudio
|
48 |
from TTSModel import TTSModel
|
49 |
+
from IPython.display import Audio
|
50 |
from speechbrain.inference.vocoders import HIFIGAN
|
51 |
|
52 |
texts = ["This is a sample text for synthesis."]
|
53 |
|
54 |
+
model_source_path = "/content/inference"
|
55 |
# Intialize TTS (Transformer) and Vocoder (HiFIGAN)
|
56 |
+
my_tts_model = TTSModel.from_hparams(source=model_source_path)
|
57 |
hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")
|
58 |
|
59 |
# Running the TTS
|
60 |
+
mel_output = my_tts_model.encode_text(texts)
|
61 |
|
62 |
# Running Vocoder (spectrogram-to-waveform)
|
63 |
waveforms = hifi_gan.decode_batch(mel_output)
|
64 |
|
65 |
# Save the waverform
|
66 |
torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050)
|
67 |
+
print("Saved the audio file!")
|
68 |
```
|
69 |
|
70 |
If you want to generate multiple sentences in one-shot, pass the sentences as items in a list.
|
|
|
73 |
### Inference on GPU
|
74 |
To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
|
75 |
|
76 |
+
Note: For Training the model please visit this [TTS_Training_Inference](https://colab.research.google.com/drive/1VYu4kXdgpv7f742QGquA1G4ipD2Kg0kT?usp=sharing) notebook
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
|
78 |
### Limitations
|
79 |
The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
|