microsoft
/

speecht5_tts

@@ -47,44 +47,35 @@ Extensive evaluations show the superiority of the proposed SpeechT5 framework on
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-## Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-You can use this model for speech synthesis. See the [model hub](https://huggingface.co/models?search=speecht5) to look for fine-tuned versions on a task that interests you.
-## Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-## Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-# Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-## Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started With the Model
-Use the code below to convert text into a mono 16 kHz speech waveform.
 ```python
 # Following pip packages need to be installed:
-# !pip install git+https://github.com/huggingface/transformers sentencepiece datasets
 from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
 from datasets import load_dataset
@@ -111,6 +102,37 @@ sf.write("speech.wav", speech.numpy(), samplerate=16000)
 Refer to [this Colab notebook](https://colab.research.google.com/drive/1i7I5pzBcU3WDFarDnzweIj4-sVVoIUFJ) for an example of how to fine-tune SpeechT5 for TTS on a different dataset or a new language.
 # Training Details
 ## Training Data

 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+## How to Get Started With the Model
+You can access the SpeechT5 model via the `Text-to-Speech` pipeline in just a couple lines of code!
+```python
+# Following pip packages need to be installed:
+# !pip install transformers sentencepiece datasets
+from transformers import pipeline
+from datasets import load_dataset
+import soundfile as sf
+synthesiser = pipeline("text-to-speech", "microsoft/speech_tt5")
+embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
+speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
+# You can replace this embedding with your own as well.
+speech = pipe("Hello what is happening", forward_params={"speaker_embeddings": speaker_embeddings})
+sf.write("speech.wav", speech["audio"], samplerate=speech["sampling_rate"])
+```
+For more fine-grained control you can use the processor + generate code to convert text into a mono 16 kHz speech waveform.
 ```python
 # Following pip packages need to be installed:
+# !pip install transformers sentencepiece datasets
 from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
 from datasets import load_dataset
 Refer to [this Colab notebook](https://colab.research.google.com/drive/1i7I5pzBcU3WDFarDnzweIj4-sVVoIUFJ) for an example of how to fine-tune SpeechT5 for TTS on a different dataset or a new language.
+## Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+You can use this model for speech synthesis. See the [model hub](https://huggingface.co/models?search=speecht5) to look for fine-tuned versions on a task that interests you.
+## Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+## Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+# Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+## Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 # Training Details
 ## Training Data