parler-tts
/

parler-tts-mini-expresso

@@ -17,17 +17,15 @@ datasets:
 <img src="https://huggingface.co/datasets/parler-tts/images/resolve/main/thumbnail.png" alt="Parler Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
-# Parler-TTS Mini: Expresso v0.1
-TODO: update link to space
-<a target="_blank" href="https://huggingface.co/spaces/parler-tts/parler_tts_mini_expresso">
   <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HuggingFace"/>
 </a>
-**Parler-TTS Mini: Expresso v0.1** is a fine-tuned version of [Parler-TTS Mini v0.1](https://huggingface.co/parler-tts/parler_tts_mini_v0.1)
 on the [Expresso](https://huggingface.co/datasets/ylacombe/expresso) dataset. It is a lightweight text-to-speech (TTS)
-model that can generate high-quality, natural sounding speech. Compared to the original model, Expresso v0.1 provides
 superior control over **emotions** (happy, confused, laughing, sad) and **consistent voices** (Jerry, Thomas, Elisabeth, Talia).
 It is part of the first release from the [Parler-TTS](https://github.com/huggingface/parler-tts) project, which aims to
@@ -72,7 +70,7 @@ sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
 * The model can generate in a range of emotions, including: "happy", "confused", "default" (meaning no particular emotion conveyed), "laughing", "sad", "whisper", "emphasis"
 * Include the term "high quality audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise
 * Punctuation can be used to control the prosody of the generations, e.g. use commas to add small breaks in speech
-* Wrap words in asterisk to emphasise them (e.g. `*you*` in the example above)
 ## Training Procedure

 <img src="https://huggingface.co/datasets/parler-tts/images/resolve/main/thumbnail.png" alt="Parler Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
+# Parler-TTS Mini: Expresso
+<a target="_blank" href="https://huggingface.co/spaces/parler-tts/parler-tts-expresso">
   <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HuggingFace"/>
 </a>
+**Parler-TTS Mini: Expresso** is a fine-tuned version of [Parler-TTS Mini v0.1](https://huggingface.co/parler-tts/parler_tts_mini_v0.1)
 on the [Expresso](https://huggingface.co/datasets/ylacombe/expresso) dataset. It is a lightweight text-to-speech (TTS)
+model that can generate high-quality, natural sounding speech. Compared to the original model, Parler-TTS Expresso provides
 superior control over **emotions** (happy, confused, laughing, sad) and **consistent voices** (Jerry, Thomas, Elisabeth, Talia).
 It is part of the first release from the [Parler-TTS](https://github.com/huggingface/parler-tts) project, which aims to
 * The model can generate in a range of emotions, including: "happy", "confused", "default" (meaning no particular emotion conveyed), "laughing", "sad", "whisper", "emphasis"
 * Include the term "high quality audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise
 * Punctuation can be used to control the prosody of the generations, e.g. use commas to add small breaks in speech
+* To emphasise particular words, wrap them in asterisk (e.g. `*you*` in the example above) and include "emphasis" in the prompt
 ## Training Procedure