SpeechT5 - a microsoft Collection

microsoft 's Collections

Phi-3

TAPEX

Table Transformer

Orca

UDOP

GIT

Phi-1

IFMs

SpeechT5

updated Jul 11

The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks.

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

Paper • 2110.07205 • Published Oct 14, 2021 • 5
microsoft/speecht5_tts

Text-to-Speech • Updated Nov 8, 2023 • 139k • 678

Note Text-to-speech version of SpeechT5
Runtime error

217

👩‍🎤

SpeechT5 Speech Synthesis Demo
microsoft/speecht5_vc

Audio-to-Audio • Updated Mar 22, 2023 • 1.61k • 90

Note Voice-conversion version of SpeechT5
Runtime error

95

👩‍🎤

SpeechT5 Voice Conversion Demo
microsoft/speecht5_asr

Automatic Speech Recognition • Updated Mar 22, 2023 • 2.34k • 36

Note Automatic-speech-recognition version of SpeechT5
Runtime error

36

👩‍🎤

SpeechT5 Speech Recognition Demo
microsoft/speecht5_hifigan

Updated Feb 2, 2023 • 131k • 17

Note SpeechT5 produces a spectrogram, this model converts it to a waveform