|
--- |
|
title: Emotional TTS Comparison |
|
emoji: π£οΈ |
|
colorFrom: blue |
|
colorTo: pink |
|
sdk: gradio |
|
sdk_version: 4.44.0 |
|
app_file: app.py |
|
pinned: false |
|
--- |
|
|
|
# Emotional TTS Comparison |
|
|
|
This project explores ways to incorporate emotion into Text-to-Speech (TTS) using OpenAI's GPT-4o-mini for text modification and TTS-1 for speech synthesis. |
|
|
|
![Capture](./images/capture.png) |
|
|
|
|
|
## Background |
|
|
|
While some TTS systems like Bark can include descriptive elements in speech (e.g., "(ν° μ리λ‘) μνν΄μ!"), they may have quality issues with noise. This project aims to find a method to convey emotion using OpenAI's TTS while maintaining high audio quality. |
|
|
|
## How It Works |
|
|
|
1. The user inputs a text. |
|
2. The system generates three versions of the text: |
|
- Original: The input text as-is |
|
- Emotional: A slightly more emotional version |
|
- Exaggerated: A highly emotional, exaggerated version |
|
3. Each version is then converted to speech using OpenAI's TTS-1 model. |
|
|
|
## Example |
|
|
|
Original: "μνν΄μ" |
|
Emotional: "μνν΄μ!!" |
|
Exaggerated: "μ κΉλ§μ! μλΌ, μνν΄μ!!" |
|
|
|
## Features |
|
|
|
- Uses GPT-4o-mini for text modification |
|
- Employs OpenAI's TTS-1 for high-quality speech synthesis |
|
- Provides a Gradio interface for easy interaction |
|
- Allows comparison of different emotional intensities in speech |
|
|
|
## Usage |
|
|
|
1. Enter your text in the input box. |
|
2. Click "Generate Versions and Speech". |
|
3. Listen to and compare the three versions of the speech. |
|
|
|
## Deployment |
|
|
|
This project is deployed on Hugging Face Spaces, allowing easy access and usage without local setup. |
|
|
|
## Note |
|
|
|
This approach aims to strike a balance between conveying emotion and maintaining speech quality. It demonstrates how text modification can influence the perceived emotion in TTS output. |