Fredric
commited on
Commit
β’
dab5cce
0
Parent(s):
Initial commit
Browse files
README.md
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Emotional TTS Comparison
|
2 |
+
|
3 |
+
This project explores ways to incorporate emotion into Text-to-Speech (TTS) using OpenAI's GPT-4 for text modification and TTS-1 for speech synthesis.
|
4 |
+
|
5 |
+
## Background
|
6 |
+
|
7 |
+
While some TTS systems like Bark can include descriptive elements in speech (e.g., "(ν° μ리λ‘) μνν΄μ!"), they may have quality issues with noise. This project aims to find a method to convey emotion using OpenAI's TTS while maintaining high audio quality.
|
8 |
+
|
9 |
+
## How It Works
|
10 |
+
|
11 |
+
1. The user inputs a text.
|
12 |
+
2. The system generates three versions of the text:
|
13 |
+
- Original: The input text as-is
|
14 |
+
- Emotional: A slightly more emotional version
|
15 |
+
- Exaggerated: A highly emotional, exaggerated version
|
16 |
+
3. Each version is then converted to speech using OpenAI's TTS-1 model.
|
17 |
+
|
18 |
+
## Example
|
19 |
+
|
20 |
+
Original: "μνν΄μ"
|
21 |
+
Emotional: "μνν΄μ!!"
|
22 |
+
Exaggerated: "μ κΉλ§μ! μλΌ, μνν΄μ!!"
|
23 |
+
|
24 |
+
## Features
|
25 |
+
|
26 |
+
- Uses GPT-4o-mini for text modification
|
27 |
+
- Employs OpenAI's TTS-1 for high-quality speech synthesis
|
28 |
+
- Provides a Gradio interface for easy interaction
|
29 |
+
- Allows comparison of different emotional intensities in speech
|
30 |
+
|
31 |
+
## Usage
|
32 |
+
|
33 |
+
1. Enter your text in the input box.
|
34 |
+
2. Click "Generate Versions and Speech".
|
35 |
+
3. Listen to and compare the three versions of the speech.
|
36 |
+
|
37 |
+
## Deployment
|
38 |
+
|
39 |
+
This project is deployed on Hugging Face Spaces, allowing easy access and usage without local setup.
|
40 |
+
|
41 |
+
## Note
|
42 |
+
|
43 |
+
This approach aims to strike a balance between conveying emotion and maintaining speech quality. It demonstrates how text modification can influence the perceived emotion in TTS output.
|
44 |
+
|