Fredric's picture
Update README.md
b6e8a6b verified
|
raw
history blame
1.77 kB
metadata
title: Emotional TTS Comparison
emoji: πŸ—£οΈ
colorFrom: blue
colorTo: pink
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false

Emotional TTS Comparison

This project explores ways to incorporate emotion into Text-to-Speech (TTS) using OpenAI's GPT-4o-mini for text modification and TTS-1 for speech synthesis.

Capture

Background

While some TTS systems like Bark can include descriptive elements in speech (e.g., "(큰 μ†Œλ¦¬λ‘œ) μœ„ν—˜ν•΄μš”!"), they may have quality issues with noise. This project aims to find a method to convey emotion using OpenAI's TTS while maintaining high audio quality.

How It Works

  1. The user inputs a text.
  2. The system generates three versions of the text:
    • Original: The input text as-is
    • Emotional: A slightly more emotional version
    • Exaggerated: A highly emotional, exaggerated version
  3. Each version is then converted to speech using OpenAI's TTS-1 model.

Example

Original: "μœ„ν—˜ν•΄μš”" Emotional: "μœ„ν—˜ν•΄μš”!!" Exaggerated: "μž κΉλ§Œμš”! μ•ˆλΌ, μœ„ν—˜ν•΄μš”!!"

Features

  • Uses GPT-4o-mini for text modification
  • Employs OpenAI's TTS-1 for high-quality speech synthesis
  • Provides a Gradio interface for easy interaction
  • Allows comparison of different emotional intensities in speech

Usage

  1. Enter your text in the input box.
  2. Click "Generate Versions and Speech".
  3. Listen to and compare the three versions of the speech.

Deployment

This project is deployed on Hugging Face Spaces, allowing easy access and usage without local setup.

Note

This approach aims to strike a balance between conveying emotion and maintaining speech quality. It demonstrates how text modification can influence the perceived emotion in TTS output.