metadata

title: ttsdoc
emoji: 🌖
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 4.41.0
app_file: app.py
pinned: false
license: apache-2.0

ttsdoc 🌖

ttsdoc is a Text-to-Speech (TTS) application that can read your PDF documents aloud. It uses the Parler TTS Mini v1 model to generate high-quality audio from text inputs, including uploaded PDF files.

Features

📄 Support for PDF, TXT, and DOCX file uploads
✍️ Direct text input option
🗣️ Customizable voice descriptions
⏱️ Adjustable maximum audio duration
🚀 GPU-accelerated audio generation

How to Use

Upload a PDF, TXT, or DOCX file or enter text directly.
Customize the voice description if desired.
Adjust the maximum audio duration.
Click "Generate Audio" to create the TTS output.

Tips for Best Results

For longer texts, the generator will create audio up to the specified maximum duration.
Experiment with different voice descriptions to achieve the desired output.
Use punctuation to control pacing and intonation in the generated speech.
For optimal quality, try to keep individual sentences or paragraphs concise.

Technical Details

This demo uses the Parler TTS Mini v1 model.
Audio generation is GPU-accelerated for faster processing.
Maximum file size for uploads: 5MB

License

This project is licensed under the Apache 2.0 License.