A newer version of the Gradio SDK is available:
5.5.0
metadata
title: ttsdoc
emoji: π
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 4.41.0
app_file: app.py
pinned: false
license: apache-2.0
ttsdoc π
ttsdoc is a Text-to-Speech (TTS) application that can read your PDF documents aloud. It uses the Parler TTS Mini v1 model to generate high-quality audio from text inputs, including uploaded PDF files.
Features
- π Support for PDF, TXT, and DOCX file uploads
- βοΈ Direct text input option
- π£οΈ Customizable voice descriptions
- β±οΈ Adjustable maximum audio duration
- π GPU-accelerated audio generation
How to Use
- Upload a PDF, TXT, or DOCX file or enter text directly.
- Customize the voice description if desired.
- Adjust the maximum audio duration.
- Click "Generate Audio" to create the TTS output.
Tips for Best Results
- For longer texts, the generator will create audio up to the specified maximum duration.
- Experiment with different voice descriptions to achieve the desired output.
- Use punctuation to control pacing and intonation in the generated speech.
- For optimal quality, try to keep individual sentences or paragraphs concise.
Technical Details
- This demo uses the Parler TTS Mini v1 model.
- Audio generation is GPU-accelerated for faster processing.
- Maximum file size for uploads: 5MB
License
This project is licensed under the Apache 2.0 License.
Powered by Gradio and Hugging Face