|
--- |
|
title: ttsdoc |
|
emoji: π |
|
colorFrom: yellow |
|
colorTo: gray |
|
sdk: gradio |
|
sdk_version: 4.41.0 |
|
app_file: app.py |
|
pinned: false |
|
license: apache-2.0 |
|
--- |
|
# ttsdoc π |
|
|
|
ttsdoc is a Text-to-Speech (TTS) application that can read your PDF documents aloud. It uses the Parler TTS Mini v1 model to generate high-quality audio from text inputs, including uploaded PDF files. |
|
|
|
## Features |
|
|
|
- π Support for PDF, TXT, and DOCX file uploads |
|
- βοΈ Direct text input option |
|
- π£οΈ Customizable voice descriptions |
|
- β±οΈ Adjustable maximum audio duration |
|
- π GPU-accelerated audio generation |
|
|
|
## How to Use |
|
|
|
1. Upload a PDF, TXT, or DOCX file or enter text directly. |
|
2. Customize the voice description if desired. |
|
3. Adjust the maximum audio duration. |
|
4. Click "Generate Audio" to create the TTS output. |
|
|
|
## Tips for Best Results |
|
|
|
- For longer texts, the generator will create audio up to the specified maximum duration. |
|
- Experiment with different voice descriptions to achieve the desired output. |
|
- Use punctuation to control pacing and intonation in the generated speech. |
|
- For optimal quality, try to keep individual sentences or paragraphs concise. |
|
|
|
## Technical Details |
|
|
|
- This demo uses the Parler TTS Mini v1 model. |
|
- Audio generation is GPU-accelerated for faster processing. |
|
- Maximum file size for uploads: 5MB |
|
|
|
## License |
|
|
|
This project is licensed under the Apache 2.0 License. |
|
|
|
--- |
|
|
|
Powered by [Gradio](https://gradio.app) and [Hugging Face](https://huggingface.co) |
|
|