ttsdoc / README.md
Sethu Iyer
App added
020af7d

A newer version of the Gradio SDK is available: 5.5.0

Upgrade
metadata
title: ttsdoc
emoji: πŸŒ–
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 4.41.0
app_file: app.py
pinned: false
license: apache-2.0

ttsdoc πŸŒ–

ttsdoc is a Text-to-Speech (TTS) application that can read your PDF documents aloud. It uses the Parler TTS Mini v1 model to generate high-quality audio from text inputs, including uploaded PDF files.

Features

  • πŸ“„ Support for PDF, TXT, and DOCX file uploads
  • ✍️ Direct text input option
  • πŸ—£οΈ Customizable voice descriptions
  • ⏱️ Adjustable maximum audio duration
  • πŸš€ GPU-accelerated audio generation

How to Use

  1. Upload a PDF, TXT, or DOCX file or enter text directly.
  2. Customize the voice description if desired.
  3. Adjust the maximum audio duration.
  4. Click "Generate Audio" to create the TTS output.

Tips for Best Results

  • For longer texts, the generator will create audio up to the specified maximum duration.
  • Experiment with different voice descriptions to achieve the desired output.
  • Use punctuation to control pacing and intonation in the generated speech.
  • For optimal quality, try to keep individual sentences or paragraphs concise.

Technical Details

  • This demo uses the Parler TTS Mini v1 model.
  • Audio generation is GPU-accelerated for faster processing.
  • Maximum file size for uploads: 5MB

License

This project is licensed under the Apache 2.0 License.


Powered by Gradio and Hugging Face