A newer version of the Gradio SDK is available:
5.5.0
CLI
WIP
Dataset
Dataset.bat
webui (python webui_dataset.py
) consists of slice audio and transcribe wavs.
Slice audio
python slice.py -i <input_dir> -o <output_dir> -m <min_sec> -M <max_sec>
Required:
input_dir
: Path to the directory containing the audio files to slice.output_dir
: Path to the directory where the sliced audio files will be saved.
Optional:
min_sec
: Minimum duration of the sliced audio files in seconds (default 2).max_sec
: Maximum duration of the sliced audio files in seconds (default 12).
Transcribe wavs
python transcribe.py -i <input_dir> -o <output_file> --speaker_name <speaker_name>
Required:
input_dir
: Path to the directory containing the audio files to transcribe.output_file
: Path to the file where the transcriptions will be saved.speaker_name
: Name of the speaker.
Optional
--initial_prompt
: Initial prompt to use for the transcription (default value is specific to Japanese).--device
:cuda
orcpu
(default:cuda
).--language
:jp
,en
, oren
(default:jp
).--model
: Whisper model, default:large-v3
--compute_type
: default:bfloat16
Train
Train.bat
webui (python webui_train.py
) consists of the following.
Preprocess audio
python resample.py -i <input_dir> -o <output_dir> [--normalize] [--trim]
Required:
input_dir
: Path to the directory containing the audio files to preprocess.output_dir
: Path to the directory where the preprocessed audio files will be saved.
TO BE WRITTEN (WIP)
γγγγοΌ