Spaces:

kkvc-hf
/

Style-Bert-VITS2-IM

Running

TAKESHI0\ogawa

6165a5e 3 months ago

1.6 kB

	# CLI

	WIP

	## Dataset

	`Dataset.bat` webui (`python webui_dataset.py`) consists of slice audio and transcribe wavs.

	### Slice audio

	```bash
	python slice.py -i <input_dir> -o <output_dir> -m <min_sec> -M <max_sec>
	```

	Required:
	- `input_dir`: Path to the directory containing the audio files to slice.
	- `output_dir`: Path to the directory where the sliced audio files will be saved.

	Optional:
	- `min_sec`: Minimum duration of the sliced audio files in seconds (default 2).
	- `max_sec`: Maximum duration of the sliced audio files in seconds (default 12).

	### Transcribe wavs

	```bash
	python transcribe.py -i <input_dir> -o <output_file> --speaker_name <speaker_name>
	```

	Required:
	- `input_dir`: Path to the directory containing the audio files to transcribe.
	- `output_file`: Path to the file where the transcriptions will be saved.
	- `speaker_name`: Name of the speaker.

	Optional
	- `--initial_prompt`: Initial prompt to use for the transcription (default value is specific to Japanese).
	- `--device`: `cuda` or `cpu` (default: `cuda`).
	- `--language`: `jp`, `en`, or `en` (default: `jp`).
	- `--model`: Whisper model, default: `large-v3`
	- `--compute_type`: default: `bfloat16`

	## Train

	`Train.bat` webui (`python webui_train.py`) consists of the following.

	### Preprocess audio
	```bash
	python resample.py -i <input_dir> -o <output_dir> [--normalize] [--trim]
	```

	Required:
	- `input_dir`: Path to the directory containing the audio files to preprocess.
	- `output_dir`: Path to the directory where the preprocessed audio files will be saved.

	TO BE WRITTEN (WIP)

	これいる？