GPT-SoVITS-v2-jay / docs /en /Changelog_EN.md
kevinwang676's picture
Upload folder using huggingface_hub
1503e4f verified

A newer version of the Gradio SDK is available: 5.7.1

Upgrade

20240121 Update

  1. Added is_share to the config. In scenarios like Colab, this can be set to True to map the WebUI to the public network.
  2. Added English system translation support to WebUI.
  3. The cmd-asr automatically detects if the FunASR model is included; if not found in the default directory, it will be downloaded from ModelScope.
  4. Attempted to fix the SoVITS training ZeroDivisionError reported in Issue 79 by filtering samples with zero length, etc.
  5. Cleaned up cached audio files and other files in the TEMP folder.
  6. Significantly reduced the issue of synthesized audio containing the end of the reference audio.

20240122 Update

  1. Fixed the issue where excessively short output files resulted in repeating the reference audio.
  2. Tested native support for English and Japanese training (Japanese training requires the root directory to be free of non-English special characters).
  3. Improved audio path checking. If an attempt is made to read from an incorrect input path, it will report that the path does not exist instead of an ffmpeg error.

20240123 Update

  1. Resolved the issue where Hubert extraction caused NaN errors, leading to SoVITS/GPT training ZeroDivisionError.
  2. Added support for quick model switching in the inference WebUI.
  3. Optimized the model file sorting logic.
  4. Replaced jieba with jieba_fast for Chinese word segmentation.

20240126 Update

  1. Added support for Chinese-English mixed and Japanese-English mixed output texts.
  2. Added an optional segmentation mode for output.
  3. Fixed the issue of UVR5 reading and automatically jumping out of directories.
  4. Fixed multiple newline issues causing inference errors.
  5. Removed redundant logs in the inference WebUI.
  6. Supported training and inference on Mac.
  7. Automatically forced single precision for GPU that do not support half precision; enforced single precision under CPU inference.

20240128 Update

  1. Fixed the issue with the pronunciation of numbers converting to Chinese characters.
  2. Fixed the issue of swallowing a few characters at the beginning of sentences.
  3. Excluded unreasonable reference audio lengths by setting restrictions.
  4. Fixed the issue where GPT training did not save checkpoints.
  5. Completed model downloading process in the Dockerfile.

20240129 Update

  1. Changed training configurations to single precision for GPUs like the 16 series, which have issues with half precision training.
  2. Tested and updated the available Colab version.
  3. Fixed the issue of git cloning the ModelScope FunASR repository with older versions of FunASR causing interface misalignment errors.

20240130 Update

  1. Automatically removed double quotes from all path-related entries to prevent errors from novice users copying paths with double quotes.
  2. Fixed issues with splitting Chinese and English punctuation and added punctuation at the beginning and end of sentences.
  3. Added splitting by punctuation.

20240201 Update

  1. Fixed the UVR5 format reading error causing separation failures.
  2. Supported automatic segmentation and language recognition for mixed Chinese-Japanese-English texts.

20240202 Update

  1. Fixed the issue where an ASR path ending with / caused an error in saving the filename.
  2. PR 377 introduced PaddleSpeech's Normalizer to fix issues like reading "xx.xx%" (percent symbols) and "元/吨" being read as "元吨" instead of "元每吨", and fixed underscore errors.

20240207 Update

  1. Corrected language parameter confusion causing decreased Chinese inference quality reported in Issue 391.
  2. PR 403 adapted UVR5 to higher versions of librosa.
  3. Commit 14a2851 fixed UVR5 inf everywhere error caused by is_half parameter not converting to boolean, resulting in constant half precision inference, which caused inf on 16 series GPUs.
  4. Optimized English text frontend.
  5. Fixed Gradio dependencies.
  6. Supported automatic reading of .list full paths if the root directory is left blank during dataset preparation.
  7. Integrated Faster Whisper ASR for Japanese and English.

20240208 Update

  1. Commit 59f35ad attempted to fix GPT training hang on Windows 10 1909 and Issue 232 (Traditional Chinese System Language).

20240212 Update

  1. Optimized logic for Faster Whisper and FunASR, switching Faster Whisper to mirror downloads to avoid issues with Hugging Face connections.
  2. PR 457 enabled experimental DPO Loss training option to mitigate GPT repetition and missing characters by constructing negative samples during training and made several inference parameters available in the inference WebUI.

20240214 Update

  1. Supported Chinese experiment names in training (previously caused errors).
  2. Made DPO training an optional feature instead of mandatory. If selected, the batch size is automatically halved. Fixed issues with new parameters not being passed in the inference WebUI.

20240216 Update

  1. Supported input without reference text.
  2. Fixed bugs in Chinese frontend reported in Issue 475.

20240221 Update

  1. Added a noise reduction option during data processing (noise reduction leaves only 16kHz sampling rate; use only if the background noise is significant).
  2. PR 559, PR 556, PR 532, PR 507, PR 509 optimized Chinese and Japanese frontend processing.
  3. Switched Mac CPU inference to use CPU instead of MPS for faster performance.
  4. Fixed Colab public URL issue.

20240306 Update

  1. PR 672 accelerated inference by 50% (tested on RTX3090 + PyTorch 2.2.1 + CU11.8 + Win10 + Py39) .
  2. No longer requires downloading the Chinese FunASR model first when using Faster Whisper non-Chinese ASR.
  3. PR 610 fixed UVR5 reverb removal model where the setting was reversed.
  4. PR 675 enabled automatic CPU inference for Faster Whisper if no CUDA is available.
  5. PR 573 modified is_half check to ensure proper CPU inference on Mac.

202403/202404/202405 Update

Minor Fixes:

  1. Fixed issues with the no-reference text mode.
  2. Optimized the Chinese and English text frontend.
  3. Improved API format.
  4. Fixed CMD format issues.
  5. Added error prompts for unsupported languages during training data processing.
  6. Fixed the bug in Hubert extraction.

Major Fixes:

  1. Fixed the issue of SoVITS training without freezing VQ (which could cause quality degradation).
  2. Added a quick inference branch.

20240610 Update

Minor Fixes:

  1. PR 1168 & PR 1169 improved the logic for pure punctuation and multi-punctuation text input.
  2. Commit 501a74a fixed CMD format for MDXNet de-reverb in UVR5, supporting paths with spaces.
  3. PR 1159 fixed progress bar logic for SoVITS training in s2_train.py.

Major Fixes:

  1. Commit 99f09c8 fixed the issue of WebUI's GPT fine-tuning not reading BERT feature of Chinese input texts, causing inconsistency with inference and potential quality degradation. Caution: If you have previously fine-tuned with a large amount of data, it is recommended to retune the model to improve quality.

20240706 Update

Minor Fixes:

  1. Commit 1250670 fixed default batch size decimal issue in CPU inference.
  2. PR 1258, PR 1265, PR 1267 fixed issues where denoising or ASR encountering exceptions would exit all pending audio files.
  3. PR 1253 fixed the issue of splitting decimals when splitting by punctuation.
  4. Commit a208698 fixed multi-process save logic for multi-GPU training.
  5. PR 1251 removed redundant my_utils.

Major Fixes:

  1. The accelerated inference code from PR 672 has been validated and merged into the main branch, ensuring consistent inference effects with the base. It also supports accelerated inference in no-reference text mode.

Future updates will continue to verify the consistency of changes in the fast_inference branch.

20240727 Update

Minor Fixes:

  1. PR 1298 cleaned up redundant i18n code.
  2. PR 1299 fixed issues where trailing slashes in user file paths caused command line errors.
  3. PR 756 fixed the step calculation logic in GPT training.

Major Fixes:

  1. Commit 9588a3c supported speech rate adjustment for synthesis. Enabled freezing randomness while only adjusting the speech rate.

20240806 Update

  1. PR 1306, PR 1356 Added support for the BS RoFormer vocal accompaniment separation model. Commit e62e965 Enabled FP16 inference.
  2. Improved Chinese text frontend.
    • PR 488 added support for polyphonic characters (v2 only);
    • PR 987 added quantifier;
    • PR 1351 supports arithmetic and basic math formulas;
    • PR 1404 fixed mixed text errors.
  3. PR 1355 automatically filled in the paths when processing audio in the WebUI.
  4. Commit bce451a, Commit 4c8b761 optimized GPU recognition logic.
  5. Commit 8a10147 added support for Cantonese ASR.
  6. Added support for GPT-SoVITS v2.
  7. PR 1387 optimized timing logic.