TylorShine's picture
Update README.md
9cefb87
metadata
language: ja
tags:
  - speech
license: other

distilhubert-ft-japanese-50k

Fine-tuned (more precisely, continue trained) 50k steps model on Japanese using the JVS corpus, Tsukuyomi-Chan corpus, Amitaro's ITA corpus V2.1, and recorded my own read ITA corpus.

Attention

This checkpoint was used the JVS corpus when training. Please read and accept the terms of use
(This terms of use was also applies this checkpoint. This means also applies this "terms of use" when you use this checkpoint with another project etc...)

References

Original repos, Many thanks!:
S3PRL

  • Using this when training (with little modify for train using own datasets).

distilhubert (hf)

Note: As same as the original, this model does not have a tokenizer as it was pretrained on audio alone. In order to use this model speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out this blog for more in-detail explanation of how to fine-tune the model.

Usage

See this blog for more information on how to fine-tune the model. Note that the class Wav2Vec2ForCTC has to be replaced by HubertForCTC.

Note: This is not the best checkpoint and become more accurate with continued train, I think. I'll try to continue when I have a time.

Credits

  ■つくよみちゃんコーパス(CV.夢前黎)

https://tyc.rei-yumesaki.net/material/corpus/

あみたろの声素材工房

https://amitaro.net/

Thanks!