ESPnet: end-to-end speech processing toolkit

ESPnet is an end-to-end speech processing toolkit covering end-to-end speech recognition, text-to-speech, speech translation, speech enhancement, speaker diarization, spoken language understanding, and so on. ESPnet uses pytorch as a deep learning engine and also follows Kaldi style data processing, feature extraction/format, and recipes to provide a complete setup for various speech processing experiments.

Citing ESPnet

@inproceedings{watanabe2018espnet,
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  title={{ESPnet}: End-to-End Speech Processing Toolkit},
  year={2018},
  booktitle={Proceedings of Interspeech},
  pages={2207--2211},
  doi={10.21437/Interspeech.2018-1456},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}

@inproceedings{hayashi2020espnet,
  title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
  author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
  booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7654--7658},
  year={2020},
  organization={IEEE}
}

@inproceedings{inaguma-etal-2020-espnet,
    title = "{ESP}net-{ST}: All-in-One Speech Translation Toolkit",
    author = "Inaguma, Hirofumi  and
      Kiyono, Shun  and
      Duh, Kevin  and
      Karita, Shigeki  and
      Yalta, Nelson  and
      Hayashi, Tomoki  and
      Watanabe, Shinji",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-demos.34",
    pages = "302--311",
}

@inproceedings{li2020espnet,
  title={{ESPnet-SE}: End-to-End Speech Enhancement and Separation Toolkit Designed for {ASR} Integration},
  author={Chenda Li and Jing Shi and Wangyou Zhang and Aswin Shanmugam Subramanian and Xuankai Chang and Naoyuki Kamo and Moto Hira and Tomoki Hayashi and Christoph Boeddeker and Zhuo Chen and Shinji Watanabe},
  booktitle={Proceedings of IEEE Spoken Language Technology Workshop (SLT)},
  pages={785--792},
  year={2021},
  organization={IEEE},
}

@article{arora2021espnet,
  title={ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet},
  author={Arora, Siddhant and Dalmia, Siddharth and Denisov, Pavel and Chang, Xuankai and Ueda, Yushi and Peng, Yifan and Zhang, Yuekai and Kumar, Sujay and Ganesan, Karthik and Yan, Brian and others},
  journal={arXiv preprint arXiv:2111.14706},
  year={2021}
}

</details>

ESPnet

AI & ML interests

ESPnet: end-to-end speech processing toolkit

spaces 3

Svs

TTS

S2st

models 481

espnet/DCASE23.AudioCaptioning.PreTrained

espnet/DCASE23.AudioCaptioning.FineTuned

espnet/voxcelebs12_ebranchformer_base

espnet/rats_ska_wavlm_frozen

espnet/owsm_ctc_v3.2_ft_1B

espnet/owsm_ctc_v3.1_1B

espnet/mixdata_svs_visinger2_spkembed_lang_pretrained

espnet/libriheavy_small_ebranchformer

espnet/UniverSLU-17-Task-Specifier

espnet/UniverSLU-17-Natural-Phrase

datasets 16

espnet/floras_2

espnet/floras

espnet/DSUChallenge2024

espnet/espnet_speechlm_pretrained_tts

espnet/espnet_speechlm_pretrained_asr

espnet/ace-kising-segments

espnet/ace-opencpop-segments

espnet/wikitongues

espnet/jesus_dramas

espnet/mms_ulab_v2

AI & ML interests

Team members 133

ESPnet: end-to-end speech processing toolkit

spaces 3 Sort: Recently updated

Svs

TTS

S2st

models 481 Sort: Recently updated

datasets 16 Sort: Recently updated

spaces 3

models 481

datasets 16