ESPnet2 ASR model
espnet/DCASE23.AudioCaptioning.PreTrained
This model was trained by Shikhar Bharadwaj using clotho_v2 recipe in espnet.
Demo: How to use in ESPnet2
Follow the ESPnet installation instructions if you haven't done that already.
cd espnet
git checkout 09779acc8f744a3bf0dc20f4c0ac7ba91df4736d
pip install -e .
cd egs2/clotho_v2/asr1
./run.sh --skip_data_prep false --skip_train true --download_model espnet/dcase23.aac.pt
RESULTS
Environments
- date:
Mon Nov 4 13:48:05 CST 2024
- python version:
3.9.18 | packaged by conda-forge | (main, Dec 23 2023, 16:33:10) [GCC 12.3.0]
- espnet version:
espnet 202402
- pytorch version:
pytorch 2.4.0
- Git hash:
aa8910e107440c14a9e22e35e252c562e636552e
- Commit date:
Sun Oct 13 10:22:11 2024 -0500
- Commit date:
exp/asr_pt.initfix.bigbatch512.lr2e-4.weighted12layers.20241103.145125/
=====================================================
Split: evaluation Evaluation over 1045 predictions.
=====================================================
cider_d : 0.19368922695190316
spice : 0.0892430642605593
spider : 0.14146614560623122
sbert_sim : 0.47783581224140936
fer : 0.5301435406698565
fense : 0.2520530143112799
meteor : 0.13875538640747337
rouge_l : 0.2737659351281743
fer.add_tail_prob : 0.2866000533103943
fer.repeat_event_prob: 0.08905956894159317
fer.repeat_adv_prob : 0.0013483789516612887
fer.remove_conj_prob: 0.19881780445575714
fer.remove_verb_prob: 0.3627748191356659
fer.error_prob : 0.6818609833717346
spider_fl : 0.07657683145973274
=====================================================
`
### Citing ESPnet
```BibTex
@inproceedings{watanabe2018espnet,
author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
title={{ESPnet}: End-to-End Speech Processing Toolkit},
year={2018},
booktitle={Proceedings of Interspeech},
pages={2207--2211},
doi={10.21437/Interspeech.2018-1456},
url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}
or arXiv:
@misc{watanabe2018espnet,
title={ESPnet: End-to-End Speech Processing Toolkit},
author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
year={2018},
eprint={1804.00015},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 4
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.