Sleeping
55
🔊
Fully open Whisper-style speech foundation models developed by CMU WAVLab: https://www.wavlab.org/activities/2024/owsm/
Note (ACL'24) CTC-based non-autoregressive speech foundation model for multilingual ASR, ST, and LID.
Note OWSM-CTC v3.1 is further fine-tuned on v3.2 data to improve long-form robustness.
Note (INTERSPEECH'24) OWSM v3.1 medium with 1.02B parameters.
Note (INTERSPEECH'24) OWSM v3.1 small with 367M parameters.
Note (INTERSPEECH'24) OWSM v3.1 base with 101M parameters.
Note (INTERSPEECH'24) OWSM v3.1 small trained on a subset of data with low restriction licenses.
Note (INTERSPEECH'24) OWSM small with data cleaning.