May I ask how could I use pipeline and output the transcript as SRT with timestamp?
Thanks for fine tuning the Cantonese model. May I ask how could I use pipeline and output the transcript as SRT with timestamp? I could run the audio file and output a paragraph of Cantonese, but couldn't find the way to output like SRT format using whisper.
I usually use WhisperX to get word/sentence-level timestamps. I am not sure exactly how to export it to SRT files but I have now uploaded the CTS version of this model so you dont need to convert it
Thanks for your reply. I am a beginner in using whisper and your model. May I ask what does CTS version mean and stand for?
do you mean I can use whisperx.load_model("alvanlii/whisper-small-cantonese") to load this model file directly instead of using pipeline method?
Thanks.
hmmm it might be easier for you to use this one