Spaces:
Running
on
L4
Time-codes from whisper
is it possible to get the time codes of each word of the generated text from whisper?
Hey @EranML ! You can get the time stamps of each segment: https://huggingface.co/openai/whisper-large-v2#long-form-transcription
For word-level time stamps, you can check out WhisperX: https://github.com/m-bain/whisperX
@EranML , The latest whisper version (20230314) supports word-level timestamps and word-level posteriors. (See the --word_timestamps option, and set it to True.)
We're looking to add this to transformers too :)
@sanchit-gandhi , Any idea when word level timestamps will be added to transformers?
Hi All - also noticed that the milliseconds part of the timestamps are rounded off leading to premature cut-offs, if using the audio segments:
00:00:00.000 --> 00:00:05.000: There's so many things here, and in my house, that people are always saying,
00:00:05.000 --> 00:00:07.000: where did you get that? And I'm like, I don't know.
Is there a way to turn off the rounding, so we get the actual milliseconds? Thank you for help.
~~
Update: for anyone needing to have increased resolution on timestamps, I found this library, and it works great in stabilizing the milliseconds portion of the VTT timecodes: https://pypi.org/project/stable-ts/