Spaces:
Running
on
L4
Running
on
L4
How to fine tuning with timestamps
#122
by
deepdml
- opened
I have my own labeled dataset and I want to fine tune it with the accuracy of timestamps as well. How can I do that using Transformers library?
@sanchit-gandhi
For fine tuning I'm following https://huggingface.co/blog/fine-tune-whisper but I didn't find anything related to timestamps.
For distil-whisper I've read that it's possible to use timestamp when pseudo-labelling: https://github.com/huggingface/distil-whisper/tree/main/training#1-pseudo-labelling. How can we addapt this to fine-tuning? Thanks
Timestamps are just tokens. All you need to do is figure out how to inject the correct timestamp token at the correct position in the text.
Take a look at the vocab.json to understand how timestamps are tokenized.