Hello, have you made changes to the tokenizer?As you can see, the same strings result in longer token sequences with your tokenizer, the same thing happens if I use WhisperTokenizer, I'm looking fot the reason behind this.
WhisperTokenizer
Okay, it's the timestamp tokens that broke
· Sign up or log in to comment