Is this the best punctuator at the moment or are there better ones?

#4
by MonsterMMORPG - opened

I am using bert-restore-punctuation for fixing the transcription generated by whisper for my youtube channel videos

My youtube channel (technology, education and programming) : https://www.youtube.com/SECourses

Whisper starts to lose ability to punctuate in some cases i don't know why but then it requires to fix punctuation otherwise it is very bad as a good subtitle

So if there are any better alternative punctuator atm that works better than felflare/bert-restore-punctuation could anyone let me know?

And this is my how to use whisper video if anyone is interested in : https://youtu.be/msj3wuYf3d8

Have you found an alternative?

Have you found an alternative?

I am still using same

@MonsterMMORPG your videos e.g. https://www.youtube.com/watch?v=dpM02YMj8FY seem to now contain punctuations.
Was it done by YouTube automatically or did you rely on tools like ChatGPT or these online HuggingFace punctuators?

Thanks!

@MonsterMMORPG your videos e.g. https://www.youtube.com/watch?v=dpM02YMj8FY seem to now contain punctuations.
Was it done by YouTube automatically or did you rely on tools like ChatGPT or these online HuggingFace punctuators?

Thanks!

I follow 4 steps

first transcribe with whisper
then punctuate with felflare/bert-restore-punctuation
then fix capital letters
then manually fix whisper transcribe errors manually

That's a lot of work.
I'm thinking about converting a good punctuator to ONNX and make a simple web page that lets you enter a text or YouTube link, and have it automatically run the network over the existing raw transcript.
Would that be useful for you?

Also I found felflare/bert-restore-punctuation not to be as good as https://huggingface.co/unikei/distilbert-base-re-punctuate

Plus, I wasn't able to convert felflare/bert-restore-punctuation to ONNX, but unikei/distilbert-base-re-punctuate works in ONNX.

Have you tried unikei/distilbert-base-re-punctuate ?

That's a lot of work.
I'm thinking about converting a good punctuator to ONNX and make a simple web page that lets you enter a text or YouTube link, and have it automatically run the network over the existing raw transcript.
Would that be useful for you?

Also I found felflare/bert-restore-punctuation not to be as good as https://huggingface.co/unikei/distilbert-base-re-punctuate

Plys, I wasn't able to convert felflare/bert-restore-punctuation to ONNX, but unikei/distilbert-base-re-punctuate works in ONNX.

Have you tried unikei/distilbert-base-re-punctuate ?

actually only last part is manual

i haven't tested this yet : unikei/distilbert-base-re-punctuate

nice ty

@MonsterMMORPG
I created a simple web page where you can try punctuating any youtube video.
Let me know if you have time to try and what the results look like for you.
https://www.appblit.com/scribe

Laurent

@MonsterMMORPG
I created a simple web page where you can try punctuating any youtube video.
Let me know if you have time to try and what the results look like for you.
https://www.appblit.com/scribe

Laurent

working decent

i tested on this video : https://youtu.be/PNA9p94JmtY

Thanks for testing it!
Now, it also streams the results as they come (chunk by chunk before Distilbert accepts a max 512 token inputs), and works on any text length.

I would like to add paragraph breaks: do you know of a model that performs this task?

Also, would automatic summary and chapters be useful to you? I saw several videos that has chapters with links to facilitate navigation.
Or is that already provided by YouTube?

I'm @ldenoue on Twitter by the way

Laurent

Thanks for testing it!
Now, it also streams the results as they come (chunk by chunk before Distilbert accepts a max 512 token inputs), and works on any text length.

I would like to add paragraph breaks: do you know of a model that performs this task?

Also, would automatic summary and chapters be useful to you? I saw several videos that has chapters with links to facilitate navigation.
Or is that already provided by YouTube?

I'm @ldenoue on Twitter by the way

Laurent

I add chapters manually

I am also dividing it the paragraphs myself with my app. but if youtube can auto synch i prefer it much better

i followed you on twitter this is mine > https://twitter.com/GozukaraFurkan

i made a comparison between my last used and new unikei/distilbert-base-re-punctuate

here results

https://twitter.com/GozukaraFurkan/status/1715045585324003358

very good analysis. How did you handle the long texts?

very good analysis. How did you handle the long texts?

i load from text file in a python app if you mean that

i used their example code that put on readme. they are amazing :)

Sign up or log in to comment