openai/whisper-large-v3 · how to handle input audio files with either white noise or general noise and no speech

unk1911

Dec 14, 2023

it seems this model performs extremely well when there's actual discernable language/conversation, but if i test with an audio clip that contains non-discernable noise, it produces a bunch of gibberish. is there any way to prevent it from generating gibberish?

here's an example of gibberish produced from white noise audio:

2023-12-14 01:35:51,945 [INFO] Takk for watching! 1 tbsps of butter 1 tbsps of flour 1 tbsps of baking powder 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1 tbsps of baking soda 1.5 kg of pork belly [0:00:03.359228s]

skypro1111

Dec 14, 2023

Use VAD and cut no-speech chunks.
https://huggingface.co/pyannote/voice-activity-detection
https://github.com/snakers4/silero-vad

unk1911

Dec 16, 2023

thanks this worked perfectly for me!