Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
Xenova 
posted an update Jun 9
Post
10181
Introducing Whisper WebGPU: Blazingly-fast ML-powered speech recognition directly in your browser! 🚀 It supports multilingual transcription and translation across 100 languages! 🤯

The model runs locally, meaning no data leaves your device! 😍

Check it out! 👇
- Demo: Xenova/whisper-webgpu
- Source code: https://github.com/xenova/whisper-web/tree/experimental-webgpu

I love this. I think it would be very cool if we could get a WebGPU model running that could differentiate between different speakers in an audio sample (e.g., “Person 1, Person 2”).

Keep up the good work!

Amazing progress, thank you!

For anyone trying this on Chrome on Linux, you may need to set some flags: a comment appears in the Console when the model loading gets stuck suggesting a startup flag, but you can just switch those with chrome://flags directly and relaunch - for me, I needed to enable both Vulkan and Unsafe WebGPU, and then it works (seriously fast I should note!!)

Oh dear God, I love this. ❤️

@Xenova can I ask how I could use this to transcribe text and store it? At the moment when I speak it recognizes everything perfectly, but it seems to only store a limited set of sentences before over-writing? Is there a way to transcribe and store locally at all?

Can you share a model to HF that is both onnx compiled and supports word-level timestamps?

hello xenova , i download your yolos onnx weight and inferences but the result very bad or my script model decode bad , can u give me your inferences yoloss script

Any chance of some pointers for how to use the model in plain JavaScript? I had a look at the react/ts and wasn't sure where to begin (I'm not the strongest js coder so I may be missing something obvious!)