@Xenova on Hugging Face: "Introducing Whisper WebGPU: Blazingly-fast ML-powered speech recognition…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Xenova

posted an update Jun 9

Post

10181

Introducing Whisper WebGPU: Blazingly-fast ML-powered speech recognition directly in your browser! 🚀 It supports multilingual transcription and translation across 100 languages! 🤯

The model runs locally, meaning no data leaves your device! 😍

Check it out! 👇
- Demo: Xenova/whisper-webgpu
- Source code: https://github.com/xenova/whisper-web/tree/experimental-webgpu

Best-codes

Jun 9

I love this. I think it would be very cool if we could get a WebGPU model running that could differentiate between different speakers in an audio sample (e.g., “Person 1, Person 2”).

Keep up the good work!

nmstoker

Jun 9

Amazing progress, thank you!

For anyone trying this on Chrome on Linux, you may need to set some flags: a comment appears in the Console when the model loading gets stuck suggesting a startup flag, but you can just switch those with chrome://flags directly and relaunch - for me, I needed to enable both Vulkan and Unsafe WebGPU, and then it works (seriously fast I should note!!)

davidmoore-io

Jun 10

Oh dear God, I love this. ❤️

ntp777

Jun 11

@Xenova can I ask how I could use this to transcribe text and store it? At the moment when I speak it recognizes everything perfectly, but it seems to only store a limited set of sentences before over-writing? Is there a way to transcribe and store locally at all?

hammeiam

Jun 14

Can you share a model to HF that is both onnx compiled and supports word-level timestamps?

chaosbasicly

Jun 14

hello xenova , i download your yolos onnx weight and inferences but the result very bad or my script model decode bad , can u give me your inferences yoloss script

nmstoker

Jun 15

Any chance of some pointers for how to use the model in plain JavaScript? I had a look at the react/ts and wasn't sure where to begin (I'm not the strongest js coder so I may be missing something obvious!)

In this post