Xenova/webgpu-embedding-benchmark · Feature: Add batch sizes 64 & 128

I am currently exploring the optimal batch size for fast inferencing for embeddings. I played with different batch sizes and found that often 64 or even 128 gives even better results than 32. Above 200 it decreases again.
If you'd add larger batch sizes, the results become even better for WebGPU. But it probably highly depends on the chunk size as well.

Please refer to this GitHub comment: https://github.com/do-me/SemanticFinder/issues/11#issuecomment-2343983733
I wrote a simplistic app to test the batch sizes on real-world data: https://geo.rocks/semanticfinder-webgpu/