--- license: mit tags: - code - audio - acceleration - network --- ## Overview This is an implementation of whisper from scratch in C++. This is a proof-of-concept. Further modifications, imporvements are coming. Feedbacks are wellcomed in the corresponding github repository, [precompAId](https://github.com/anycores/precompAId). Binary contains: * exe for testing the app quickly * header and dll for building custom solutions * main.cpp as an example, how to use the header (the exe compiled from this) * weights.xdf (required to load into the graph, no other input required) * audios folder, containing examples to try the application * convert.py for creating the right input for the application from and arbitrary audio file Versions: * for windows there are 4 compiled versions * all versions corresponds to the level of available instruction sets * avx512: requires avx512F, avx512BW, avx512VL and FMA * avx2: requires avx2 and FMA * sse: requires sse4.1 * default: requires no intrinsic related cpu features ## Quick start Example for the usage of whisper.exe (dll should be discoverable by the exe): ``` whisper.exe weights.xgdf audios\voice_example1.pb ``` Example compilation (with clang from the root): ``` clang++ main.cpp win64\whisper.lib -o whisper.exe ``` Example for converting: ``` python convert.py --ipath audios\voice_example_orig1.wav --opath voice_example.pb ``` ## Implementation info Tested on: * windows 11 and ubuntu20 * intel i7 11th gen * clang 16.06 as compiler Current properties: * fp32 * [this tool](https://github.com/archspec/archspec) can help enlisting the available cpu features for selecting the right library version ## Further Notes Improved versions will arrive regularly. Feedbacks are wellcomed. Especially the following: * features to be add (input format, expected output format etc.) * devices (plan to extend for mobiles, IPUs etc.) * models (what other models would be great to accelerate)