|
--- |
|
license: mit |
|
tags: |
|
- code |
|
- audio |
|
- acceleration |
|
- network |
|
--- |
|
## Overview |
|
|
|
This is an implementation of whisper from scratch in C++. |
|
This is a proof-of-concept. Further modifications, imporvements are coming. |
|
|
|
Feedbacks are wellcomed in the corresponding github repository, [precompAId](https://github.com/anycores/precompAId). |
|
|
|
Binary contains: |
|
* exe for testing the app quickly |
|
* header and dll for building custom solutions |
|
* main.cpp as an example, how to use the header (the exe compiled from this) |
|
* weights.xdf (required to load into the graph, no other input required) |
|
* audios folder, containing examples to try the application |
|
* convert.py for creating the right input for the application from and arbitrary audio file |
|
|
|
Versions: |
|
* for windows there are 4 compiled versions |
|
* all versions corresponds to the level of available instruction sets |
|
* avx512: requires avx512F, avx512BW, avx512VL and FMA |
|
* avx2: requires avx2 and FMA |
|
* sse: requires sse4.1 |
|
* default: requires no intrinsic related cpu features |
|
|
|
|
|
## Quick start |
|
|
|
Example for the usage of whisper.exe (dll should be discoverable by the exe): |
|
``` |
|
whisper.exe weights.xgdf audios\voice_example1.pb |
|
``` |
|
|
|
Example compilation (with clang from the root): |
|
``` |
|
clang++ main.cpp win64\whisper.lib -o whisper.exe |
|
``` |
|
|
|
Example for converting: |
|
``` |
|
python convert.py --ipath audios\voice_example_orig1.wav --opath voice_example.pb |
|
``` |
|
|
|
## Implementation info |
|
|
|
Tested on: |
|
* windows 11 and ubuntu20 |
|
* intel i7 11th gen |
|
* clang 16.06 as compiler |
|
|
|
Current properties: |
|
* fp32 |
|
* [this tool](https://github.com/archspec/archspec) can help enlisting the available cpu features for selecting the right library version |
|
|
|
## Further Notes |
|
|
|
Improved versions will arrive regularly. |
|
Feedbacks are wellcomed. Especially the following: |
|
* features to be add (input format, expected output format etc.) |
|
* devices (plan to extend for mobiles, IPUs etc.) |
|
* models (what other models would be great to accelerate) |