devasheeshG/whisper_medium_fp16_transformers

Versions:

CUDA: 12.1
cuDNN Version: 8.9.2.26_1.0-1_amd64

tensorflow Version: 2.12.0
torch Version: 2.1.0.dev20230606+cu12135
transformers Version: 4.30.2
accelerate Version: 0.20.3

Model Benchmarks:

RAM: 2.8 GB (Original_Model: 5.5GB)
VRAM: 1812 MB (Original_Model: 6GB)

test.wav: 23 s (Multilingual Speech i.e. English+Hindi)

Time in seconds for Processing by each device

Device Name	float32 (Original)	float16	CudaCores	TensorCores
3060	1.7	1.1	3,584	112
1660 Super	OOM	3.3	1,408	N/A
Collab (Tesla T4)	2.8	2.2	2,560	320
Collab (CPU)	35	N/A	N/A	N/A
M1 (CPU)	-	-	-	-
M1 (GPU -> 'mps')	-	-	-	-

NOTE: TensorCores are efficient in mixed-precision calculations
CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab CPU)

Punchuation: True

Model Error Benchmarks:

WER: Word Error Rate
MER: Match Error Rate
WIL: Word Information Lost
WIP: Word Information Preserved
CER: Character Error Rate

Hindi to Hindi (test.tsv) Common Voice 14.0

Test done on RTX 3060 on 2557 Samples

	WER	MER	WIL	WIP	CER
Original_Model (54 min)	52.02	47.86	66.82	33.17	23.76
This_Model (38 min)	54.97	47.86	66.83	33.16	30.23

Hindi to English (test.csv) Custom Dataset

Test done on RTX 3060 on 1000 Samples

	WER	MER	WIL	WIP	CER
Original_Model (30 min)	-	-	-	-	-
This_Model (20 min)	-	-	-	-	-

English (LibriSpeech -> test-clean)

Test done on RTX 3060 on __ Samples

	WER	MER	WIL	WIP	CER
Original_Model	-	-	-	-	-
This_Model	-	-	-	-	-

English (LibriSpeech -> test-other)

Test done on RTX 3060 on __ Samples

	WER	MER	WIL	WIP	CER
Original_Model	-	-	-	-	-
This_Model	-	-	-	-	-

'jiwer' library is used for calculations

Code for conversion:

Will be soon Uploaded on Github

Usage

A file __init__.py is contained inside this repo which contains all the code to use this model.

Firstly, clone this repo and place all the files inside a folder.

Make sure you have git-lfs installed (https://git-lfs.com)

git lfs install
git clone https://huggingface.co/devasheeshG/whisper_medium_fp16_transformers

Please try in jupyter notebook

# Import the Model
from whisper_medium_fp16_transformers import Model, load_audio, pad_or_trim

# Initilise the model
model = Model(
            model_name_or_path='whisper_medium_fp16_transformers',
            cuda_visible_device="0", 
            device='cuda',
      )

# Load Audio
audio = load_audio('whisper_medium_fp16_transformers/test.wav')
audio = pad_or_trim(audio)

# Transcribe (First transcription takes time)
model.transcribe(audio)

Credits

It is fp16 version of openai/whisper-medium

devasheeshG
/

whisper_medium_fp16_transformers