whisper-small-hi-cv
This model is a fine-tuned version of openai/whisper-small on the Common Voice 15 dataset. It achieves the following results on the evaluation set:
- Wer: 14.0178
- Cer: 05.8824
Evaluation
from datasets import load_dataset,load_metric,Audio
from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch
import torchaudio
test_dataset = load_dataset("mozilla-foundation/common_voice_13_0", "hi", split="test")
wer = load_metric("wer")
cer = load_metric("cer")
processor = WhisperProcessor.from_pretrained("SakshiRathi77/whisper-hindi-kagglex")
model = WhisperForConditionalGeneration.from_pretrained("SakshiRathi77/whisper-hindi-kagglex").to("cuda")
test_dataset = test_dataset.cast_column("audio", Audio(sampling_rate=16000))
def map_to_pred(batch):
audio = batch["audio"]
input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
batch["reference"] = processor.tokenizer._normalize(batch['sentence'])
with torch.no_grad():
predicted_ids = model.generate(input_features.to("cuda"))[0]
transcription = processor.decode(predicted_ids)
batch["prediction"] = processor.tokenizer._normalize(transcription)
return batch
result = test_dataset.map(map_to_pred)
print("WER: {:2f}".format(100 * wer.compute(predictions=result["prediction"], references=result["reference"])))
print("CER: {:2f}".format(100 * cer.compute(predictions=result["prediction"], references=result["reference"])))
WER: 23.1361
CER: 10.4366
- Downloads last month
- 6
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for SakshiRathi77/whisper-hindi-kagglex
Base model
openai/whisper-smallDatasets used to train SakshiRathi77/whisper-hindi-kagglex
Evaluation results
- Test WER on Common Voice 15self-reported13.991
- Test CER on Common Voice 15self-reported5.884
- Test WER on Common Voice 13self-reported23.136
- Test CER on Common Voice 13self-reported10.437