Update README.md

01fd723 verified 4 months ago

6.35 kB

	---
	language:
	- ar
	license: apache-2.0
	base_model: openai/whisper-small
	tags:
	- whisper-event
	- generated_from_trainer
	datasets:
	- mozilla-foundation/common_voice_11_0
	metrics:
	- wer
	model-index:
	- name: Whisper Small ar - Mohammed Bakheet
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Common Voice 11.0
	type: mozilla-foundation/common_voice_11_0
	config: ar
	split: test
	args: ar
	metrics:
	- name: Wer
	type: wer
	value: 20.32288342406608
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Whisper Small ar - Mohammed Bakheet

	نموذج كلام الصغير للتعرف على الصوت، هذا النموذج يتميز بدقة عالية في التعرف على الصوت باللغة العربية

	This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Common Voice 11.0 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2758
	- Wer: 20.3229

	## Model description

	This model is a fine-tuned version of openai/whisper-small on the Common Voice 11.0 dataset. It achieves 20.32 WER. Data augmentation can be implemented to further improve the model performance.

	## Intended uses & limitations

	```python
	from datasets import load_dataset
	from transformers import WhisperProcessor, WhisperForConditionalGeneration
	from datasets import Audio

	# load the dataset
	test_dataset = load_dataset("mozilla-foundation/common_voice_11_0", "ar", split="test", use_auth_token=True, trust_remote_code=True)

	# get the processor and model from mohammed/whisper-small-arabic-cv-11
	processor = WhisperProcessor.from_pretrained("mohammed/whisper-small-arabic-cv-11")
	model = WhisperForConditionalGeneration.from_pretrained("mohammed/whisper-small-arabic-cv-11")
	model.config.forced_decoder_ids = None

	# resample the audio files to 16000
	test_dataset = test_dataset.cast_column("audio", Audio(sampling_rate=16000))

	# get 10 exmaples of model transcription
	for i in range(10):
	sample = test_dataset[i]["audio"]
	input_features = processor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt").input_features
	predicted_ids = model.generate(input_features)
	transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
	transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
	print(f"{i} Reference Sentence: {test_dataset[i]['sentence']}")
	print(f"{i} Predicted Sentence: {transcription[0]}")
	```

	The output is:

	```
	0 Reference Sentence: زارني في أوائل الشهر بدري
	0 Predicted Sentence: زارني في أوائل الشهر بدري
	1 Reference Sentence: إبنك بطل.
	1 Predicted Sentence: ابنك بطل
	2 Reference Sentence: الواعظ الأمرد هذا الذي
	2 Predicted Sentence: الوعز الأمرد هذا الذي
	3 Reference Sentence: سمح له هذا بالتخصص في البرونز الصغير، الذي يتم إنتاجه بشكل رئيسي ومربح للتصدير.
	3 Predicted Sentence: صمح له هازب التخزوس في البرونز الصغير الذي زيت معنى به بشكل رئيسي من غربح للتصدير
	4 Reference Sentence: ألديك قلم ؟
	4 Predicted Sentence: ألديك قلم
	5 Reference Sentence: يا نديمي قسم بي الى الصهباء
	5 Predicted Sentence: يا نديمي قد سنبي إلى الصحباء
	6 Reference Sentence: إنك تكبر المشكلة.
	6 Predicted Sentence: إنك تكبر المشكلة
	7 Reference Sentence: يرغب أن يلتقي بك.
	7 Predicted Sentence: يرغب أن يلتقي بك
	8 Reference Sentence: إنهم لا يعرفون لماذا حتى.
	8 Predicted Sentence: إنهم لا يعرفون لماذا حبت
	9 Reference Sentence: سيسعدني مساعدتك أي وقت تحب.
	9 Predicted Sentence: سيسعد لمساعدتك أي وقت تحب
	```

	## Training and evaluation data

	This model is trained on the Common Voice 11.0 dataset.

	## Training procedure

	The model is trained on 64 cores CPU, Nvidia 4070 Ti with 24 GB VRAM, and 100GB Disk space. The GPU utilization reached 100%. Please check the training hyperparameters below.

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 2
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 500
	- training_steps: 5000
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:-------:\|
	\| 0.721 \| 0.2079 \| 250 \| 0.3651 \| 29.8761 \|
	\| 0.3044 \| 0.4158 \| 500 \| 0.3308 \| 27.6497 \|
	\| 0.262 \| 0.6237 \| 750 \| 0.3085 \| 25.2769 \|
	\| 0.2396 \| 0.8316 \| 1000 \| 0.2863 \| 24.5298 \|
	\| 0.1998 \| 1.0394 \| 1250 \| 0.2743 \| 23.2776 \|
	\| 0.134 \| 1.2473 \| 1500 \| 0.2749 \| 22.9829 \|
	\| 0.1328 \| 1.4552 \| 1750 \| 0.2662 \| 22.3315 \|
	\| 0.1314 \| 1.6631 \| 2000 \| 0.2643 \| 21.7402 \|
	\| 0.1262 \| 1.8710 \| 2250 \| 0.2598 \| 21.8566 \|
	\| 0.101 \| 2.0789 \| 2500 \| 0.2608 \| 21.4248 \|
	\| 0.0653 \| 2.2868 \| 2750 \| 0.2682 \| 20.9912 \|
	\| 0.062 \| 2.4947 \| 3000 \| 0.2638 \| 21.0137 \|
	\| 0.0627 \| 2.7026 \| 3250 \| 0.2636 \| 20.5369 \|
	\| 0.0603 \| 2.9105 \| 3500 \| 0.2602 \| 20.4580 \|
	\| 0.0456 \| 3.1183 \| 3750 \| 0.2748 \| 20.9555 \|
	\| 0.0324 \| 3.3262 \| 4000 \| 0.2702 \| 20.4918 \|
	\| 0.0318 \| 3.5341 \| 4250 \| 0.2739 \| 20.4355 \|
	\| 0.0296 \| 3.7420 \| 4500 \| 0.2735 \| 20.4374 \|
	\| 0.0291 \| 3.9499 \| 4750 \| 0.2725 \| 20.3717 \|
	\| 0.022 \| 4.1578 \| 5000 \| 0.2758 \| 20.3229 \|


	### Framework versions

	- Transformers 4.42.4
	- Pytorch 2.3.1+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1