jlondonobo
/

whisper-large-v2-pt

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

whisper-large-v2-pt / README.md

jlondonobo's picture

Update README.md

a4e3282 almost 2 years ago

|

3.15 kB

	---
	language:
	- pt
	license: apache-2.0
	tags:
	- whisper-event
	- generated_from_trainer
	datasets:
	- mozilla-foundation/common_voice_11_0
	metrics:
	- wer
	model-index:
	- name: Whisper Large v2 Portuguese
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: mozilla-foundation/common_voice_11_0 pt
	type: mozilla-foundation/common_voice_11_0
	config: pt
	split: test
	args: pt
	metrics:
	- name: Wer
	type: wer
	value: 5.590020342630419
	---

	# Whisper Large V2 Portuguese 🇧🇷🇵🇹

	Bem-vindo ao whisper large-v2 para transcrição em português 👋🏻

	Transcribe Portuguese audio to text with the highest precision.

	- Loss: 0.282
	- Wer: 5.590

	This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on the [mozilla-foundation/common_voice_11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) dataset. If you want a lighter model, you may be interested in [jlondonobo/whisper-medium-pt](https://huggingface.co/jlondonobo/whisper-medium-pt). It achieves faster inference with almost no difference in WER.

	### Comparable models
	Reported WER is based on the evaluation subset of Common Voice.
	\| Model \| WER \| # Parameters \|
	\|--------------------------------------------------\|:--------:\|:------------:\|
	\| [jlondonobo/whisper-large-v2-pt](https://huggingface.co/jlondonobo/whisper-large-v2-pt) \| 5.590 🤗 \| 1550M \|
	\| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) \| 6.300 \| 1550M \|
	\| [jlondonobo/whisper-medium-pt](https://huggingface.co/jlondonobo/whisper-medium-pt) \| 6.579 \| 769M \|
	\| [jonatasgrosman/wav2vec2-large-xlsr-53-portuguese](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-portuguese) \| 11.310 \| 317M \|
	\| [Edresson/wav2vec2-large-xlsr-coraa-portuguese](https://huggingface.co/Edresson/wav2vec2-large-xlsr-coraa-portuguese) \| 20.080 \| 317M \|


	### Training hyperparameters
	We used the following hyperparameters for training:
	- `learning_rate`: 1e-05
	- `train_batch_size`: 16
	- `eval_batch_size`: 8
	- `seed`: 42
	- `gradient_accumulation_steps`: 2
	- `total_train_batch_size`: 32
	- `optimizer`: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- `lr_scheduler_type`: linear
	- `lr_scheduler_warmup_steps`: 500
	- `training_steps`: 5000
	- `mixed_precision_training`: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|
	\| 0.0828 \| 1.09 \| 1000 \| 0.1868 \| 6.778 \|
	\| 0.0241 \| 3.07 \| 2000 \| 0.2057 \| 6.109 \|
	\| 0.0084 \| 5.06 \| 3000 \| 0.2367 \| 6.029 \|
	\| 0.0015 \| 7.04 \| 4000 \| 0.2469 \| 5.709 \|
	\| 0.0009 \| 9.02 \| 5000 \| 0.2821 \| 5.590 🤗\|


	### Framework versions

	- Transformers 4.26.0.dev0
	- Pytorch 1.13.0+cu117
	- Datasets 2.7.1.dev0
	- Tokenizers 0.13.2