tbkazakova's picture
Update README.md
bd62a46 verified
|
raw
history blame
3.73 kB
metadata
base_model: facebook/w2v-bert-2.0
language: eve
tags:
  - generated_from_trainer
datasets:
  - audiofolder
metrics:
  - wer
  - cer
model-index:
  - name: wav2vec-bert-2.0-even-pakendorf
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: audiofolder
          type: audiofolder
          config: default
          split: train
          args: default
        metrics:
          - name: Wer
            type: wer
            value: 0.5968606805108706

wav2vec-bert-2.0-even-pakendorf-0406-1347

This model is a fine-tuned version of facebook/w2v-bert-2.0 on the audiofolder dataset. It achieves the following results on the evaluation set:

  • Cer: 0.2128
  • Loss: inf
  • Wer: 0.5969

Model description

How to use:

from transformers import AutoModelForCTC, Wav2Vec2BertProcessor

model = AutoModelForCTC.from_pretrained("tbkazakova/wav2vec-bert-2.0-even-pakendorf")
processor = Wav2Vec2BertProcessor.from_pretrained("tbkazakova/wav2vec-bert-2.0-even-pakendorf")

data, sampling_rate = librosa.load('audio.wav')
librosa.resample(data, orig_sr=sampling_rate, target_sr=16000)
logits = model(torch.tensor(processor(data,
                                      sampling_rate=16000).input_features[0]).unsqueeze(0)).logits

pred_ids = torch.argmax(logits, dim=-1)[0]
print(processor.decode(pred_ids))

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Cer Validation Loss Wer
4.5767 0.5051 200 0.4932 inf 0.9973
1.8775 1.0101 400 0.3211 inf 0.8494
1.6006 1.5152 600 0.3017 inf 0.8040
1.4476 2.0202 800 0.2896 inf 0.7534
1.2213 2.5253 1000 0.2610 inf 0.7080
1.1485 3.0303 1200 0.2684 inf 0.6800
0.9554 3.5354 1400 0.2459 inf 0.6732
0.9379 4.0404 1600 0.2275 inf 0.6251
0.7644 4.5455 1800 0.2235 inf 0.6224
0.7891 5.0505 2000 0.2180 inf 0.6053
0.633 5.5556 2200 0.2130 inf 0.5996
0.6197 6.0606 2400 0.2126 inf 0.6032
0.5212 6.5657 2600 0.2196 inf 0.6019
0.4881 7.0707 2800 0.2125 inf 0.5894
0.4 7.5758 3000 0.2066 inf 0.5852
0.4008 8.0808 3200 0.2076 inf 0.5790
0.3304 8.5859 3400 0.2096 inf 0.5884
0.3446 9.0909 3600 0.2124 inf 0.5983
0.3237 9.5960 3800 0.2128 inf 0.5969

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1