Edit model card

multilingual_speech_to_intent_wav2vec

This model is a fine-tuned version of facebook/wav2vec2-base on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5542
  • Accuracy: 0.7430
  • Precision: 0.8060
  • Recall: 0.7430
  • F1: 0.7456

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1
2.3588 1.0 219 1.4144 0.5916 0.6385 0.5916 0.5322
0.8825 2.0 438 0.7289 0.8195 0.8635 0.8195 0.8243
0.7836 3.0 657 0.6739 0.8514 0.8648 0.8514 0.8513
0.7345 4.0 876 0.4483 0.9080 0.9189 0.9080 0.9071
0.7204 5.0 1095 0.5039 0.8882 0.9059 0.8882 0.8915
0.5355 6.0 1314 0.5051 0.8967 0.9049 0.8967 0.8971
0.5939 7.0 1533 0.3162 0.9314 0.9387 0.9314 0.9322
0.5311 8.0 1752 0.3218 0.9292 0.9318 0.9292 0.9292
0.5098 9.0 1971 0.5819 0.8804 0.8858 0.8804 0.8809
0.508 10.0 2190 0.5930 0.8804 0.8843 0.8804 0.8792
0.4672 11.0 2409 0.3127 0.9229 0.9251 0.9229 0.9222
0.4619 12.0 2628 0.3761 0.9193 0.9227 0.9193 0.9193
0.4668 13.0 2847 0.6386 0.8740 0.8800 0.8740 0.8726
0.444 14.0 3066 0.4134 0.9073 0.9133 0.9073 0.9079
0.4059 15.0 3285 0.3106 0.9349 0.9370 0.9349 0.9347
0.3857 16.0 3504 0.3639 0.9222 0.9296 0.9222 0.9217
0.432 17.0 3723 0.5168 0.8896 0.8977 0.8896 0.8885
0.3909 18.0 3942 1.0967 0.8004 0.8269 0.8004 0.8022
0.4341 19.0 4161 0.7655 0.8556 0.8624 0.8556 0.8554
0.3673 20.0 4380 0.2394 0.9505 0.9525 0.9505 0.9505
0.3784 21.0 4599 0.4200 0.9207 0.9228 0.9207 0.9202
0.4064 22.0 4818 0.5932 0.8818 0.8876 0.8818 0.8820
0.3825 23.0 5037 0.9998 0.8493 0.8616 0.8493 0.8484
0.3485 24.0 5256 1.1882 0.7877 0.8071 0.7877 0.7888
0.3242 25.0 5475 0.5562 0.9073 0.9118 0.9073 0.9076
0.3526 26.0 5694 0.6743 0.8832 0.8927 0.8832 0.8825
0.3573 27.0 5913 0.3483 0.9271 0.9313 0.9271 0.9272
0.3381 28.0 6132 1.1346 0.8018 0.8152 0.8018 0.8017
0.3243 29.0 6351 0.9003 0.8316 0.8439 0.8316 0.8315
0.3045 30.0 6570 0.9181 0.8493 0.8570 0.8493 0.8482

Framework versions

  • Transformers 4.43.3
  • Pytorch 2.1.0+cu118
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
94.6M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for KasuleTrevor/multilingual_speech_to_intent_wav2vec

Finetuned
(649)
this model