asrtbsc_distilbert-freezed-best

Normalized ASR translations based multi-label DAC

Model description

Backbone: DistilBert uncased
Pooling: Self attention
Multi-label classification head: 2 dense layers with two dropouts 0.3 and Tanh activation inbetween

Training and evaluation data

Trained on normalized Whisper small transcripts.
Evaluated on ground truth (GT) and normalized Whisper small transcripts (E2E).

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0004
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 20

Framework versions

Transformers 4.41.1
Pytorch 2.3.0+cu121
Datasets 2.19.2
Tokenizers 0.19.1

Masioki
/

asrtbsc_distilbert-freezed-best

asrtbsc_distilbert-freezed-best

Model description

Training and evaluation data

Training hyperparameters

Framework versions

Dataset used to train Masioki/asrtbsc_distilbert-freezed-best

Evaluation results