Update README.md
Browse files
README.md
CHANGED
@@ -33,10 +33,10 @@ Ground truth text with prosody encoding and ASR encoding residual cross attentio
|
|
33 |
|
34 |
## Model description
|
35 |
|
36 |
-
ASR encoder: [Whisper small](https://huggingface.co/openai/whisper-small) encoder
|
37 |
-
Prosody encoder: 2 layer transformer encoder with initial dense projection
|
38 |
Backbone: [DistilBert uncased](https://huggingface.co/distilbert/distilbert-base-uncased)
|
39 |
-
Fusion: 2 residual cross attention fusion layers (F_asr x F_text and F_prosody x F_text) with dense layer on top
|
40 |
Pooling: Self attention
|
41 |
Multi-label classification head: 2 dense layers with two dropouts 0.3 and Tanh activation inbetween
|
42 |
|
|
|
33 |
|
34 |
## Model description
|
35 |
|
36 |
+
ASR encoder: [Whisper small](https://huggingface.co/openai/whisper-small) encoder
|
37 |
+
Prosody encoder: 2 layer transformer encoder with initial dense projection
|
38 |
Backbone: [DistilBert uncased](https://huggingface.co/distilbert/distilbert-base-uncased)
|
39 |
+
Fusion: 2 residual cross attention fusion layers (F_asr x F_text and F_prosody x F_text) with dense layer on top
|
40 |
Pooling: Self attention
|
41 |
Multi-label classification head: 2 dense layers with two dropouts 0.3 and Tanh activation inbetween
|
42 |
|