File size: 3,863 Bytes
fdcbbf9 88649ad fdcbbf9 88649ad fdcbbf9 88649ad fdcbbf9 88649ad fdcbbf9 88649ad fdcbbf9 88649ad fdcbbf9 88649ad fdcbbf9 88649ad fdcbbf9 88649ad fdcbbf9 88649ad fdcbbf9 88649ad |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
---
language:
- en
datasets:
- mozilla-foundation/common_voice_13_0
- facebook/voxpopuli
- LIUM/tedlium
- librispeech_asr
- fisher_corpus
- WSJ-0
metrics:
- wer
pipeline_tag: automatic-speech-recognition
model-index:
- name: tbd
results:
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: LibriSpeech (clean)
type: librispeech_asr
config: clean
split: test
args:
language: en
metrics:
- type: wer
value: 3.5
name: Test WER
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: LibriSpeech (other)
type: librispeech_asr
config: other
split: test
args:
language: en
metrics:
- type: wer
value: 8.1
name: Test WER
- task:
type: Automatic Speech Recognition
name: automatic-speech-recognition
dataset:
name: tedlium-v3
type: LIUM/tedlium
config: release1
split: test
args:
language: en
metrics:
- type: wer
value: 5.4
name: Test WER
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: Vox Populi
type: facebook/voxpopuli
config: en
split: test
args:
language: en
metrics:
- type: wer
value: 8.3
name: Test WER
- task:
type: Automatic Speech Recognition
name: automatic-speech-recognition
dataset:
name: Mozilla Common Voice 13.0
type: mozilla-foundation/common_voice_13_0
config: en
split: test
args:
language: en
metrics:
- type: wer
value: 16.3
name: Test WER
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: FLEURS
type: google/fleurs
split: test
args:
language: en_us
metrics:
- type: wer
value: 9.6
name: Test WER
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: Switchboard
type: unk
split: eval2000
args:
language: en
metrics:
- type: wer
value: 9.2
name: Test WER
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: Wall Street Journal
type: unk
split: eval92
args:
language: en
metrics:
- type: wer
value: 2.6
name: Test WER
---
# DeCRED-base
This is a **40M encoder-decoder Ebranchformer model** trained with an decoder-centric regularization technique on 6,000 hours of open-source normalised English data.
Architecture details, training hyperparameters, and a description of the proposed technique will be added soon.
*Disclaimer: The model currently hallucinates on segments containing silence only, as it was previously not trained on such data. The fix will be added soon.*
The model can be used with the [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
class to transcribe audio files of arbitrary length.
```python
from transformers import pipeline
model_id = "BUT-FIT/DeCRED-small"
pipe = pipeline("automatic-speech-recognition", model=model_id, feature_extractor=model_id, trust_remote_code=True)
# In newer versions of transformers (>4.31.0), there is a bug in the pipeline inference type.
# The warning can be ignored.
pipe.type = "seq2seq"
# Run beam search decoding with joint CTC-attention scorer
result_beam = pipe("audio.wav")
# Run greedy decoding without joint CTC-attention scorer
pipe.model.generation_config.ctc_weight = 0.0
pipe.model.generation_config.num_beams = 1
result_greedy = pipe("audio.wav")
``` |