File size: 5,213 Bytes
146c0f9 288a4c7 146c0f9 288a4c7 145604e a863382 288a4c7 146c0f9 288a4c7 146c0f9 288a4c7 146c0f9 288a4c7 146c0f9 288a4c7 146c0f9 288a4c7 4b259c6 288a4c7 146c0f9 e6aa8a5 e3c26be 63f0798 e3c26be 066a524 e3c26be 288a4c7 146c0f9 288a4c7 146c0f9 288a4c7 145604e 146c0f9 288a4c7 146c0f9 288a4c7 146c0f9 288a4c7 146c0f9 288a4c7 146c0f9 288a4c7 146c0f9 288a4c7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
---
license: apache-2.0
language:
- de
library_name: transformers
pipeline_tag: automatic-speech-recognition
model-index:
- name: whisper-large-v3-turbo-german by Florian Zimmermeister @primeLine
results:
- task:
type: automatic-speech-recognition
name: Speech Recognition
dataset:
name: German ASR Data-Mix
type: flozi00/asr-german-mixed
metrics:
- type: wer
value: 4.77 %
name: Test WER
datasets:
- flozi00/asr-german-mixed
- flozi00/asr-german-mixed-evals
base_model:
- primeline/whisper-large-v3-german
---
### Summary
This model map provides information about a model based on Whisper Large v3 that has been fine-tuned for speech recognition in German. Whisper is a powerful speech recognition platform developed by OpenAI. This model has been specially optimized for processing and recognizing German speech.
### Applications
This model can be used in various application areas, including
- Transcription of spoken German language
- Voice commands and voice control
- Automatic subtitling for German videos
- Voice-based search queries in German
- Dictation functions in word processing programs
## Model family
| Model | Parameters | link |
|----------------------------------|------------|--------------------------------------------------------------|
| Whisper large v3 german | 1.54B | [link](https://huggingface.co/primeline/whisper-large-v3-german) |
| Whisper large v3 turbo german | 809M | [link](https://huggingface.co/primeline/whisper-large-v3-turbo-german)
| Distil-whisper large v3 german | 756M | [link](https://huggingface.co/primeline/distil-whisper-large-v3-german) |
| tiny whisper | 37.8M | [link](https://huggingface.co/primeline/whisper-tiny-german) |
## Evaluations - Word error rate
| Dataset | openai-whisper-large-v3-turbo | openai-whisper-large-v3 | primeline-whisper-large-v3-german | nyrahealth-CrisperWhisper (large version) | primeline-whisper-large-v3-turbo-german |
|-------------------------------------|-------------------------------|-------------------------|-----------------------------------|---------------------------|-----------------------------------------|
| common_voice_19_0 | 3.929 | 3.559 | 3.215 | **1.925** | 3.202 |
| multilingual librispeech | 3.205 | 2.833 | 2.128 | 2.847 | **2.073** |
| Tuda-De | 8.331 | 7.951 | 8.285 | **5.447** | 6.577 |
| All | 3.676 | 3.305 | 2.761 | 2.697 | **2.637** |
The data and code for evaluations are available [here](https://huggingface.co/datasets/flozi00/asr-german-mixed-evals)
### Training data
The training data for this model includes a large amount of spoken German from various sources. The data was carefully selected and processed to optimize recognition performance.
### Training process
The training of the model was performed with the following hyperparameters
- Batch size: 12288
- Epochs: 3
- Learning rate: 1e-6
- Data augmentation: No
- Optimizer: [Ademamix](https://arxiv.org/abs/2409.03137)
### How to use
```python
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "primeline/whisper-large-v3-turbo-german"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=128,
chunk_length_s=30,
batch_size=16,
return_timestamps=True,
torch_dtype=torch_dtype,
device=device,
)
dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]
result = pipe(sample)
print(result["text"])
```
## [About us](https://primeline-ai.com/en/)
[![primeline AI](https://primeline-ai.com/wp-content/uploads/2024/02/pl_ai_bildwortmarke_original.svg)](https://primeline-ai.com/en/)
Your partner for AI infrastructure in Germany <br>
Experience the powerful AI infrastructure that drives your ambitions in Deep Learning, Machine Learning & High-Performance Computing. Optimized for AI training and inference.
Model author: [Florian Zimmermeister](https://huggingface.co/flozi00) |