whisper-large-v3-turbo-german / README.md

Update README.md

63f0798 verified about 2 months ago

5.21 kB

	---
	license: apache-2.0
	language:
	- de
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	model-index:
	- name: whisper-large-v3-turbo-german by Florian Zimmermeister @primeLine
	results:
	- task:
	type: automatic-speech-recognition
	name: Speech Recognition
	dataset:
	name: German ASR Data-Mix
	type: flozi00/asr-german-mixed
	metrics:
	- type: wer
	value: 4.77 %
	name: Test WER
	datasets:
	- flozi00/asr-german-mixed
	- flozi00/asr-german-mixed-evals
	base_model:
	- primeline/whisper-large-v3-german
	---

	### Summary
	This model map provides information about a model based on Whisper Large v3 that has been fine-tuned for speech recognition in German. Whisper is a powerful speech recognition platform developed by OpenAI. This model has been specially optimized for processing and recognizing German speech.



	### Applications
	This model can be used in various application areas, including

	- Transcription of spoken German language
	- Voice commands and voice control
	- Automatic subtitling for German videos
	- Voice-based search queries in German
	- Dictation functions in word processing programs


	## Model family

	\| Model \| Parameters \| link \|
	\|----------------------------------\|------------\|--------------------------------------------------------------\|
	\| Whisper large v3 german \| 1.54B \| [link](https://huggingface.co/primeline/whisper-large-v3-german) \|
	\| Whisper large v3 turbo german \| 809M \| [link](https://huggingface.co/primeline/whisper-large-v3-turbo-german)
	\| Distil-whisper large v3 german \| 756M \| [link](https://huggingface.co/primeline/distil-whisper-large-v3-german) \|
	\| tiny whisper \| 37.8M \| [link](https://huggingface.co/primeline/whisper-tiny-german) \|


	## Evaluations - Word error rate

	\| Dataset \| openai-whisper-large-v3-turbo \| openai-whisper-large-v3 \| primeline-whisper-large-v3-german \| nyrahealth-CrisperWhisper (large version) \| primeline-whisper-large-v3-turbo-german \|
	\|-------------------------------------\|-------------------------------\|-------------------------\|-----------------------------------\|---------------------------\|-----------------------------------------\|
	\| common_voice_19_0 \| 3.929 \| 3.559 \| 3.215 \| 1.925 \| 3.202 \|
	\| multilingual librispeech \| 3.205 \| 2.833 \| 2.128 \| 2.847 \| 2.073 \|
	\| Tuda-De \| 8.331 \| 7.951 \| 8.285 \| 5.447 \| 6.577 \|
	\| All \| 3.676 \| 3.305 \| 2.761 \| 2.697 \| 2.637 \|

	The data and code for evaluations are available [here](https://huggingface.co/datasets/flozi00/asr-german-mixed-evals)

	### Training data
	The training data for this model includes a large amount of spoken German from various sources. The data was carefully selected and processed to optimize recognition performance.


	### Training process
	The training of the model was performed with the following hyperparameters

	- Batch size: 12288
	- Epochs: 3
	- Learning rate: 1e-6
	- Data augmentation: No
	- Optimizer: [Ademamix](https://arxiv.org/abs/2409.03137)


	### How to use

	```python
	import torch
	from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
	from datasets import load_dataset
	device = "cuda:0" if torch.cuda.is_available() else "cpu"
	torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
	model_id = "primeline/whisper-large-v3-turbo-german"
	model = AutoModelForSpeechSeq2Seq.from_pretrained(
	model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
	)
	model.to(device)
	processor = AutoProcessor.from_pretrained(model_id)
	pipe = pipeline(
	"automatic-speech-recognition",
	model=model,
	tokenizer=processor.tokenizer,
	feature_extractor=processor.feature_extractor,
	max_new_tokens=128,
	chunk_length_s=30,
	batch_size=16,
	return_timestamps=True,
	torch_dtype=torch_dtype,
	device=device,
	)
	dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
	sample = dataset[0]["audio"]
	result = pipe(sample)
	print(result["text"])
	```


	## [About us](https://primeline-ai.com/en/)

	[![primeline AI](https://primeline-ai.com/wp-content/uploads/2024/02/pl_ai_bildwortmarke_original.svg)](https://primeline-ai.com/en/)


	Your partner for AI infrastructure in Germany <br>
	Experience the powerful AI infrastructure that drives your ambitions in Deep Learning, Machine Learning & High-Performance Computing. Optimized for AI training and inference.



	Model author: [Florian Zimmermeister](https://huggingface.co/flozi00)