metadata
language:
- en
library_name: transformers
pipeline_tag: automatic-speech-recognition
Model trained in int8 with LoRA
Usage:
prepare pipeline, setting to default generate_opts will give you (deterministic) greedy decoding with up to 112 tokens generated, no repetition penalty:
asr_model=prepare_pipeline(
model_dir='.', # wherever you save the model
generate_opts={'max_new_tokens':112,
'num_beams':1,
'repetition_penalty':1,
'do_sample':False
}
)
run ASR:
asr_model(audio_path)
See also:
https://github.com/rosyvs/isatasr
Model is on Github at https://github.com/rosyvs/isatasr/tree/main/models/whisat-1.2
Training script: https://github.com/rosyvs/isatasr/blob/main/train/whisat/tune_hf_whisper.py
Training hyperparameters: https://github.com/rosyvs/isatasr/blob/main/train/whisat/hparams/redo_for_ICASSP/publicKS_ig_hf_LoRA_int8_largev2.yaml