README.md · rosyvs/whisat at f60eaa16075831395afd91079677759dcefb64f6

metadata

language:
  - en
library_name: transformers
pipeline_tag: automatic-speech-recognition

Model trained in int8 with LoRA

Usage:

prepare pipeline, setting to default generate_opts will give you (deterministic) greedy decoding with up to 112 tokens generated, no repetition penalty:

asr_model=prepare_pipeline(
        model_dir='.', # wherever you save the model
        generate_opts={'max_new_tokens':112,
                'num_beams':1,
                'repetition_penalty':1,
                'do_sample':False
                            }
                )

run ASR:

asr_model(audio_path)

See also: https://github.com/rosyvs/isatasr
Model is on Github at https://github.com/rosyvs/isatasr/tree/main/models/whisat-1.2
Training script: https://github.com/rosyvs/isatasr/blob/main/train/whisat/tune_hf_whisper.py
Training hyperparameters: https://github.com/rosyvs/isatasr/blob/main/train/whisat/hparams/redo_for_ICASSP/publicKS_ig_hf_LoRA_int8_largev2.yaml