Edit model card

Model trained in int8 with LoRA

Usage:

prepare pipeline, providing any custom generate_kwargs supprted by https://huggingface.co/docs/transformers/v4.40.0/en/main_classes/text_generation#transformers.GenerationConfig

asr_model=prepare_pipeline(
        model_dir='.', # wherever you save the model
        generate_kwargs={
                'max_new_tokens':112,
                'num_beams':1,
                'repetition_penalty':1,
                'do_sample':False
                            }
                )

run ASR:

asr_model(audio_path)

run ASR on full directory in audio_dir: If generate_kwargs not specified, will give you (deterministic) greedy decoding with up to 112 tokens generated, no repetition penalty

ASRdirWhisat(
        audio_dir, 
        out_dir = '../whisat_results/',
        model_dir=".",
)

Training information:

  • Training script: tune_hf_whisper.py
  • Training hyperparameters: hparams.yaml
  • Training data manifest: PUBLIC_KIDS_TRAIN_v4_deduped.csv

Note: to recreate this training you will need to acquire the following public datasets:

  • MyST (myst-v0.4.2)
  • CuKids
  • CSLU

and ensure they are stored at paths consistend with those in the data manifest above.

Reference:

@inproceedings{southwell2024,
  title={Automatic speech recognition tuned for child speech in the classroom},
  author={ Southwell, Rosy and  Ward , Wayne and Trinh , Viet Anh and Clevenger, Charis and  Clevenger, Clay and  Watts, Emily and Reitman, Jason and  D’Mello, Sidney and Whitehill, Jacob},
booktitle={{IEEE} International Conference on Acoustics, Speech and Signal Processing
                  {ICASSP} 2024, Seoul, South Korea, April 14-19, 2024},  
                  year={2024},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.