File size: 1,839 Bytes
3c0cc82 f60eaa1 a5466d4 e404b97 f60eaa1 d0dded1 e404b97 d0dded1 4db98ee d0dded1 f60eaa1 e404b97 91d1a17 e404b97 91d1a17 e404b97 6a9787b e404b97 6a9787b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
---
language:
- en
library_name: transformers
pipeline_tag: automatic-speech-recognition
---
Model trained in int8 with LoRA
Usage:
prepare pipeline, providing any custom generate_kwargs supprted by https://huggingface.co/docs/transformers/v4.40.0/en/main_classes/text_generation#transformers.GenerationConfig
```
asr_model=prepare_pipeline(
model_dir='.', # wherever you save the model
generate_kwargs={
'max_new_tokens':112,
'num_beams':1,
'repetition_penalty':1,
'do_sample':False
}
)
```
run ASR:
```
asr_model(audio_path)
```
run ASR on full directory in `audio_dir`:
If generate_kwargs not specified, will give you (deterministic) greedy decoding with up to 112 tokens generated, no repetition penalty
```
ASRdirWhisat(
audio_dir,
out_dir = '../whisat_results/',
model_dir=".",
)
```
Training information:
- Training script: tune_hf_whisper.py
- Training hyperparameters: hparams.yaml
- Training data manifest: PUBLIC_KIDS_TRAIN_v4_deduped.csv
Note: to recreate this training you will need to acquire the following public datasets:
- MyST (myst-v0.4.2)
- CuKids
- CSLU
and ensure they are stored at paths consistend with those in the data manifest above.
Reference:
```
@inproceedings{southwell2024,
title={Automatic speech recognition tuned for child speech in the classroom},
author={ Southwell, Rosy and Ward , Wayne and Trinh , Viet Anh and Clevenger, Charis and Clevenger, Clay and Watts, Emily and Reitman, Jason and D’Mello, Sidney and Whitehill, Jacob},
booktitle={{IEEE} International Conference on Acoustics, Speech and Signal Processing
{ICASSP} 2024, Seoul, South Korea, April 14-19, 2024},
year={2024},
}
``` |