gradio
torchaudio
soundfile
librosa
numpy
pandas
transformers
scipy
huggingface_hub
torch
datasets
accelerate>=0.21.0