Fairseq Inference Setup and Usage
This repository provides a streamlined setup and guide for performing inference with Fairseq models, tailored for automatic speech recognition.
Table of Contents
Setup Instructions
To set up the environment and install necessary dependencies for Fairseq inference, follow these steps.
1. Create and Activate a Virtual Environment
Choose between Python's venv
or Conda for environment management.
Using venv
:
python3.8 -m venv lm_env # use python3.8 or adjust for your preferred version
source lm_env/bin/activate
Using Conda:
conda create -n fairseq_inference python==3.8.10
conda activate fairseq_inference
2. Install PyTorch and CUDA
Install the appropriate version of PyTorch and CUDA for your setup:
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
If using Python 3.10.15 and CUDA 12.4:
pip install torch+cu124 torchvision+cu124 torchaudio+cu124 -f https://download.pytorch.org/whl/cu124/torch_stable.html
3. Install Additional Packages
pip install wheel soundfile editdistance pyarrow tensorboard tensorboardX
4. Clone the Fairseq Inference Repository
git clone https://github.com/Speech-Lab-IITM/Fairseq-Inference.git
cd Fairseq-Inference/fairseq-0.12.2
pip install --editable ./
python setup.py build develop
Download Required Models
Download the necessary models for your ASR tasks. Place them in the appropriate directory (model_path
).
Running Inference
Once setup is complete and models are downloaded, use the following command to run inference:
python3 infer.py model_path audio_path
This script takes in the model directory and an audio file to generate a transcription.
Getting Transcripts
After running the inference script, you will receive the transcript for the provided audio file in the output.