Fairseq Inference Setup and Usage

This repository provides a streamlined setup and guide for performing inference with Fairseq models, tailored for automatic speech recognition.

Setup Instructions
Download Required Models
Running Inference
Getting Transcripts

Setup Instructions

To set up the environment and install necessary dependencies for Fairseq inference, follow these steps.

1. Create and Activate a Virtual Environment

Choose between Python's venv or Conda for environment management.

Using venv:

python3.8 -m venv lm_env  # use python3.8 or adjust for your preferred version
source lm_env/bin/activate

Using Conda:

conda create -n fairseq_inference python==3.8.10
conda activate fairseq_inference

2. Install PyTorch and CUDA

Install the appropriate version of PyTorch and CUDA for your setup:

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

If using Python 3.10.15 and CUDA 12.4:

pip install torch+cu124 torchvision+cu124 torchaudio+cu124 -f https://download.pytorch.org/whl/cu124/torch_stable.html

3. Install Additional Packages

pip install wheel soundfile editdistance pyarrow tensorboard tensorboardX

4. Clone the Fairseq Inference Repository

git clone https://github.com/Speech-Lab-IITM/Fairseq-Inference.git
cd Fairseq-Inference/fairseq-0.12.2
pip install --editable ./
python setup.py build develop

Download Required Models

Download the necessary models for your ASR tasks. Place them in the appropriate directory (model_path).

Running Inference

Once setup is complete and models are downloaded, use the following command to run inference:

python3 infer.py model_path audio_path

This script takes in the model directory and an audio file to generate a transcription.

Getting Transcripts

After running the inference script, you will receive the transcript for the provided audio file in the output.

SPRINGLab
/

data2vec_aqc

You need to agree to share your contact information to access this model