File size: 3,160 Bytes
de35d2f 6b939cc 049d0fd de35d2f be85aed de35d2f 049d0fd de35d2f 4159d6d de35d2f 84a7e64 de35d2f 049d0fd de35d2f 458938b de35d2f 995d8d1 de35d2f 995d8d1 de35d2f 995d8d1 de35d2f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
---
title: Automatic speech recognition
sdk: gradio
app_file: src/app.py
python_version: 3.11
sdk_version: 4.44.0
app_port: 7860
tags: [asr, stt, speech-to-text, whisper, pyannote, diarization]
pinned: true
emoji: 👂
---
# Automatic speech recognition
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
![Python 3.10](badges/python3_10.svg)
[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/tools4eu/asr)
![Screenshot](img/screenshot.jpg)
Automatic speech recognition uses [Whisper](https://github.com/openai/whisper) to transcribe audio files and [pyannote-audio](https://github.com/pyannote/pyannote-audio) to add speaker diarization.
It has optimized inference because of batching and Scale-Product-Attention (SDPA) or flash attention (if available).
> :warning: **Always review transcriptions.** Transcriptions are done using AI models which are never 100% accurate.
The repo contains (will contain) code to run the software
- as a command-line tool
- as graphical interface
- as an inference API
## Installation
### Prerequisites
The host machine must have an Nvidia graphics card with CUDA 12.x installed natively, preferably [CUDA 12.1](https://developer.nvidia.com/cuda-12-1-0-download-archive), even when using Docker.
The graphics card should have at least 12GB VRAM for the largest model.
The host machine must have Docker installed.
For a Linux server, follow [these instructions](https://docs.docker.com/engine/install/)
For a desktop (visual UI available), follow [these instructions](https://www.docker.com/products/docker-desktop/)
### Docker (recommended)
The Docker image is prebuilt and maintained at [tools4eu/docker-asr](https://github.com/tools4eu/docker-asr) and available on the Docker Hub as [tools4eu/asr](https://hub.docker.com/repository/docker/tools4eu/asr/general)
Run the Docker image, forward port 7860 (Gradio) and pass your GPU(s) to the container
`docker run -p 7860:7860 --gpus all tools4eu/asr`
Or in detached mode (in background)
`docker run -d -p 7860:7860 --gpus all tools4eu/asr`
You can check whether it is running with
`docker ps`
If you want to follow terminal output of a detached container, you can use
`docker logs -f <first n digits of the container id>`
The first time a transcription is requested, it will download the model.
To avoid this happening each time, make sure you stop and start the same container, instead of using
`docker run ...` again
use `docker start <first n digits of container>`
You can find the list of all containers, also stopped ones by using
`docker ps -a`
To open the app, open your **browser** and go to `localhost:7860`
### Dev Container
Open the project Visual Studio Code and use CTRL + SHIFT + P and type "Rebuild and reopen in container".
After building, open up a terminal and activate the virtual environment
`source /home/jovyan/venv/bin/activate`
Then run the app
`python src/app.py`
## License
GNU General Public License v3.0 or later
See [COPYING](COPYING) to see the full text.
|