File size: 3,207 Bytes
e3cbaf9 96e64e9 f34e6bf 96e64e9 e3cbaf9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
---
license: mit
pipeline_tag: audio-to-audio
library_name: transformers
---
# VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration
VoiceRestore is a cutting-edge speech restoration model designed to significantly enhance the quality of degraded voice recordings. Leveraging flow-matching transformers, this model excels at addressing a wide range of audio imperfections commonly found in speech, including background noise, reverberation, distortion, and signal loss.
Demo of audio restorations: [VoiceRestore](https://sparkling-rabanadas-3082be.netlify.app/)
Credits: This repository is based on the [E2-TTS implementation by Lucidrains](https://github.com/lucidrains/e2-tts-pytorch)
## Usage
``` bash
!git lfs install
!git clone https://huggingface.co/jadechoghari/VoiceRestore
%cd VoiceRestore
!pip install -r requirements.txt
```
``` python
from transformers import AutoModel
# path to the model folder (on colab it's as follows)
checkpoint_path = "/content/VoiceRestore"
model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True)
model("test_input.wav", "test_output.wav")
```
## Example
### Degraded Input:
![Degraded Input](./imgs/degraded.png "Degraded Input")
Degraded audio (reverberation, distortion, noise, random cut):
**Note**: Adjust your volume before playing the degraded audio sample, as it may contain distortions.
https://github.com/user-attachments/assets/0c030274-60b5-41a4-abe6-59a3f1bc934b
---
### Restored (steps=32, cfg=1.0):
![Restored](./imgs/restored.png "Restored")
Restored audio - 16 steps, strength 0.5:
https://github.com/user-attachments/assets/fdbbb988-9bd2-4750-bddd-32bd5153d254
---
### Ground Truth:
![Ground Truth](./imgs/ground_truth.png "Ground Truth")
---
## Key Features
- **Universal Restoration**: The model can handle any level and type of voice recording degradation. Pure magic.
- **Easy to Use**: Simple interface for processing degraded audio files.
- **Pretrained Model**: Includes a 301 million parameter transformer model with pre-trained weights. (Model is still in the process of training, there will be further checkpoint updates)
---
## Model Details
- **Architecture**: Flow-matching transformer
- **Parameters**: 300M+ parameters
- **Input**: Degraded speech audio (various formats supported)
- **Output**: Restored speech
## Limitations and Future Work
- Current model is optimized for speech; may not perform optimally on music or other audio types.
- Ongoing research to improve performance on extreme degradations.
- Future updates may include real-time processing capabilities.
## Citation
If you use VoiceRestore in your research, please cite our paper:
```
@article{kirdey2024voicerestore,
title={VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration},
author={Kirdey, Stanislav},
journal={arXiv},
year={2024}
}
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- Based on the [E2-TTS implementation by Lucidrains](https://github.com/lucidrains/e2-tts-pytorch)
- Special thanks to the open-source community for their invaluable contributions. |