File size: 3,291 Bytes
e3cbaf9
 
 
 
 
96e64e9
 
 
 
e713156
96e64e9
a65f39e
f34e6bf
 
 
 
 
 
 
 
 
 
 
6b8efdb
f34e6bf
0fed967
b649af4
f34e6bf
 
 
 
 
96e64e9
 
 
e713156
96e64e9
e713156
 
 
 
96e64e9
 
 
 
e713156
 
 
 
96e64e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e713156
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: mit
pipeline_tag: audio-to-audio
library_name: transformers
---
# VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration

VoiceRestore is a cutting-edge speech restoration model designed to significantly enhance the quality of degraded voice recordings. Leveraging flow-matching transformers, this model excels at addressing a wide range of audio imperfections commonly found in speech, including background noise, reverberation, distortion, and signal loss.

It is based on this [repo](https://github.com/skirdey/voicerestore) & demo of audio restorations: [VoiceRestore](https://sparkling-rabanadas-3082be.netlify.app/)

## Usage - using Transformers 🤗
``` bash
!git lfs install
!git clone https://huggingface.co/jadechoghari/VoiceRestore
%cd VoiceRestore
!pip install -r requirements.txt
```

``` python
from transformers import AutoModel
# path to the model folder (on colab it's as follows)
checkpoint_path = "/content/VoiceRestore"
model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True)
model("test_input.wav", "test_output.wav")
#add short=False if audio is > 10 seconds
model("long.mp3", "long_output.mp3", short=False)
```




## Example
### Degraded Input: 

### Degraded Input Audio

<audio controls>
  <source src="https://huggingface.co/jadechoghari/VoiceRestore/resolve/main/test_input.wav" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>

---
### Restored (steps=32, cfg=1.0):

<audio controls>
  <source src="https://huggingface.co/jadechoghari/VoiceRestore/resolve/main/test_output.wav" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>

Restored audio - 16 steps, strength 0.5:

---
## Key Features

- **Universal Restoration**: The model can handle any level and type of voice recording degradation. Pure magic.  
- **Easy to Use**: Simple interface for processing degraded audio files.
- **Pretrained Model**: Includes a 301 million parameter transformer model with pre-trained weights. (Model is still in the process of training, there will be further checkpoint updates)

---


## Model Details

- **Architecture**: Flow-matching transformer
- **Parameters**: 300M+ parameters
- **Input**: Degraded speech audio (various formats supported)
- **Output**: Restored speech

## Limitations and Future Work

- Current model is optimized for speech; may not perform optimally on music or other audio types.
- Ongoing research to improve performance on extreme degradations.
- Future updates may include real-time processing capabilities.

## Citation

If you use VoiceRestore in your research, please cite our paper:

```
@article{kirdey2024voicerestore,
  title={VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration},
  author={Kirdey, Stanislav},
  journal={arXiv},
  year={2024}
}
```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Based on the [E2-TTS implementation by Lucidrains](https://github.com/lucidrains/e2-tts-pytorch)
- Special thanks to the open-source community for their invaluable contributions.
- Credits: This repository is based on the [E2-TTS implementation by Lucidrains](https://github.com/lucidrains/e2-tts-pytorch)