Commit
โข
e43b797
0
Parent(s):
Duplicate from hubertsiuzdak/snac_32khz
Browse filesCo-authored-by: Hubert Siuzdak <[email protected]>
- .gitattributes +35 -0
- README.md +71 -0
- config.json +13 -0
- pytorch_model.bin +3 -0
.gitattributes
ADDED
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
tags:
|
4 |
+
- audio
|
5 |
+
---
|
6 |
+
|
7 |
+
# SNAC ๐ฟ
|
8 |
+
|
9 |
+
Multi-**S**cale **N**eural **A**udio **C**odec (SNAC) compressess audio into discrete codes at a low bitrate.
|
10 |
+
|
11 |
+
๐ This model was primarily trained on music data, and its recommended use case is music (and SFX) generation. See below for other pretrained models.
|
12 |
+
|
13 |
+
๐ GitHub repository: https://github.com/hubertsiuzdak/snac/
|
14 |
+
|
15 |
+
## Overview
|
16 |
+
|
17 |
+
SNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC. However, SNAC introduces a simple change where coarse tokens are sampled less frequently,
|
18 |
+
covering a broader time span.
|
19 |
+
|
20 |
+
This model compresses 32 kHz audio into discrete codes at a 1.9 kbps bitrate. It uses 4 RVQ levels with token rates of 10, 21, 42, and
|
21 |
+
83 Hz.
|
22 |
+
|
23 |
+
## Pretrained models
|
24 |
+
|
25 |
+
Currently, all models support only single audio channel (mono).
|
26 |
+
|
27 |
+
| Model | Bitrate | Sample Rate | Params | Recommended use case |
|
28 |
+
|-----------------------------------------------------------------------------|-----------|-------------|--------|--------------------------|
|
29 |
+
| [hubertsiuzdak/snac_24khz](https://huggingface.co/hubertsiuzdak/snac_24khz) | 0.98 kbps | 24 kHz | 19.8 M | ๐ฃ๏ธ Speech |
|
30 |
+
| hubertsiuzdak/snac_32khz (this model) | 1.9 kbps | 32 kHz | 54.5 M | ๐ธ Music / Sound Effects |
|
31 |
+
| [hubertsiuzdak/snac_44khz](https://huggingface.co/hubertsiuzdak/snac_44khz) | 2.6 kbps | 44 kHz | 54.5 M | ๐ธ Music / Sound Effects |
|
32 |
+
|
33 |
+
## Usage
|
34 |
+
|
35 |
+
Install it using:
|
36 |
+
|
37 |
+
```bash
|
38 |
+
pip install snac
|
39 |
+
```
|
40 |
+
To encode (and decode) audio with SNAC in Python, use the following code:
|
41 |
+
|
42 |
+
```python
|
43 |
+
import torch
|
44 |
+
from snac import SNAC
|
45 |
+
|
46 |
+
model = SNAC.from_pretrained("hubertsiuzdak/snac_32khz").eval().cuda()
|
47 |
+
audio = torch.randn(1, 1, 32000).cuda() # B, 1, T
|
48 |
+
|
49 |
+
with torch.inference_mode():
|
50 |
+
codes = model.encode(audio)
|
51 |
+
audio_hat = model.decode(codes)
|
52 |
+
```
|
53 |
+
|
54 |
+
You can also encode and reconstruct in a single call:
|
55 |
+
|
56 |
+
```python
|
57 |
+
with torch.inference_mode():
|
58 |
+
audio_hat, codes = model(audio)
|
59 |
+
```
|
60 |
+
|
61 |
+
โ ๏ธ Note that `codes` is a list of token sequences of variable lengths, each corresponding to a different temporal
|
62 |
+
resolution.
|
63 |
+
|
64 |
+
```
|
65 |
+
>>> [code.shape[1] for code in codes]
|
66 |
+
[12, 24, 48, 96]
|
67 |
+
```
|
68 |
+
|
69 |
+
## Acknowledgements
|
70 |
+
|
71 |
+
Module definitions are adapted from the [Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec).
|
config.json
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"sampling_rate": 32000,
|
3 |
+
"encoder_dim": 64,
|
4 |
+
"encoder_rates": [2, 3, 8, 8],
|
5 |
+
"decoder_dim": 1536,
|
6 |
+
"decoder_rates": [8, 8, 3, 2],
|
7 |
+
"attn_window_size": 32,
|
8 |
+
"codebook_size": 4096,
|
9 |
+
"codebook_dim": 8,
|
10 |
+
"vq_strides": [8, 4, 2, 1],
|
11 |
+
"noise": true,
|
12 |
+
"depthwise": true
|
13 |
+
}
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:bfee2f057c1e287443786bedab377b5176b430e911417683977b7af71ea3ba65
|
3 |
+
size 218308802
|