File size: 7,360 Bytes
c968fc3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
# Pretrained Models Dependency

The models dependency of Amphion are as follows (sort alphabetically):

- [Pretrained Models Dependency](#pretrained-models-dependency)
  - [Amphion Singing BigVGAN](#amphion-singing-bigvgan)
  - [Amphion Speech HiFi-GAN](#amphion-speech-hifi-gan)
  - [ContentVec](#contentvec)
  - [WeNet](#wenet)
  - [Whisper](#whisper)
  - [RawNet3](#rawnet3)


The instructions about how to download them is displayed as follows.

## Amphion Singing BigVGAN

We fine-tune the official BigVGAN pretrained model with over 120 hours singing voice data. The fine-tuned checkpoint can be downloaded [here](https://huggingface.co/amphion/BigVGAN_singing_bigdata). You need to download the `400000.pt` and `args.json` files into `Amphion/pretrained/bigvgan`:

```
Amphion
 ┣ pretrained
 ┃ ┣ bivgan
 ┃ ┃ ┣ 400000.pt
 ┃ ┃ ┣ args.json
```

## Amphion Speech HiFi-GAN

We trained our HiFi-GAN pretrained model with 685 hours speech data. Which can be downloaded [here](https://huggingface.co/amphion/hifigan_speech_bigdata). You need to download the whole folder of `hifigan_speech` into `Amphion/pretrained/hifigan`.

```
Amphion
 ┣ pretrained
 ┃ ┣ hifigan
 ┃ ┃ ┣ hifigan_speech
 ┃ ┃ ┃ ┣ log
 ┃ ┃ ┃ ┣ result
 ┃ ┃ ┃ ┣ checkpoint
 ┃ ┃ ┃ ┣ args.json
```

## Amphion DiffWave

We trained our DiffWave pretrained model with 125 hours speech data and around 80 hours of singing voice data. Which can be downloaded [here](https://huggingface.co/amphion/diffwave). You need to download the whole folder of `diffwave` into `Amphion/pretrained/diffwave`.

```
Amphion
 ┣ pretrained
 ┃ ┣ diffwave
 ┃ ┃ ┣ diffwave_speech
 ┃ ┃ ┃ ┣ samples
 ┃ ┃ ┃ ┣ checkpoint
 ┃ ┃ ┃ ┣ args.json
```

## ContentVec

You can download the pretrained ContentVec model [here](https://github.com/auspicious3000/contentvec). Note that we use the `ContentVec_legacy-500 classes` checkpoint. Assume that you download the `checkpoint_best_legacy_500.pt` into the `Amphion/pretrained/contentvec`.

```
Amphion
 ┣ pretrained
 ┃ ┣ contentvec
 ┃ ┃ ┣ checkpoint_best_legacy_500.pt
```

## WeNet

You can download the pretrained WeNet model [here](https://github.com/wenet-e2e/wenet/blob/main/docs/pretrained_models.md). Take the `wenetspeech` pretrained checkpoint as an example, assume you download the `wenetspeech_u2pp_conformer_exp.tar` into the `Amphion/pretrained/wenet`. Unzip it and modify its configuration file as follows:

```sh
cd Amphion/pretrained/wenet

### Unzip the expt dir
tar -xvf wenetspeech_u2pp_conformer_exp.tar.gz

### Specify the updated path in train.yaml
cd 20220506_u2pp_conformer_exp
vim train.yaml
# TODO: Change the value of "cmvn_file" (Line 2) to the absolute path of the `global_cmvn` file. (Eg: [YourPath]/Amphion/pretrained/wenet/20220506_u2pp_conformer_exp/global_cmvn)
```

The final file struture tree is like:

```
Amphion
 ┣ pretrained
 ┃ ┣ wenet
 ┃ ┃ ┣ 20220506_u2pp_conformer_exp
 ┃ ┃ ┃ ┣ final.pt
 ┃ ┃ ┃ ┣ global_cmvn
 ┃ ┃ ┃ ┣ train.yaml
 ┃ ┃ ┃ ┣ units.txt
```

## Whisper

The official pretrained whisper checkpoints can be available [here](https://github.com/openai/whisper/blob/e58f28804528831904c3b6f2c0e473f346223433/whisper/__init__.py#L17). In Amphion, we use the `medium` whisper model by default. You can download it as follows:

```bash
cd Amphion/pretrained
mkdir whisper
cd whisper

wget https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt
```

The final file structure tree is like:

```
Amphion
 ┣ pretrained
 ┃ ┣ whisper
 ┃ ┃ ┣ medium.pt
```

## RawNet3

The official pretrained RawNet3 checkpoints can be available [here](https://huggingface.co/jungjee/RawNet3). You need to download the `model.pt` file and put it in the folder.

The final file structure tree is like:

```
Amphion
 ┣ pretrained
 ┃ ┣ rawnet3
 ┃ ┃ ┣ model.pt
```


# (Optional) Model Dependencies for Evaluation
When utilizing Amphion's Evaluation Pipelines, terminals without access to `huggingface.co` may encounter error messages such as "OSError: Can't load tokenizer for ...". To work around this, the dependant models for evaluation can be pre-prepared and stored here, at `Amphion/pretrained`, and follow [this README](../egs/metrics/README.md#troubleshooting) to configure your environment to load local models.

The dependant models of Amphion's evaluation pipeline are as follows (sort alphabetically):

- [Evaluation Pipeline Models Dependency](#optional-model-dependencies-for-evaluation)
  - [bert-base-uncased](#bert-base-uncased)
  - [facebook/bart-base](#facebookbart-base)
  - [roberta-base](#roberta-base)
  - [wavlm](#wavlm)

The instructions about how to download them is displayed as follows.

## bert-base-uncased

To load `bert-base-uncased` locally, follow [this link](https://huggingface.co/bert-base-uncased) to download all files for `bert-base-uncased` model, and store them under `Amphion/pretrained/bert-base-uncased`, conforming to the following file structure tree:

```
Amphion
 ┣ pretrained
 ┃ ┣ bert-base-uncased
 ┃ ┃ ┣ config.json
 ┃ ┃ ┣ coreml 
 ┃ ┃ ┃ ┣ fill-mask
 ┃ ┃ ┃   ┣ float32_model.mlpackage
 ┃ ┃ ┃      ┣ Data
 ┃ ┃ ┃         ┣ com.apple.CoreML
 ┃ ┃ ┃            ┣ model.mlmodel 
 ┃ ┃ ┣ flax_model.msgpack
 ┃ ┃ ┣ LICENSE
 ┃ ┃ ┣ model.onnx
 ┃ ┃ ┣ model.safetensors
 ┃ ┃ ┣ pytorch_model.bin
 ┃ ┃ ┣ README.md
 ┃ ┃ ┣ rust_model.ot
 ┃ ┃ ┣ tf_model.h5
 ┃ ┃ ┣ tokenizer_config.json
 ┃ ┃ ┣ tokenizer.json
 ┃ ┃ ┣ vocab.txt
```

## facebook/bart-base

To load `facebook/bart-base` locally, follow [this link](https://huggingface.co/facebook/bart-base) to download all files for `facebook/bart-base` model, and store them under `Amphion/pretrained/facebook/bart-base`, conforming to the following file structure tree:

```
Amphion
 ┣ pretrained
 ┃ ┣ facebook
 ┃ ┃ ┣ bart-base
 ┃ ┃ ┃ ┣ config.json
 ┃ ┃ ┃ ┣ flax_model.msgpack
 ┃ ┃ ┃ ┣ gitattributes.txt
 ┃ ┃ ┃ ┣ merges.txt
 ┃ ┃ ┃ ┣ model.safetensors
 ┃ ┃ ┃ ┣ pytorch_model.bin
 ┃ ┃ ┃ ┣ README.txt
 ┃ ┃ ┃ ┣ rust_model.ot
 ┃ ┃ ┃ ┣ tf_model.h5
 ┃ ┃ ┃ ┣ tokenizer.json
 ┃ ┃ ┃ ┣ vocab.json
```

## roberta-base

To load `roberta-base` locally, follow [this link](https://huggingface.co/roberta-base) to download all files for `roberta-base` model, and store them under `Amphion/pretrained/roberta-base`, conforming to the following file structure tree:

```
Amphion
 ┣ pretrained
 ┃ ┣ roberta-base
 ┃ ┃ ┣ config.json
 ┃ ┃ ┣ dict.txt
 ┃ ┃ ┣ flax_model.msgpack
 ┃ ┃ ┣ gitattributes.txt
 ┃ ┃ ┣ merges.txt
 ┃ ┃ ┣ model.safetensors
 ┃ ┃ ┣ pytorch_model.bin
 ┃ ┃ ┣ README.txt
 ┃ ┃ ┣ rust_model.ot
 ┃ ┃ ┣ tf_model.h5
 ┃ ┃ ┣ tokenizer.json
 ┃ ┃ ┣ vocab.json
```

## wavlm

The official pretrained wavlm checkpoints can be available [here](https://huggingface.co/microsoft/wavlm-base-plus-sv). The file structure tree is as follows:

```
Amphion
 ┣ wavlm
 ┃ ┣ config.json
 ┃ ┣ preprocessor_config.json
 ┃ ┣ pytorch_model.bin
```