vatsal-metavoice
commited on
Commit
•
14aac75
1
Parent(s):
e007fe7
feat: update README
Browse files
README.md
CHANGED
@@ -2,6 +2,8 @@
|
|
2 |
license: apache-2.0
|
3 |
language:
|
4 |
- en
|
|
|
|
|
5 |
---
|
6 |
|
7 |
MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech). It has been built with the following priorities:
|
@@ -13,38 +15,8 @@ MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for
|
|
13 |
|
14 |
We’re releasing MetaVoice-1B under the Apache 2.0 license, *it can be used without restrictions*.
|
15 |
|
16 |
-
## Installation
|
17 |
-
```bash
|
18 |
-
# install ffmpeg
|
19 |
-
wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz
|
20 |
-
wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz.md5
|
21 |
-
md5sum -c ffmpeg-git-amd64-static.tar.xz.md5
|
22 |
-
tar xvf ffmpeg-git-amd64-static.tar.xz
|
23 |
-
sudo mv ffmpeg-git-*-static/ffprobe ffmpeg-git-*-static/ffmpeg /usr/local/bin/
|
24 |
-
rm -rf ffmpeg-git-*
|
25 |
-
|
26 |
-
pip install -r requirements.txt
|
27 |
-
pip install -e .
|
28 |
-
```
|
29 |
-
|
30 |
-
## Download
|
31 |
-
```
|
32 |
-
wget https://cdn.themetavoice.xyz/metavoice-1B-v0.1.tar
|
33 |
-
tar -xvf metavoice-1B-v0.1.tar
|
34 |
-
```
|
35 |
-
|
36 |
## Usage
|
37 |
-
|
38 |
-
```bash
|
39 |
-
python fam/llm/sample.py --model_dir=<PATH_TO_MODEL_DIR> --spk_cond_path=<PATH_TO_TARGET_AUDIO>
|
40 |
-
```
|
41 |
-
|
42 |
-
2. Deploy it on any cloud (AWS/GCP/Azure), using our [inference server](/fam/llm/serving.py)
|
43 |
-
```bash
|
44 |
-
python fam/llm/serving.py --model_dir=<PATH_TO_MODEL_DIR>
|
45 |
-
```
|
46 |
-
|
47 |
-
3. Use it on HuggingFace
|
48 |
|
49 |
## Soon
|
50 |
- Long form TTS
|
@@ -66,6 +38,3 @@ We predict EnCodec tokens from text, and speaker information. This is then diffu
|
|
66 |
The model supports:
|
67 |
1. KV-caching via Flash Decoding
|
68 |
2. Batching (including texts of different lengths)
|
69 |
-
|
70 |
-
## Contribute
|
71 |
-
- See all [active issues](https://github.com/themetavoicexyz/issues)!
|
|
|
2 |
license: apache-2.0
|
3 |
language:
|
4 |
- en
|
5 |
+
tags:
|
6 |
+
- pretrained
|
7 |
---
|
8 |
|
9 |
MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech). It has been built with the following priorities:
|
|
|
15 |
|
16 |
We’re releasing MetaVoice-1B under the Apache 2.0 license, *it can be used without restrictions*.
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
## Usage
|
19 |
+
See [Github](https://github.com/metavoiceio/metavoice-src) for the latest usage instructions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
## Soon
|
22 |
- Long form TTS
|
|
|
38 |
The model supports:
|
39 |
1. KV-caching via Flash Decoding
|
40 |
2. Batching (including texts of different lengths)
|
|
|
|
|
|