Update README.md
#5
by
reach-vb
HF staff
- opened
README.md
CHANGED
@@ -11,6 +11,9 @@ We further release a set of stereophonic capable models. Those were fine tuned f
|
|
11 |
from the mono models. The training data is otherwise identical and capabilities and limitations are shared with the base modes. The stereo models work by getting 2 streams of tokens from the EnCodec model, and interleaving those using
|
12 |
the delay pattern.
|
13 |
|
|
|
|
|
|
|
14 |
MusicGen is a text-to-music model capable of genreating high-quality music samples conditioned on text descriptions or audio prompts.
|
15 |
It is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz.
|
16 |
Unlike existing methods, like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass.
|
@@ -67,15 +70,15 @@ pip install --upgrade git+https://github.com/huggingface/transformers.git scipy
|
|
67 |
2. Run inference via the `Text-to-Audio` (TTA) pipeline. You can infer the MusicGen model via the TTA pipeline in just a few lines of code!
|
68 |
|
69 |
```python
|
70 |
-
import scipy
|
71 |
import torch
|
|
|
72 |
from transformers import pipeline
|
73 |
|
74 |
-
synthesiser = pipeline("text-to-audio", "facebook/musicgen-stereo-
|
75 |
|
76 |
-
music = synthesiser("lo-fi music with a soothing melody", forward_params={"
|
77 |
|
78 |
-
|
79 |
```
|
80 |
|
81 |
3. Run inference via the Transformers modelling code. You can use the processor + generate code to convert text into a mono 32 kHz audio waveform for more fine-grained control.
|
|
|
11 |
from the mono models. The training data is otherwise identical and capabilities and limitations are shared with the base modes. The stereo models work by getting 2 streams of tokens from the EnCodec model, and interleaving those using
|
12 |
the delay pattern.
|
13 |
|
14 |
+
Stereophonic sound, also known as stereo, is a technique used to reproduce sound with depth and direction.
|
15 |
+
It uses two separate audio channels played through speakers (or headphones), which creates the impression of sound coming from multiple directions.
|
16 |
+
|
17 |
MusicGen is a text-to-music model capable of genreating high-quality music samples conditioned on text descriptions or audio prompts.
|
18 |
It is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz.
|
19 |
Unlike existing methods, like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass.
|
|
|
70 |
2. Run inference via the `Text-to-Audio` (TTA) pipeline. You can infer the MusicGen model via the TTA pipeline in just a few lines of code!
|
71 |
|
72 |
```python
|
|
|
73 |
import torch
|
74 |
+
import soundfile as sf
|
75 |
from transformers import pipeline
|
76 |
|
77 |
+
synthesiser = pipeline("text-to-audio", "facebook/musicgen-stereo-small", device="cuda:0", torch_dtype=torch.float16)
|
78 |
|
79 |
+
music = synthesiser("lo-fi music with a soothing melody", forward_params={"max_new_tokens": 256})
|
80 |
|
81 |
+
sf.write("musicgen_out.wav", music["audio"][0].T, music["sampling_rate"])
|
82 |
```
|
83 |
|
84 |
3. Run inference via the Transformers modelling code. You can use the processor + generate code to convert text into a mono 32 kHz audio waveform for more fine-grained control.
|