reach-vb HF staff commited on
Commit
8823033
1 Parent(s): af607a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -11,6 +11,9 @@ We further release a set of stereophonic capable models. Those were fine tuned f
11
  from the mono models. The training data is otherwise identical and capabilities and limitations are shared with the base modes. The stereo models work by getting 2 streams of tokens from the EnCodec model, and interleaving those using
12
  the delay pattern.
13
 
 
 
 
14
  MusicGen is a text-to-music model capable of genreating high-quality music samples conditioned on text descriptions or audio prompts.
15
  It is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz.
16
  Unlike existing methods, like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass.
@@ -67,15 +70,15 @@ pip install --upgrade git+https://github.com/huggingface/transformers.git scipy
67
  2. Run inference via the `Text-to-Audio` (TTA) pipeline. You can infer the MusicGen model via the TTA pipeline in just a few lines of code!
68
 
69
  ```python
70
- import scipy
71
  import torch
 
72
  from transformers import pipeline
73
 
74
- synthesiser = pipeline("text-to-audio", "facebook/musicgen-stereo-large", torch_dtype=torch.float16, device="cuda")
75
 
76
- music = synthesiser("lo-fi music with a soothing melody", forward_params={"do_sample": True})
77
 
78
- scipy.io.wavfile.write("musicgen_out.wav", rate=music["sampling_rate"], music=audio["audio"])
79
  ```
80
 
81
  3. Run inference via the Transformers modelling code. You can use the processor + generate code to convert text into a mono 32 kHz audio waveform for more fine-grained control.
 
11
  from the mono models. The training data is otherwise identical and capabilities and limitations are shared with the base modes. The stereo models work by getting 2 streams of tokens from the EnCodec model, and interleaving those using
12
  the delay pattern.
13
 
14
+ Stereophonic sound, also known as stereo, is a technique used to reproduce sound with depth and direction.
15
+ It uses two separate audio channels played through speakers (or headphones), which creates the impression of sound coming from multiple directions.
16
+
17
  MusicGen is a text-to-music model capable of genreating high-quality music samples conditioned on text descriptions or audio prompts.
18
  It is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz.
19
  Unlike existing methods, like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass.
 
70
  2. Run inference via the `Text-to-Audio` (TTA) pipeline. You can infer the MusicGen model via the TTA pipeline in just a few lines of code!
71
 
72
  ```python
 
73
  import torch
74
+ import soundfile as sf
75
  from transformers import pipeline
76
 
77
+ synthesiser = pipeline("text-to-audio", "facebook/musicgen-stereo-small", device="cuda:0", torch_dtype=torch.float16)
78
 
79
+ music = synthesiser("lo-fi music with a soothing melody", forward_params={"max_new_tokens": 256})
80
 
81
+ sf.write("musicgen_out.wav", music["audio"][0].T, music["sampling_rate"])
82
  ```
83
 
84
  3. Run inference via the Transformers modelling code. You can use the processor + generate code to convert text into a mono 32 kHz audio waveform for more fine-grained control.