File size: 7,843 Bytes
5238467
 
 
 
 
 
 
 
 
 
5325fcc
 
 
 
5238467
5325fcc
5238467
 
 
 
5325fcc
5238467
 
 
 
5325fcc
5238467
5325fcc
5238467
5325fcc
 
 
5238467
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5325fcc
5238467
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5325fcc
5238467
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5325fcc
5238467
 
5325fcc
 
 
 
 
5238467
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5325fcc
 
 
 
 
 
 
5238467
5325fcc
5238467
5325fcc
 
 
 
5238467
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5325fcc
5238467
 
5325fcc
5238467
 
 
 
 
 
 
 
5325fcc
5238467
5325fcc
 
 
 
5238467
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5325fcc
 
 
 
 
 
5238467
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MusicGen\n",
    "Welcome to MusicGen's demo jupyter notebook. Here you will find a series of self-contained examples of how to use MusicGen in different settings.\n",
    "\n",
    "First, we start by initializing MusicGen, you can choose a model from the following selection:\n",
    "1. `facebook/musicgen-small` - 300M transformer decoder.\n",
    "2. `facebook/musicgen-medium` - 1.5B transformer decoder.\n",
    "3. `facebook/musicgen-melody` - 1.5B transformer decoder also supporting melody conditioning.\n",
    "4. `facebook/musicgen-large` - 3.3B transformer decoder.\n",
    "\n",
    "We will use the `facebook/musicgen-small` variant for the purpose of this demonstration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from audiocraft.models import MusicGen\n",
    "from audiocraft.models import MultiBandDiffusion\n",
    "\n",
    "USE_DIFFUSION_DECODER = False\n",
    "# Using small model, better results would be obtained with `medium` or `large`.\n",
    "model = MusicGen.get_pretrained('facebook/musicgen-small')\n",
    "if USE_DIFFUSION_DECODER:\n",
    "    mbd = MultiBandDiffusion.get_mbd_musicgen()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let us configure the generation parameters. Specifically, you can control the following:\n",
    "* `use_sampling` (bool, optional): use sampling if True, else do argmax decoding. Defaults to True.\n",
    "* `top_k` (int, optional): top_k used for sampling. Defaults to 250.\n",
    "* `top_p` (float, optional): top_p used for sampling, when set to 0 top_k is used. Defaults to 0.0.\n",
    "* `temperature` (float, optional): softmax temperature parameter. Defaults to 1.0.\n",
    "* `duration` (float, optional): duration of the generated waveform. Defaults to 30.0.\n",
    "* `cfg_coef` (float, optional): coefficient used for classifier free guidance. Defaults to 3.0.\n",
    "\n",
    "When left unchanged, MusicGen will revert to its default parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.set_generation_params(\n",
    "    use_sampling=True,\n",
    "    top_k=250,\n",
    "    duration=30\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we can go ahead and start generating music using one of the following modes:\n",
    "* Unconditional samples using `model.generate_unconditional`\n",
    "* Music continuation using `model.generate_continuation`\n",
    "* Text-conditional samples using `model.generate`\n",
    "* Melody-conditional samples using `model.generate_with_chroma`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Music Continuation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import math\n",
    "import torchaudio\n",
    "import torch\n",
    "from audiocraft.utils.notebook import display_audio\n",
    "\n",
    "def get_bip_bip(bip_duration=0.125, frequency=440,\n",
    "                duration=0.5, sample_rate=32000, device=\"cuda\"):\n",
    "    \"\"\"Generates a series of bip bip at the given frequency.\"\"\"\n",
    "    t = torch.arange(\n",
    "        int(duration * sample_rate), device=\"cuda\", dtype=torch.float) / sample_rate\n",
    "    wav = torch.cos(2 * math.pi * 440 * t)[None]\n",
    "    tp = (t % (2 * bip_duration)) / (2 * bip_duration)\n",
    "    envelope = (tp >= 0.5).float()\n",
    "    return wav * envelope"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here we use a synthetic signal to prompt both the tonality and the BPM\n",
    "# of the generated audio.\n",
    "res = model.generate_continuation(\n",
    "    get_bip_bip(0.125).expand(2, -1, -1), \n",
    "    32000, ['Jazz jazz and only jazz', \n",
    "            'Heartful EDM with beautiful synths and chords'], \n",
    "    progress=True)\n",
    "display_audio(res, 32000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You can also use any audio from a file. Make sure to trim the file if it is too long!\n",
    "prompt_waveform, prompt_sr = torchaudio.load(\"../assets/bach.mp3\")\n",
    "prompt_duration = 2\n",
    "prompt_waveform = prompt_waveform[..., :int(prompt_duration * prompt_sr)]\n",
    "output = model.generate_continuation(prompt_waveform, prompt_sample_rate=prompt_sr, progress=True, return_tokens=True)\n",
    "display_audio(output[0], sample_rate=32000)\n",
    "if USE_DIFFUSION_DECODER:\n",
    "    out_diffusion = mbd.tokens_to_wav(output[1])\n",
    "    display_audio(out_diffusion, sample_rate=32000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Text-conditional Generation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from audiocraft.utils.notebook import display_audio\n",
    "\n",
    "output = model.generate(\n",
    "    descriptions=[\n",
    "        #'80s pop track with bassy drums and synth',\n",
    "        #'90s rock song with loud guitars and heavy drums',\n",
    "        #'Progressive rock drum and bass solo',\n",
    "        #'Punk Rock song with loud drum and power guitar',\n",
    "        #'Bluesy guitar instrumental with soulful licks and a driving rhythm section',\n",
    "        #'Jazz Funk song with slap bass and powerful saxophone',\n",
    "        'drum and bass beat with intense percussions'\n",
    "    ],\n",
    "    progress=True, return_tokens=True\n",
    ")\n",
    "display_audio(output[0], sample_rate=32000)\n",
    "if USE_DIFFUSION_DECODER:\n",
    "    out_diffusion = mbd.tokens_to_wav(output[1])\n",
    "    display_audio(out_diffusion, sample_rate=32000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Melody-conditional Generation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torchaudio\n",
    "from audiocraft.utils.notebook import display_audio\n",
    "\n",
    "model = MusicGen.get_pretrained('facebook/musicgen-melody')\n",
    "model.set_generation_params(duration=8)\n",
    "\n",
    "melody_waveform, sr = torchaudio.load(\"../assets/bach.mp3\")\n",
    "melody_waveform = melody_waveform.unsqueeze(0).repeat(2, 1, 1)\n",
    "output = model.generate_with_chroma(\n",
    "    descriptions=[\n",
    "        '80s pop track with bassy drums and synth',\n",
    "        '90s rock song with loud guitars and heavy drums',\n",
    "    ],\n",
    "    melody_wavs=melody_waveform,\n",
    "    melody_sample_rate=sr,\n",
    "    progress=True, return_tokens=True\n",
    ")\n",
    "display_audio(output[0], sample_rate=32000)\n",
    "if USE_DIFFUSION_DECODER:\n",
    "    out_diffusion = mbd.tokens_to_wav(output[1])\n",
    "    display_audio(out_diffusion, sample_rate=32000)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  },
  "vscode": {
   "interpreter": {
    "hash": "b02c911f9b3627d505ea4a19966a915ef21f28afb50dbf6b2115072d27c69103"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}