juancopi81/mutopia_guitar_mmm

Music generation could be approached similarly to language generation. There are many ways to represent music as text and then use a language model to create a model capable of music generation. For encoding MIDI files as text, I am using the excellent implementation of Dr. Tristan Beheren of the paper: MMM: Exploring Conditional Multi-Track Music Generation with the Transformer.

This model is a fine-tuned version of gpt2 on the Mutopia Guitar Dataset. Use the widget to generate your piece, and then use this notebook to listen to the results (work in progress). I created the notebook as an adaptation of the one created by Dr. Tristan Behrens.

It achieves the following results on the evaluation set:

Train Loss: 0.5365
Validation Loss: 1.5482

Model description

The model is GPT-2 loaded with the GPT2LMHeadModel architecture from Hugging Face. The context size is 256, and the vocabulary size is 588. The model uses a WhitespaceSplit pre-tokenizer. The tokenizer is also in the Hugging Face hub.

Intended uses & limitations

I built this model to learn more about how to use Hugging Face. I am implementing some of the parts of the Hugging Face course with a project that I find interesting. The main intention of this model is educational. I am creating a series of notebooks where I show every step of the process:

Collecting the data
Pre-processing the data
Training a tokenizer from scratch
Fine-tuning a GPT-2 model
Building a Gradio app for the model

I trained the model using the free version of Colab with a small dataset. Right now, it is heavily overfitting. My idea is to have a more extensive dataset of Guitar Music from Latinoamerica to train a new model similar to the Mutopia Guitar Model, using more GPU resources.

Training and evaluation data

I am training the model with Mutopia Guitar Dataset. This dataset consists of the soloist guitar pieces of the Mutopia Project. The dataset mainly contains guitar music from western classical composers, such as Sor, Aguado, Carcassi, and Giuliani.

For the first epochs of training, I transposed the notes by raising and lowering the pitches using the twelve semi-tones of an entire octave. Later, I trained the model without transposing the pieces so that generation shows better results of a real guitar piece.

Training hyperparameters

Click to expand

The following hyperparameters were used during training (with transposition):

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-07, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-07, 'decay_steps': 5726, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}

The following hyperparameters were used during training (without transposition - first round):

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-07, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-07, 'decay_steps': 350, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}

The following hyperparameters were used during training (without transposition - second round):

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-07, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-07, 'decay_steps': 350, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}

The following hyperparameters were used during training (without transposition, new tokenizer - third round):

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-07, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-07, 'decay_steps': 350, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}

The following hyperparameters were used during training (without transposition, new tokenizer - fourth round):

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-07, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-07, 'decay_steps': 350, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}

The following hyperparameters were used during training (without transposition, new tokenizer - fifth round):

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-07, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-07, 'decay_steps': 350, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}

The following hyperparameters were used during training (without transposition, new tokenizer - sixth round):

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-07, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-07, 'decay_steps': 350, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}

The following hyperparameters were used during training (without transposition, new tokenizer - seventh round):

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 0.0005, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 0.0005, 'decay_steps': 1025, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
training_precision: mixed_float16

Training results

Click to expand

Using transposition: | Train Loss | Validation Loss | Epoch | |:----------:|:---------------:|:-----:| | 1.0705 | 1.3590 | 0 | | 0.8889 | 1.3702 | 1 | | 0.7588 | 1.3974 | 2 | | 0.7294 | 1.4813 | 3 | | 0.6263 | 1.5263 | 4 | | 0.5841 | 1.5263 | 5 | | 0.5844 | 1.5263 | 6 | | 0.5837 | 1.5346 | 7 | | 0.5798 | 1.5411 | 8 | | 0.5773 | 1.5440 | 9 |

Without transposition (first round):

Train Loss	Validation Loss	Epoch
0.5503	1.5436	0
0.5503	1.5425	1
0.5476	1.5425	2
0.5467	1.5425	3
0.5447	1.5431	4
0.5418	1.5447	5
0.5418	1.5451	6
0.5401	1.5472	7
0.5386	1.5479	8
0.5365	1.5482	9

Without transposition (second round):

Train Loss	Validation Loss	Epoch
0.5368	1.5482	0
0.5355	1.5480	1
0.5326	1.5488	2
0.5363	1.5493	3
0.5346	1.5488	4
0.5329	1.5502	5
0.5329	1.5514	6
0.5308	1.5514	7
0.5292	1.5536	8
0.5272	1.5543	9

Without transposition (third round - new tokenizer):

Train Loss	Validation Loss	Epoch
6.1361	6.4569	0
5.6383	5.8249	1
4.9125	4.8956	2
4.2013	4.2778	3
3.8665	4.0330	4
3.7106	3.8956	5
3.6041	3.7995	6
3.5301	3.7485	7
3.4973	3.7323	8
3.4909	3.7323	9

Without transposition (fourth round - new tokenizer):

Train Loss	Validation Loss	Epoch
3.4879	3.7206	0
3.4667	3.6874	1
3.4229	3.6373	2
3.3680	3.5751	3
3.2998	3.5026	4
3.2208	3.4240	5
3.1385	3.3397	6
3.0580	3.2587	7
2.9949	3.2118	8
2.9646	3.1958	9

Without transposition (fifth round - new tokenizer):

Train Loss	Validation Loss	Epoch
2.9562	3.1902	0
2.9457	3.1751	1
2.9266	3.1512	2
2.9039	3.1176	3
2.8705	3.0775	4
2.8291	3.0295	5
2.7872	2.9811	6
2.7394	2.9321	7
2.6996	2.9023	8
2.6819	2.8927	9

Without transposition (sixth round - new tokenizer):

Train Loss	Validation Loss	Epoch
2.6769	2.8894	0
2.6719	2.8791	1
2.6612	2.8638	2
2.6465	2.8439	3
2.6242	2.8174	4
2.6006	2.7877	5
2.5679	2.7554	6
2.5387	2.7223	7
2.5115	2.7029	8
2.5011	2.6970	9

Without transposition (seventh round - new tokenizer):

Train Loss	Validation Loss	Epoch
2.2881	2.2059	0
1.7702	1.8533	1
1.4625	1.6948	2
1.2876	1.6865	3
1.1926	1.6414	4
1.1329	1.6360	5
1.1069	1.6448	6
1.0408	1.6207	7
0.8939	1.5837	8
0.7265	1.5901	9
0.5902	1.6261	10
0.4489	1.7007	11
0.3223	1.7940	12
0.2158	1.9032	13
0.1448	1.9892	14

Framework versions

Transformers 4.22.1
TensorFlow 2.8.2
Datasets 2.5.1
Tokenizers 0.12.1

juancopi81
/

mutopia_guitar_mmm

juancopi81/mutopia_guitar_mmm

Model description

Intended uses & limitations

Training and evaluation data

Training hyperparameters

Training results

Framework versions

Model tree for juancopi81/mutopia_guitar_mmm

Dataset used to train juancopi81/mutopia_guitar_mmm

Space using juancopi81/mutopia_guitar_mmm 1

Evaluation results