agent404 commited on
Commit
90be659
1 Parent(s): a8b376d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -26
README.md CHANGED
@@ -25,29 +25,6 @@ margin. Our work reveals that LLMs can be an excellent compressor for music, but
25
 
26
  <!-- <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/5fd6f670053c8345eddc1b68/8NSONUjIF7KGUCfwzPCd9.mpga"></audio> -->
27
 
28
- ## Training Data
29
-
30
- ChatMusician is pretrained on the 🤗 [MusicPile](https://huggingface.co/datasets/m-a-p/MusicPile), which is the first pretraining corpus for **developing musical abilities** in large language models. Check out the dataset card for more details.
31
- And supervised finetuned on 1.1M samples(2:1 ratio between music scores
32
- and music knowledge & music summary data) from MusicPile. Check our [paper](http://arxiv.org/abs/2402.16153) for more details.
33
-
34
- ## Training Procedure
35
-
36
- We initialized a fp16-precision ChatMusician-Base from the LLaMA2-7B-Base weights, and applied a continual pre-training plus fine-tuning pipeline. LoRA adapters were integrated into the attention and MLP layers, with additional training on embeddings and all linear layers. The maximum sequence length
37
- was 2048. We utilized 16 80GB-A800 GPUs for one epoch pre-training and 8 32GB-V100 GPUs for two epoch fine-tuning. DeepSpeed was employed for memory efficiency, and the AdamW optimizer was used with a 1e-4 learning rate and a 5% warmup cosine scheduler. Gradient clipping was set at 1.0. The LoRA parameters dimension, alpha, and
38
- dropout were set to 64, 16, and 0.1, with a batch size of 8.
39
-
40
- ## Evaluation
41
-
42
- 1. Music understanding abilities are evaluated on the [MusicTheoryBench](https://huggingface.co/datasets/m-a-p/MusicTheoryBench).
43
- 2. General language abilities of ChatMusician are evaluated on the [Massive Multitask Language Understanding (MMLU) dataset](https://huggingface.co/datasets/lukaemon/mmlu).
44
-
45
-
46
- ## Usage
47
-
48
- You can use the models through Huggingface's Transformers library. Check our Github repo for more advanced use: [https://github.com/hf-lin/ChatMusician](https://github.com/hf-lin/ChatMusician)
49
-
50
-
51
  ## Prompt Format
52
 
53
  **Our model produces symbolic music(ABC notation) well in the following prompts.** Here are some musical tasks.
@@ -303,6 +280,28 @@ K:G
303
  ge d2 G2 cBAG d2 G2 cBAG
304
  ```
305
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
306
  ## CLI demo
307
  ```
308
  from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
@@ -353,8 +352,14 @@ We've tried our best to build math generalist models. However, we acknowledge th
353
 
354
 
355
  ## Citation
356
- If you use the models, data, or code from this project, please cite the original paper:
357
-
358
  ```
359
- coming soon.
 
 
 
 
 
 
 
360
  ```
 
25
 
26
  <!-- <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/5fd6f670053c8345eddc1b68/8NSONUjIF7KGUCfwzPCd9.mpga"></audio> -->
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ## Prompt Format
29
 
30
  **Our model produces symbolic music(ABC notation) well in the following prompts.** Here are some musical tasks.
 
280
  ge d2 G2 cBAG d2 G2 cBAG
281
  ```
282
 
283
+ ## Training Data
284
+
285
+ ChatMusician is pretrained on the 🤗 [MusicPile](https://huggingface.co/datasets/m-a-p/MusicPile), which is the first pretraining corpus for **developing musical abilities** in large language models. Check out the dataset card for more details.
286
+ And supervised finetuned on 1.1M samples(2:1 ratio between music scores
287
+ and music knowledge & music summary data) from MusicPile. Check our [paper](http://arxiv.org/abs/2402.16153) for more details.
288
+
289
+ ## Training Procedure
290
+
291
+ We initialized a fp16-precision ChatMusician-Base from the LLaMA2-7B-Base weights, and applied a continual pre-training plus fine-tuning pipeline. LoRA adapters were integrated into the attention and MLP layers, with additional training on embeddings and all linear layers. The maximum sequence length
292
+ was 2048. We utilized 16 80GB-A800 GPUs for one epoch pre-training and 8 32GB-V100 GPUs for two epoch fine-tuning. DeepSpeed was employed for memory efficiency, and the AdamW optimizer was used with a 1e-4 learning rate and a 5% warmup cosine scheduler. Gradient clipping was set at 1.0. The LoRA parameters dimension, alpha, and
293
+ dropout were set to 64, 16, and 0.1, with a batch size of 8.
294
+
295
+ ## Evaluation
296
+
297
+ 1. Music understanding abilities are evaluated on the [MusicTheoryBench](https://huggingface.co/datasets/m-a-p/MusicTheoryBench).
298
+ 2. General language abilities of ChatMusician are evaluated on the [Massive Multitask Language Understanding (MMLU) dataset](https://huggingface.co/datasets/lukaemon/mmlu).
299
+
300
+
301
+ ## Usage
302
+
303
+ You can use the models through Huggingface's Transformers library. Check our Github repo for more advanced use: [https://github.com/hf-lin/ChatMusician](https://github.com/hf-lin/ChatMusician)
304
+
305
  ## CLI demo
306
  ```
307
  from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
 
352
 
353
 
354
  ## Citation
355
+ If you find our work helpful, feel free to give us a cite.
 
356
  ```
357
+ @misc{yuan2024chatmusician,
358
+ title={ChatMusician: Understanding and Generating Music Intrinsically with LLM},
359
+ author={Ruibin Yuan and Hanfeng Lin and Yi Wang and Zeyue Tian and Shangda Wu and Tianhao Shen and Ge Zhang and Yuhang Wu and Cong Liu and Ziya Zhou and Ziyang Ma and Liumeng Xue and Ziyu Wang and Qin Liu and Tianyu Zheng and Yizhi Li and Yinghao Ma and Yiming Liang and Xiaowei Chi and Ruibo Liu and Zili Wang and Pengfei Li and Jingcheng Wu and Chenghua Lin and Qifeng Liu and Tao Jiang and Wenhao Huang and Wenhu Chen and Emmanouil Benetos and Jie Fu and Gus Xia and Roger Dannenberg and Wei Xue and Shiyin Kang and Yike Guo},
360
+ year={2024},
361
+ eprint={2402.16153},
362
+ archivePrefix={arXiv},
363
+ primaryClass={cs.SD}
364
+ }
365
  ```