m-a-p
/

ChatMusician

@@ -25,29 +25,6 @@ margin. Our work reveals that LLMs can be an excellent compressor for music, but
 <!-- <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/5fd6f670053c8345eddc1b68/8NSONUjIF7KGUCfwzPCd9.mpga"></audio> -->
-## Training Data
-ChatMusician is pretrained on the 🤗 [MusicPile](https://huggingface.co/datasets/m-a-p/MusicPile), which is the first pretraining corpus for **developing musical abilities** in large language models. Check out the dataset card for more details.
-And supervised finetuned on 1.1M samples(2:1 ratio between music scores
-and music knowledge & music summary data) from MusicPile. Check our [paper](http://arxiv.org/abs/2402.16153) for more details.
-## Training Procedure
-We initialized a fp16-precision ChatMusician-Base from the LLaMA2-7B-Base weights, and applied a continual pre-training plus fine-tuning pipeline. LoRA adapters were integrated into the attention and MLP layers, with additional training on embeddings and all linear layers. The maximum sequence length
-was 2048. We utilized 16 80GB-A800 GPUs for one epoch pre-training and 8 32GB-V100 GPUs for two epoch fine-tuning. DeepSpeed was employed for memory efficiency, and the AdamW optimizer was used with a 1e-4 learning rate and a 5% warmup cosine scheduler. Gradient clipping was set at 1.0. The LoRA parameters dimension, alpha, and
-dropout were set to 64, 16, and 0.1, with a batch size of 8.
-## Evaluation
-1. Music understanding abilities are evaluated on the [MusicTheoryBench](https://huggingface.co/datasets/m-a-p/MusicTheoryBench).
-2. General language abilities of ChatMusician are evaluated  on the [Massive Multitask Language Understanding (MMLU) dataset](https://huggingface.co/datasets/lukaemon/mmlu).
-## Usage
-You can use the models through Huggingface's Transformers library. Check our Github repo for more advanced use: [https://github.com/hf-lin/ChatMusician](https://github.com/hf-lin/ChatMusician)
 ## Prompt Format
 **Our model produces symbolic music(ABC notation) well in the following prompts.** Here are some musical tasks.
@@ -303,6 +280,28 @@ K:G
 ge d2 G2 cBAG d2 G2 cBAG
 ```
 ## CLI demo
 ```
 from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
@@ -353,8 +352,14 @@ We've tried our best to build math generalist models. However, we acknowledge th
 ## Citation
-If you use the models, data, or code from this project, please cite the original paper:
 ```
-coming soon.
 ```

 <!-- <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/5fd6f670053c8345eddc1b68/8NSONUjIF7KGUCfwzPCd9.mpga"></audio> -->
 ## Prompt Format
 **Our model produces symbolic music(ABC notation) well in the following prompts.** Here are some musical tasks.
 ge d2 G2 cBAG d2 G2 cBAG
 ```
+## Training Data
+ChatMusician is pretrained on the 🤗 [MusicPile](https://huggingface.co/datasets/m-a-p/MusicPile), which is the first pretraining corpus for **developing musical abilities** in large language models. Check out the dataset card for more details.
+And supervised finetuned on 1.1M samples(2:1 ratio between music scores
+and music knowledge & music summary data) from MusicPile. Check our [paper](http://arxiv.org/abs/2402.16153) for more details.
+## Training Procedure
+We initialized a fp16-precision ChatMusician-Base from the LLaMA2-7B-Base weights, and applied a continual pre-training plus fine-tuning pipeline. LoRA adapters were integrated into the attention and MLP layers, with additional training on embeddings and all linear layers. The maximum sequence length
+was 2048. We utilized 16 80GB-A800 GPUs for one epoch pre-training and 8 32GB-V100 GPUs for two epoch fine-tuning. DeepSpeed was employed for memory efficiency, and the AdamW optimizer was used with a 1e-4 learning rate and a 5% warmup cosine scheduler. Gradient clipping was set at 1.0. The LoRA parameters dimension, alpha, and
+dropout were set to 64, 16, and 0.1, with a batch size of 8.
+## Evaluation
+1. Music understanding abilities are evaluated on the [MusicTheoryBench](https://huggingface.co/datasets/m-a-p/MusicTheoryBench).
+2. General language abilities of ChatMusician are evaluated  on the [Massive Multitask Language Understanding (MMLU) dataset](https://huggingface.co/datasets/lukaemon/mmlu).
+## Usage
+You can use the models through Huggingface's Transformers library. Check our Github repo for more advanced use: [https://github.com/hf-lin/ChatMusician](https://github.com/hf-lin/ChatMusician)
 ## CLI demo
 ```
 from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
 ## Citation
+If you find our work helpful, feel free to give us a cite.
 ```
+@misc{yuan2024chatmusician,
+      title={ChatMusician: Understanding and Generating Music Intrinsically with LLM},
+      author={Ruibin Yuan and Hanfeng Lin and Yi Wang and Zeyue Tian and Shangda Wu and Tianhao Shen and Ge Zhang and Yuhang Wu and Cong Liu and Ziya Zhou and Ziyang Ma and Liumeng Xue and Ziyu Wang and Qin Liu and Tianyu Zheng and Yizhi Li and Yinghao Ma and Yiming Liang and Xiaowei Chi and Ruibo Liu and Zili Wang and Pengfei Li and Jingcheng Wu and Chenghua Lin and Qifeng Liu and Tao Jiang and Wenhao Huang and Wenhu Chen and Emmanouil Benetos and Jie Fu and Gus Xia and Roger Dannenberg and Wei Xue and Shiyin Kang and Yike Guo},
+      year={2024},
+      eprint={2402.16153},
+      archivePrefix={arXiv},
+      primaryClass={cs.SD}
+}
 ```