Update README.md
Browse files
README.md
CHANGED
@@ -294,8 +294,9 @@ dropout were set to 64, 16, and 0.1, with a batch size of 8.
|
|
294 |
|
295 |
## Evaluation
|
296 |
|
297 |
-
1. Music understanding abilities are evaluated on the [MusicTheoryBench](https://huggingface.co/datasets/m-a-p/MusicTheoryBench).
|
298 |
-
|
|
|
299 |
|
300 |
|
301 |
## Usage
|
|
|
294 |
|
295 |
## Evaluation
|
296 |
|
297 |
+
1. Music understanding abilities are evaluated on the [MusicTheoryBench](https://huggingface.co/datasets/m-a-p/MusicTheoryBench). Zero-shot accuracy on MusicTheoryBench.
|
298 |
+
We included GPT-3.5, GPT-4, LLaMA2-7B-Base, ChatMusician-Base, and ChatMusician. The blue bar represents the performance on the music knowledge metric, and the red bar represents the music reasoning metric. The dashed line corresponds to a random baseline, with a score of 25%.![MusicTheoryBench_result](./MusicTheoryBench_result_plt.png)
|
299 |
+
3. General language abilities of ChatMusician are evaluated on the [Massive Multitask Language Understanding (MMLU) dataset](https://huggingface.co/datasets/lukaemon/mmlu).
|
300 |
|
301 |
|
302 |
## Usage
|