NekoMikoReimu/TohoWorkInProgress

Highly Responsive to Prompts - Touhou MusicGen(Medium) Finetune Notes

Some of the generated samples are going in ways that sound not-touhou-ish, but more traditional upbeat/cheerful, like elevator or corporate filler music. A few potential causes: a. The Essentia-generated audio tags haven't been looked at carefully to see if they actually match the song in question, I remember seeing some wackiness in there that we could stand to clear out..
b. I'm not sure, but I think I heard some key changes in the samples. Should check the "key" items in the jsonl as well.
For our next run, I'd like to add in LpMusicCaps support. https://huggingface.co/spaces/seungheondoh/LP-Music-Caps-demo
I'm not sure that the downgrading to 32khz is actually necessary, I've seen other people not do it.