xyz-nlp/XuanYuan2.0 · Why Hybrid-tuning works

Hi, here I have some confusion about the tuning

what's the size of the Instruction data? It is about 67%, much larger than the pre-train data, especially compared with other LLMs.
is there any difference between the instruction data training and pre-train data training, or just treat all the examples equally?
is there any module like multi-task finetuning mentioned in the paper?
usually the pre-train data is far larger than the instruction data, but here it's about 67%. How to make sure the instruction data is high-quality when using self-instruct and self-qa?