training data
#3
by
Meital
- opened
amazing!
Can you say more about the high-quality code you used to train the model?
Is it permissive?
For models of different sizes, I tried various data combinations because I found that some datasets are more suitable for training smaller models. They consist of multiple public datasets and private datasets.
The highest quality dataset should be https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K, and using it alone can achieve about 75% Pass@1.
thanks!
Meital
changed discussion status to
closed