Dataset
#1
by
titan087
- opened
Hey,
What dataset did you use to finetune this model? I was looking for one to finetune codellama 34b and havent found one that looked good.
Thanks!
Same here. So I chose a benchmark dataset.
https://huggingface.co/datasets/codeparrot/xlcost-text-to-code
The JavaScript subsection has about 10K rows. I felt that to be good enough for a fine-tune. Let me know your thoughts as well.
Its worth a shot, for a basic test I can try training either Gemma or Llama3, or potentially Phi-3, at least to start with. If it works well enough, than scale it up to one of the coding based 34b models.