Finetuning Genna for Foreign Language
I am attempting to fine-tune Gemma for one of the languages on which it has been pretrained. Could you provide any suggestions regarding the optimal size of the dataset to ensure a noticeable improvement in performance? The best format for the training files? Any other recommendations? Thank you.
@user1357925
hello friend. I had a good response from the gemma 2b model using this format to pass to the dataset. I did the Fine tuning for Brazilian Portuguese. He follows
I have 2 datasets. One for mental 36k (gemma 2b )
another with 100k for instruct ( gemma 7b )
def formatting_func(example):
instruction = example['question']
output = example['answer']
text = f"<start_of_turn>user\n{instruction}<end_of_turn> <start_of_turn>model\n{output}<end_of_turn>"
return text
@user1357925
Yeah, sure. Could you send me a email?
[email protected] or call me on linkedIn and i will share the notebook with you.
@Wielebnyd sure, could you send me a mail to share with you the full notebook ?
@rhaymison hello sir I'm intersting in your work can you give some informations about the prompt you used and lora rank you used please
I want to fine tune gemma 2b on 40k row english and darija(moroccan language)