iocuydi/llama-2-amharic-3784m · How to download and use the model.

Jan 26

Hello, does anyone have a snippet of python code on how to download and use the model? OR anything that shows you the procedures to use the model.

iocuydi

Owner Jan 26

Hello, you can download the model with git LFS and then run it using the inference script in the github repo.

Accept Llama2 license and download Llama2 weights
Download the amharic finetune from this repository as shown here https://huggingface.co/docs/hub/models-downloading
Clone the github repo and put your path to llama2 and the peft model into the inference script here: https://github.com/iocuydi/amharic-llama-llava/blob/main/inference/run_inf.py

abdimussa87

Jan 26

What is the peft model?

abdimussa87

Jan 26

This line doesn't seem to import inside the run_inf.py file:

from model_utils import load_model, load_peft_model

I can't find the model_utils file anywhere in the github repo

iocuydi

Owner Jan 26

Added that file to the github repo.

Peft stands for "Parameter Efficient Fine Tuning." It allows large models to be finetuned more easily, more about it here: https://huggingface.co/blog/peft
With this and most llama finetunes, you'll load the original llama weights, and then a smaller set of Peft weights from the finetune.

abdimussa87

Jan 26

•

edited Jan 26

Thank you for doing that. So I did the following as you described:

Downloaded the llama-2-7b model using the download.sh script
Downloaded this amharic model using git lfs from hugging face
Cloned the github repository and put the path to the llama model in the run_inf.py file

Questions:

Where do I use the amharic model I downloaded from here (step 2 above)
What is the below path exactly
peft_model = '/path/to/checkpoint'
How do I change the Llama-2 tokenizer with the Llama-2-Amharic tokenizer.

Thank you.

iocuydi

Owner Jan 27

Forgot to mention you need to convert llama2 to huggingface format as with this: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py

The "main_path" param should point at the directory with the llama weights after they are converted to huggingface format.
The peft model path is the path to the finetuned checkpoint. Without it loading a checkpoint, you're just using the original Llama2. This path should point to a directory containing the files downloaded from this hf repository (the fine tuned weights).
Replace the tokenizer files that come with Llama2 with the tokenizer files from this repository.

abdimussa87

Jan 27

•

edited Jan 27

Thank you!! Regarding the tokenizer files, would replacing only the tokenizer.model file work? I tried that and it does respond with Amharic. Though not sure if replacing the remaining files improve its output.

iocuydi

Owner Jan 27

You should replace all the applicable tokenizer files with ours. A couple other tips for prompting:
-Try different system prompts (the initial instruction about being an Amharic assistant) but keep the system prompt in English
-Experiment with different hyper parameters depending on the task, higher top k/temperature can give more varied and creative answers but also more chance of hallucinations and wrong answers.

abdimussa87

Jan 28

Thanks for the tips.
I was thinking of continuing the pre-training with more amharic data. Unfortunately, I wasn't really able to find good resources on how to do that. Can you please recommend some helpful resources to achieve that?

iocuydi

Owner Jan 30

The scripts in the github repo can be used for pretraining and finetuning. Unless you have a massive amount of Amharic data (billions of tokens), doing additional pretraining likely will not help much, and finetuning would be a more effective strategy. You can also check out the Chinese Llama Alpaca paper/repo for more details, much of this work was based on that.

abdimussa87

Jan 30

Alright, thanks a lot for your support!!

abdimussa87

Feb 2

One more thing. So I tried to finetune the model on top of loading the gari model using peft. Then, when I try to run inference by loading both the gari peft and my finetuned peft one after another and try to ask a question, it no longer gives an answer it previously replied correctly. Like if I ask "what medicine should I take if I have a flu" it answers well on the gari peft, but outputs giberrish on the one that loads both the gari peft and the newer finetuned peft.

MAIN_PATH = '/model/Llama-2-7b-hf'
peft_model = '/model/llama-2-amharic-3784m'
#newer finetuned version on top of the garri model
peft_model2 = '/home/user/model/output'

model = load_model(model_name, quantization)
model = load_peft_model(model, peft_model)
model = load_peft_model(model, peft_model2)

Is the way I'm loading both peft models correct?

iocuydi

Owner Feb 7

Only load one peft model. If you load another you're replacing the weights of the first one, they aren't meant to be mixed. In general you will load a single base llama model, and optionally a single peft model.

For your case, it sounds like you should follow these steps:

load Llama2 with my peft model, then finetune
After training, load Llama2 with your peft model, perform inference, additional finetuning, etc.

If your model isn't performing as expected, there may be an issue with your dataset or training process. One way to debug is to first try a very simple dataset of a couple thousand identical items (all the same training example) and see if you can get the model to overfit and get 0 loss on this and inference properly, before moving on to the actual dataset.