Continued pre-training on mistralai/Mistral-Nemo-Instruct-2407
using the Kurdish wiki dataset with unsloth
.
This model should be further fine-tuned since the pre-training was to improve Kurdish language understanding.
It's a quantized model using bitsandbytes
so that it uses less memory. See bitsandbytes documentation.
There isn't a standard or even a good Kurdish metric to evaluate the model (that I could find). Will make it my next project to create an evaluation so that there's a reproducible baseline for Kurdish.
Will look into a multi-GPU training setup so don't have to wait all day for results. Would like to train it with both Kurmanji and Sorani.
Use
Should be fine-tuned further for a specific task. See instruction fine-tuned model nazimali/Mistral-Nemo-Kurdish-Instruct.
Training
Transformers 4.44.2
1 NVIDIA A100 80GB PCIe
Duration 6h 31m 4s
{
"total_flos": 4121524790259794000,
"train/epoch": 1,
"train/global_step": 1960,
"train/grad_norm": 3.1958093643188477,
"train/learning_rate": 0,
"train/loss": 1.2108,
"train_loss": 1.256846008738693,
"train_runtime": 23227.1752,
"train_samples_per_second": 2.7,
"train_steps_per_second": 0.084
}
Pre-training data:
nazimali/kurdish-wikipedia-articles
- Dataset number of rows: 63,076
- Filtered columns
title, text
- Must have at least 1 character
- Number of rows used for training: 62,720
Training prompt format:
training_prompt = """Gotara Wikipedia
### Sernav: {}
### Gotar:
{}"""
- Downloads last month
- 29
Model tree for nazimali/Mistral-Nemo-Kurdish
Base model
mistralai/Mistral-Nemo-Base-2407