sft

This model is a fine-tuned version of NousResearch/Meta-Llama-3-8B-Instruct on the identity and the eightwords-20241120-alapaca datasets. It achieves the following results on the evaluation set:

Loss: 2.9089

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 64.0

Training results

Training Loss	Epoch	Step	Validation Loss
1.4604	5.6417	1000	1.5410
1.0287	11.2835	2000	1.3768
0.7163	16.9252	3000	1.4101
0.4191	22.5670	4000	1.6336
0.1893	28.2087	5000	1.9742
0.0809	33.8505	6000	2.2380
0.0312	39.4922	7000	2.4977
0.0116	45.1340	8000	2.7681
0.0073	50.7757	9000	2.8551
0.0067	56.4175	10000	2.9038
0.0063	62.0592	11000	2.9077

Framework versions

Transformers 4.46.1
Pytorch 2.5.1+cu121
Datasets 3.1.0
Tokenizers 0.20.3

clinno
/

eightwords-241120

sft

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for clinno/eightwords-241120

Evaluation results