Edit model card

sft

This model is a fine-tuned version of NousResearch/Meta-Llama-3-8B-Instruct on the identity and the eightwords-20241120-alapaca datasets. It achieves the following results on the evaluation set:

  • Loss: 2.9089

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 64.0

Training results

Training Loss Epoch Step Validation Loss
1.4604 5.6417 1000 1.5410
1.0287 11.2835 2000 1.3768
0.7163 16.9252 3000 1.4101
0.4191 22.5670 4000 1.6336
0.1893 28.2087 5000 1.9742
0.0809 33.8505 6000 2.2380
0.0312 39.4922 7000 2.4977
0.0116 45.1340 8000 2.7681
0.0073 50.7757 9000 2.8551
0.0067 56.4175 10000 2.9038
0.0063 62.0592 11000 2.9077

Framework versions

  • Transformers 4.46.1
  • Pytorch 2.5.1+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
4
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for clinno/eightwords-241120

Finetuned
(25)
this model