Edit model card

tanshi-models-224-ep10

This model is a fine-tuned version of YxBxRyXJx/tanshi-models-224 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8686

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 5
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 0.1351 20 1.0816
No log 0.2703 40 1.1103
1.2252 0.4054 60 1.0766
1.2252 0.5405 80 1.1650
1.2016 0.6757 100 1.0430
1.2016 0.8108 120 1.0198
1.2016 0.9459 140 1.0058
1.1913 1.0811 160 1.0173
1.1913 1.2162 180 1.0246
1.1996 1.3514 200 1.0407
1.1996 1.4865 220 1.0185
1.1996 1.6216 240 0.9941
1.1931 1.7568 260 0.9983
1.1931 1.8919 280 1.0098
1.1227 2.0270 300 1.0127
1.1227 2.1622 320 0.9726
1.1227 2.2973 340 0.9944
1.1479 2.4324 360 1.0146
1.1479 2.5676 380 0.9614
1.0578 2.7027 400 0.9794
1.0578 2.8378 420 0.9699
1.0578 2.9730 440 0.9782
1.1325 3.1081 460 0.9551
1.1325 3.2432 480 0.9714
1.0768 3.3784 500 0.9524
1.0768 3.5135 520 0.9540
1.0768 3.6486 540 0.9115
1.067 3.7838 560 0.8934
1.067 3.9189 580 0.9231
1.0786 4.0541 600 0.9242
1.0786 4.1892 620 0.8910
1.0786 4.3243 640 0.8810
1.0959 4.4595 660 0.8875
1.0959 4.5946 680 0.8753
1.0206 4.7297 700 0.8724
1.0206 4.8649 720 0.8699
1.0206 5.0 740 0.8686

Framework versions

  • Transformers 4.46.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.2
  • Tokenizers 0.20.1
Downloads last month
53
Safetensors
Model size
303M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.