tFINE-base-300m-samsum

An example fine-tune of pszemraj/tFINE-base-300m for summarization using the samsum dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9820
  • Rouge1: 42.3629
  • Rouge2: 18.4285
  • Rougel: 34.6339
  • Rougelsum: 38.7792
  • Gen Len: 27.8033

The base model was pre-trained with CTX 1024 and fine-tuned on samsum with 1024 CTX inputs.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 17868
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 4.0

Training results

keep epoch 3 checkpt as final

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
1.9528 0.9989 115 1.9189 40.093 18.2018 33.9749 36.9071 29.3333
1.5346 1.9978 230 1.8827 41.4676 18.3467 34.1909 38.2131 27.6633
1.1696 2.9967 345 1.9820 42.3629 18.4285 34.6339 38.7792 27.8033
0.9359 3.9957 460 2.1588 41.2237 17.8161 33.7101 37.9569 30.18
Downloads last month
7
Safetensors
Model size
301M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pszemraj/tFINE-base-300m-samsum

Finetuned
(1)
this model

Dataset used to train pszemraj/tFINE-base-300m-samsum

Evaluation results