tFINE-base-300m-samsum
An example fine-tune of pszemraj/tFINE-base-300m for summarization using the samsum dataset. It achieves the following results on the evaluation set:
- Loss: 1.9820
- Rouge1: 42.3629
- Rouge2: 18.4285
- Rougel: 34.6339
- Rougelsum: 38.7792
- Gen Len: 27.8033
The base model was pre-trained with CTX 1024 and fine-tuned on samsum with 1024 CTX inputs.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 16
- seed: 17868
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 4.0
Training results
keep epoch 3 checkpt as final
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
1.9528 | 0.9989 | 115 | 1.9189 | 40.093 | 18.2018 | 33.9749 | 36.9071 | 29.3333 |
1.5346 | 1.9978 | 230 | 1.8827 | 41.4676 | 18.3467 | 34.1909 | 38.2131 | 27.6633 |
1.1696 | 2.9967 | 345 | 1.9820 | 42.3629 | 18.4285 | 34.6339 | 38.7792 | 27.8033 |
0.9359 | 3.9957 | 460 | 2.1588 | 41.2237 | 17.8161 | 33.7101 | 37.9569 | 30.18 |
- Downloads last month
- 7
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for pszemraj/tFINE-base-300m-samsum
Base model
pszemraj/tFINE-base-300m