veronica-girolimetti's picture
End of training
6ede80a verified
|
raw
history blame
5.05 kB
metadata
license: apache-2.0
base_model: google/flan-t5-small
tags:
  - generated_from_trainer
metrics:
  - rouge
model-index:
  - name: t5-summarization-one-shot-better-prompt
    results: []

t5-summarization-one-shot-better-prompt

This model is a fine-tuned version of google/flan-t5-small on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2414
  • Rouge: {'rouge1': 38.3588, 'rouge2': 17.983, 'rougeL': 20.1917, 'rougeLsum': 20.1917}
  • Bert Score: 0.8806
  • Bleurt 20: -0.7794
  • Gen Len: 13.44

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 7
  • eval_batch_size: 7
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Rouge Bert Score Bleurt 20 Gen Len
2.7663 1.0 186 2.4069 {'rouge1': 43.4548, 'rouge2': 17.3297, 'rougeL': 18.9728, 'rougeLsum': 18.9728} 0.874 -0.8387 14.275
2.4668 2.0 372 2.3255 {'rouge1': 42.9892, 'rouge2': 18.518, 'rougeL': 19.7631, 'rougeLsum': 19.7631} 0.8763 -0.8091 13.965
2.2692 3.0 558 2.2633 {'rouge1': 36.8257, 'rouge2': 16.1751, 'rougeL': 17.9916, 'rougeLsum': 17.9916} 0.8744 -0.8312 12.955
2.2018 4.0 744 2.2481 {'rouge1': 40.4112, 'rouge2': 18.1938, 'rougeL': 20.0606, 'rougeLsum': 20.0606} 0.877 -0.7846 14.04
2.1736 5.0 930 2.2243 {'rouge1': 39.2656, 'rouge2': 18.4718, 'rougeL': 19.5926, 'rougeLsum': 19.5926} 0.8786 -0.7865 13.31
2.0189 6.0 1116 2.2220 {'rouge1': 38.1992, 'rouge2': 18.0936, 'rougeL': 18.6278, 'rougeLsum': 18.6278} 0.877 -0.8295 13.3
1.9425 7.0 1302 2.2103 {'rouge1': 38.9165, 'rouge2': 18.0013, 'rougeL': 19.2571, 'rougeLsum': 19.2571} 0.8779 -0.7923 13.445
1.9192 8.0 1488 2.2060 {'rouge1': 37.6615, 'rouge2': 18.1423, 'rougeL': 19.3882, 'rougeLsum': 19.3882} 0.8773 -0.814 13.135
1.8502 9.0 1674 2.1948 {'rouge1': 37.595, 'rouge2': 17.5944, 'rougeL': 19.4897, 'rougeLsum': 19.4897} 0.8809 -0.7914 13.15
1.8201 10.0 1860 2.1995 {'rouge1': 38.7935, 'rouge2': 19.2667, 'rougeL': 20.5059, 'rougeLsum': 20.5059} 0.8809 -0.7765 13.36
1.7472 11.0 2046 2.2036 {'rouge1': 37.4728, 'rouge2': 17.5974, 'rougeL': 19.5534, 'rougeLsum': 19.5534} 0.8797 -0.7943 13.245
1.772 12.0 2232 2.2050 {'rouge1': 37.6136, 'rouge2': 17.442, 'rougeL': 20.122, 'rougeLsum': 20.122} 0.881 -0.7765 13.35
1.7273 13.0 2418 2.2153 {'rouge1': 37.2238, 'rouge2': 16.6237, 'rougeL': 19.4117, 'rougeLsum': 19.4117} 0.8789 -0.7929 13.325
1.6854 14.0 2604 2.2243 {'rouge1': 38.1249, 'rouge2': 18.0241, 'rougeL': 20.485, 'rougeLsum': 20.485} 0.8822 -0.778 13.315
1.6598 15.0 2790 2.2299 {'rouge1': 37.3743, 'rouge2': 17.3192, 'rougeL': 19.9239, 'rougeLsum': 19.9239} 0.8795 -0.7805 13.275
1.63 16.0 2976 2.2286 {'rouge1': 38.6731, 'rouge2': 18.2088, 'rougeL': 20.2535, 'rougeLsum': 20.2535} 0.8801 -0.7882 13.415
1.6654 17.0 3162 2.2355 {'rouge1': 38.0295, 'rouge2': 17.6256, 'rougeL': 19.9215, 'rougeLsum': 19.9215} 0.8799 -0.7894 13.34
1.6443 18.0 3348 2.2404 {'rouge1': 38.3122, 'rouge2': 17.5836, 'rougeL': 19.8706, 'rougeLsum': 19.8706} 0.8801 -0.7799 13.45
1.6083 19.0 3534 2.2399 {'rouge1': 38.1749, 'rouge2': 17.4993, 'rougeL': 20.0054, 'rougeLsum': 20.0054} 0.8801 -0.7772 13.435
1.5953 20.0 3720 2.2414 {'rouge1': 38.3588, 'rouge2': 17.983, 'rougeL': 20.1917, 'rougeLsum': 20.1917} 0.8806 -0.7794 13.44

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0