|
--- |
|
license: apache-2.0 |
|
base_model: google/flan-t5-small |
|
tags: |
|
- generated_from_trainer |
|
metrics: |
|
- rouge |
|
model-index: |
|
- name: t5-summarization-one-shot-better-prompt |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# t5-summarization-one-shot-better-prompt |
|
|
|
This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on an unknown dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 2.2414 |
|
- Rouge: {'rouge1': 38.3588, 'rouge2': 17.983, 'rougeL': 20.1917, 'rougeLsum': 20.1917} |
|
- Bert Score: 0.8806 |
|
- Bleurt 20: -0.7794 |
|
- Gen Len: 13.44 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.0001 |
|
- train_batch_size: 7 |
|
- eval_batch_size: 7 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- num_epochs: 20 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Rouge | Bert Score | Bleurt 20 | Gen Len | |
|
|:-------------:|:-----:|:----:|:---------------:|:-------------------------------------------------------------------------------:|:----------:|:---------:|:-------:| |
|
| 2.7663 | 1.0 | 186 | 2.4069 | {'rouge1': 43.4548, 'rouge2': 17.3297, 'rougeL': 18.9728, 'rougeLsum': 18.9728} | 0.874 | -0.8387 | 14.275 | |
|
| 2.4668 | 2.0 | 372 | 2.3255 | {'rouge1': 42.9892, 'rouge2': 18.518, 'rougeL': 19.7631, 'rougeLsum': 19.7631} | 0.8763 | -0.8091 | 13.965 | |
|
| 2.2692 | 3.0 | 558 | 2.2633 | {'rouge1': 36.8257, 'rouge2': 16.1751, 'rougeL': 17.9916, 'rougeLsum': 17.9916} | 0.8744 | -0.8312 | 12.955 | |
|
| 2.2018 | 4.0 | 744 | 2.2481 | {'rouge1': 40.4112, 'rouge2': 18.1938, 'rougeL': 20.0606, 'rougeLsum': 20.0606} | 0.877 | -0.7846 | 14.04 | |
|
| 2.1736 | 5.0 | 930 | 2.2243 | {'rouge1': 39.2656, 'rouge2': 18.4718, 'rougeL': 19.5926, 'rougeLsum': 19.5926} | 0.8786 | -0.7865 | 13.31 | |
|
| 2.0189 | 6.0 | 1116 | 2.2220 | {'rouge1': 38.1992, 'rouge2': 18.0936, 'rougeL': 18.6278, 'rougeLsum': 18.6278} | 0.877 | -0.8295 | 13.3 | |
|
| 1.9425 | 7.0 | 1302 | 2.2103 | {'rouge1': 38.9165, 'rouge2': 18.0013, 'rougeL': 19.2571, 'rougeLsum': 19.2571} | 0.8779 | -0.7923 | 13.445 | |
|
| 1.9192 | 8.0 | 1488 | 2.2060 | {'rouge1': 37.6615, 'rouge2': 18.1423, 'rougeL': 19.3882, 'rougeLsum': 19.3882} | 0.8773 | -0.814 | 13.135 | |
|
| 1.8502 | 9.0 | 1674 | 2.1948 | {'rouge1': 37.595, 'rouge2': 17.5944, 'rougeL': 19.4897, 'rougeLsum': 19.4897} | 0.8809 | -0.7914 | 13.15 | |
|
| 1.8201 | 10.0 | 1860 | 2.1995 | {'rouge1': 38.7935, 'rouge2': 19.2667, 'rougeL': 20.5059, 'rougeLsum': 20.5059} | 0.8809 | -0.7765 | 13.36 | |
|
| 1.7472 | 11.0 | 2046 | 2.2036 | {'rouge1': 37.4728, 'rouge2': 17.5974, 'rougeL': 19.5534, 'rougeLsum': 19.5534} | 0.8797 | -0.7943 | 13.245 | |
|
| 1.772 | 12.0 | 2232 | 2.2050 | {'rouge1': 37.6136, 'rouge2': 17.442, 'rougeL': 20.122, 'rougeLsum': 20.122} | 0.881 | -0.7765 | 13.35 | |
|
| 1.7273 | 13.0 | 2418 | 2.2153 | {'rouge1': 37.2238, 'rouge2': 16.6237, 'rougeL': 19.4117, 'rougeLsum': 19.4117} | 0.8789 | -0.7929 | 13.325 | |
|
| 1.6854 | 14.0 | 2604 | 2.2243 | {'rouge1': 38.1249, 'rouge2': 18.0241, 'rougeL': 20.485, 'rougeLsum': 20.485} | 0.8822 | -0.778 | 13.315 | |
|
| 1.6598 | 15.0 | 2790 | 2.2299 | {'rouge1': 37.3743, 'rouge2': 17.3192, 'rougeL': 19.9239, 'rougeLsum': 19.9239} | 0.8795 | -0.7805 | 13.275 | |
|
| 1.63 | 16.0 | 2976 | 2.2286 | {'rouge1': 38.6731, 'rouge2': 18.2088, 'rougeL': 20.2535, 'rougeLsum': 20.2535} | 0.8801 | -0.7882 | 13.415 | |
|
| 1.6654 | 17.0 | 3162 | 2.2355 | {'rouge1': 38.0295, 'rouge2': 17.6256, 'rougeL': 19.9215, 'rougeLsum': 19.9215} | 0.8799 | -0.7894 | 13.34 | |
|
| 1.6443 | 18.0 | 3348 | 2.2404 | {'rouge1': 38.3122, 'rouge2': 17.5836, 'rougeL': 19.8706, 'rougeLsum': 19.8706} | 0.8801 | -0.7799 | 13.45 | |
|
| 1.6083 | 19.0 | 3534 | 2.2399 | {'rouge1': 38.1749, 'rouge2': 17.4993, 'rougeL': 20.0054, 'rougeLsum': 20.0054} | 0.8801 | -0.7772 | 13.435 | |
|
| 1.5953 | 20.0 | 3720 | 2.2414 | {'rouge1': 38.3588, 'rouge2': 17.983, 'rougeL': 20.1917, 'rougeLsum': 20.1917} | 0.8806 | -0.7794 | 13.44 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.35.2 |
|
- Pytorch 2.1.0+cu121 |
|
- Datasets 2.16.1 |
|
- Tokenizers 0.15.0 |
|
|