File size: 5,046 Bytes
43f5cbd 6ede80a 43f5cbd 6ede80a 43f5cbd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
---
license: apache-2.0
base_model: google/flan-t5-small
tags:
- generated_from_trainer
metrics:
- rouge
model-index:
- name: t5-summarization-one-shot-better-prompt
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# t5-summarization-one-shot-better-prompt
This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 2.2414
- Rouge: {'rouge1': 38.3588, 'rouge2': 17.983, 'rougeL': 20.1917, 'rougeLsum': 20.1917}
- Bert Score: 0.8806
- Bleurt 20: -0.7794
- Gen Len: 13.44
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 7
- eval_batch_size: 7
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 20
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rouge | Bert Score | Bleurt 20 | Gen Len |
|:-------------:|:-----:|:----:|:---------------:|:-------------------------------------------------------------------------------:|:----------:|:---------:|:-------:|
| 2.7663 | 1.0 | 186 | 2.4069 | {'rouge1': 43.4548, 'rouge2': 17.3297, 'rougeL': 18.9728, 'rougeLsum': 18.9728} | 0.874 | -0.8387 | 14.275 |
| 2.4668 | 2.0 | 372 | 2.3255 | {'rouge1': 42.9892, 'rouge2': 18.518, 'rougeL': 19.7631, 'rougeLsum': 19.7631} | 0.8763 | -0.8091 | 13.965 |
| 2.2692 | 3.0 | 558 | 2.2633 | {'rouge1': 36.8257, 'rouge2': 16.1751, 'rougeL': 17.9916, 'rougeLsum': 17.9916} | 0.8744 | -0.8312 | 12.955 |
| 2.2018 | 4.0 | 744 | 2.2481 | {'rouge1': 40.4112, 'rouge2': 18.1938, 'rougeL': 20.0606, 'rougeLsum': 20.0606} | 0.877 | -0.7846 | 14.04 |
| 2.1736 | 5.0 | 930 | 2.2243 | {'rouge1': 39.2656, 'rouge2': 18.4718, 'rougeL': 19.5926, 'rougeLsum': 19.5926} | 0.8786 | -0.7865 | 13.31 |
| 2.0189 | 6.0 | 1116 | 2.2220 | {'rouge1': 38.1992, 'rouge2': 18.0936, 'rougeL': 18.6278, 'rougeLsum': 18.6278} | 0.877 | -0.8295 | 13.3 |
| 1.9425 | 7.0 | 1302 | 2.2103 | {'rouge1': 38.9165, 'rouge2': 18.0013, 'rougeL': 19.2571, 'rougeLsum': 19.2571} | 0.8779 | -0.7923 | 13.445 |
| 1.9192 | 8.0 | 1488 | 2.2060 | {'rouge1': 37.6615, 'rouge2': 18.1423, 'rougeL': 19.3882, 'rougeLsum': 19.3882} | 0.8773 | -0.814 | 13.135 |
| 1.8502 | 9.0 | 1674 | 2.1948 | {'rouge1': 37.595, 'rouge2': 17.5944, 'rougeL': 19.4897, 'rougeLsum': 19.4897} | 0.8809 | -0.7914 | 13.15 |
| 1.8201 | 10.0 | 1860 | 2.1995 | {'rouge1': 38.7935, 'rouge2': 19.2667, 'rougeL': 20.5059, 'rougeLsum': 20.5059} | 0.8809 | -0.7765 | 13.36 |
| 1.7472 | 11.0 | 2046 | 2.2036 | {'rouge1': 37.4728, 'rouge2': 17.5974, 'rougeL': 19.5534, 'rougeLsum': 19.5534} | 0.8797 | -0.7943 | 13.245 |
| 1.772 | 12.0 | 2232 | 2.2050 | {'rouge1': 37.6136, 'rouge2': 17.442, 'rougeL': 20.122, 'rougeLsum': 20.122} | 0.881 | -0.7765 | 13.35 |
| 1.7273 | 13.0 | 2418 | 2.2153 | {'rouge1': 37.2238, 'rouge2': 16.6237, 'rougeL': 19.4117, 'rougeLsum': 19.4117} | 0.8789 | -0.7929 | 13.325 |
| 1.6854 | 14.0 | 2604 | 2.2243 | {'rouge1': 38.1249, 'rouge2': 18.0241, 'rougeL': 20.485, 'rougeLsum': 20.485} | 0.8822 | -0.778 | 13.315 |
| 1.6598 | 15.0 | 2790 | 2.2299 | {'rouge1': 37.3743, 'rouge2': 17.3192, 'rougeL': 19.9239, 'rougeLsum': 19.9239} | 0.8795 | -0.7805 | 13.275 |
| 1.63 | 16.0 | 2976 | 2.2286 | {'rouge1': 38.6731, 'rouge2': 18.2088, 'rougeL': 20.2535, 'rougeLsum': 20.2535} | 0.8801 | -0.7882 | 13.415 |
| 1.6654 | 17.0 | 3162 | 2.2355 | {'rouge1': 38.0295, 'rouge2': 17.6256, 'rougeL': 19.9215, 'rougeLsum': 19.9215} | 0.8799 | -0.7894 | 13.34 |
| 1.6443 | 18.0 | 3348 | 2.2404 | {'rouge1': 38.3122, 'rouge2': 17.5836, 'rougeL': 19.8706, 'rougeLsum': 19.8706} | 0.8801 | -0.7799 | 13.45 |
| 1.6083 | 19.0 | 3534 | 2.2399 | {'rouge1': 38.1749, 'rouge2': 17.4993, 'rougeL': 20.0054, 'rougeLsum': 20.0054} | 0.8801 | -0.7772 | 13.435 |
| 1.5953 | 20.0 | 3720 | 2.2414 | {'rouge1': 38.3588, 'rouge2': 17.983, 'rougeL': 20.1917, 'rougeLsum': 20.1917} | 0.8806 | -0.7794 | 13.44 |
### Framework versions
- Transformers 4.35.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0
|