File size: 5,046 Bytes
43f5cbd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ede80a
 
 
 
 
43f5cbd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ede80a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43f5cbd
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
license: apache-2.0
base_model: google/flan-t5-small
tags:
- generated_from_trainer
metrics:
- rouge
model-index:
- name: t5-summarization-one-shot-better-prompt
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# t5-summarization-one-shot-better-prompt

This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 2.2414
- Rouge: {'rouge1': 38.3588, 'rouge2': 17.983, 'rougeL': 20.1917, 'rougeLsum': 20.1917}
- Bert Score: 0.8806
- Bleurt 20: -0.7794
- Gen Len: 13.44

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 7
- eval_batch_size: 7
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 20

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rouge                                                                           | Bert Score | Bleurt 20 | Gen Len |
|:-------------:|:-----:|:----:|:---------------:|:-------------------------------------------------------------------------------:|:----------:|:---------:|:-------:|
| 2.7663        | 1.0   | 186  | 2.4069          | {'rouge1': 43.4548, 'rouge2': 17.3297, 'rougeL': 18.9728, 'rougeLsum': 18.9728} | 0.874      | -0.8387   | 14.275  |
| 2.4668        | 2.0   | 372  | 2.3255          | {'rouge1': 42.9892, 'rouge2': 18.518, 'rougeL': 19.7631, 'rougeLsum': 19.7631}  | 0.8763     | -0.8091   | 13.965  |
| 2.2692        | 3.0   | 558  | 2.2633          | {'rouge1': 36.8257, 'rouge2': 16.1751, 'rougeL': 17.9916, 'rougeLsum': 17.9916} | 0.8744     | -0.8312   | 12.955  |
| 2.2018        | 4.0   | 744  | 2.2481          | {'rouge1': 40.4112, 'rouge2': 18.1938, 'rougeL': 20.0606, 'rougeLsum': 20.0606} | 0.877      | -0.7846   | 14.04   |
| 2.1736        | 5.0   | 930  | 2.2243          | {'rouge1': 39.2656, 'rouge2': 18.4718, 'rougeL': 19.5926, 'rougeLsum': 19.5926} | 0.8786     | -0.7865   | 13.31   |
| 2.0189        | 6.0   | 1116 | 2.2220          | {'rouge1': 38.1992, 'rouge2': 18.0936, 'rougeL': 18.6278, 'rougeLsum': 18.6278} | 0.877      | -0.8295   | 13.3    |
| 1.9425        | 7.0   | 1302 | 2.2103          | {'rouge1': 38.9165, 'rouge2': 18.0013, 'rougeL': 19.2571, 'rougeLsum': 19.2571} | 0.8779     | -0.7923   | 13.445  |
| 1.9192        | 8.0   | 1488 | 2.2060          | {'rouge1': 37.6615, 'rouge2': 18.1423, 'rougeL': 19.3882, 'rougeLsum': 19.3882} | 0.8773     | -0.814    | 13.135  |
| 1.8502        | 9.0   | 1674 | 2.1948          | {'rouge1': 37.595, 'rouge2': 17.5944, 'rougeL': 19.4897, 'rougeLsum': 19.4897}  | 0.8809     | -0.7914   | 13.15   |
| 1.8201        | 10.0  | 1860 | 2.1995          | {'rouge1': 38.7935, 'rouge2': 19.2667, 'rougeL': 20.5059, 'rougeLsum': 20.5059} | 0.8809     | -0.7765   | 13.36   |
| 1.7472        | 11.0  | 2046 | 2.2036          | {'rouge1': 37.4728, 'rouge2': 17.5974, 'rougeL': 19.5534, 'rougeLsum': 19.5534} | 0.8797     | -0.7943   | 13.245  |
| 1.772         | 12.0  | 2232 | 2.2050          | {'rouge1': 37.6136, 'rouge2': 17.442, 'rougeL': 20.122, 'rougeLsum': 20.122}    | 0.881      | -0.7765   | 13.35   |
| 1.7273        | 13.0  | 2418 | 2.2153          | {'rouge1': 37.2238, 'rouge2': 16.6237, 'rougeL': 19.4117, 'rougeLsum': 19.4117} | 0.8789     | -0.7929   | 13.325  |
| 1.6854        | 14.0  | 2604 | 2.2243          | {'rouge1': 38.1249, 'rouge2': 18.0241, 'rougeL': 20.485, 'rougeLsum': 20.485}   | 0.8822     | -0.778    | 13.315  |
| 1.6598        | 15.0  | 2790 | 2.2299          | {'rouge1': 37.3743, 'rouge2': 17.3192, 'rougeL': 19.9239, 'rougeLsum': 19.9239} | 0.8795     | -0.7805   | 13.275  |
| 1.63          | 16.0  | 2976 | 2.2286          | {'rouge1': 38.6731, 'rouge2': 18.2088, 'rougeL': 20.2535, 'rougeLsum': 20.2535} | 0.8801     | -0.7882   | 13.415  |
| 1.6654        | 17.0  | 3162 | 2.2355          | {'rouge1': 38.0295, 'rouge2': 17.6256, 'rougeL': 19.9215, 'rougeLsum': 19.9215} | 0.8799     | -0.7894   | 13.34   |
| 1.6443        | 18.0  | 3348 | 2.2404          | {'rouge1': 38.3122, 'rouge2': 17.5836, 'rougeL': 19.8706, 'rougeLsum': 19.8706} | 0.8801     | -0.7799   | 13.45   |
| 1.6083        | 19.0  | 3534 | 2.2399          | {'rouge1': 38.1749, 'rouge2': 17.4993, 'rougeL': 20.0054, 'rougeLsum': 20.0054} | 0.8801     | -0.7772   | 13.435  |
| 1.5953        | 20.0  | 3720 | 2.2414          | {'rouge1': 38.3588, 'rouge2': 17.983, 'rougeL': 20.1917, 'rougeLsum': 20.1917}  | 0.8806     | -0.7794   | 13.44   |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0