---
license: apache-2.0
base_model: google/flan-t5-small
tags:
- generated_from_trainer
metrics:
- rouge
model-index:
- name: t5-summarization-zero-shot-headers-and-better-prompt
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# t5-summarization-zero-shot-headers-and-better-prompt

This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 2.2226
- Rouge: {'rouge1': 0.4351, 'rouge2': 0.2124, 'rougeL': 0.215, 'rougeLsum': 0.215}
- Bert Score: 0.8806
- Bleurt 20: -0.7502
- Gen Len: 14.645

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 7
- eval_batch_size: 7
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 20

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rouge                                                                       | Bert Score | Bleurt 20 | Gen Len |
|:-------------:|:-----:|:----:|:---------------:|:---------------------------------------------------------------------------:|:----------:|:---------:|:-------:|
| 3.0683        | 1.0   | 186  | 2.5857          | {'rouge1': 0.4573, 'rouge2': 0.1803, 'rougeL': 0.1858, 'rougeLsum': 0.1858} | 0.8683     | -0.8521   | 15.445  |
| 2.7283        | 2.0   | 372  | 2.4092          | {'rouge1': 0.446, 'rouge2': 0.1853, 'rougeL': 0.1969, 'rougeLsum': 0.1969}  | 0.8709     | -0.828    | 15.115  |
| 2.4766        | 3.0   | 558  | 2.3190          | {'rouge1': 0.4183, 'rouge2': 0.1834, 'rougeL': 0.1947, 'rougeLsum': 0.1947} | 0.869      | -0.8673   | 14.425  |
| 2.351         | 4.0   | 744  | 2.2736          | {'rouge1': 0.4264, 'rouge2': 0.1843, 'rougeL': 0.1919, 'rougeLsum': 0.1919} | 0.8693     | -0.8411   | 15.205  |
| 2.287         | 5.0   | 930  | 2.2440          | {'rouge1': 0.42, 'rouge2': 0.1924, 'rougeL': 0.1991, 'rougeLsum': 0.1991}   | 0.875      | -0.8358   | 14.305  |
| 2.1426        | 6.0   | 1116 | 2.2100          | {'rouge1': 0.4196, 'rouge2': 0.1903, 'rougeL': 0.2027, 'rougeLsum': 0.2027} | 0.8779     | -0.8189   | 14.38   |
| 2.0381        | 7.0   | 1302 | 2.2171          | {'rouge1': 0.459, 'rouge2': 0.2143, 'rougeL': 0.2142, 'rougeLsum': 0.2142}  | 0.8772     | -0.7757   | 14.825  |
| 1.9927        | 8.0   | 1488 | 2.2106          | {'rouge1': 0.44, 'rouge2': 0.2073, 'rougeL': 0.2132, 'rougeLsum': 0.2132}   | 0.8795     | -0.7798   | 14.53   |
| 1.9347        | 9.0   | 1674 | 2.1976          | {'rouge1': 0.4289, 'rouge2': 0.2062, 'rougeL': 0.2122, 'rougeLsum': 0.2122} | 0.88       | -0.7774   | 14.14   |
| 1.8733        | 10.0  | 1860 | 2.1987          | {'rouge1': 0.4472, 'rouge2': 0.215, 'rougeL': 0.2124, 'rougeLsum': 0.2124}  | 0.8791     | -0.7688   | 14.49   |
| 1.7883        | 11.0  | 2046 | 2.1963          | {'rouge1': 0.4375, 'rouge2': 0.2114, 'rougeL': 0.2064, 'rougeLsum': 0.2064} | 0.8786     | -0.785    | 14.66   |
| 1.8253        | 12.0  | 2232 | 2.2055          | {'rouge1': 0.4351, 'rouge2': 0.2073, 'rougeL': 0.2106, 'rougeLsum': 0.2106} | 0.8803     | -0.7759   | 14.59   |
| 1.7751        | 13.0  | 2418 | 2.2029          | {'rouge1': 0.4371, 'rouge2': 0.2125, 'rougeL': 0.2119, 'rougeLsum': 0.2119} | 0.8796     | -0.7711   | 14.7    |
| 1.7087        | 14.0  | 2604 | 2.2073          | {'rouge1': 0.448, 'rouge2': 0.2211, 'rougeL': 0.2176, 'rougeLsum': 0.2176}  | 0.8806     | -0.7492   | 14.695  |
| 1.7034        | 15.0  | 2790 | 2.2150          | {'rouge1': 0.4381, 'rouge2': 0.214, 'rougeL': 0.2158, 'rougeLsum': 0.2158}  | 0.8809     | -0.7611   | 14.555  |
| 1.6671        | 16.0  | 2976 | 2.2211          | {'rouge1': 0.4388, 'rouge2': 0.2162, 'rougeL': 0.2169, 'rougeLsum': 0.2169} | 0.8797     | -0.7532   | 14.73   |
| 1.6964        | 17.0  | 3162 | 2.2207          | {'rouge1': 0.4316, 'rouge2': 0.2117, 'rougeL': 0.2137, 'rougeLsum': 0.2137} | 0.8799     | -0.7729   | 14.54   |
| 1.6556        | 18.0  | 3348 | 2.2183          | {'rouge1': 0.4379, 'rouge2': 0.2122, 'rougeL': 0.2163, 'rougeLsum': 0.2163} | 0.8804     | -0.7475   | 14.735  |
| 1.6391        | 19.0  | 3534 | 2.2200          | {'rouge1': 0.4332, 'rouge2': 0.2105, 'rougeL': 0.2149, 'rougeLsum': 0.2149} | 0.8805     | -0.7521   | 14.635  |
| 1.6309        | 20.0  | 3720 | 2.2226          | {'rouge1': 0.4351, 'rouge2': 0.2124, 'rougeL': 0.215, 'rougeLsum': 0.215}   | 0.8806     | -0.7502   | 14.645  |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0