End of training

811c895 verified 10 months ago

5.05 kB

	---
	license: apache-2.0
	base_model: google/flan-t5-small
	tags:
	- generated_from_trainer
	metrics:
	- rouge
	model-index:
	- name: t5-summarization-one-shot-better-prompt
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# t5-summarization-one-shot-better-prompt

	This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.2431
	- Rouge: {'rouge1': 39.1164, 'rouge2': 19.0784, 'rougeL': 20.2856, 'rougeLsum': 20.2856}
	- Bert Score: 0.8802
	- Bleurt 20: -0.7688
	- Gen Len: 13.545

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 7
	- eval_batch_size: 7
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 20

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge \| Bert Score \| Bleurt 20 \| Gen Len \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:-------------------------------------------------------------------------------:\|:----------:\|:---------:\|:-------:\|
	\| 2.6921 \| 1.0 \| 172 \| 2.4379 \| {'rouge1': 43.7984, 'rouge2': 17.2952, 'rougeL': 18.5604, 'rougeLsum': 18.5604} \| 0.869 \| -0.8739 \| 14.84 \|
	\| 2.5119 \| 2.0 \| 344 \| 2.3282 \| {'rouge1': 41.5219, 'rouge2': 17.4612, 'rougeL': 19.5103, 'rougeLsum': 19.5103} \| 0.8749 \| -0.8329 \| 13.7 \|
	\| 2.3033 \| 3.0 \| 516 \| 2.2821 \| {'rouge1': 41.0636, 'rouge2': 18.2347, 'rougeL': 19.8704, 'rougeLsum': 19.8704} \| 0.878 \| -0.8268 \| 13.75 \|
	\| 2.2139 \| 4.0 \| 688 \| 2.2404 \| {'rouge1': 39.9679, 'rouge2': 18.8795, 'rougeL': 19.7032, 'rougeLsum': 19.7032} \| 0.8796 \| -0.8035 \| 13.305 \|
	\| 2.0835 \| 5.0 \| 860 \| 2.2446 \| {'rouge1': 41.8958, 'rouge2': 18.439, 'rougeL': 19.2982, 'rougeLsum': 19.2982} \| 0.877 \| -0.7963 \| 14.34 \|
	\| 2.0379 \| 6.0 \| 1032 \| 2.2233 \| {'rouge1': 40.9703, 'rouge2': 19.7574, 'rougeL': 19.9387, 'rougeLsum': 19.9387} \| 0.8793 \| -0.7805 \| 13.625 \|
	\| 1.959 \| 7.0 \| 1204 \| 2.2073 \| {'rouge1': 39.2194, 'rouge2': 18.9553, 'rougeL': 19.7847, 'rougeLsum': 19.7847} \| 0.8787 \| -0.8045 \| 13.365 \|
	\| 1.9177 \| 8.0 \| 1376 \| 2.2146 \| {'rouge1': 40.8391, 'rouge2': 19.5219, 'rougeL': 20.2602, 'rougeLsum': 20.2602} \| 0.8781 \| -0.7974 \| 13.865 \|
	\| 1.8749 \| 9.0 \| 1548 \| 2.2071 \| {'rouge1': 40.9497, 'rouge2': 19.9867, 'rougeL': 20.5682, 'rougeLsum': 20.5682} \| 0.8808 \| -0.7812 \| 13.68 \|
	\| 1.8112 \| 10.0 \| 1720 \| 2.2045 \| {'rouge1': 36.465, 'rouge2': 16.4287, 'rougeL': 19.1978, 'rougeLsum': 19.1978} \| 0.8772 \| -0.8384 \| 13.295 \|
	\| 1.7475 \| 11.0 \| 1892 \| 2.2210 \| {'rouge1': 39.4889, 'rouge2': 19.1309, 'rougeL': 19.879, 'rougeLsum': 19.879} \| 0.8785 \| -0.8074 \| 13.585 \|
	\| 1.7384 \| 12.0 \| 2064 \| 2.2269 \| {'rouge1': 38.2904, 'rouge2': 18.2873, 'rougeL': 19.4418, 'rougeLsum': 19.4418} \| 0.8789 \| -0.7984 \| 13.42 \|
	\| 1.6849 \| 13.0 \| 2236 \| 2.2261 \| {'rouge1': 37.6283, 'rouge2': 17.6979, 'rougeL': 19.584, 'rougeLsum': 19.584} \| 0.878 \| -0.7885 \| 13.445 \|
	\| 1.6531 \| 14.0 \| 2408 \| 2.2186 \| {'rouge1': 38.7975, 'rouge2': 19.0939, 'rougeL': 20.7873, 'rougeLsum': 20.7873} \| 0.8806 \| -0.783 \| 13.445 \|
	\| 1.663 \| 15.0 \| 2580 \| 2.2245 \| {'rouge1': 38.9159, 'rouge2': 19.153, 'rougeL': 20.5232, 'rougeLsum': 20.5232} \| 0.8811 \| -0.7514 \| 13.59 \|
	\| 1.6036 \| 16.0 \| 2752 \| 2.2430 \| {'rouge1': 37.6184, 'rouge2': 17.6773, 'rougeL': 19.2693, 'rougeLsum': 19.2693} \| 0.8771 \| -0.7992 \| 13.6 \|
	\| 1.6333 \| 17.0 \| 2924 \| 2.2418 \| {'rouge1': 38.1301, 'rouge2': 18.4061, 'rougeL': 20.1355, 'rougeLsum': 20.1355} \| 0.879 \| -0.7845 \| 13.49 \|
	\| 1.6322 \| 18.0 \| 3096 \| 2.2421 \| {'rouge1': 38.0746, 'rouge2': 18.2039, 'rougeL': 19.7404, 'rougeLsum': 19.7404} \| 0.8789 \| -0.7892 \| 13.41 \|
	\| 1.5982 \| 19.0 \| 3268 \| 2.2411 \| {'rouge1': 39.1375, 'rouge2': 19.1696, 'rougeL': 20.2695, 'rougeLsum': 20.2695} \| 0.8802 \| -0.7713 \| 13.465 \|
	\| 1.593 \| 20.0 \| 3440 \| 2.2431 \| {'rouge1': 39.1164, 'rouge2': 19.0784, 'rougeL': 20.2856, 'rougeLsum': 20.2856} \| 0.8802 \| -0.7688 \| 13.545 \|


	### Framework versions

	- Transformers 4.35.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.16.1
	- Tokenizers 0.15.0