MikaSie
/

LongT5_no_extraction_V1

text2text-generation

Inference Endpoints

Model card Files Files and versions Community

LongT5_no_extraction_V1 / README.md

MikaSie's picture

End of training

88e5db7 verified 6 months ago

|

3.6 kB

	---
	license: apache-2.0
	base_model: google/long-t5-tglobal-base
	tags:
	- generated_from_trainer
	datasets:
	- eur-lex-sum
	model-index:
	- name: LongT5_no_extraction_V1
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# LongT5_no_extraction_V1

	This model is a fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the eur-lex-sum dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.3639

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 16
	- total_eval_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 40

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-------:\|:----:\|:---------------:\|
	\| 3.2571 \| 0.9963 \| 68 \| 1.8571 \|
	\| 2.6516 \| 1.9927 \| 136 \| 1.7238 \|
	\| 2.2687 \| 2.9890 \| 204 \| 1.6153 \|
	\| 2.0466 \| 4.0 \| 273 \| 1.5414 \|
	\| 1.9659 \| 4.9963 \| 341 \| 1.4955 \|
	\| 1.8813 \| 5.9927 \| 409 \| 1.4752 \|
	\| 1.8277 \| 6.9890 \| 477 \| 1.4571 \|
	\| 1.7626 \| 8.0 \| 546 \| 1.4437 \|
	\| 1.7528 \| 8.9963 \| 614 \| 1.4315 \|
	\| 1.7249 \| 9.9927 \| 682 \| 1.4229 \|
	\| 1.6981 \| 10.9890 \| 750 \| 1.4126 \|
	\| 1.6559 \| 12.0 \| 819 \| 1.4061 \|
	\| 1.6599 \| 12.9963 \| 887 \| 1.3983 \|
	\| 1.6465 \| 13.9927 \| 955 \| 1.3994 \|
	\| 1.6282 \| 14.9890 \| 1023 \| 1.3923 \|
	\| 1.5906 \| 16.0 \| 1092 \| 1.3873 \|
	\| 1.6035 \| 16.9963 \| 1160 \| 1.3878 \|
	\| 1.5909 \| 17.9927 \| 1228 \| 1.3851 \|
	\| 1.5802 \| 18.9890 \| 1296 \| 1.3799 \|
	\| 1.5481 \| 20.0 \| 1365 \| 1.3860 \|
	\| 1.5607 \| 20.9963 \| 1433 \| 1.3745 \|
	\| 1.5517 \| 21.9927 \| 1501 \| 1.3736 \|
	\| 1.5436 \| 22.9890 \| 1569 \| 1.3735 \|
	\| 1.5126 \| 24.0 \| 1638 \| 1.3728 \|
	\| 1.5289 \| 24.9963 \| 1706 \| 1.3739 \|
	\| 1.5234 \| 25.9927 \| 1774 \| 1.3706 \|
	\| 1.5179 \| 26.9890 \| 1842 \| 1.3671 \|
	\| 1.4908 \| 28.0 \| 1911 \| 1.3680 \|
	\| 1.5057 \| 28.9963 \| 1979 \| 1.3688 \|
	\| 1.5026 \| 29.9927 \| 2047 \| 1.3649 \|
	\| 1.498 \| 30.9890 \| 2115 \| 1.3662 \|
	\| 1.4866 \| 32.0 \| 2184 \| 1.3655 \|
	\| 1.493 \| 32.9963 \| 2252 \| 1.3644 \|
	\| 1.4877 \| 33.9927 \| 2320 \| 1.3669 \|
	\| 1.4858 \| 34.9890 \| 2388 \| 1.3650 \|
	\| 1.465 \| 36.0 \| 2457 \| 1.3649 \|
	\| 1.4822 \| 36.9963 \| 2525 \| 1.3647 \|
	\| 1.4797 \| 37.9927 \| 2593 \| 1.3644 \|
	\| 1.4803 \| 38.9890 \| 2661 \| 1.3640 \|
	\| 1.4548 \| 39.8535 \| 2720 \| 1.3639 \|


	### Framework versions

	- Transformers 4.40.1
	- Pytorch 2.2.1+cu121
	- Datasets 2.17.1
	- Tokenizers 0.19.1