Update README.md

ac35852 verified 4 months ago

3.78 kB

	---
	base_model: google/flan-t5-base
	library_name: peft
	license: apache-2.0
	tags:
	- generated_from_trainer
	model-index:
	- name: results
	results: []
	pipeline_tag: text-generation
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# results

	This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.9615

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.001
	- train_batch_size: 6
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 12
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 23
	- training_steps: 2373

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 3.0035 \| 0.42 \| 50 \| 2.5115 \|
	\| 2.7023 \| 0.84 \| 100 \| 2.3978 \|
	\| 2.6198 \| 1.26 \| 150 \| 2.3258 \|
	\| 2.5523 \| 1.68 \| 200 \| 2.2768 \|
	\| 2.4817 \| 2.1 \| 250 \| 2.2360 \|
	\| 2.4591 \| 2.52 \| 300 \| 2.2041 \|
	\| 2.4 \| 2.94 \| 350 \| 2.1844 \|
	\| 2.3709 \| 3.36 \| 400 \| 2.1547 \|
	\| 2.3591 \| 3.78 \| 450 \| 2.1366 \|
	\| 2.3232 \| 4.2 \| 500 \| 2.1210 \|
	\| 2.3016 \| 4.62 \| 550 \| 2.1119 \|
	\| 2.3041 \| 5.04 \| 600 \| 2.0993 \|
	\| 2.2646 \| 5.46 \| 650 \| 2.0908 \|
	\| 2.247 \| 5.88 \| 700 \| 2.0794 \|
	\| 2.1935 \| 6.3 \| 750 \| 2.0612 \|
	\| 2.2334 \| 6.72 \| 800 \| 2.0573 \|
	\| 2.2054 \| 7.14 \| 850 \| 2.0498 \|
	\| 2.212 \| 7.56 \| 900 \| 2.0460 \|
	\| 2.1687 \| 7.98 \| 950 \| 2.0388 \|
	\| 2.1454 \| 8.4 \| 1000 \| 2.0347 \|
	\| 2.1344 \| 8.82 \| 1050 \| 2.0243 \|
	\| 2.1522 \| 9.24 \| 1100 \| 2.0155 \|
	\| 2.1051 \| 9.66 \| 1150 \| 2.0144 \|
	\| 2.1435 \| 10.08 \| 1200 \| 2.0152 \|
	\| 2.1251 \| 10.5 \| 1250 \| 2.0133 \|
	\| 2.0664 \| 10.92 \| 1300 \| 2.0000 \|
	\| 2.0656 \| 11.34 \| 1350 \| 2.0002 \|
	\| 2.1186 \| 11.76 \| 1400 \| 1.9933 \|
	\| 2.0719 \| 12.18 \| 1450 \| 1.9906 \|
	\| 2.0389 \| 12.61 \| 1500 \| 1.9913 \|
	\| 2.0655 \| 13.03 \| 1550 \| 1.9874 \|
	\| 2.0371 \| 13.45 \| 1600 \| 1.9824 \|
	\| 2.0581 \| 13.87 \| 1650 \| 1.9789 \|
	\| 2.0068 \| 14.29 \| 1700 \| 1.9801 \|
	\| 2.0536 \| 14.71 \| 1750 \| 1.9750 \|
	\| 2.0311 \| 15.13 \| 1800 \| 1.9729 \|
	\| 2.0292 \| 15.55 \| 1850 \| 1.9716 \|
	\| 1.9955 \| 15.97 \| 1900 \| 1.9714 \|
	\| 2.0056 \| 16.39 \| 1950 \| 1.9671 \|
	\| 2.0391 \| 16.81 \| 2000 \| 1.9642 \|
	\| 2.0059 \| 17.23 \| 2050 \| 1.9687 \|
	\| 2.0155 \| 17.65 \| 2100 \| 1.9644 \|
	\| 1.9745 \| 18.07 \| 2150 \| 1.9617 \|
	\| 1.9929 \| 18.49 \| 2200 \| 1.9621 \|
	\| 1.9978 \| 18.91 \| 2250 \| 1.9639 \|
	\| 2.023 \| 19.33 \| 2300 \| 1.9617 \|
	\| 1.992 \| 19.75 \| 2350 \| 1.9615 \|


	### Framework versions

	- PEFT 0.8.2
	- Transformers 4.38.1
	- Pytorch 2.3.0+cu121
	- Datasets 2.17.0
	- Tokenizers 0.15.2

	---
	base_model: google/flan-t5-base
	library_name: peft
	license: apache-2.0
	tags:
	- generated_from_trainer
	model-index:
	- name: results
	results: []
	pipeline_tag: text-generation
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# results

	This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.9615

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.001
	- train_batch_size: 6
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 12
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 23
	- training_steps: 2373

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 3.0035 \| 0.42 \| 50 \| 2.5115 \|
	\| 2.7023 \| 0.84 \| 100 \| 2.3978 \|
	\| 2.6198 \| 1.26 \| 150 \| 2.3258 \|
	\| 2.5523 \| 1.68 \| 200 \| 2.2768 \|
	\| 2.4817 \| 2.1 \| 250 \| 2.2360 \|
	\| 2.4591 \| 2.52 \| 300 \| 2.2041 \|
	\| 2.4 \| 2.94 \| 350 \| 2.1844 \|
	\| 2.3709 \| 3.36 \| 400 \| 2.1547 \|
	\| 2.3591 \| 3.78 \| 450 \| 2.1366 \|
	\| 2.3232 \| 4.2 \| 500 \| 2.1210 \|
	\| 2.3016 \| 4.62 \| 550 \| 2.1119 \|
	\| 2.3041 \| 5.04 \| 600 \| 2.0993 \|
	\| 2.2646 \| 5.46 \| 650 \| 2.0908 \|
	\| 2.247 \| 5.88 \| 700 \| 2.0794 \|
	\| 2.1935 \| 6.3 \| 750 \| 2.0612 \|
	\| 2.2334 \| 6.72 \| 800 \| 2.0573 \|
	\| 2.2054 \| 7.14 \| 850 \| 2.0498 \|
	\| 2.212 \| 7.56 \| 900 \| 2.0460 \|
	\| 2.1687 \| 7.98 \| 950 \| 2.0388 \|
	\| 2.1454 \| 8.4 \| 1000 \| 2.0347 \|
	\| 2.1344 \| 8.82 \| 1050 \| 2.0243 \|
	\| 2.1522 \| 9.24 \| 1100 \| 2.0155 \|
	\| 2.1051 \| 9.66 \| 1150 \| 2.0144 \|
	\| 2.1435 \| 10.08 \| 1200 \| 2.0152 \|
	\| 2.1251 \| 10.5 \| 1250 \| 2.0133 \|
	\| 2.0664 \| 10.92 \| 1300 \| 2.0000 \|
	\| 2.0656 \| 11.34 \| 1350 \| 2.0002 \|
	\| 2.1186 \| 11.76 \| 1400 \| 1.9933 \|
	\| 2.0719 \| 12.18 \| 1450 \| 1.9906 \|
	\| 2.0389 \| 12.61 \| 1500 \| 1.9913 \|
	\| 2.0655 \| 13.03 \| 1550 \| 1.9874 \|
	\| 2.0371 \| 13.45 \| 1600 \| 1.9824 \|
	\| 2.0581 \| 13.87 \| 1650 \| 1.9789 \|
	\| 2.0068 \| 14.29 \| 1700 \| 1.9801 \|
	\| 2.0536 \| 14.71 \| 1750 \| 1.9750 \|
	\| 2.0311 \| 15.13 \| 1800 \| 1.9729 \|
	\| 2.0292 \| 15.55 \| 1850 \| 1.9716 \|
	\| 1.9955 \| 15.97 \| 1900 \| 1.9714 \|
	\| 2.0056 \| 16.39 \| 1950 \| 1.9671 \|
	\| 2.0391 \| 16.81 \| 2000 \| 1.9642 \|
	\| 2.0059 \| 17.23 \| 2050 \| 1.9687 \|
	\| 2.0155 \| 17.65 \| 2100 \| 1.9644 \|
	\| 1.9745 \| 18.07 \| 2150 \| 1.9617 \|
	\| 1.9929 \| 18.49 \| 2200 \| 1.9621 \|
	\| 1.9978 \| 18.91 \| 2250 \| 1.9639 \|
	\| 2.023 \| 19.33 \| 2300 \| 1.9617 \|
	\| 1.992 \| 19.75 \| 2350 \| 1.9615 \|


	### Framework versions

	- PEFT 0.8.2
	- Transformers 4.38.1
	- Pytorch 2.3.0+cu121
	- Datasets 2.17.0
	- Tokenizers 0.15.2