shirzady1934

End of training

4746847 11 months ago

3.89 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- generated_from_trainer
	base_model: distilgpt2
	model-index:
	- name: distilgpt-monolinugal
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# distilgpt-monolinugal

	This model is a fine-tuned version of [distilgpt2](https://huggingface.co/distilgpt2) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.4876

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 12
	- eval_batch_size: 12
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 96
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 8
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 3.3098 \| 0.16 \| 200 \| 3.5905 \|
	\| 3.2847 \| 0.32 \| 400 \| 3.5644 \|
	\| 3.2612 \| 0.48 \| 600 \| 3.5504 \|
	\| 3.2636 \| 0.64 \| 800 \| 3.5384 \|
	\| 3.2481 \| 0.8 \| 1000 \| 3.5301 \|
	\| 3.2393 \| 0.96 \| 1200 \| 3.5233 \|
	\| 3.2381 \| 1.12 \| 1400 \| 3.5184 \|
	\| 3.2317 \| 1.28 \| 1600 \| 3.5168 \|
	\| 3.2244 \| 1.44 \| 1800 \| 3.5123 \|
	\| 3.2258 \| 1.6 \| 2000 \| 3.5117 \|
	\| 3.2238 \| 1.76 \| 2200 \| 3.5058 \|
	\| 3.2376 \| 1.92 \| 2400 \| 3.5058 \|
	\| 3.212 \| 2.08 \| 2600 \| 3.5044 \|
	\| 3.231 \| 2.24 \| 2800 \| 3.5019 \|
	\| 3.2044 \| 2.4 \| 3000 \| 3.5003 \|
	\| 3.2107 \| 2.57 \| 3200 \| 3.5002 \|
	\| 3.2096 \| 2.73 \| 3400 \| 3.4996 \|
	\| 3.215 \| 2.89 \| 3600 \| 3.4963 \|
	\| 3.2092 \| 3.05 \| 3800 \| 3.4979 \|
	\| 3.2034 \| 3.21 \| 4000 \| 3.4964 \|
	\| 3.1992 \| 3.37 \| 4200 \| 3.4971 \|
	\| 3.1975 \| 3.53 \| 4400 \| 3.4941 \|
	\| 3.222 \| 3.69 \| 4600 \| 3.4932 \|
	\| 3.2104 \| 3.85 \| 4800 \| 3.4927 \|
	\| 3.199 \| 4.01 \| 5000 \| 3.4918 \|
	\| 3.2033 \| 4.17 \| 5200 \| 3.4927 \|
	\| 3.201 \| 4.33 \| 5400 \| 3.4924 \|
	\| 3.1947 \| 4.49 \| 5600 \| 3.4931 \|
	\| 3.2172 \| 4.65 \| 5800 \| 3.4907 \|
	\| 3.201 \| 4.81 \| 6000 \| 3.4908 \|
	\| 3.2089 \| 4.97 \| 6200 \| 3.4892 \|
	\| 3.206 \| 5.13 \| 6400 \| 3.4896 \|
	\| 3.2074 \| 5.29 \| 6600 \| 3.4884 \|
	\| 3.2046 \| 5.45 \| 6800 \| 3.4891 \|
	\| 3.1899 \| 5.61 \| 7000 \| 3.4888 \|
	\| 3.196 \| 5.77 \| 7200 \| 3.4891 \|
	\| 3.1946 \| 5.93 \| 7400 \| 3.4880 \|
	\| 3.1951 \| 6.09 \| 7600 \| 3.4887 \|
	\| 3.1998 \| 6.25 \| 7800 \| 3.4878 \|
	\| 3.1775 \| 6.41 \| 8000 \| 3.4880 \|
	\| 3.1947 \| 6.57 \| 8200 \| 3.4880 \|
	\| 3.1876 \| 6.73 \| 8400 \| 3.4876 \|
	\| 3.1984 \| 6.89 \| 8600 \| 3.4878 \|
	\| 3.1927 \| 7.05 \| 8800 \| 3.4875 \|
	\| 3.2006 \| 7.21 \| 9000 \| 3.4875 \|
	\| 3.2042 \| 7.37 \| 9200 \| 3.4875 \|
	\| 3.1856 \| 7.54 \| 9400 \| 3.4877 \|
	\| 3.1952 \| 7.7 \| 9600 \| 3.4877 \|
	\| 3.1981 \| 7.86 \| 9800 \| 3.4876 \|


	### Framework versions

	- PEFT 0.7.1
	- Transformers 4.36.2
	- Pytorch 1.13.0+cu116
	- Datasets 2.16.0
	- Tokenizers 0.15.0

	---
	license: apache-2.0
	library_name: peft
	tags:
	- generated_from_trainer
	base_model: distilgpt2
	model-index:
	- name: distilgpt-monolinugal
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# distilgpt-monolinugal

	This model is a fine-tuned version of [distilgpt2](https://huggingface.co/distilgpt2) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.4876

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 12
	- eval_batch_size: 12
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 96
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 8
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 3.3098 \| 0.16 \| 200 \| 3.5905 \|
	\| 3.2847 \| 0.32 \| 400 \| 3.5644 \|
	\| 3.2612 \| 0.48 \| 600 \| 3.5504 \|
	\| 3.2636 \| 0.64 \| 800 \| 3.5384 \|
	\| 3.2481 \| 0.8 \| 1000 \| 3.5301 \|
	\| 3.2393 \| 0.96 \| 1200 \| 3.5233 \|
	\| 3.2381 \| 1.12 \| 1400 \| 3.5184 \|
	\| 3.2317 \| 1.28 \| 1600 \| 3.5168 \|
	\| 3.2244 \| 1.44 \| 1800 \| 3.5123 \|
	\| 3.2258 \| 1.6 \| 2000 \| 3.5117 \|
	\| 3.2238 \| 1.76 \| 2200 \| 3.5058 \|
	\| 3.2376 \| 1.92 \| 2400 \| 3.5058 \|
	\| 3.212 \| 2.08 \| 2600 \| 3.5044 \|
	\| 3.231 \| 2.24 \| 2800 \| 3.5019 \|
	\| 3.2044 \| 2.4 \| 3000 \| 3.5003 \|
	\| 3.2107 \| 2.57 \| 3200 \| 3.5002 \|
	\| 3.2096 \| 2.73 \| 3400 \| 3.4996 \|
	\| 3.215 \| 2.89 \| 3600 \| 3.4963 \|
	\| 3.2092 \| 3.05 \| 3800 \| 3.4979 \|
	\| 3.2034 \| 3.21 \| 4000 \| 3.4964 \|
	\| 3.1992 \| 3.37 \| 4200 \| 3.4971 \|
	\| 3.1975 \| 3.53 \| 4400 \| 3.4941 \|
	\| 3.222 \| 3.69 \| 4600 \| 3.4932 \|
	\| 3.2104 \| 3.85 \| 4800 \| 3.4927 \|
	\| 3.199 \| 4.01 \| 5000 \| 3.4918 \|
	\| 3.2033 \| 4.17 \| 5200 \| 3.4927 \|
	\| 3.201 \| 4.33 \| 5400 \| 3.4924 \|
	\| 3.1947 \| 4.49 \| 5600 \| 3.4931 \|
	\| 3.2172 \| 4.65 \| 5800 \| 3.4907 \|
	\| 3.201 \| 4.81 \| 6000 \| 3.4908 \|
	\| 3.2089 \| 4.97 \| 6200 \| 3.4892 \|
	\| 3.206 \| 5.13 \| 6400 \| 3.4896 \|
	\| 3.2074 \| 5.29 \| 6600 \| 3.4884 \|
	\| 3.2046 \| 5.45 \| 6800 \| 3.4891 \|
	\| 3.1899 \| 5.61 \| 7000 \| 3.4888 \|
	\| 3.196 \| 5.77 \| 7200 \| 3.4891 \|
	\| 3.1946 \| 5.93 \| 7400 \| 3.4880 \|
	\| 3.1951 \| 6.09 \| 7600 \| 3.4887 \|
	\| 3.1998 \| 6.25 \| 7800 \| 3.4878 \|
	\| 3.1775 \| 6.41 \| 8000 \| 3.4880 \|
	\| 3.1947 \| 6.57 \| 8200 \| 3.4880 \|
	\| 3.1876 \| 6.73 \| 8400 \| 3.4876 \|
	\| 3.1984 \| 6.89 \| 8600 \| 3.4878 \|
	\| 3.1927 \| 7.05 \| 8800 \| 3.4875 \|
	\| 3.2006 \| 7.21 \| 9000 \| 3.4875 \|
	\| 3.2042 \| 7.37 \| 9200 \| 3.4875 \|
	\| 3.1856 \| 7.54 \| 9400 \| 3.4877 \|
	\| 3.1952 \| 7.7 \| 9600 \| 3.4877 \|
	\| 3.1981 \| 7.86 \| 9800 \| 3.4876 \|


	### Framework versions

	- PEFT 0.7.1
	- Transformers 4.36.2
	- Pytorch 1.13.0+cu116
	- Datasets 2.16.0
	- Tokenizers 0.15.0