gpt2-wikitext2

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 30

Training Loss	Epoch	Step	Validation Loss
No log	1.0	136	2.6125
No log	2.0	272	2.3027
No log	3.0	408	2.1283
2.7351	4.0	544	1.9970
2.7351	5.0	680	1.8977
2.7351	6.0	816	1.8174
2.7351	7.0	952	1.7532
1.9296	8.0	1088	1.6992
1.9296	9.0	1224	1.6594
1.9296	10.0	1360	1.6301
1.9296	11.0	1496	1.5985
1.6853	12.0	1632	1.5784
1.6853	13.0	1768	1.5560
1.6853	14.0	1904	1.5433
1.5639	15.0	2040	1.5262
1.5639	16.0	2176	1.5109
1.5639	17.0	2312	1.5018
1.5639	18.0	2448	1.4957
1.4932	19.0	2584	1.4823
1.4932	20.0	2720	1.4741
1.4932	21.0	2856	1.4681
1.4932	22.0	2992	1.4618
1.448	23.0	3128	1.4559
1.448	24.0	3264	1.4543
1.448	25.0	3400	1.4487
1.4177	26.0	3536	1.4463
1.4177	27.0	3672	1.4435
1.4177	28.0	3808	1.4429
1.4177	29.0	3944	1.4424
1.4044	30.0	4080	1.4411