Output
This model is a fine-tuned version of Toflamus/GPT-2_para3M on an unknown dataset. TrainOutput(global_step=4060, training_loss=6.123095868491187, metrics={'train_runtime': 1435.0504, 'train_samples_per_second': 181.185, 'train_steps_per_second': 2.829, 'total_flos': 96669633527808.0, 'train_loss': 6.123095868491187, 'epoch': 5.0})
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 5
Training results
Step Training Loss
100 7.737900
200 7.066700
300 6.840200
400 6.686600
500 6.607700
600 6.516500
700 6.449800
800 6.360400
900 6.321700
1000 6.252700
1100 6.223500
1200 6.194700
1300 6.131500
1400 6.113400
1500 6.106500
1600 6.044100
1700 6.024400
1800 6.008500
1900 6.006600
2000 5.959900
2100 5.931100
2200 5.925300
2300 5.933500
2400 5.921900
2500 5.913400
2600 5.898100
2700 5.874700
2800 5.869100
2900 5.851200
3000 5.853900
3100 5.870100
3200 5.868100
3300 5.837000
3400 5.845300
3500 5.828800
3600 5.847400
3700 5.858600
3800 5.853200
3900 5.836600
4000 5.849100
Framework versions
- Transformers 4.32.0
- Pytorch 2.0.1+cu117
- Datasets 2.14.4
- Tokenizers 0.13.2
- Downloads last month
- 18