metadata

license: mit
base_model: Toflamus/GPT-2_para3M
tags:
  - generated_from_trainer
model-index:
  - name: Output
    results: []

Output

This model is a fine-tuned version of Toflamus/GPT-2_para3M on an unknown dataset. TrainOutput(global_step=4060, training_loss=6.123095868491187, metrics={'train_runtime': 1435.0504, 'train_samples_per_second': 181.185, 'train_steps_per_second': 2.829, 'total_flos': 96669633527808.0, 'train_loss': 6.123095868491187, 'epoch': 5.0})

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 5

Training results

Step Training Loss
100 7.737900
200 7.066700
300 6.840200
400 6.686600
500 6.607700
600 6.516500
700 6.449800
800 6.360400
900 6.321700
1000 6.252700
1100 6.223500
1200 6.194700
1300 6.131500
1400 6.113400
1500 6.106500
1600 6.044100
1700 6.024400
1800 6.008500
1900 6.006600
2000 5.959900
2100 5.931100
2200 5.925300
2300 5.933500
2400 5.921900
2500 5.913400
2600 5.898100
2700 5.874700
2800 5.869100
2900 5.851200
3000 5.853900
3100 5.870100
3200 5.868100
3300 5.837000
3400 5.845300
3500 5.828800
3600 5.847400
3700 5.858600
3800 5.853200
3900 5.836600
4000 5.849100

Framework versions

Transformers 4.32.0
Pytorch 2.0.1+cu117
Datasets 2.14.4
Tokenizers 0.13.2