IrwinD's picture
End of training
4b9c9d6 verified
metadata
license: apache-2.0
base_model: distilbert/distilbert-base-uncased
tags:
  - trl
  - reward-trainer
  - generated_from_trainer
datasets:
  - hdfs_rlhf_log_summary_dataset
metrics:
  - accuracy
model-index:
  - name: log_sage_reward_model
    results:
      - task:
          name: Text Classification
          type: text-classification
        dataset:
          name: hdfs_rlhf_log_summary_dataset
          type: hdfs_rlhf_log_summary_dataset
          config: default
          split: None
          args: default
        metrics:
          - name: Accuracy
            type: accuracy
            value: 1

log_sage_reward_model

This model is a fine-tuned version of distilbert/distilbert-base-uncased on the hdfs_rlhf_log_summary_dataset dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4242
  • Accuracy: 1.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1.41e-05
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 1 0.6936 0.8
No log 2.0 3 0.6931 0.8
No log 3.0 5 0.6928 1.0
No log 4.0 6 0.6927 1.0
No log 5.0 8 0.6923 1.0
0.2849 6.0 10 0.6915 1.0
0.2849 7.0 11 0.6908 1.0
0.2849 8.0 13 0.6889 1.0
0.2849 9.0 15 0.6838 1.0
0.2849 10.0 16 0.6788 1.0
0.2849 11.0 18 0.6633 1.0
0.2669 12.0 20 0.6464 1.0
0.2669 13.0 21 0.6422 1.0
0.2669 14.0 23 0.6312 1.0
0.2669 15.0 25 0.5991 1.0
0.2669 16.0 26 0.5796 1.0
0.2669 17.0 27 0.5571 1.0
0.2669 18.0 29 0.5255 1.0
0.2252 19.0 31 0.5055 1.0
0.2252 20.0 32 0.4967 1.0
0.2252 21.0 34 0.4841 1.0
0.2252 22.0 36 0.4742 1.0
0.2252 23.0 37 0.4700 1.0
0.2252 24.0 39 0.4633 1.0
0.1245 25.0 41 0.4573 1.0
0.1245 26.0 42 0.4547 1.0
0.1245 27.0 44 0.4501 1.0
0.1245 28.0 46 0.4462 1.0
0.1245 29.0 47 0.4444 1.0
0.1245 30.0 49 0.4415 1.0
0.0996 31.0 51 0.4390 1.0
0.0996 32.0 52 0.4378 1.0
0.0996 33.0 53 0.4368 1.0
0.0996 34.0 55 0.4349 1.0
0.0996 35.0 57 0.4333 1.0
0.0996 36.0 58 0.4326 1.0
0.0862 37.0 60 0.4315 1.0
0.0862 38.0 62 0.4306 1.0
0.0862 39.0 63 0.4301 1.0
0.0862 40.0 65 0.4294 1.0
0.0862 41.0 67 0.4288 1.0
0.0862 42.0 68 0.4285 1.0
0.0765 43.0 70 0.4281 1.0
0.0765 44.0 72 0.4276 1.0
0.0765 45.0 73 0.4272 1.0
0.0765 46.0 75 0.4265 1.0
0.0765 47.0 77 0.4261 1.0
0.0765 48.0 78 0.4259 1.0
0.0765 49.0 79 0.4257 1.0
0.0783 50.0 81 0.4253 1.0
0.0783 51.0 83 0.4250 1.0
0.0783 52.0 84 0.4249 1.0
0.0783 53.0 86 0.4247 1.0
0.0783 54.0 88 0.4246 1.0
0.0783 55.0 89 0.4245 1.0
0.0652 56.0 91 0.4244 1.0
0.0652 57.0 93 0.4243 1.0
0.0652 58.0 94 0.4243 1.0
0.0652 59.0 96 0.4242 1.0
0.0652 60.0 98 0.4242 1.0
0.0652 61.0 99 0.4242 1.0
0.0655 61.09 100 0.4242 1.0

Framework versions

  • Transformers 4.39.0
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2