File size: 5,651 Bytes

---
license: apache-2.0
base_model: distilbert/distilbert-base-uncased
tags:
- trl
- reward-trainer
- generated_from_trainer
datasets:
- hdfs_rlhf_log_summary_dataset
metrics:
- accuracy
model-index:
- name: log_sage_reward_model
  results:
  - task:
      name: Text Classification
      type: text-classification
    dataset:
      name: hdfs_rlhf_log_summary_dataset
      type: hdfs_rlhf_log_summary_dataset
      config: default
      split: None
      args: default
    metrics:
    - name: Accuracy
      type: accuracy
      value: 1.0
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# log_sage_reward_model

This model is a fine-tuned version of [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) on the hdfs_rlhf_log_summary_dataset dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4242
- Accuracy: 1.0

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1.41e-05
- train_batch_size: 4
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 100

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:--------:|
| No log        | 1.0   | 1    | 0.6936          | 0.8      |
| No log        | 2.0   | 3    | 0.6931          | 0.8      |
| No log        | 3.0   | 5    | 0.6928          | 1.0      |
| No log        | 4.0   | 6    | 0.6927          | 1.0      |
| No log        | 5.0   | 8    | 0.6923          | 1.0      |
| 0.2849        | 6.0   | 10   | 0.6915          | 1.0      |
| 0.2849        | 7.0   | 11   | 0.6908          | 1.0      |
| 0.2849        | 8.0   | 13   | 0.6889          | 1.0      |
| 0.2849        | 9.0   | 15   | 0.6838          | 1.0      |
| 0.2849        | 10.0  | 16   | 0.6788          | 1.0      |
| 0.2849        | 11.0  | 18   | 0.6633          | 1.0      |
| 0.2669        | 12.0  | 20   | 0.6464          | 1.0      |
| 0.2669        | 13.0  | 21   | 0.6422          | 1.0      |
| 0.2669        | 14.0  | 23   | 0.6312          | 1.0      |
| 0.2669        | 15.0  | 25   | 0.5991          | 1.0      |
| 0.2669        | 16.0  | 26   | 0.5796          | 1.0      |
| 0.2669        | 17.0  | 27   | 0.5571          | 1.0      |
| 0.2669        | 18.0  | 29   | 0.5255          | 1.0      |
| 0.2252        | 19.0  | 31   | 0.5055          | 1.0      |
| 0.2252        | 20.0  | 32   | 0.4967          | 1.0      |
| 0.2252        | 21.0  | 34   | 0.4841          | 1.0      |
| 0.2252        | 22.0  | 36   | 0.4742          | 1.0      |
| 0.2252        | 23.0  | 37   | 0.4700          | 1.0      |
| 0.2252        | 24.0  | 39   | 0.4633          | 1.0      |
| 0.1245        | 25.0  | 41   | 0.4573          | 1.0      |
| 0.1245        | 26.0  | 42   | 0.4547          | 1.0      |
| 0.1245        | 27.0  | 44   | 0.4501          | 1.0      |
| 0.1245        | 28.0  | 46   | 0.4462          | 1.0      |
| 0.1245        | 29.0  | 47   | 0.4444          | 1.0      |
| 0.1245        | 30.0  | 49   | 0.4415          | 1.0      |
| 0.0996        | 31.0  | 51   | 0.4390          | 1.0      |
| 0.0996        | 32.0  | 52   | 0.4378          | 1.0      |
| 0.0996        | 33.0  | 53   | 0.4368          | 1.0      |
| 0.0996        | 34.0  | 55   | 0.4349          | 1.0      |
| 0.0996        | 35.0  | 57   | 0.4333          | 1.0      |
| 0.0996        | 36.0  | 58   | 0.4326          | 1.0      |
| 0.0862        | 37.0  | 60   | 0.4315          | 1.0      |
| 0.0862        | 38.0  | 62   | 0.4306          | 1.0      |
| 0.0862        | 39.0  | 63   | 0.4301          | 1.0      |
| 0.0862        | 40.0  | 65   | 0.4294          | 1.0      |
| 0.0862        | 41.0  | 67   | 0.4288          | 1.0      |
| 0.0862        | 42.0  | 68   | 0.4285          | 1.0      |
| 0.0765        | 43.0  | 70   | 0.4281          | 1.0      |
| 0.0765        | 44.0  | 72   | 0.4276          | 1.0      |
| 0.0765        | 45.0  | 73   | 0.4272          | 1.0      |
| 0.0765        | 46.0  | 75   | 0.4265          | 1.0      |
| 0.0765        | 47.0  | 77   | 0.4261          | 1.0      |
| 0.0765        | 48.0  | 78   | 0.4259          | 1.0      |
| 0.0765        | 49.0  | 79   | 0.4257          | 1.0      |
| 0.0783        | 50.0  | 81   | 0.4253          | 1.0      |
| 0.0783        | 51.0  | 83   | 0.4250          | 1.0      |
| 0.0783        | 52.0  | 84   | 0.4249          | 1.0      |
| 0.0783        | 53.0  | 86   | 0.4247          | 1.0      |
| 0.0783        | 54.0  | 88   | 0.4246          | 1.0      |
| 0.0783        | 55.0  | 89   | 0.4245          | 1.0      |
| 0.0652        | 56.0  | 91   | 0.4244          | 1.0      |
| 0.0652        | 57.0  | 93   | 0.4243          | 1.0      |
| 0.0652        | 58.0  | 94   | 0.4243          | 1.0      |
| 0.0652        | 59.0  | 96   | 0.4242          | 1.0      |
| 0.0652        | 60.0  | 98   | 0.4242          | 1.0      |
| 0.0652        | 61.0  | 99   | 0.4242          | 1.0      |
| 0.0655        | 61.09 | 100  | 0.4242          | 1.0      |


### Framework versions

- Transformers 4.39.0
- Pytorch 2.2.1+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2