log_sage_reward_model

This model is a fine-tuned version of distilbert/distilbert-base-uncased on the hdfs_rlhf_log_summary_dataset dataset. It achieves the following results on the evaluation set:

Loss: 0.4242
Accuracy: 1.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1.41e-05
train_batch_size: 4
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	1	0.6936	0.8
No log	2.0	3	0.6931	0.8
No log	3.0	5	0.6928	1.0
No log	4.0	6	0.6927	1.0
No log	5.0	8	0.6923	1.0
0.2849	6.0	10	0.6915	1.0
0.2849	7.0	11	0.6908	1.0
0.2849	8.0	13	0.6889	1.0
0.2849	9.0	15	0.6838	1.0
0.2849	10.0	16	0.6788	1.0
0.2849	11.0	18	0.6633	1.0
0.2669	12.0	20	0.6464	1.0
0.2669	13.0	21	0.6422	1.0
0.2669	14.0	23	0.6312	1.0
0.2669	15.0	25	0.5991	1.0
0.2669	16.0	26	0.5796	1.0
0.2669	17.0	27	0.5571	1.0
0.2669	18.0	29	0.5255	1.0
0.2252	19.0	31	0.5055	1.0
0.2252	20.0	32	0.4967	1.0
0.2252	21.0	34	0.4841	1.0
0.2252	22.0	36	0.4742	1.0
0.2252	23.0	37	0.4700	1.0
0.2252	24.0	39	0.4633	1.0
0.1245	25.0	41	0.4573	1.0
0.1245	26.0	42	0.4547	1.0
0.1245	27.0	44	0.4501	1.0
0.1245	28.0	46	0.4462	1.0
0.1245	29.0	47	0.4444	1.0
0.1245	30.0	49	0.4415	1.0
0.0996	31.0	51	0.4390	1.0
0.0996	32.0	52	0.4378	1.0
0.0996	33.0	53	0.4368	1.0
0.0996	34.0	55	0.4349	1.0
0.0996	35.0	57	0.4333	1.0
0.0996	36.0	58	0.4326	1.0
0.0862	37.0	60	0.4315	1.0
0.0862	38.0	62	0.4306	1.0
0.0862	39.0	63	0.4301	1.0
0.0862	40.0	65	0.4294	1.0
0.0862	41.0	67	0.4288	1.0
0.0862	42.0	68	0.4285	1.0
0.0765	43.0	70	0.4281	1.0
0.0765	44.0	72	0.4276	1.0
0.0765	45.0	73	0.4272	1.0
0.0765	46.0	75	0.4265	1.0
0.0765	47.0	77	0.4261	1.0
0.0765	48.0	78	0.4259	1.0
0.0765	49.0	79	0.4257	1.0
0.0783	50.0	81	0.4253	1.0
0.0783	51.0	83	0.4250	1.0
0.0783	52.0	84	0.4249	1.0
0.0783	53.0	86	0.4247	1.0
0.0783	54.0	88	0.4246	1.0
0.0783	55.0	89	0.4245	1.0
0.0652	56.0	91	0.4244	1.0
0.0652	57.0	93	0.4243	1.0
0.0652	58.0	94	0.4243	1.0
0.0652	59.0	96	0.4242	1.0
0.0652	60.0	98	0.4242	1.0
0.0652	61.0	99	0.4242	1.0
0.0655	61.09	100	0.4242	1.0

Framework versions

Transformers 4.39.0
Pytorch 2.2.1+cu121
Datasets 2.18.0
Tokenizers 0.15.2

IrwinD
/

log_sage_reward_model

log_sage_reward_model

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for IrwinD/log_sage_reward_model

Evaluation results