mnoukhov
/

pythia410m-dpo-tldr-lr1e-5

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

Edit model card

pythia410m-dpo-tldr-lr1e-5

This model is a fine-tuned version of mnoukhov/pythia410m-sft-tldr on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5595
Rewards/chosen: -0.9059
Rewards/rejected: -1.3735
Rewards/accuracies: 0.7113
Rewards/margins: 0.4677
Logps/rejected: -88.3830
Logps/chosen: -88.3830
Logps/ref Rejected: -63.5119
Logps/ref Chosen: -70.2656

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 4
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logps/ref Rejected	Logps/ref Chosen
0.6295	0.2	291	0.5864	-0.5101	-0.8319	0.7039	0.3218	-80.4685	-80.4685	-63.5119	-70.2656
0.5926	0.4	582	0.5600	-0.9009	-1.3738	0.7120	0.4728	-88.2839	-88.2839	-63.5119	-70.2656
0.5761	0.6	873	0.5585	-0.9509	-1.4326	0.7110	0.4817	-89.2846	-89.2846	-63.5119	-70.2656
0.5678	0.8	1164	0.5595	-0.9059	-1.3735	0.7113	0.4677	-88.3830	-88.3830	-63.5119	-70.2656

Framework versions

PEFT 0.10.0
Transformers 4.38.2
Pytorch 2.1.2+cu121
Datasets 2.17.0
Tokenizers 0.15.2

Downloads last month: 4

Inference API

Unable to determine this model’s pipeline type. Check the docs .

Model tree for mnoukhov/pythia410m-dpo-tldr-lr1e-5

Base model

EleutherAI/pythia-410m-deduped

Finetuned

mnoukhov/pythia410m-sft-tldr

Adapter

(4)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard