More info

#1
by Sigmally - opened

Hi! Could you share more information about this model? How long has it been trained, what graphics card was it trained on, how was it trained (do you have your own code for finetuning or sth else)?

https://huggingface.co/datasets/maywell/hh-rlhf-harmyes

1 Epoch SFT, and 1 Epoch DPO trained using trl library with this dataset

Used 1 x A100 around 3~4 hour.

maywell changed discussion status to closed

Sign up or log in to comment