More info
#1
by
Sigmally
- opened
Hi! Could you share more information about this model? How long has it been trained, what graphics card was it trained on, how was it trained (do you have your own code for finetuning or sth else)?
https://huggingface.co/datasets/maywell/hh-rlhf-harmyes
1 Epoch SFT, and 1 Epoch DPO trained using trl library with this dataset
Used 1 x A100 around 3~4 hour.
thanks!
maywell
changed discussion status to
closed