ODIN-RM & RLHF models
Collection
The ODIN and the policies trained by ODIN
•
8 items
•
Updated
This is an official implementation of ODIN-ppo-L230-7B model, which is a chat assistant trained by fine-tuning LLaMA on Open-Assistant dataset via PPO. The L230 means the output length in LIMA test set is ~230. ODIN is the reward model for the training.