RLHFlow
/

LLaMA3-iterative-DPO-final

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

weqweasdas commited on Jun 3

Commit

360547e

•

1 Parent(s): c20c9f0

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -15,6 +15,7 @@ See the [collection](https://huggingface.co/collections/RLHFlow/online-rlhf-663a
 - [SFT model](https://huggingface.co/RLHFlow/LLaMA3-SFT)
 - [Reward model](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1)
 ## Dataset
 - [Preference data mix](https://huggingface.co/datasets/hendrydong/preference_700K)

 - [SFT model](https://huggingface.co/RLHFlow/LLaMA3-SFT)
 - [Reward model](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1)
+- This model is more like the concise version in the report. We are still working on the model realeasing due to some license issue....
 ## Dataset
 - [Preference data mix](https://huggingface.co/datasets/hendrydong/preference_700K)