Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,7 @@ tags:
|
|
14 |
|
15 |
<!-- Provide a quick summary of what the model is/does. -->
|
16 |
|
17 |
-
- **Developed by
|
18 |
- **Model type:** Language Model finetuned with RLHF / RLAIF
|
19 |
- **License:** Apache-2.0 license under the condition that the model is not used to compete with OpenAI
|
20 |
- **Finetuned from model:** [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) (based on [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1))
|
@@ -22,14 +22,7 @@ tags:
|
|
22 |
|
23 |
|
24 |
We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) with our new reward model [Nexusflow/Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B) and policy optimization method [Fine-Tuning Language Models from Human Preferences (PPO)](https://arxiv.org/abs/1909.08593).
|
25 |
-
Harnessing the power of
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
For more detailed discussions, please check out our original [blog post](https://starling.cs.berkeley.edu)!
|
30 |
-
<!-- Provide the basic links for the model. -->
|
31 |
-
|
32 |
-
- **Blog:** https://starling.cs.berkeley.edu/
|
33 |
|
34 |
|
35 |
## Uses
|
|
|
14 |
|
15 |
<!-- Provide a quick summary of what the model is/does. -->
|
16 |
|
17 |
+
- **Developed by: The Nexusflow Team (** Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu, Karthik Ganesan, Wei-Lin Chiang, Jian Zhang, and Jiantao Jiao).
|
18 |
- **Model type:** Language Model finetuned with RLHF / RLAIF
|
19 |
- **License:** Apache-2.0 license under the condition that the model is not used to compete with OpenAI
|
20 |
- **Finetuned from model:** [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) (based on [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1))
|
|
|
22 |
|
23 |
|
24 |
We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) with our new reward model [Nexusflow/Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B) and policy optimization method [Fine-Tuning Language Models from Human Preferences (PPO)](https://arxiv.org/abs/1909.08593).
|
25 |
+
Harnessing the power of the ranking dataset, [berkeley-nest/Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar), the upgraded reward model, [Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B), and the new reward training and policy tuning pipeline, Starling-LM-7B-beta scores an improved 8.12 in MT Bench with GPT-4 as a judge.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
|
28 |
## Uses
|