gx-ai-architect
commited on
Commit
•
8f64d68
1
Parent(s):
fa88994
Update README.md
Browse files
README.md
CHANGED
@@ -18,9 +18,11 @@ base_model: mistralai/Mistral-7B-v0.1
|
|
18 |
# Model Card for Merlinite-7B-pt 🔥
|
19 |
|
20 |
### Overview
|
21 |
-
We introduce
|
22 |
|
23 |
-
|
|
|
|
|
24 |
|
25 |
### Performance
|
26 |
|
|
|
18 |
# Model Card for Merlinite-7B-pt 🔥
|
19 |
|
20 |
### Overview
|
21 |
+
We introduce **Merlinite-7B-pt**, a strong 7b open-source chat model, aigned using AI feedback **without using any human annotation or proprietary models**.
|
22 |
|
23 |
+
**Merlinite-7B-pt** is first supervised-finetuned (SFT) via LAB using Mistral-7B-v0.1 as base model, and then preference-tuned via AI feedback. Our preference tuning recipe uses the DPO reward from Mixtral-8x7B-Instruct-v0.1 as the proxy for human preferences, and applies iterative rejection sampling to finetune the SFT policy. We show that DPO log-ratios can serve as a reliable reward signal, showing clear correlation between reward improvements and Mt-Bench improvements.
|
24 |
+
|
25 |
+
The official **Merlinite-7B-pt** achieves **7.96** on MT-Bench, surpassing Mistral-7B-Instruct-v0.1, Llama2-70b-chat and comparable to small-sized proprietary models like GPT3.5-Turbo-0314 and Claude-v1. It also exhibits superior instruction-following and human preference compared to the SFT Merlinite-7B model.
|
26 |
|
27 |
### Performance
|
28 |
|