gx-ai-architect
commited on
Commit
•
8d4f3d5
1
Parent(s):
d2d12c3
Update README.md
Browse files
README.md
CHANGED
@@ -71,25 +71,6 @@ We also observe a clear correlation between the Mixtral DPO reward scores and MT
|
|
71 |
|
72 |
The final Merlinite-7B-pt is the peak checkpoint measured by both Batch-Reward and MT-Bench.
|
73 |
|
74 |
-
### Acknowledgements
|
75 |
-
|
76 |
-
Guangxuan Xu,
|
77 |
-
Project lead.
|
78 |
-
|
79 |
-
Akash Srivastava,
|
80 |
-
Primary advisor
|
81 |
-
|
82 |
-
Kai Xu,
|
83 |
-
Advised on evaluation and model training.
|
84 |
-
|
85 |
-
Tahira Naseem,
|
86 |
-
Advised on DPO rewards.
|
87 |
-
|
88 |
-
Abhishek Bhandwaldar,
|
89 |
-
Advised on distributed sampling and reward annotation implementation.
|
90 |
-
|
91 |
-
Thanks to Luis Lastras, David D. Cox, Ruchir Puri, and Sriram Raghavan for enabling this project and for provisioning the resources.
|
92 |
-
|
93 |
|
94 |
## Model description
|
95 |
|
@@ -117,3 +98,23 @@ The model has been tuned via AI preference. However, this is not a targeted RLHF
|
|
117 |
The model undergoes training on synthetic data, leading to the potential inheritance of both advantages and limitations from the underlying teacher models and data generation methods. The incorporation of safety measures during Merlinite-7b-pt's training process is considered beneficial. However, a nuanced understanding of the associated risks requires detailed studies for more accurate quantification.
|
118 |
|
119 |
In the absence of adequate safeguards, there exists a risk of malicious utilization of these models for generating disinformation or harmful content. Caution is urged against complete reliance on a specific language model for crucial decisions or impactful information, as preventing these models from fabricating content is not straightforward. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in ungrounded generation scenarios due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
71 |
|
72 |
The final Merlinite-7B-pt is the peak checkpoint measured by both Batch-Reward and MT-Bench.
|
73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
|
75 |
## Model description
|
76 |
|
|
|
98 |
The model undergoes training on synthetic data, leading to the potential inheritance of both advantages and limitations from the underlying teacher models and data generation methods. The incorporation of safety measures during Merlinite-7b-pt's training process is considered beneficial. However, a nuanced understanding of the associated risks requires detailed studies for more accurate quantification.
|
99 |
|
100 |
In the absence of adequate safeguards, there exists a risk of malicious utilization of these models for generating disinformation or harmful content. Caution is urged against complete reliance on a specific language model for crucial decisions or impactful information, as preventing these models from fabricating content is not straightforward. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in ungrounded generation scenarios due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain.
|
101 |
+
|
102 |
+
### Acknowledgements
|
103 |
+
|
104 |
+
Guangxuan Xu,
|
105 |
+
Project lead.
|
106 |
+
|
107 |
+
Akash Srivastava,
|
108 |
+
Primary advisor
|
109 |
+
|
110 |
+
Kai Xu,
|
111 |
+
Advised on evaluation and model training.
|
112 |
+
|
113 |
+
Tahira Naseem,
|
114 |
+
Advised on DPO rewards.
|
115 |
+
|
116 |
+
Abhishek Bhandwaldar,
|
117 |
+
Advised on distributed sampling and reward annotation implementation.
|
118 |
+
|
119 |
+
Thanks to Luis Lastras, David D. Cox, Ruchir Puri, and Sriram Raghavan for enabling this project and for provisioning the resources.
|
120 |
+
|