Update README.md
Browse files
README.md
CHANGED
@@ -257,6 +257,19 @@ To understand the capabilities, we compare Phi-3.5-vision with a set of models o
|
|
257 |
| Document Intelligence | TextVQA (val) | 72.0 | 66.2 | 68.8 | 67.4 | 70.9 | 70.5 | 64.5 | 75.6 |
|
258 |
| Object visual presence verification | POPE (test) | 86.1 | 83.3 | 84.2 | 86.1 | 83.6 | 76.6 | 89.3 | 87.0 |
|
259 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
260 |
|
261 |
## Software
|
262 |
* [PyTorch](https://github.com/pytorch/pytorch)
|
|
|
257 |
| Document Intelligence | TextVQA (val) | 72.0 | 66.2 | 68.8 | 67.4 | 70.9 | 70.5 | 64.5 | 75.6 |
|
258 |
| Object visual presence verification | POPE (test) | 86.1 | 83.3 | 84.2 | 86.1 | 83.6 | 76.6 | 89.3 | 87.0 |
|
259 |
|
260 |
+
## Safety Evaluation and Red-Teaming
|
261 |
+
|
262 |
+
**Approach**
|
263 |
+
The Phi-3 family of models has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated datasets.
|
264 |
+
The overall technique employed to do the safety alignment is a combination of SFT (Supervised Fine-Tuning) and RLHF (Reinforcement Learning from Human Feedback) approaches
|
265 |
+
by utilizing human-labeled and synthetic English-language datasets, including publicly available datasets focusing on helpfulness and harmlessness as well as various
|
266 |
+
questions and answers targeted to multiple safety categories.
|
267 |
+
|
268 |
+
**Safety Evaluation**
|
269 |
+
We leveraged various evaluation techniques including red teaming, adversarial conversation simulations, and safety evaluation benchmark datasets to evaluate Phi-3.5
|
270 |
+
models' propensity to produce undesirable outputs across multiple risk categories. Several approaches were used to compensate for the limitations of one approach alone.
|
271 |
+
Please refer to the [technical report](https://arxiv.org/pdf/2404.14219) for more details of our safety alignment.
|
272 |
+
|
273 |
|
274 |
## Software
|
275 |
* [PyTorch](https://github.com/pytorch/pytorch)
|