nguyenbh commited on
Commit
0eb7016
1 Parent(s): bea9545

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -257,6 +257,19 @@ To understand the capabilities, we compare Phi-3.5-vision with a set of models o
257
  | Document Intelligence | TextVQA (val) | 72.0 | 66.2 | 68.8 | 67.4 | 70.9 | 70.5 | 64.5 | 75.6 |
258
  | Object visual presence verification | POPE (test) | 86.1 | 83.3 | 84.2 | 86.1 | 83.6 | 76.6 | 89.3 | 87.0 |
259
 
 
 
 
 
 
 
 
 
 
 
 
 
 
260
 
261
  ## Software
262
  * [PyTorch](https://github.com/pytorch/pytorch)
 
257
  | Document Intelligence | TextVQA (val) | 72.0 | 66.2 | 68.8 | 67.4 | 70.9 | 70.5 | 64.5 | 75.6 |
258
  | Object visual presence verification | POPE (test) | 86.1 | 83.3 | 84.2 | 86.1 | 83.6 | 76.6 | 89.3 | 87.0 |
259
 
260
+ ## Safety Evaluation and Red-Teaming
261
+
262
+ **Approach**
263
+ The Phi-3 family of models has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated datasets.
264
+ The overall technique employed to do the safety alignment is a combination of SFT (Supervised Fine-Tuning) and RLHF (Reinforcement Learning from Human Feedback) approaches
265
+ by utilizing human-labeled and synthetic English-language datasets, including publicly available datasets focusing on helpfulness and harmlessness as well as various
266
+ questions and answers targeted to multiple safety categories.
267
+
268
+ **Safety Evaluation**
269
+ We leveraged various evaluation techniques including red teaming, adversarial conversation simulations, and safety evaluation benchmark datasets to evaluate Phi-3.5
270
+ models' propensity to produce undesirable outputs across multiple risk categories. Several approaches were used to compensate for the limitations of one approach alone.
271
+ Please refer to the [technical report](https://arxiv.org/pdf/2404.14219) for more details of our safety alignment.
272
+
273
 
274
  ## Software
275
  * [PyTorch](https://github.com/pytorch/pytorch)