a-F1 commited on
Commit
8d00a01
1 Parent(s): 457765e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -14
README.md CHANGED
@@ -1,16 +1,40 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
- # Zephyr-7B-beta unlearned using SimNPO on WMDP
 
5
 
6
  ## Model Details
7
 
8
- - **Base Model**: Zephyr-7B-beta
9
- - **Unlearning**: SimNPO on WMDP-Bio and WMDP-Cyber
 
 
 
 
10
 
11
  ## Unlearning Algorithm
12
 
13
- This model uses the `SimNPO` unlearning algorithm with the following parameters:
 
 
14
  - Learning Rate: `4e-6`
15
  - beta: `5.5`
16
  - lambda: `5.0`
@@ -24,21 +48,25 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
24
  model = AutoModelForCausalLM.from_pretrained("OPTML-Group/SimNPO-WMDP-zephyr-7b-beta", use_flash_attention_2=True, torch_dtype=torch.bfloat16, trust_remote_code=True)
25
  ```
26
 
 
 
 
 
 
 
 
27
  ## Citation
28
 
29
  If you use this model in your research, please cite:
30
  ```
31
- @misc{fan2024simplicityprevailsrethinkingnegative,
32
- title={Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning},
33
- author={Chongyu Fan and Jiancheng Liu and Licong Lin and Jinghan Jia and Ruiqi Zhang and Song Mei and Sijia Liu},
34
- year={2024},
35
- eprint={2410.07163},
36
- archivePrefix={arXiv},
37
- primaryClass={cs.CL},
38
- url={https://arxiv.org/abs/2410.07163},
39
  }
40
  ```
41
 
42
- ## Contact
43
 
44
- For questions or issues regarding this model, please contact chongyu.fan93@gmail.com.
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - cais/wmdp
5
+ language:
6
+ - en
7
+ base_model:
8
+ - HuggingFaceH4/zephyr-7b-beta
9
+ pipeline_tag: text-generation
10
+ library_name: transformers
11
+ tags:
12
+ - unlearn
13
+ - machine-unlearning
14
+ - llm-unlearning
15
+ - data-privacy
16
+ - large-language-models
17
+ - trustworthy-ai
18
+ - trustworthy-machine-learning
19
+ - language-model
20
  ---
21
+
22
+ # # SimNPO-Unlearned Model on Task "WMDP"
23
 
24
  ## Model Details
25
 
26
+ - **Unlearning**:
27
+ - **Task**: [🤗datasets/cais/wmdp](https://huggingface.co/datasets/cais/wmdp)
28
+ - **Method**: [SimNPO](https://arxiv.org/abs/2410.07163)
29
+ - **Origin Model**: [🤗HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
30
+ - **Code Base**: [github.com/OPTML-Group/Unlearn-Simple](https://github.com/OPTML-Group/Unlearn-Simple)
31
+ - **Research Paper**: ["Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning"](https://arxiv.org/abs/2410.07163)
32
 
33
  ## Unlearning Algorithm
34
 
35
+ This model uses the `SimNPO` unlearning algorithm with the following optimization objective:
36
+ $$\ell_{SimNPO}(\mathbf{\theta}) = \mathbb{E}_{(x, y) \in \mathcal{D}_f}\left[-\frac{2}{\beta}\log\sigma\left(-\frac{\beta}{|y|}\log\pi_{\mathbf{\theta}}(y|x) - \gamma\right)\right] + \lambda \mathbb{E}_{(x, y) \in \mathcal{D}_r}[-\log\pi_{\mathbf{\theta}} (y|x)]$$
37
+ Unlearning hyper-parameters:
38
  - Learning Rate: `4e-6`
39
  - beta: `5.5`
40
  - lambda: `5.0`
 
48
  model = AutoModelForCausalLM.from_pretrained("OPTML-Group/SimNPO-WMDP-zephyr-7b-beta", use_flash_attention_2=True, torch_dtype=torch.bfloat16, trust_remote_code=True)
49
  ```
50
 
51
+ ## Evaluation Results
52
+ ||1 - AccBio|1 - AccCyber|MMLU|
53
+ |---|---|---|---|
54
+ |Origin|0.352|0.608|0.585|
55
+ |NPO|0.581|0.616|0.476|
56
+ |**SimNPO**|0.584|0.678|0.471|
57
+
58
  ## Citation
59
 
60
  If you use this model in your research, please cite:
61
  ```
62
+ @article{fan2024simplicity,
63
+ title={Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning},
64
+ author={Fan, Chongyu and Liu, Jiancheng and Lin, Licong and Jia, Jinghan and Zhang, Ruiqi and Mei, Song and Liu, Sijia},
65
+ journal={arXiv preprint arXiv:2410.07163},
66
+ year={2024}
 
 
 
67
  }
68
  ```
69
 
70
+ ## Reporting Issues
71
 
72
+ Reporting issues with the model: [github.com/OPTML-Group/Unlearn-Simple](https://github.com/OPTML-Group/Unlearn-Simple)