HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

This repository contains code for reproducing HarmAug introduced in

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

Seanie Lee*, Haebin Seong*, Dong Bok Lee, Minki Kang, Xiaoyin Chen, Dominik Wagner, Yoshua Bengio, Juho Lee, Sung Ju Hwang (*: Equal contribution)

[arXiv link] [Model link]
[Dataset link]

Reproduction Steps

First, we recommend to create a conda environment with python 3.10.

conda create -n harmaug python=3.10
conda activate harmaug

After that, install the requirements.

pip install -r requirements.txt

Then, download necessary files from Google Drive and put them into their appropriate folders.

mv [email protected] ./data

Finally, you can start the knowledge distillation process.

bash script/kd.sh

Reference

To cite our paper, please use this BibTex

@article{lee2024harmaug,
  title={{HarmAug}: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models},
  author={Lee, Seanie and Seong, Haebin and Lee, Dong Bok and Kang, Minki and Chen, Xiaoyin and Wagner, Dominik and Bengio, Yoshua and Lee, Juho and Hwang, Sung Ju},
  journal={arXiv preprint arXiv:2410.01524},
  year={2024}
}