hbseong
/

HarmAug-Guard

Text Classification

Inference Endpoints

Model card Files Files and versions Community

HarmAug-Guard / README.md

Seanie-lee's picture

Upload README.md

eba8083 verified 19 days ago

|

1.82 kB

	# HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
	This repository contains code for reproducing HarmAug introduced in

	HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

	Seanie Lee, Haebin Seong, Dong Bok Lee, Minki Kang, Xiaoyin Chen, Dominik Wagner, Yoshua Bengio, Juho Lee, Sung Ju Hwang (*: Equal contribution)

	[[arXiv link]](https://arxiv.org/abs/2410.01524)
	[[Model link]](https://huggingface.co/AnonHB/HarmAug_Guard_Model_deberta_v3_large_finetuned)
	[[Dataset link]](https://huggingface.co/datasets/AnonHB/HarmAug_generated_dataset)


	![concept_figure](https://github.com/user-attachments/assets/3e61f7c6-e0c2-4107-bb4e-9b4d2c7ba961)

	![overall_comparison_broken](https://github.com/user-attachments/assets/03cc0fa5-e9dc-4d78-a5b8-a2c122672fea)



	## Reproduction Steps

	First, we recommend to create a conda environment with python 3.10.
	```
	conda create -n harmaug python=3.10
	conda activate harmaug
	```


	After that, install the requirements.
	```
	pip install -r requirements.txt
	```


	Then, download necessary files from [Google Drive](https://drive.google.com/drive/folders/1oLUMPauXYtEBP7rvbULXL4hHp9Ck_yqg?usp=drive_link) and put them into their appropriate folders.
	```
	mv [email protected] ./data
	```


	Finally, you can start the knowledge distillation process.
	```
	bash script/kd.sh
	```

	## Reference
	To cite our paper, please use this BibTex
	```bibtex
	@article{lee2024harmaug,
	title={{HarmAug}: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models},
	author={Lee, Seanie and Seong, Haebin and Lee, Dong Bok and Kang, Minki and Chen, Xiaoyin and Wagner, Dominik and Bengio, Yoshua and Lee, Juho and Hwang, Sung Ju},
	journal={arXiv preprint arXiv:2410.01524},
	year={2024}
	}
	```