hbseong
/

HarmAug-Guard

Text Classification

Inference Endpoints

Model card Files Files and versions Community

HarmAug-Guard / README.md

hbseong's picture

Update README.md

caaef24 verified 22 days ago

|

1.03 kB

	---
	tags:
	- deberta-v3
	- deberta
	- deberta-v2
	license: mit
	base_model:
	- microsoft/deberta-v3-large
	pipeline_tag: text-classification
	---

	# HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

	[arXiv Link](https://arxiv.org/abs/2410.01524)

	Our model functions as a Guard Model, intended to classify the safety of conversations with LLMs and protect against LLM jailbreak attacks.
	It is fine-tuned from DeBERTa-v3-large and trained using HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models.
	The training process involves knowledge distillation paired with data augmentation, using our [HarmAug Generated Dataset].


	For more information, please refer to our [github](https://github.com/imnotkind/HarmAug)



	![image/png](https://cdn-uploads.huggingface.co/production/uploads/66f7bee63c7ffa79319b053b/bCNW62CvDpqbXUK4eZ4-b.png)

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/66f7bee63c7ffa79319b053b/REbNDOhT31bv_XRa6-VzE.png)