|
--- |
|
tags: |
|
- deberta-v3 |
|
- deberta |
|
- deberta-v2 |
|
license: mit |
|
base_model: |
|
- microsoft/deberta-v3-large |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models |
|
|
|
[arXiv Link](https://arxiv.org/abs/2410.01524) |
|
|
|
Our model functions as a Guard Model, intended to classify the safety of conversations with LLMs and protect against LLM jailbreak attacks. |
|
It is fine-tuned from DeBERTa-v3-large and trained using **HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models**. |
|
The training process involves knowledge distillation paired with data augmentation, using our [**HarmAug Generated Dataset**]. |
|
|
|
|
|
For more information, please refer to our [github](https://github.com/imnotkind/HarmAug) |
|
|
|
|
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/66f7bee63c7ffa79319b053b/bCNW62CvDpqbXUK4eZ4-b.png) |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/66f7bee63c7ffa79319b053b/REbNDOhT31bv_XRa6-VzE.png) |