HarmAug-Guard / README.md
hbseong's picture
Update README.md
caaef24 verified
|
raw
history blame
1.03 kB
metadata
tags:
  - deberta-v3
  - deberta
  - deberta-v2
license: mit
base_model:
  - microsoft/deberta-v3-large
pipeline_tag: text-classification

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

arXiv Link

Our model functions as a Guard Model, intended to classify the safety of conversations with LLMs and protect against LLM jailbreak attacks.
It is fine-tuned from DeBERTa-v3-large and trained using HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models.
The training process involves knowledge distillation paired with data augmentation, using our [HarmAug Generated Dataset].

For more information, please refer to our github

image/png

image/png