Upload README.md
#2
by
Seanie-lee
- opened
README.md
CHANGED
@@ -16,7 +16,7 @@ library_name: transformers
|
|
16 |
|
17 |
Our model functions as a Guard Model, intended to classify the safety of conversations with LLMs and protect against LLM jailbreak attacks.
|
18 |
It is fine-tuned from DeBERTa-v3-large and trained using **HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models**.
|
19 |
-
The training process involves knowledge distillation paired with data augmentation, using our [**HarmAug Generated Dataset**]
|
20 |
|
21 |
|
22 |
For more information, please refer to our [github](https://github.com/imnotkind/HarmAug)
|
@@ -44,7 +44,7 @@ model.eval()
|
|
44 |
# If response is not given, the model will predict the unsafe score of the prompt.
|
45 |
# If response is given, the model will predict the unsafe score of the response.
|
46 |
def predict(model, prompt, response=None):
|
47 |
-
device = model.device
|
48 |
if response == None:
|
49 |
inputs = tokenizer(prompt, return_tensors="pt")
|
50 |
else:
|
|
|
16 |
|
17 |
Our model functions as a Guard Model, intended to classify the safety of conversations with LLMs and protect against LLM jailbreak attacks.
|
18 |
It is fine-tuned from DeBERTa-v3-large and trained using **HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models**.
|
19 |
+
The training process involves knowledge distillation paired with data augmentation, using our [**HarmAug Generated Dataset**].
|
20 |
|
21 |
|
22 |
For more information, please refer to our [github](https://github.com/imnotkind/HarmAug)
|
|
|
44 |
# If response is not given, the model will predict the unsafe score of the prompt.
|
45 |
# If response is given, the model will predict the unsafe score of the response.
|
46 |
def predict(model, prompt, response=None):
|
47 |
+
device = model.device
|
48 |
if response == None:
|
49 |
inputs = tokenizer(prompt, return_tensors="pt")
|
50 |
else:
|