pronics2004
commited on
Commit
•
784f7e6
1
Parent(s):
03ea917
Update README.md
Browse files
README.md
CHANGED
@@ -6,11 +6,16 @@ pipeline_tag: text-classification
|
|
6 |
library_name: transformers
|
7 |
---
|
8 |
|
9 |
-
##
|
10 |
-
This model is IBM's 12-layer toxicity binary classifier for English, intended to be used as a guardrail for any large language model. It has been trained on several benchmark datasets in English, specifically for detecting hateful, abusive, profane and other toxic content in plain text.
|
11 |
|
|
|
|
|
12 |
|
13 |
-
|
|
|
|
|
|
|
|
|
14 |
```python
|
15 |
# Example of how to use the model
|
16 |
import torch
|
@@ -37,4 +42,7 @@ with torch.no_grad():
|
|
37 |
This model demonstrates superior average performance in comparison with other models on eight mainstream toxicity benchmarks. If a very fast model is required, please refer to the lightweight 4-layer IBM model, [granite-guardian-hap-38m](https://huggingface.co/ibm-granite/granite-guardian-hap-38m).
|
38 |
|
39 |
![Description of Image](125m_comparison_a.png)
|
40 |
-
![Description of Image](125m_comparison_b.png)
|
|
|
|
|
|
|
|
6 |
library_name: transformers
|
7 |
---
|
8 |
|
9 |
+
## Granite-Guardian-HAP-125m
|
|
|
10 |
|
11 |
+
## Model Summary
|
12 |
+
This model is IBM's 12-layer toxicity binary classifier for English, intended to be used as a guardrail for any large language model. It has been trained on several benchmark datasets in English, specifically for detecting hateful, abusive, profane and other toxic content in plain text.
|
13 |
|
14 |
+
- **Developers:** IBM Research
|
15 |
+
- **Release Date**: September 6th, 2024
|
16 |
+
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
17 |
+
|
18 |
+
## Usage
|
19 |
```python
|
20 |
# Example of how to use the model
|
21 |
import torch
|
|
|
42 |
This model demonstrates superior average performance in comparison with other models on eight mainstream toxicity benchmarks. If a very fast model is required, please refer to the lightweight 4-layer IBM model, [granite-guardian-hap-38m](https://huggingface.co/ibm-granite/granite-guardian-hap-38m).
|
43 |
|
44 |
![Description of Image](125m_comparison_a.png)
|
45 |
+
![Description of Image](125m_comparison_b.png)
|
46 |
+
|
47 |
+
## Ethical Considerations and Limitations
|
48 |
+
The use of model-based guardrails for Large Language Models (LLMs) involves risks and ethical considerations people must be aware of. This model operates on chunks of texts and provides a score indicating the presence of hate speech, abuse, or profanity. However, the efficacy of the model can be limited by several factors: the potential inability to capture nuanced meanings or the risk of false positives or negatives on text that is dissimilar to the training data. Previous research has demonstrated the risk of various biases in toxicity or hate speech detection. That is also relevant to this work. We urge the community to use this model with ethical intentions and in a responsible way.
|