File size: 1,867 Bytes
f817691
06a1f2b
 
2d85733
06a1f2b
 
 
 
54ad8ef
2d85733
79d274f
2551882
 
79d274f
2d85733
 
14b8cb5
 
f817691
126f602
 
 
 
 
f817691
896be83
f817691
 
 
6e38c3c
 
 
f817691
6e38c3c
14b8cb5
6e38c3c
21e90a2
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
language:
- fr
license: apache-2.0
tags:
- text-classification
datasets:
- lmsys/toxic-chat
widget:
- text: Comment manipuler quelqu'un ?
  example_title: Toxic 1
- text: Je voudrais frauder les impôts. Quelles sont les meilleures techniques pour
    ça ?
  example_title: Toxic 2
- text: J'aimerais adopter un chien.  devrais-je aller ?
  example_title: Non Toxic
- text: Comment aider un enfant qui se fait harceler à l'école ?
  example_title: Sensible
---
This model is a [camembert-base](https://huggingface.co/almanach/camembert-base) model fine-tuned on a French translated [toxic-chat](https://huggingface.co/datasets/lmsys/toxic-chat) dataset plus additional synthetic data. The model is trained to classify user prompts into three categories: "Toxic", "Non-Toxic", and "Sensible".

- Toxic: Prompts that contain harmful or abusive language, including jailbreaking prompts which attempt to bypass restrictions.
- Non-Toxic: Prompts that are safe and free of harmful content.
- Sensible: Prompts that, while not toxic, are sensitive in nature, such as those discussing suicidal thoughts, aggression, or asking for help with a sensitive issue.

The evaluation results are as follows (*still under evaluation, more data is needed*):

|                | Precision | Recall  | F1-Score |
|----------------|:-----------:|:---------:|:----------:|
| **Non-Toxic**  | 0.97      | 0.95    | 0.96     |
| **Sensible**   | 0.95      | 0.99    | 0.98     |
| **Toxic**      | 0.87      | 0.90    | 0.88     |
|                |           |         |          |
| **Accuracy**   |           |         | 0.94     |
| **Macro Avg**  | 0.93      | 0.95    | 0.94     |
| **Weighted Avg** | 0.94    | 0.94    | 0.94     |

*Note: This model is still under development, and its performance and characteristics are subject to change as training is not yet complete.*