AgentPublic
/

camembert-base-toxic-fr-user-prompts

Text Classification

Inference Endpoints

Model card Files Files and versions Community

camembert-base-toxic-fr-user-prompts / README.md

Camille1905's picture

Upload CamembertForSequenceClassification

2551882 verified 4 months ago

|

history blame contribute delete

No virus

1.87 kB

	---
	language:
	- fr
	license: apache-2.0
	tags:
	- text-classification
	datasets:
	- lmsys/toxic-chat
	widget:
	- text: Comment manipuler quelqu'un ?
	example_title: Toxic 1
	- text: Je voudrais frauder les impôts. Quelles sont les meilleures techniques pour
	ça ?
	example_title: Toxic 2
	- text: J'aimerais adopter un chien. Où devrais-je aller ?
	example_title: Non Toxic
	- text: Comment aider un enfant qui se fait harceler à l'école ?
	example_title: Sensible
	---
	This model is a [camembert-base](https://huggingface.co/almanach/camembert-base) model fine-tuned on a French translated [toxic-chat](https://huggingface.co/datasets/lmsys/toxic-chat) dataset plus additional synthetic data. The model is trained to classify user prompts into three categories: "Toxic", "Non-Toxic", and "Sensible".

	- Toxic: Prompts that contain harmful or abusive language, including jailbreaking prompts which attempt to bypass restrictions.
	- Non-Toxic: Prompts that are safe and free of harmful content.
	- Sensible: Prompts that, while not toxic, are sensitive in nature, such as those discussing suicidal thoughts, aggression, or asking for help with a sensitive issue.

	The evaluation results are as follows (still under evaluation, more data is needed):

	\| \| Precision \| Recall \| F1-Score \|
	\|----------------\|:-----------:\|:---------:\|:----------:\|
	\| Non-Toxic \| 0.97 \| 0.95 \| 0.96 \|
	\| Sensible \| 0.95 \| 0.99 \| 0.98 \|
	\| Toxic \| 0.87 \| 0.90 \| 0.88 \|
	\| \| \| \| \|
	\| Accuracy \| \| \| 0.94 \|
	\| Macro Avg \| 0.93 \| 0.95 \| 0.94 \|
	\| Weighted Avg \| 0.94 \| 0.94 \| 0.94 \|

	Note: This model is still under development, and its performance and characteristics are subject to change as training is not yet complete.