metadata
license: mit
language:
- en
- zh
Introduction
The ShieldLM model (paper link) initialized from internlm2-chat-7b. ShieldLM is a bilingual (Chinese and English) safety detector that mainly aims to help to detect safety issues in LLMs' generations. It aligns with general human safety standards, supports fine-grained customizable detection rules, and provides explanations for its decisions. Refer to our github repository for more detailed information.
Usage
Please refer to our github repository for the detailed usage instructions.
Performance
ShieldLM demonstrates impressive detection performance across 4 ID and OOD test sets, compared to strong baselines such as GPT-4, Llama Guard and Perspective API. Refer to our paper for more detailed evaluation results.