Model Details
Original LlamaGuard 7b model can be found here
Llama-Guard is a 7B parameter Llama 2-based input-output safeguard model. It can be used for classifying content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM: it generates text in its output that indicates whether a given prompt or response is safe/unsafe, and if unsafe based on a policy, it also lists the violating subcategories. Here is an example:
These are exl2 4.65bpw quantized weights. Original 7B model performs on binary classification of 2k toxic chat test examples Precision: 0.9, Recall: 0.277, F1 Score: 0.424
4.0bpw performs Precision: 0.92, Recall: 0.246, F1 Score: 0.389
4.65bpw performs Precision: 0.903, Recall: 0.256, F1 Score: 0.400