Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems Paper โข 2410.13334 โข Published 15 days ago โข 12
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models Paper โข 2410.01524 โข Published 30 days ago โข 3