ShieldGemma Release Collection A series of safety classifiers, trained on top of Gemma 2, for developers to filter inputs and outputs of their applications. • 3 items • Updated Jul 31 • 11
Gemma Scope Release Collection A comprehensive, open suite of sparse autoencoders for Gemma 2 2B and 9B. • 10 items • Updated Aug 11 • 13
In-Context Editing: Learning Knowledge from Self-Induced Distributions Paper • 2406.11194 • Published Jun 17 • 15
Panacea: Pareto Alignment via Preference Adaptation for LLMs Paper • 2402.02030 • Published Feb 3 • 10
CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents Paper • 2401.10568 • Published Jan 19 • 15
Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation Paper • 2310.01320 • Published Oct 2, 2023 • 9