LLM-Alignment Papers - a rbgo Collection

rbgo 's Collections

LLM-Alignment Papers

LLM-Alignment Papers

updated Sep 9

Concrete Problems in AI Safety

Paper • 1606.06565 • Published Jun 21, 2016 • 1
The Off-Switch Game

Paper • 1611.08219 • Published Nov 24, 2016 • 1
Learning to summarize from human feedback

Paper • 2009.01325 • Published Sep 2, 2020 • 4
Truthful AI: Developing and governing AI that does not lie

Paper • 2110.06674 • Published Oct 13, 2021 • 1
Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 6
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 15
Constitutional AI: Harmlessness from AI Feedback

Paper • 2212.08073 • Published Dec 15, 2022 • 2
Discovering Language Model Behaviors with Model-Written Evaluations

Paper • 2212.09251 • Published Dec 19, 2022 • 1
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

Paper • 2406.09264 • Published Jun 13 • 1
Scalable AI Safety via Doubly-Efficient Debate

Paper • 2311.14125 • Published Nov 23, 2023 • 2