rbgo
's Collections
LLM-Alignment Papers
updated
Concrete Problems in AI Safety
Paper
•
1606.06565
•
Published
•
1
Paper
•
1611.08219
•
Published
•
1
Learning to summarize from human feedback
Paper
•
2009.01325
•
Published
•
4
Truthful AI: Developing and governing AI that does not lie
Paper
•
2110.06674
•
Published
•
1
Scaling Laws for Neural Language Models
Paper
•
2001.08361
•
Published
•
6
Training language models to follow instructions with human feedback
Paper
•
2203.02155
•
Published
•
15
Constitutional AI: Harmlessness from AI Feedback
Paper
•
2212.08073
•
Published
•
2
Discovering Language Model Behaviors with Model-Written Evaluations
Paper
•
2212.09251
•
Published
•
1
Towards Bidirectional Human-AI Alignment: A Systematic Review for
Clarifications, Framework, and Future Directions
Paper
•
2406.09264
•
Published
•
1
Scalable AI Safety via Doubly-Efficient Debate
Paper
•
2311.14125
•
Published
•
2