-
Large Language Model Alignment: A Survey
Paper • 2309.15025 • Published • 2 -
Aligning Large Language Models with Human: A Survey
Paper • 2307.12966 • Published • 1 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 45 -
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
Paper • 2310.05344 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2203.11171
-
RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Paper • 2310.01352 • Published • 7 -
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Paper • 2203.11171 • Published • 1 -
MemGPT: Towards LLMs as Operating Systems
Paper • 2310.08560 • Published • 6 -
Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
Paper • 2310.06117 • Published • 3
-
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 14 -
ExpeL: LLM Agents Are Experiential Learners
Paper • 2308.10144 • Published • 2 -
Demystifying GPT Self-Repair for Code Generation
Paper • 2306.09896 • Published • 19 -
Large Language Models are Better Reasoners with Self-Verification
Paper • 2212.09561 • Published • 1
-
Ada-Instruct: Adapting Instruction Generators for Complex Reasoning
Paper • 2310.04484 • Published • 5 -
Diversity of Thought Improves Reasoning Abilities of Large Language Models
Paper • 2310.07088 • Published • 5 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 75 -
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
Paper • 2310.13332 • Published • 14
-
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Paper • 2310.15123 • Published • 7 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 18 -
Self-Convinced Prompting: Few-Shot Question Answering with Repeated Introspection
Paper • 2310.05035 • Published • 1 -
Chain-of-Thought Reasoning is a Policy Improvement Operator
Paper • 2309.08589 • Published • 1