stereoplegic
's Collections
RL/Alignment
updated
Moral Foundations of Large Language Models
Paper
•
2310.15337
•
Published
•
1
Specific versus General Principles for Constitutional AI
Paper
•
2310.13798
•
Published
•
2
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper
•
2310.13639
•
Published
•
24
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI
Feedback
Paper
•
2309.00267
•
Published
•
47
Self-Alignment with Instruction Backtranslation
Paper
•
2308.06259
•
Published
•
40
Deep Reinforcement Learning from Hierarchical Weak Preference Feedback
Paper
•
2309.02632
•
Published
•
1
A General Theoretical Paradigm to Understand Learning from Human
Preferences
Paper
•
2310.12036
•
Published
•
14
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for
LLM Alignment
Paper
•
2310.00212
•
Published
•
2
Learning Optimal Advantage from Preferences and Mistaking it for Reward
Paper
•
2310.02456
•
Published
•
1
Teaching Language Models to Self-Improve through Interactive
Demonstrations
Paper
•
2310.13522
•
Published
•
11
Chain-of-Thought Reasoning is a Policy Improvement Operator
Paper
•
2309.08589
•
Published
•
1
MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language
Models
Paper
•
2310.12426
•
Published
•
1
Enable Language Models to Implicitly Learn Self-Improvement From Data
Paper
•
2310.00898
•
Published
•
23
Tuna: Instruction Tuning using Feedback from Large Language Models
Paper
•
2310.13385
•
Published
•
10
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
Paper
•
2310.11716
•
Published
•
5
CITING: Large Language Models Create Curriculum for Instruction Tuning
Paper
•
2310.02527
•
Published
•
2
Towards Understanding Sycophancy in Language Models
Paper
•
2310.13548
•
Published
•
4
Peering Through Preferences: Unraveling Feedback Acquisition for
Aligning Large Language Models
Paper
•
2308.15812
•
Published
•
1
SALMON: Self-Alignment with Principle-Following Reward Models
Paper
•
2310.05910
•
Published
•
2
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper
•
2310.01377
•
Published
•
5
Verbosity Bias in Preference Labeling by Large Language Models
Paper
•
2310.10076
•
Published
•
2
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Paper
•
2310.12773
•
Published
•
28
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper
•
2309.11235
•
Published
•
16
DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller
Language Models
Paper
•
2310.05074
•
Published
•
1
SELF: Language-Driven Self-Evolution for Large Language Model
Paper
•
2310.00533
•
Published
•
2
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
Feedback
Paper
•
2309.10691
•
Published
•
4
A Long Way to Go: Investigating Length Correlations in RLHF
Paper
•
2310.03716
•
Published
•
9
Efficient RLHF: Reducing the Memory Usage of PPO
Paper
•
2309.00754
•
Published
•
13
Aligning Language Models with Offline Reinforcement Learning from Human
Feedback
Paper
•
2308.12050
•
Published
•
1
Reward Model Ensembles Help Mitigate Overoptimization
Paper
•
2310.02743
•
Published
•
1
SCREWS: A Modular Framework for Reasoning with Revisions
Paper
•
2309.13075
•
Published
•
15
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle
Consistency
Paper
•
2310.03734
•
Published
•
14
DSPy: Compiling Declarative Language Model Calls into Self-Improving
Pipelines
Paper
•
2310.03714
•
Published
•
30
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
Paper
•
2310.03739
•
Published
•
21
Aligning Large Multimodal Models with Factually Augmented RLHF
Paper
•
2309.14525
•
Published
•
29
Prometheus: Inducing Fine-grained Evaluation Capability in Language
Models
Paper
•
2310.08491
•
Published
•
53
The Consensus Game: Language Model Generation via Equilibrium Search
Paper
•
2310.09139
•
Published
•
12
Quality-Diversity through AI Feedback
Paper
•
2310.13032
•
Published
•
1
Reward-Augmented Decoding: Efficient Controlled Text Generation With a
Unidirectional Reward Model
Paper
•
2310.09520
•
Published
•
10
Controllable Text Generation with Residual Memory Transformer
Paper
•
2309.16231
•
Published
•
1
Paper
•
2309.16609
•
Published
•
34
Controlled Decoding from Language Models
Paper
•
2310.17022
•
Published
•
14
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper
•
2310.17631
•
Published
•
32
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language
Models' Alignment
Paper
•
2308.05374
•
Published
•
27
SayCanPay: Heuristic Planning with Large Language Models using Learnable
Domain Knowledge
Paper
•
2308.12682
•
Published
•
2
Grounding Large Language Models in Interactive Environments with Online
Reinforcement Learning
Paper
•
2302.02662
•
Published
•
1
Natural Logic-guided Autoregressive Multi-hop Document Retrieval for
Fact Verification
Paper
•
2212.05276
•
Published
•
1
Aligning Large Language Models with Human: A Survey
Paper
•
2307.12966
•
Published
•
1
Zephyr: Direct Distillation of LM Alignment
Paper
•
2310.16944
•
Published
•
121
Statistical Rejection Sampling Improves Preference Optimization
Paper
•
2309.06657
•
Published
•
13
Principle-Driven Self-Alignment of Language Models from Scratch with
Minimal Human Supervision
Paper
•
2305.03047
•
Published
•
1
CIEM: Contrastive Instruction Evaluation Method for Better Instruction
Tuning
Paper
•
2309.02301
•
Published
•
1
TouchStone: Evaluating Vision-Language Models by Language Models
Paper
•
2308.16890
•
Published
•
1
Are Large Language Model-based Evaluators the Solution to Scaling Up
Multilingual Evaluation?
Paper
•
2309.07462
•
Published
•
3
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Paper
•
2309.10202
•
Published
•
9
VIGC: Visual Instruction Generation and Correction
Paper
•
2308.12714
•
Published
•
1
Improving Generalization of Alignment with Human Preferences through
Group Invariant Learning
Paper
•
2310.11971
•
Published
•
1
Large Language Models as Optimizers
Paper
•
2309.03409
•
Published
•
75
In-Context Alignment: Chat with Vanilla Language Models Before
Fine-Tuning
Paper
•
2308.04275
•
Published
•
1
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
•
2305.18290
•
Published
•
47
Beyond Reward: Offline Preference-guided Policy Optimization
Paper
•
2305.16217
•
Published
•
1
Decentralized Policy Optimization
Paper
•
2211.03032
•
Published
•
1
Large Language Models are not Fair Evaluators
Paper
•
2305.17926
•
Published
•
1
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Paper
•
2305.10425
•
Published
•
5
Don't throw away your value model! Making PPO even better via
Value-Guided Monte-Carlo Tree Search decoding
Paper
•
2309.15028
•
Published
•
1
Improving Language Models with Advantage-based Offline Policy Gradients
Paper
•
2305.14718
•
Published
•
2
Large Language Models Cannot Self-Correct Reasoning Yet
Paper
•
2310.01798
•
Published
•
33
Q-Transformer: Scalable Offline Reinforcement Learning via
Autoregressive Q-Functions
Paper
•
2309.10150
•
Published
•
24
CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning
Paper
•
2207.01780
•
Published
•
1
RLTF: Reinforcement Learning from Unit Test Feedback
Paper
•
2307.04349
•
Published
•
4
Fine-Grained Human Feedback Gives Better Rewards for Language Model
Training
Paper
•
2306.01693
•
Published
•
3
Fine-tuning Language Models with Generative Adversarial Feedback
Paper
•
2305.06176
•
Published
•
1
Aligning Large Language Models through Synthetic Feedback
Paper
•
2305.13735
•
Published
•
1
Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Paper
•
2306.02231
•
Published
•
2
Reinforced Self-Training (ReST) for Language Modeling
Paper
•
2308.08998
•
Published
•
2
SuperHF: Supervised Iterative Learning from Human Feedback
Paper
•
2310.16763
•
Published
•
1
Split and Merge: Aligning Position Biases in Large Language Model based
Evaluators
Paper
•
2310.01432
•
Published
•
1
Generative Judge for Evaluating Alignment
Paper
•
2310.05470
•
Published
•
1
Personas as a Way to Model Truthfulness in Language Models
Paper
•
2310.18168
•
Published
•
5
A Framework for Automated Measurement of Responsible AI Harms in
Generative AI Applications
Paper
•
2310.17750
•
Published
•
9
RRAML: Reinforced Retrieval Augmented Machine Learning
Paper
•
2307.12798
•
Published
•
1
Enabling Intelligent Interactions between an Agent and an LLM: A
Reinforcement Learning Approach
Paper
•
2306.03604
•
Published
•
1
Conservative Dual Policy Optimization for Efficient Model-Based
Reinforcement Learning
Paper
•
2209.07676
•
Published
•
2
Fine-tuning Aligned Language Models Compromises Safety, Even When Users
Do Not Intend To!
Paper
•
2310.03693
•
Published
•
1
Evaluating the Moral Beliefs Encoded in LLMs
Paper
•
2307.14324
•
Published
•
1
Moral Mimicry: Large Language Models Produce Moral Rationalizations
Tailored to Political Identity
Paper
•
2209.12106
•
Published
•
1
Red-Teaming Large Language Models using Chain of Utterances for
Safety-Alignment
Paper
•
2308.09662
•
Published
•
3
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large
Language Model Application
Paper
•
2305.17701
•
Published
•
1
A Survey on Fairness in Large Language Models
Paper
•
2308.10149
•
Published
•
1
Do LLMs Understand User Preferences? Evaluating LLMs On User Rating
Prediction
Paper
•
2305.06474
•
Published
•
1
Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent
Problems in AI Alignment using Large-Language Models
Paper
•
2307.11137
•
Published
•
1
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Paper
•
2310.20624
•
Published
•
12
ExpeL: LLM Agents Are Experiential Learners
Paper
•
2308.10144
•
Published
•
2
Sociotechnical Safety Evaluation of Generative AI Systems
Paper
•
2310.11986
•
Published
Open Problems and Fundamental Limitations of Reinforcement Learning from
Human Feedback
Paper
•
2307.15217
•
Published
•
36
Secrets of RLHF in Large Language Models Part I: PPO
Paper
•
2307.04964
•
Published
•
28
Demystifying GPT Self-Repair for Code Generation
Paper
•
2306.09896
•
Published
•
19
Training language models to follow instructions with human feedback
Paper
•
2203.02155
•
Published
•
15
Okapi: Instruction-tuned Large Language Models in Multiple Languages
with Reinforcement Learning from Human Feedback
Paper
•
2307.16039
•
Published
•
4
A Mixture-of-Expert Approach to RL-based Dialogue Management
Paper
•
2206.00059
•
Published
•
1
"Pick-and-Pass" as a Hat-Trick Class for First-Principle Memory,
Generalizability, and Interpretability Benchmarks
Paper
•
2310.20654
•
Published
•
1
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in
Zero-Shot Reasoning
Paper
•
2212.08061
•
Published
•
1
MUTEX: Learning Unified Policies from Multimodal Task Specifications
Paper
•
2309.14320
•
Published
•
1
Lifelong Inverse Reinforcement Learning
Paper
•
2207.00461
•
Published
•
1
Improving Code Generation by Training with Natural Language Feedback
Paper
•
2303.16749
•
Published
•
1
Tailoring Self-Rationalizers with Multi-Reward Distillation
Paper
•
2311.02805
•
Published
•
3
Unleashing the Power of Pre-trained Language Models for Offline
Reinforcement Learning
Paper
•
2310.20587
•
Published
•
16
B-Coder: Value-Based Deep Reinforcement Learning for Program
Synthesis
Paper
•
2310.03173
•
Published
•
1
Towards Anytime Fine-tuning: Continually Pre-trained Language Models
with Hypernetwork Prompt
Paper
•
2310.13024
•
Published
•
1
Multi-Task Recommendations with Reinforcement Learning
Paper
•
2302.03328
•
Published
•
1
Curriculum-based Asymmetric Multi-task Reinforcement Learning
Paper
•
2211.03352
•
Published
•
1
Efficient Training of Multi-task Combinarotial Neural Solver with
Multi-armed Bandits
Paper
•
2305.06361
•
Published
•
1
Rethinking Decision Transformer via Hierarchical Reinforcement Learning
Paper
•
2311.00267
•
Published
•
1
Pre-training with Synthetic Data Helps Offline Reinforcement Learning
Paper
•
2310.00771
•
Published
•
2
Guiding Pretraining in Reinforcement Learning with Large Language Models
Paper
•
2302.06692
•
Published
•
1
Large Language Model Alignment: A Survey
Paper
•
2309.15025
•
Published
•
2
Making Large Language Models Better Reasoners with Alignment
Paper
•
2309.02144
•
Published
•
2
Pretraining in Deep Reinforcement Learning: A Survey
Paper
•
2211.03959
•
Published
•
1
Reinforcement Learning for Generative AI: A Survey
Paper
•
2308.14328
•
Published
•
1
d3rlpy: An Offline Deep Reinforcement Learning Library
Paper
•
2111.03788
•
Published
•
1
Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory
Weighting
Paper
•
2306.13085
•
Published
•
1
Efficient Online Reinforcement Learning with Offline Data
Paper
•
2302.02948
•
Published
•
2
Improving Offline-to-Online Reinforcement Learning with Q-Ensembles
Paper
•
2306.06871
•
Published
•
1
A Simple Unified Uncertainty-Guided Framework for Offline-to-Online
Reinforcement Learning
Paper
•
2306.07541
•
Published
•
1
A Dataset Perspective on Offline Reinforcement Learning
Paper
•
2111.04714
•
Published
•
1
Goal-Conditioned Predictive Coding as an Implicit Planner for Offline
Reinforcement Learning
Paper
•
2307.03406
•
Published
•
1
Semi-Supervised Offline Reinforcement Learning with Action-Free
Trajectories
Paper
•
2210.06518
•
Published
•
1
Mildly Constrained Evaluation Policy for Offline Reinforcement Learning
Paper
•
2306.03680
•
Published
•
1
Conservative State Value Estimation for Offline Reinforcement Learning
Paper
•
2302.06884
•
Published
•
1
Revisiting the Minimalist Approach to Offline Reinforcement Learning
Paper
•
2305.09836
•
Published
•
3
Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch
Size
Paper
•
2211.11092
•
Published
•
1
A learning gap between neuroscience and reinforcement learning
Paper
•
2104.10995
•
Published
•
1
AF Adapter: Continual Pretraining for Building Chinese Biomedical
Language Model
Paper
•
2211.11363
•
Published
•
1
RLocator: Reinforcement Learning for Bug Localization
Paper
•
2305.05586
•
Published
•
1
Beyond Words: A Mathematical Framework for Interpreting Large Language
Models
Paper
•
2311.03033
•
Published
•
1
LoopTune: Optimizing Tensor Computations with Reinforcement Learning
Paper
•
2309.01825
•
Published
•
1
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
Paper
•
2308.13387
•
Published
•
1
Hypernetworks for Zero-shot Transfer in Reinforcement Learning
Paper
•
2211.15457
•
Published
•
1
Offline Experience Replay for Continual Offline Reinforcement Learning
Paper
•
2305.13804
•
Published
•
1
Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning
Paper
•
2301.11063
•
Published
•
1
Fine-tuning Language Models for Factuality
Paper
•
2311.08401
•
Published
•
28
From Instructions to Intrinsic Human Values -- A Survey of Alignment
Goals for Big Models
Paper
•
2308.12014
•
Published
•
1
Automatically Correcting Large Language Models: Surveying the landscape
of diverse self-correction strategies
Paper
•
2308.03188
•
Published
•
2
When to Show a Suggestion? Integrating Human Feedback in AI-Assisted
Programming
Paper
•
2306.04930
•
Published
•
3
Appropriateness is all you need!
Paper
•
2304.14553
•
Published
•
1
LLM Cognitive Judgements Differ From Human
Paper
•
2307.11787
•
Published
•
1
Fake Alignment: Are LLMs Really Aligned Well?
Paper
•
2311.05915
•
Published
•
2
LLM Augmented Hierarchical Agents
Paper
•
2311.05596
•
Published
•
1
Self-driven Grounding: Large Language Model Agents with Automatical
Language-aligned Skill Learning
Paper
•
2309.01352
•
Published
•
1
Introspective Tips: Large Language Model for In-Context Decision Making
Paper
•
2305.11598
•
Published
•
1
SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and
Reasoning
Paper
•
2305.15486
•
Published
•
1
Alignment is not sufficient to prevent large language models from
generating harmful information: A psychoanalytic perspective
Paper
•
2311.08487
•
Published
•
1
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM
Paper
•
2311.09528
•
Published
•
2
Learning to Prune Deep Neural Networks via Reinforcement Learning
Paper
•
2007.04756
•
Published
•
1
Training Language Models with Language Feedback at Scale
Paper
•
2303.16755
•
Published
•
1
Recomposing the Reinforcement Learning Building Blocks with
Hypernetworks
Paper
•
2106.06842
•
Published
•
1
Continual Model-Based Reinforcement Learning with Hypernetworks
Paper
•
2009.11997
•
Published
•
1
Responsible Task Automation: Empowering Large Language Models as
Responsible Task Automators
Paper
•
2306.01242
•
Published
•
2
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
Paper
•
2312.14878
•
Published
•
13
ICE-GRT: Instruction Context Enhancement by Generative Reinforcement
based Transformers
Paper
•
2401.02072
•
Published
•
9
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
Fine-grained Correctional Human Feedback
Paper
•
2312.00849
•
Published
•
8
Diffusion Model Alignment Using Direct Preference Optimization
Paper
•
2311.12908
•
Published
•
47
Paper
•
2312.07000
•
Published
•
11
Pearl: A Production-ready Reinforcement Learning Agent
Paper
•
2312.03814
•
Published
•
14
Routing to the Expert: Efficient Reward-guided Ensemble of Large
Language Models
Paper
•
2311.08692
•
Published
•
12
Trusted Source Alignment in Large Language Models
Paper
•
2311.06697
•
Published
•
10
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
•
2401.01335
•
Published
•
64
Self-Rewarding Language Models
Paper
•
2401.10020
•
Published
•
143
Aligning Large Language Models with Human Preferences through
Representation Engineering
Paper
•
2312.15997
•
Published
•
1
DRLC: Reinforcement Learning with Dense Rewards from LLM Critic
Paper
•
2401.07382
•
Published
•
2
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper
•
2401.06080
•
Published
•
25
Data-Efficient Alignment of Large Language Models with Human Feedback
Through Natural Language
Paper
•
2311.14543
•
Published
•
1
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper
•
2401.18058
•
Published
•
21
StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback
Paper
•
2402.01391
•
Published
•
41
Efficient Exploration for LLMs
Paper
•
2402.00396
•
Published
•
21
West-of-N: Synthetic Preference Generation for Improved Reward Modeling
Paper
•
2401.12086
•
Published
•
1
Improving Reinforcement Learning from Human Feedback with Efficient
Reward Model Ensemble
Paper
•
2401.16635
•
Published
•
1
Uncertainty-Penalized Reinforcement Learning from Human Feedback with
Diverse Reward LoRA Ensembles
Paper
•
2401.00243
•
Published
•
1
Iterative Data Smoothing: Mitigating Reward Overfitting and
Overoptimization in RLHF
Paper
•
2401.16335
•
Published
•
1
Transforming and Combining Rewards for Aligning Large Language Models
Paper
•
2402.00742
•
Published
•
11
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate
Reward Hacking
Paper
•
2312.09244
•
Published
•
7
Can LLMs Follow Simple Rules?
Paper
•
2311.04235
•
Published
•
10
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Paper
•
2311.05584
•
Published
•
1
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Paper
•
2402.08609
•
Published
•
34
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
•
2402.03300
•
Published
•
69
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
•
2401.08967
•
Published
•
27
In deep reinforcement learning, a pruned network is a good network
Paper
•
2402.12479
•
Published
•
17
Q-Probe: A Lightweight Approach to Reward Maximization for Language
Models
Paper
•
2402.14688
•
Published
WildChat: 1M ChatGPT Interaction Logs in the Wild
Paper
•
2405.01470
•
Published
•
59
FLAME: Factuality-Aware Alignment for Large Language Models
Paper
•
2405.01525
•
Published
•
24
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Paper
•
2405.01481
•
Published
•
25
Self-Play Preference Optimization for Language Model Alignment
Paper
•
2405.00675
•
Published
•
24
Small Language Model Can Self-correct
Paper
•
2401.07301
•
Published
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper
•
2406.00888
•
Published
•
30
LongSkywork: A Training Recipe for Efficiently Extending Context Length
in Large Language Models
Paper
•
2406.00605
•
Published
•
2
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper
•
2406.11827
•
Published
•
14
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context
Reinforcement Learning
Paper
•
2406.08973
•
Published
•
85