Align - a xxyyy123 Collection

xxyyy123 's Collections

LLM

Align

Dataset

Align

updated Aug 2

Internal Consistency and Self-Feedback in Large Language Models: A Survey

Paper • 2407.14507 • Published Jul 19 • 46
New Desiderata for Direct Preference Optimization

Paper • 2407.09072 • Published Jul 12 • 9
Self-Recognition in Language Models

Paper • 2407.06946 • Published Jul 9 • 24
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Paper • 2407.04842 • Published Jul 5 • 52
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning

Paper • 2407.00782 • Published Jun 30 • 23
Direct Preference Knowledge Distillation for Large Language Models

Paper • 2406.19774 • Published Jun 28 • 21
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Paper • 2407.00617 • Published Jun 30 • 7
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

Paper • 2406.18629 • Published Jun 26 • 41
Aligning Teacher with Student Preferences for Tailored Training Data Generation

Paper • 2406.19227 • Published Jun 27 • 24
Can LLMs Learn by Teaching? A Preliminary Study

Paper • 2406.14629 • Published Jun 20 • 18
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data

Paper • 2406.18790 • Published Jun 26 • 33
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

Paper • 2406.16377 • Published Jun 24 • 11
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Paper • 2406.16855 • Published Jun 24 • 54
OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?

Paper • 2406.16772 • Published Jun 24 • 2
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Paper • 2406.12624 • Published Jun 18 • 36
Bootstrapping Language Models with DPO Implicit Rewards

Paper • 2406.09760 • Published Jun 14 • 38
Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Paper • 2407.02477 • Published Jul 2 • 21
Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Paper • 2407.18248 • Published Jul 25 • 31