Shengyi Costa Huang's picture

Shengyi Costa Huang

vwxyzjn

·

http://costa.sh

AI & ML interests

None yet

Articles

How NuminaMath Won the 1st AIMO Progress Prize

Preference Optimization for Vision Language Models

Putting RL back in RLHF

Constitutional AI with Open LLMs

The N Implementation Details of RLHF with PPO

Organizations

vwxyzjn's activity

upvoted a collection 5 months ago

RLOO / PPOv2 TL;DR summarize checkpoints

4 items • Updated Jun 11 • 1

upvoted a paper 6 months ago

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20 • 34

upvoted a paper 11 months ago

Exploiting Novel GPT-4 APIs

Paper • 2312.14302 • Published Dec 21, 2023 • 12

upvoted 3 papers about 1 year ago

Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning

Paper • 2311.03736 • Published Nov 7, 2023 • 9

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 122

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

Paper • 2310.00036 • Published Sep 29, 2023 • 2

upvoted 4 papers over 1 year ago

Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 242

Learning to Model the World with Language

Paper • 2308.01399 • Published Jul 31, 2023 • 34

CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms

Paper • 2111.08819 • Published Nov 16, 2021 • 2

Secrets of RLHF in Large Language Models Part I: PPO

Paper • 2307.04964 • Published Jul 11, 2023 • 28