Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
RLHFlow
's Collections
RLHFlow MATH Process Reward Model
Standard-format-preference-dataset
Mixture-of-preference-reward-modeling
RM-Bradley-Terry
PM-pair
Online RLHF
RLHFLow Reward Models
SFT Models
SFT Models
updated
2 days ago
We train a series of SFT models on the high-quality SFT dataset of RLHFlow for research purpose.
Upvote
1
RLHFlow/LLaMA3-SFT
Text Generation
•
Updated
1 day ago
•
8.44k
•
7
RLHFlow/RLHFlow-SFT-Dataset-ver2
Viewer
•
Updated
3 days ago
•
2.32M
•
6
•
1
RLHFlow/LLaMA3-SFT-v2
Text Generation
•
Updated
1 day ago
•
1.45k
RLHFlow/Llama3-SFT-v2.0-epoch1
Text Generation
•
Updated
1 day ago
RLHFlow/Llama3-SFT-v2.0-epoch2
Text Generation
•
Updated
1 day ago
•
1
RLHFlow/Llama3-SFT-v2.0-epoch3
Text Generation
•
Updated
1 day ago
Upvote
1
Share collection
View history
Collection guide
Browse collections