We collect the open-source datasets and process them into the standard format.
AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
models
9
RLHFlow/LLaMA3-SFT-v2
Text Generation
•
Updated
•
84
RLHFlow/LLaMA3.2-3B-SFT
Text Generation
•
Updated
•
37
RLHFlow/LLaMA3.2-1B-SFT
Text Generation
•
Updated
•
12
RLHFlow/ArmoRM-Llama3-8B-v0.1
Text Classification
•
Updated
•
38.9k
•
133
RLHFlow/LLaMA3-iterative-DPO-final
Text Generation
•
Updated
•
4.58k
•
41
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation
•
Updated
•
309
•
33
RLHFlow/LLaMA3-SFT
Text Generation
•
Updated
•
5.01k
•
7
RLHFlow/DPA-v1-Mistral-7B
Text Generation
•
Updated
•
21
•
1
RLHFlow/RewardModel-Mistral-7B-for-DPA-v1
Text Classification
•
Updated
•
619
•
1
datasets
43
RLHFlow/Llama3-SFT-RAFT-Ultrafeedback-iter1
Viewer
•
Updated
•
20k
•
10
RLHFlow/ultrafeedback_iter3
Viewer
•
Updated
•
19.6k
•
8
RLHFlow/ultrafeedback_iter2
Viewer
•
Updated
•
20k
•
8
RLHFlow/ultrafeedback_iter1
Viewer
•
Updated
•
20k
•
17
RLHFlow/pair-preference-Skywork-80K-v0.1
Viewer
•
Updated
•
82k
•
296
RLHFlow/ArmoRM-Multi-Objective-Data-v0.2
Viewer
•
Updated
•
555k
•
22
RLHFlow/ArmoRM-Multi-Objective-Data-v0.1
Viewer
•
Updated
•
569k
•
55
RLHFlow/pair_data_v2_80K_wsafety_short
Viewer
•
Updated
•
790k
•
18
RLHFlow/pair_data_v2_78_wo_safety
Viewer
•
Updated
•
777k
•
2
RLHFlow/pair_data_v2_80K_wsafety
Viewer
•
Updated
•
803k
•
80
•
1