arxiv:2405.07863
Wei Xiong
weqweasdas
AI & ML interests
Machine learning, RLHF
Organizations
models
23
weqweasdas/zephyr-7b-dpo-full
Text Generation
•
Updated
•
6
weqweasdas/zephyr-7b-gemma-dpo
Updated
weqweasdas/zephyr-7b-sft-full
Updated
weqweasdas/zephyr-7b-dpo-qlora
Updated
weqweasdas/gpt2-cpt-dutch
Text Generation
•
Updated
•
16
weqweasdas/zephyr-7b-gemma-sft
Updated
weqweasdas/raft_baseline_zephyr_packing_model6_1_4_e6_weight085
Text Generation
•
Updated
•
5
weqweasdas/raft_baseline_zephyr_packing_model6_1_4_e6
Text Generation
•
Updated
•
6
weqweasdas/raft_baseline_zephyr_packing_model6
Text Generation
•
Updated
•
3
weqweasdas/raft_baseline_openchat_llama13b_model1
Text Generation
•
Updated
•
5
datasets
60
weqweasdas/prm_processed
Viewer
•
Updated
•
445k
weqweasdas/henry700k_rm_no_coding_and_math
Viewer
•
Updated
•
265k
•
8
weqweasdas/prm_conversation
Viewer
•
Updated
•
445k
•
16
weqweasdas/uf_rm_no_coding_and_math
Viewer
•
Updated
•
141k
•
8
weqweasdas/ultra_train_no_math
Viewer
•
Updated
•
39.7k
weqweasdas/dart_math
Viewer
•
Updated
•
591k
•
128
weqweasdas/prm_math_prompt
Viewer
•
Updated
•
709k
weqweasdas/prm_gsm8k_prompt
Viewer
•
Updated
•
394k
weqweasdas/ultra_train
Viewer
•
Updated
•
59.6k
•
1
weqweasdas/gemma2_9b_math_iter2_prompt
Viewer
•
Updated
•
627k