A collection of chat models to explore the differences between three alignment techniques: DPO, IPO, and KTO.
AI & ML interests
None defined yet.
Collections
1
models
62
trl-lib/qwen1.5-1.8b-dpo-cli
Updated
trl-lib/qwen1.5-0.5b-sft
Text Generation
•
Updated
•
11
trl-lib/qwen1.5-1.8b-sft
Text Generation
•
Updated
•
195
•
4
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.9-steps-800
Updated
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.8-steps-800
Updated
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.7-steps-800
Updated
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.6-steps-800
Updated
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.5-steps-800
Updated
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.4-steps-800
Updated
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.3-steps-800
Updated