trl-lib (TRL)

Collections 2

spaces 2

Sleeping

10

⚒️

TextEnvironments

Runtime error

213

🦙

StackLLaMa

models 80

datasets 15

trl-lib/prm800k

Viewer • Updated about 18 hours ago • 41.2k • 20

trl-lib/rlaif-v

Viewer • Updated Sep 27 • 83.1k • 91 • 1

trl-lib/Capybara-Preferences

Viewer • Updated Sep 19 • 15.4k • 50

trl-lib/Capybara

Viewer • Updated Sep 19 • 16k • 1.05k

trl-lib/ultrafeedback-prompt

Viewer • Updated Sep 16 • 39.8k • 1.07k • 2

trl-lib/tldr

Viewer • Updated Sep 12 • 130k • 1.61k

trl-lib/ultrafeedback_binarized

Viewer • Updated Sep 12 • 63.1k • 4.06k • 3

trl-lib/lm-human-preferences-sentiment

Viewer • Updated Sep 10 • 6.26k • 52

trl-lib/lm-human-preferences-descriptiveness

Viewer • Updated Sep 10 • 6.26k • 37

trl-lib/tldr-preference

Viewer • Updated Sep 10 • 179k • 86

TRL

AI & ML interests

Collections 2

teknium/OpenHermes-2.5-Mistral-7B

Intel/orca_dpo_pairs

trl-lib/OpenHermes-2-Mistral-7B-ipo-beta-0.1-steps-200

trl-lib/OpenHermes-2-Mistral-7B-ipo-beta-0.2-steps-200

trl-lib/pythia-1b-deduped-tldr-online-dpo

trl-lib/pythia-1b-deduped-tldr-sft

trl-lib/pythia-6.9b-deduped-tldr-online-dpo

trl-lib/pythia-2.8b-deduped-tldr-sft

spaces 2

TextEnvironments

StackLLaMa

models 80

trl-lib/Qwen2-0.5B-XPO

trl-lib/Qwen2-0.5B-OnlineDPO

trl-lib/Qwen2-0.5B-KTO

trl-lib/Qwen2-0.5B-ORPO

trl-lib/Qwen2-0.5B-DPO

trl-lib/Qwen2-0.5B-Reward

trl-lib/pythia-1b-deduped-tldr-rm

trl-lib/pythia-2.8b-deduped-tldr-online-dpo

trl-lib/pythia-6.9b-deduped-tldr-offline-dpo

trl-lib/pythia-2.8b-deduped-tldr-offline-dpo

datasets 15

trl-lib/prm800k

trl-lib/rlaif-v

trl-lib/Capybara-Preferences

trl-lib/Capybara

trl-lib/ultrafeedback-prompt

trl-lib/tldr

trl-lib/ultrafeedback_binarized

trl-lib/lm-human-preferences-sentiment

trl-lib/lm-human-preferences-descriptiveness

trl-lib/tldr-preference

AI & ML interests

Team members 8

Collections 2

spaces 2 Sort: Recently updated

TextEnvironments

StackLLaMa

models 80 Sort: Recently updated

datasets 15 Sort: Recently updated

spaces 2

models 80

datasets 15