Pythia-hh-all-sft-dpo - a lomahony Collection

lomahony 's Collections

Pythia-hh-all-sft-dpo

pythia-helpful-1epoch

pythia-helpful-epoch2

Pythia-helpful 3 epochs

Pythia-hh-all-sft-dpo

updated Mar 12

Pythia models supervised finetuned and DPO finetuned with all of Anthropic-hh-rlhf dataset for 1 epoch.

lomahony/eleuther-pythia160m-hh-sft

Text Generation • Updated Aug 12, 2023 • 692
lomahony/eleuther-pythia2.8b-hh-sft

Text Generation • Updated Aug 12, 2023 • 21
lomahony/eleuther-pythia410m-hh-sft

Text Generation • Updated Aug 12, 2023 • 24
lomahony/eleuther-pythia6.9b-hh-dpo

Text Generation • Updated Aug 12, 2023 • 16
lomahony/eleuther-pythia70m-hh-sft

Text Generation • Updated Aug 12, 2023 • 29
lomahony/eleuther-pythia12b-hh-sft0

Text Generation • Updated Aug 12, 2023 • 20
lomahony/eleuther-pythia12b-hh-sft

Text Generation • Updated Aug 31, 2023 • 12
lomahony/eleuther-pythia70m-hh-dpo

Text Generation • Updated Aug 12, 2023 • 627
lomahony/eleuther-pythia160m-hh-dpo

Text Generation • Updated Aug 12, 2023 • 292
lomahony/eleuther-pythia410m-hh-dpo

Text Generation • Updated Aug 12, 2023 • 770
lomahony/eleuther-pythia2.8b-hh-dpo

Text Generation • Updated Aug 12, 2023 • 18 • 1
lomahony/eleuther-pythia12b-hh-dpo

Text Generation • Updated Aug 31, 2023 • 13
lomahony/eleuther-pythia6.9b-hh-sft

Text Generation • Updated Aug 12, 2023 • 176