RLAIF/sft-llama-3.1-8b-base-sft-llama-405b-instruct-correct-only-format-lr-5e-06-bs-64 Text Generation • Updated 10 days ago • 528
RLAIF/sft-gemma-2-9b-base-sft-llama-405b-instruct-correct-only-format-lr-5e-06-bs-64 Text Generation • Updated 10 days ago • 407
RLAIF/sft-gemma-2-9b-base-prm800k-correct-only-sft-format-lr-1e-06-bs-32 Text Generation • Updated 13 days ago • 134
RLAIF/22-sequential-temp-0-verifier-oracle-in-context-train-8-w-error-masking Updated 29 days ago • 41
RLAIF/15-w-error-masking-temp-0-verifier-in-context-train-in-context-inference-8-model Updated Sep 30