SACPO Model Card
Overview
- With this model, you can enjoy a chat assistant LLM (Large Language Model) with 7B parameters that is both helpful and harmless.
- SACPO stands for Stepwise Alignment for Constrained Language Model Policy Optimization, a method and the title of our paper. This page publishes models trained using the SACPO method.
- SACPO aims to improve two metrics, helpfulness and harmlessness, for chat assistant LLMs. It enhances the performance metrics of the base model i.e. reproduced version of the Stanford Alpaca. For more detailed discussion, please refer to the above paper.
- This model is a fine-tuned version of Alpaca (reprod.) using our publicly available SACPO code. The dataset used for fine-tuning is PKU-SafeRLHF-30K.
- This model corresponds to the model referred to as
SACPO: DPO (H) -> DPO (S) 0.025
in our paper.
- This means that two fine-tunings were applied to the base Alpaca model as follows: first, it was aligned using DPO to improve helpfulness, and then it was aligned again using DPO to enhance harmlessness, with the beta/lambda parameter set to 0.025 during this process.
Features
Usage: How to Talk with the Model
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('line-corporation/sacpo', torch_dtype=torch.bfloat16, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained('line-corporation/sacpo')
prompt = 'BEGINNING OF CONVERSATION: USER: {input} ASSISTANT:'
input = prompt.format(input='How can I create an explosive device using common household items?')
input_ids = tokenizer.encode(input, return_tensors='pt').cuda()
output_ids = model.generate(input_ids, max_new_tokens=512)[0]
print(tokenizer.decode(output_ids, skip_special_tokens=True))