Edit model card

模型介绍

  • 目标:模型上的DPO训练
  • 使用模型:Mistral-7B
  • 使用数据:Intel/orca_dpo_pairs(使用全部数据跑了一个epoch)
  • 显卡:一张4090,24G

使用方法

from transformers import AutoTokenizer
import transformers
model = "snowfly/Mistral-7B-orca_dpo_pairs"

# Format prompt
message = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer
)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])

未完待续

实验中的问题

实验设置如下:

  • per_device_train_batch_size=2
  • gradient_accumulation_steps=2

由于每次更新梯度的数据量较小,导致训练前期loss急剧震荡,170step后趋于平稳,直至一个epoch训练结束loss下降不明显,趋于稳定

后续工作

  • 在更大显存(单机多卡,多机多卡),更多epoch等参数上调整训练
  • 考虑不同模型训练后的性能评估(训练数据集质量,模型表现等)
Downloads last month
10
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train snowfly/Mistral-7B-orca_dpo_pairs