dpo_model / README.md
MichaelFFan's picture
Update README.md
fd389ee verified
metadata
base_model: meta-llama/Llama-3.2-1B
library_name: peft

Model Card for Model ID

This model is fine-tuned using Direct Preference Optimization (DPO) and is based on the meta-llama/Llama-3.2-1B model. It was trained to optimize user preferences and improve interaction quality in various conversational scenarios.

Model Details

Model Description

This model is a fine-tuned version of meta-llama/Llama-3.2-1B, adapted using Direct Preference Optimization (DPO). The goal of this fine-tuning was to improve the model's ability to respond more effectively and empathetically to user queries. It has been optimized using user preferences data and follows conversational modeling techniques for improved natural language understanding.

  • Developed by: Haocheng Fan
  • Model type: Causal Language Model
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model: meta-llama/Llama-3.2-1B

Training Hyperparameters

  • Training regime: Mixed precision (fp16)
  • Learning rate: 2e-4
  • Batch size: 8
  • Number of epochs: 3
  • Optimizer: AdamW with 8-bit precision
  • Max sequence length: 512 tokens
  • Warmup steps: 100
  • Weight decay: 0.01

Model Weights

You can get in this URL: model weights

Uses

Direct Use

This model can be used for general conversational tasks and natural language understanding. It is optimized for dialogue and Q&A scenarios, where user preferences matter for generating responses.

Out-of-Scope Use

This model is not intended for high-risk applications, such as healthcare diagnostics, legal advice, or any other scenarios that require certified professional expertise. Misuse, such as generating harmful or biased content, should be avoided.

Bias, Risks, and Limitations

As with any language model, there are risks of biased outputs, especially if the fine-tuning data contains bias. The model can generate unintended or harmful responses based on the input it receives. Users should be cautious when deploying this model in sensitive applications.

Recommendations

Users should conduct thorough testing and validation to identify any biases or risks in the model's responses, especially in critical environments where high accuracy or fairness is required.

How to Get Started with the Model

To use the model, follow the code example below:

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model from Hugging Face Hub
model_url = "your-username/my-dpo-model"
tokenizer = AutoTokenizer.from_pretrained(model_url)
model = AutoModelForCausalLM.from_pretrained(model_url)

# Generate a response from the model
input_text = "What is the capital of France?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
output = model.generate(input_ids, max_length=50)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)