--- base_model: meta-llama/Llama-3.2-1B library_name: peft --- # Model Card for Model ID This model is fine-tuned using Direct Preference Optimization (DPO) and is based on the `meta-llama/Llama-3.2-1B` model. It was trained to optimize user preferences and improve interaction quality in various conversational scenarios. ## Model Details ### Model Description This model is a fine-tuned version of `meta-llama/Llama-3.2-1B`, adapted using Direct Preference Optimization (DPO). The goal of this fine-tuning was to improve the model's ability to respond more effectively and empathetically to user queries. It has been optimized using user preferences data and follows conversational modeling techniques for improved natural language understanding. - **Developed by:** Haocheng Fan - **Model type:** Causal Language Model - **Language(s) (NLP):** English - **License:** MIT - **Finetuned from model:** meta-llama/Llama-3.2-1B ### Model Sources - **Repository:** [Link to the Hugging Face model repo] - **Paper [optional]:** [If applicable, link to related research or documentation] - **Demo [optional]:** [If a demo exists, provide a link] ## Uses ### Direct Use This model can be used for general conversational tasks and natural language understanding. It is optimized for dialogue and Q&A scenarios, where user preferences matter for generating responses. ### Out-of-Scope Use This model is not intended for high-risk applications, such as healthcare diagnostics, legal advice, or any other scenarios that require certified professional expertise. Misuse, such as generating harmful or biased content, should be avoided. ## Bias, Risks, and Limitations As with any language model, there are risks of biased outputs, especially if the fine-tuning data contains bias. The model can generate unintended or harmful responses based on the input it receives. Users should be cautious when deploying this model in sensitive applications. ### Recommendations Users should conduct thorough testing and validation to identify any biases or risks in the model's responses, especially in critical environments where high accuracy or fairness is required. ## How to Get Started with the Model To use the model, follow the code example below: ```python from transformers import AutoTokenizer, AutoModelForCausalLM # Load the model from Hugging Face Hub model_url = "your-username/my-dpo-model" tokenizer = AutoTokenizer.from_pretrained(model_url) model = AutoModelForCausalLM.from_pretrained(model_url) # Generate a response from the model input_text = "What is the capital of France?" input_ids = tokenizer(input_text, return_tensors="pt").input_ids output = model.generate(input_ids, max_length=50) response = tokenizer.decode(output[0], skip_special_tokens=True) print(response)