--- license: llama2 tags: - alignment-handbook - trl - dpo - generated_from_trainer base_model: llama-2-nl/Llama-2-7b-hf-lora-tokentrans-sft datasets: - BramVanroy/ultra_feedback_dutch model-index: - name: Llama-2-7b-hf-lora-tokentrans-it results: [] --- # ChocoLlama/ChocoLlama-2-7B-tokentrans-instruct This model is a fine-tuned version of [ChocoLlama/ChocoLlama-2-7B-tokentrans-base](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-tokentrans-base) on the BramVanroy/ultra_feedback_dutch dataset. It achieves the following results on the evaluation set: - Loss: 0.3913 - Rewards/chosen: 0.1776 - Rewards/rejected: -0.6740 - Rewards/accuracies: 0.9418 - Rewards/margins: 0.8516 - Logps/rejected: -556.9005 - Logps/chosen: -600.6971 - Logits/rejected: 1.1696 - Logits/chosen: 1.5756 # Use the model ``` from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained('ChocoLlama/ChocoLlama-2-7B-tokentrans-instruct') model = AutoModelForCausalLM.from_pretrained('ChocoLlama/ChocoLlama-2-7B-tokentrans-instruct', device_map="auto") messages = [ {"role": "system", "content": "Je bent een artificiële intelligentie-assistent en geeft behulpzame, gedetailleerde en beleefde antwoorden op de vragen van de gebruiker."}, {"role": "user", "content": "Jacques brel, Willem Elsschot en Jan Jambon zitten op café. Waar zouden ze over babbelen?"}, ] input_ids = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) new_terminators = [ tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>") ] outputs = model.generate( input_ids, max_new_tokens=512, eos_token_id=new_terminators, do_sample=True, temperature=0.8, top_p=0.95, ) response = outputs[0][input_ids.shape[-1]:] print(tokenizer.decode(response, skip_special_tokens=True)) ``` ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 4 - total_train_batch_size: 64 - total_eval_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.609 | 0.1327 | 100 | 0.6007 | 0.0611 | -0.1426 | 0.9060 | 0.2037 | -551.5856 | -601.8618 | 1.1882 | 1.6120 | | 0.4911 | 0.2653 | 200 | 0.4847 | 0.1405 | -0.3755 | 0.9328 | 0.5160 | -553.9150 | -601.0678 | 1.1788 | 1.5940 | | 0.4222 | 0.3980 | 300 | 0.4298 | 0.1687 | -0.5353 | 0.9373 | 0.7040 | -555.5129 | -600.7857 | 1.1738 | 1.5840 | | 0.3917 | 0.5307 | 400 | 0.4034 | 0.1729 | -0.6302 | 0.9418 | 0.8032 | -556.4622 | -600.7433 | 1.1682 | 1.5761 | | 0.3924 | 0.6633 | 500 | 0.3936 | 0.1799 | -0.6645 | 0.9425 | 0.8444 | -556.8052 | -600.6739 | 1.1689 | 1.5753 | | 0.3874 | 0.7960 | 600 | 0.3912 | 0.1796 | -0.6760 | 0.9433 | 0.8556 | -556.9198 | -600.6769 | 1.1684 | 1.5742 | | 0.3922 | 0.9287 | 700 | 0.3909 | 0.1789 | -0.6788 | 0.9396 | 0.8577 | -556.9485 | -600.6838 | 1.1685 | 1.5742 | ### Framework versions - Transformers 4.40.1 - Pytorch 2.1.2 - Datasets 2.19.0 - Tokenizers 0.19.1