---
license: llama2
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
base_model: llama-2-nl/Llama-2-7b-hf-lora-tokentrans-sft
datasets:
- BramVanroy/ultra_feedback_dutch
model-index:
- name: Llama-2-7b-hf-lora-tokentrans-it
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# ChocoLlama/ChocoLlama-2-7B-tokentrans-instruct

This model is a fine-tuned version of [ChocoLlama/ChocoLlama-2-7B-tokentrans-base](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-tokentrans-base) on the BramVanroy/ultra_feedback_dutch dataset.
It achieves the following results on the evaluation set:
- Loss: 0.3913
- Rewards/chosen: 0.1776
- Rewards/rejected: -0.6740
- Rewards/accuracies: 0.9418
- Rewards/margins: 0.8516
- Logps/rejected: -556.9005
- Logps/chosen: -600.6971
- Logits/rejected: 1.1696
- Logits/chosen: 1.5756

# Use the model

```
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained('ChocoLlama/ChocoLlama-2-7B-tokentrans-instruct')
model = AutoModelForCausalLM.from_pretrained('ChocoLlama/ChocoLlama-2-7B-tokentrans-instruct', device_map="auto")

messages = [
    {"role": "system", "content": "Je bent een artificiële intelligentie-assistent en geeft behulpzame, gedetailleerde en beleefde antwoorden op de vragen van de gebruiker."},
    {"role": "user", "content": "Jacques brel, Willem Elsschot en Jan Jambon zitten op café. Waar zouden ze over babbelen?"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

new_terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=512,
    eos_token_id=new_terminators,
    do_sample=True,
    temperature=0.8,
    top_p=0.95,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

```

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.609         | 0.1327 | 100  | 0.6007          | 0.0611         | -0.1426          | 0.9060             | 0.2037          | -551.5856      | -601.8618    | 1.1882          | 1.6120        |
| 0.4911        | 0.2653 | 200  | 0.4847          | 0.1405         | -0.3755          | 0.9328             | 0.5160          | -553.9150      | -601.0678    | 1.1788          | 1.5940        |
| 0.4222        | 0.3980 | 300  | 0.4298          | 0.1687         | -0.5353          | 0.9373             | 0.7040          | -555.5129      | -600.7857    | 1.1738          | 1.5840        |
| 0.3917        | 0.5307 | 400  | 0.4034          | 0.1729         | -0.6302          | 0.9418             | 0.8032          | -556.4622      | -600.7433    | 1.1682          | 1.5761        |
| 0.3924        | 0.6633 | 500  | 0.3936          | 0.1799         | -0.6645          | 0.9425             | 0.8444          | -556.8052      | -600.6739    | 1.1689          | 1.5753        |
| 0.3874        | 0.7960 | 600  | 0.3912          | 0.1796         | -0.6760          | 0.9433             | 0.8556          | -556.9198      | -600.6769    | 1.1684          | 1.5742        |
| 0.3922        | 0.9287 | 700  | 0.3909          | 0.1789         | -0.6788          | 0.9396             | 0.8577          | -556.9485      | -600.6838    | 1.1685          | 1.5742        |


### Framework versions

- Transformers 4.40.1
- Pytorch 2.1.2
- Datasets 2.19.0
- Tokenizers 0.19.1