File size: 4,345 Bytes
13ed3a4
767316e
13ed3a4
 
767316e
 
 
 
13ed3a4
767316e
 
 
13ed3a4
767316e
 
13ed3a4
767316e
13ed3a4
767316e
13ed3a4
767316e
 
 
13ed3a4
90143d5
 
 
 
 
0b6b07e
 
 
 
 
90143d5
 
 
13ed3a4
 
767316e
 
 
ac45154
13ed3a4
 
 
767316e
 
 
13ed3a4
 
767316e
 
 
 
 
61a8726
b23fe7d
 
13ed3a4
 
b23fe7d
13ed3a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6e2f005
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
license: apache-2.0
base_model: meta-llama/Llama-2-13b-hf
tags:
  - generated_from_trainer
  - llama
  - lora
  - adapters
datasets:
  - yhavinga/mc4_nl_cleaned
language:
  - nl
model-index:
  - name: llama2-13b-ft-mc4_nl_cleaned_tiny
    results: []
---
    

# llama2-13b-ft-mc4_nl_cleaned_tiny

This model is a fine-tuned version of [meta-llama/Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf)
on the [yhavinga/mc4_nl_cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned/viewer/tiny/train) dataset (`tiny` partition) on a context of 4096 tokens.
See the original [meta-llama/Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) for more information, intended use, and biases.

If you use this model or refer to it, please use the following citation:

Vanroy, B. (2023). *Language Resources for Dutch Large Language Modelling*. [https://arxiv.org/abs/2312.12852](https://arxiv.org/abs/2312.12852)

```bibtext
@article{vanroy2023language,
  title={Language Resources for {Dutch} Large Language Modelling},
  author={Vanroy, Bram},
  journal={arXiv preprint arXiv:2312.12852},
  year={2023}
}
```

## Intended uses & limitations

While Llama 2 already contains some proficiency in Dutch, this finetune is intended to improve the fluency of Dutch (not increase its knowledge). It is therefore
intended as a generative model for Dutch language. The biases, shortcomings and intended uses are otherwise the same as those of
the [original model]((https://huggingface.co/meta-llama/Llama-2-13b-hf)). The model can be used for generative tasks or finetuned further on other tasks
such as summarization, adaptation, instruction or chat finetuning.

## Training and evaluation data

Trained on the [yhavinga/mc4_nl_cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned/viewer/tiny/train) dataset (`tiny` partition) for one epoch. The canonical 
validation split was not used but instead 5% of `train` was used as validation.

## Training procedure

Trained with LoRA targetting `["q_proj", "v_proj"]` in 4 bit and merged before upload. Trained with Flash Attention as borrowed from
[here](https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/utils/llama_patch.py).

The adapters are in the `adapters` branch.

Initial training investigation on the Tier-1 HPC of [Vlaams Supercomputer Centrum (VSC)](https://www.vscentrum.be/) and training on our own server of 4x 3090s.


### Training hyperparameters

The following hyperparameters were used during training in the HPC investigation:
- learning_rate: 0.0003
- train_batch_size: 12
- eval_batch_size: 12
- seed: 42
- distributed_type: multi-GPU
- num_devices: 16
- gradient_accumulation_steps: 6
- total_train_batch_size: 1152
- total_eval_batch_size: 192
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 1.8784        | 0.09  | 90   | 1.8820          |
| 1.8344        | 0.19  | 180  | 1.8542          |
| 1.8351        | 0.28  | 270  | 1.8355          |
| 1.8206        | 0.37  | 360  | 1.8212          |
| 1.8021        | 0.47  | 450  | 1.8088          |
| 1.8102        | 0.56  | 540  | 1.7982          |
| 1.7991        | 0.65  | 630  | 1.7890          |
| 1.7788        | 0.74  | 720  | 1.7811          |
| 1.7915        | 0.84  | 810  | 1.7742          |
| 1.7715        | 0.93  | 900  | 1.7676          |


### Framework versions

- Transformers 4.31.0.dev0
- Pytorch 2.0.1+cu117
- Datasets 2.13.1
- Tokenizers 0.13.3

# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BramVanroy__llama2-13b-ft-mc4_nl_cleaned_tiny)

| Metric                | Value                     |
|-----------------------|---------------------------|
| Avg.                  | 46.81   |
| ARC (25-shot)         | 59.3          |
| HellaSwag (10-shot)   | 82.04    |
| MMLU (5-shot)         | 54.67         |
| TruthfulQA (0-shot)   | 38.03   |
| Winogrande (5-shot)   | 77.27   |
| GSM8K (5-shot)        | 10.31        |
| DROP (3-shot)         | 6.08         |