gemma-7b-sft-qlora-no-robots
This model is a fine-tuned version of google/gemma-7b on the chansung/no_robots_only_coding dataset. It achieves the following results on the evaluation set:
- Loss: 4.2389
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 50
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
17.168 | 1.0 | 3 | 15.0876 |
14.9207 | 2.0 | 6 | 8.7644 |
14.9207 | 3.0 | 9 | 4.8425 |
7.3214 | 4.0 | 12 | 3.0239 |
2.9627 | 5.0 | 15 | 2.2565 |
2.9627 | 6.0 | 18 | 1.8792 |
1.7971 | 7.0 | 21 | 1.7648 |
1.7971 | 8.0 | 24 | 1.7012 |
1.4939 | 9.0 | 27 | 1.5479 |
1.2756 | 10.0 | 30 | 1.5051 |
1.2756 | 11.0 | 33 | 1.3975 |
1.0884 | 12.0 | 36 | 1.4440 |
1.0884 | 13.0 | 39 | 1.4135 |
0.9429 | 14.0 | 42 | 1.4587 |
0.7653 | 15.0 | 45 | 1.4874 |
0.7653 | 16.0 | 48 | 1.5958 |
0.6424 | 17.0 | 51 | 1.5928 |
0.6424 | 18.0 | 54 | 1.6838 |
0.5346 | 19.0 | 57 | 1.8264 |
0.4249 | 20.0 | 60 | 1.9655 |
0.4249 | 21.0 | 63 | 2.1370 |
0.3347 | 22.0 | 66 | 2.6981 |
0.3347 | 23.0 | 69 | 2.7131 |
0.2655 | 24.0 | 72 | 2.7668 |
0.2026 | 25.0 | 75 | 2.8615 |
0.2026 | 26.0 | 78 | 3.1596 |
0.1588 | 27.0 | 81 | 3.3286 |
0.1588 | 28.0 | 84 | 3.5463 |
0.1319 | 29.0 | 87 | 3.3686 |
0.1111 | 30.0 | 90 | 3.6859 |
0.1111 | 31.0 | 93 | 3.7810 |
0.0939 | 32.0 | 96 | 3.7559 |
0.0939 | 33.0 | 99 | 3.9164 |
0.082 | 34.0 | 102 | 3.9693 |
0.0709 | 35.0 | 105 | 4.0430 |
0.0709 | 36.0 | 108 | 4.1017 |
0.0638 | 37.0 | 111 | 4.1449 |
0.0638 | 38.0 | 114 | 4.1639 |
0.0597 | 39.0 | 117 | 4.1880 |
0.0556 | 40.0 | 120 | 4.2123 |
0.0556 | 41.0 | 123 | 4.2196 |
0.0535 | 42.0 | 126 | 4.2262 |
0.0535 | 43.0 | 129 | 4.2301 |
0.0521 | 44.0 | 132 | 4.2314 |
0.0521 | 45.0 | 135 | 4.2365 |
0.0521 | 46.0 | 138 | 4.2350 |
0.0525 | 47.0 | 141 | 4.2364 |
0.0525 | 48.0 | 144 | 4.2320 |
0.0509 | 49.0 | 147 | 4.2361 |
0.0505 | 50.0 | 150 | 4.2389 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.2.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2
- Downloads last month
- 4
Model tree for chansung/gemma-7b-sft-qlora-no-robots
Base model
google/gemma-7b