gemma-7b-sft-qlora-1
This model is a fine-tuned version of google/gemma-7b on the chansung/no_robots_only_coding dataset. It achieves the following results on the evaluation set:
- Loss: 2.1615
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 25
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
23.7344 | 0.91 | 5 | 7.9584 |
14.6026 | 2.0 | 11 | 6.8289 |
10.8118 | 2.91 | 16 | 6.4185 |
10.8598 | 4.0 | 22 | 5.1061 |
7.9354 | 4.91 | 27 | 1.7011 |
2.0354 | 6.0 | 33 | 1.4461 |
1.4855 | 6.91 | 38 | 1.3565 |
1.326 | 8.0 | 44 | 1.2935 |
1.1375 | 8.91 | 49 | 1.2696 |
0.9091 | 10.0 | 55 | 1.2716 |
0.8111 | 10.91 | 60 | 1.2861 |
0.689 | 12.0 | 66 | 1.3148 |
0.6341 | 12.91 | 71 | 1.3391 |
0.5359 | 14.0 | 77 | 1.4232 |
0.4664 | 14.91 | 82 | 1.5107 |
0.3951 | 16.0 | 88 | 1.6597 |
0.3593 | 16.91 | 93 | 1.9377 |
0.2802 | 18.0 | 99 | 1.9024 |
0.2613 | 18.91 | 104 | 2.0981 |
0.2262 | 20.0 | 110 | 2.1472 |
0.2169 | 20.91 | 115 | 2.1633 |
0.2232 | 22.0 | 121 | 2.1595 |
0.2096 | 22.73 | 125 | 2.1615 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.3
- Pytorch 2.2.2+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 6
Model tree for chansung/gemma-7b-sft-qlora-1
Base model
google/gemma-7b