metadata
base_model: google/gemma-7b
library_name: peft
license: gemma
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: gemma7bit-lora-sql
results: []
gemma7bit-lora-sql
This model is a fine-tuned version of google/gemma-7b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 40.8546
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 1
- eval_batch_size: 8
- seed: 1399
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 2
- training_steps: 500
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
47.4397 | 0.0000 | 2 | 112.0961 |
54.9563 | 0.0001 | 4 | 113.0320 |
43.0701 | 0.0001 | 6 | 105.0883 |
29.3374 | 0.0001 | 8 | 93.7564 |
24.013 | 0.0001 | 10 | 70.5026 |
5.7244 | 0.0002 | 12 | 70.3644 |
6.7112 | 0.0002 | 14 | 69.0918 |
5.139 | 0.0002 | 16 | 67.7594 |
5.658 | 0.0002 | 18 | 64.8925 |
3.348 | 0.0003 | 20 | 62.9086 |
3.0009 | 0.0003 | 22 | 54.9081 |
3.1078 | 0.0003 | 24 | 47.0123 |
2.9829 | 0.0003 | 26 | 44.8515 |
2.4287 | 0.0004 | 28 | 42.1563 |
2.1561 | 0.0004 | 30 | 39.5831 |
2.3805 | 0.0004 | 32 | 37.8210 |
4.199 | 0.0004 | 34 | 36.5321 |
4.2891 | 0.0005 | 36 | 35.5581 |
2.8376 | 0.0005 | 38 | 35.1185 |
2.4216 | 0.0005 | 40 | 35.1674 |
2.2408 | 0.0005 | 42 | 34.9562 |
3.4941 | 0.0006 | 44 | 35.2440 |
3.4866 | 0.0006 | 46 | 34.5079 |
2.2815 | 0.0006 | 48 | 34.1046 |
2.2584 | 0.0006 | 50 | 34.0249 |
2.7932 | 0.0007 | 52 | 34.8069 |
2.8995 | 0.0007 | 54 | 35.0606 |
3.3107 | 0.0007 | 56 | 35.8230 |
3.0793 | 0.0007 | 58 | 36.0362 |
4.5829 | 0.0008 | 60 | 34.8489 |
2.6841 | 0.0008 | 62 | 33.6494 |
3.5738 | 0.0008 | 64 | 32.4676 |
2.955 | 0.0008 | 66 | 31.9876 |
2.1847 | 0.0009 | 68 | 31.4324 |
3.5749 | 0.0009 | 70 | 31.4434 |
2.0652 | 0.0009 | 72 | 31.6449 |
1.9506 | 0.0009 | 74 | 31.8311 |
2.6852 | 0.0010 | 76 | 32.0123 |
1.8463 | 0.0010 | 78 | 32.2012 |
2.4999 | 0.0010 | 80 | 32.4074 |
1.7525 | 0.0010 | 82 | 32.5013 |
1.865 | 0.0011 | 84 | 32.7458 |
2.5512 | 0.0011 | 86 | 32.9542 |
2.041 | 0.0011 | 88 | 33.7792 |
3.4588 | 0.0011 | 90 | 33.5860 |
2.2258 | 0.0012 | 92 | 33.9242 |
2.1416 | 0.0012 | 94 | 34.2110 |
1.9904 | 0.0012 | 96 | 34.1852 |
1.9793 | 0.0012 | 98 | 34.1257 |
3.3329 | 0.0013 | 100 | 34.2512 |
2.6011 | 0.0013 | 102 | 34.4635 |
2.4212 | 0.0013 | 104 | 34.5869 |
1.941 | 0.0014 | 106 | 34.7022 |
2.4623 | 0.0014 | 108 | 34.9359 |
2.4267 | 0.0014 | 110 | 35.1085 |
1.7913 | 0.0014 | 112 | 35.1962 |
1.6845 | 0.0015 | 114 | 35.5859 |
3.0888 | 0.0015 | 116 | 35.8237 |
3.4959 | 0.0015 | 118 | 35.4403 |
2.5661 | 0.0015 | 120 | 35.3171 |
2.4044 | 0.0016 | 122 | 35.1409 |
3.1554 | 0.0016 | 124 | 35.0385 |
2.0637 | 0.0016 | 126 | 35.4118 |
5.6131 | 0.0016 | 128 | 35.2343 |
3.0214 | 0.0017 | 130 | 35.9148 |
1.771 | 0.0017 | 132 | 36.5919 |
2.4126 | 0.0017 | 134 | 36.8129 |
2.5102 | 0.0017 | 136 | 36.6166 |
6.5612 | 0.0018 | 138 | 36.9545 |
2.1154 | 0.0018 | 140 | 36.8204 |
2.533 | 0.0018 | 142 | 36.5374 |
1.7012 | 0.0018 | 144 | 36.6904 |
2.2287 | 0.0019 | 146 | 36.1521 |
4.2646 | 0.0019 | 148 | 36.1889 |
1.8624 | 0.0019 | 150 | 36.5876 |
1.9946 | 0.0019 | 152 | 36.6302 |
2.124 | 0.0020 | 154 | 36.6274 |
3.01 | 0.0020 | 156 | 36.6652 |
1.928 | 0.0020 | 158 | 37.0886 |
2.6035 | 0.0020 | 160 | 37.2648 |
2.2572 | 0.0021 | 162 | 37.4929 |
1.5284 | 0.0021 | 164 | 37.7779 |
1.1103 | 0.0021 | 166 | 37.9401 |
2.4597 | 0.0021 | 168 | 37.7270 |
2.4846 | 0.0022 | 170 | 37.4224 |
2.6234 | 0.0022 | 172 | 36.6518 |
2.4765 | 0.0022 | 174 | 36.2149 |
2.0448 | 0.0022 | 176 | 35.9293 |
2.2736 | 0.0023 | 178 | 35.5881 |
2.7181 | 0.0023 | 180 | 35.3821 |
1.9195 | 0.0023 | 182 | 35.2214 |
2.9274 | 0.0023 | 184 | 35.0837 |
3.191 | 0.0024 | 186 | 35.1131 |
2.6804 | 0.0024 | 188 | 35.1649 |
1.5547 | 0.0024 | 190 | 35.3133 |
2.2601 | 0.0024 | 192 | 35.6737 |
2.5229 | 0.0025 | 194 | 36.1338 |
2.6806 | 0.0025 | 196 | 36.2942 |
2.2258 | 0.0025 | 198 | 36.4748 |
1.2856 | 0.0025 | 200 | 36.9566 |
2.1439 | 0.0026 | 202 | 37.1834 |
4.0704 | 0.0026 | 204 | 37.5976 |
2.5138 | 0.0026 | 206 | 38.2877 |
2.9025 | 0.0027 | 208 | 38.5739 |
1.8761 | 0.0027 | 210 | 38.3348 |
1.9228 | 0.0027 | 212 | 38.3183 |
1.7924 | 0.0027 | 214 | 38.2928 |
2.7619 | 0.0028 | 216 | 38.1185 |
2.1031 | 0.0028 | 218 | 37.7249 |
2.6893 | 0.0028 | 220 | 37.7826 |
2.255 | 0.0028 | 222 | 37.7949 |
2.754 | 0.0029 | 224 | 37.8576 |
1.6294 | 0.0029 | 226 | 38.2263 |
1.8586 | 0.0029 | 228 | 38.4837 |
2.4252 | 0.0029 | 230 | 38.7646 |
2.36 | 0.0030 | 232 | 38.9834 |
1.4407 | 0.0030 | 234 | 39.1561 |
1.6109 | 0.0030 | 236 | 39.3041 |
2.2582 | 0.0030 | 238 | 39.3389 |
2.8185 | 0.0031 | 240 | 39.5245 |
1.6233 | 0.0031 | 242 | 39.3154 |
2.4039 | 0.0031 | 244 | 39.0988 |
1.7734 | 0.0031 | 246 | 39.0567 |
1.4779 | 0.0032 | 248 | 39.0881 |
2.7848 | 0.0032 | 250 | 38.9895 |
2.2963 | 0.0032 | 252 | 39.2507 |
2.0605 | 0.0032 | 254 | 39.3339 |
3.3667 | 0.0033 | 256 | 39.5060 |
2.9702 | 0.0033 | 258 | 39.5491 |
2.6734 | 0.0033 | 260 | 39.7907 |
2.4727 | 0.0033 | 262 | 40.1472 |
2.7539 | 0.0034 | 264 | 40.4749 |
1.601 | 0.0034 | 266 | 40.3649 |
2.1531 | 0.0034 | 268 | 40.2932 |
1.8656 | 0.0034 | 270 | 40.2728 |
1.9617 | 0.0035 | 272 | 40.3498 |
1.8911 | 0.0035 | 274 | 40.3157 |
2.3878 | 0.0035 | 276 | 40.2882 |
2.677 | 0.0035 | 278 | 40.4437 |
2.8035 | 0.0036 | 280 | 40.2423 |
1.7537 | 0.0036 | 282 | 40.0182 |
1.5873 | 0.0036 | 284 | 39.8449 |
1.7802 | 0.0036 | 286 | 39.7251 |
2.1861 | 0.0037 | 288 | 39.3972 |
1.9197 | 0.0037 | 290 | 39.4064 |
2.6752 | 0.0037 | 292 | 39.4320 |
1.7225 | 0.0037 | 294 | 39.4498 |
1.7274 | 0.0038 | 296 | 39.4309 |
3.9891 | 0.0038 | 298 | 40.1752 |
2.5153 | 0.0038 | 300 | 40.9025 |
2.0587 | 0.0038 | 302 | 41.4380 |
2.3115 | 0.0039 | 304 | 41.9152 |
1.8684 | 0.0039 | 306 | 42.4118 |
2.0388 | 0.0039 | 308 | 42.8904 |
2.9396 | 0.0040 | 310 | 43.0102 |
1.5832 | 0.0040 | 312 | 43.0678 |
1.897 | 0.0040 | 314 | 43.0292 |
2.2008 | 0.0040 | 316 | 43.0302 |
2.4185 | 0.0041 | 318 | 42.8252 |
1.9265 | 0.0041 | 320 | 42.5088 |
2.5759 | 0.0041 | 322 | 42.2636 |
2.9898 | 0.0041 | 324 | 42.1571 |
1.7106 | 0.0042 | 326 | 41.7366 |
2.3907 | 0.0042 | 328 | 41.3667 |
2.4861 | 0.0042 | 330 | 41.3056 |
1.6998 | 0.0042 | 332 | 41.2167 |
2.6034 | 0.0043 | 334 | 41.2615 |
1.6455 | 0.0043 | 336 | 41.2327 |
1.8484 | 0.0043 | 338 | 41.2317 |
2.2123 | 0.0043 | 340 | 41.2374 |
1.8939 | 0.0044 | 342 | 41.1753 |
1.881 | 0.0044 | 344 | 41.1000 |
1.5313 | 0.0044 | 346 | 40.9959 |
2.3099 | 0.0044 | 348 | 40.9817 |
2.2593 | 0.0045 | 350 | 40.9572 |
2.2597 | 0.0045 | 352 | 40.9278 |
2.1038 | 0.0045 | 354 | 40.8672 |
1.6107 | 0.0045 | 356 | 40.6815 |
2.0831 | 0.0046 | 358 | 40.5641 |
2.2921 | 0.0046 | 360 | 40.5117 |
2.3178 | 0.0046 | 362 | 40.5802 |
1.6295 | 0.0046 | 364 | 40.4780 |
2.038 | 0.0047 | 366 | 40.5544 |
1.7012 | 0.0047 | 368 | 40.7328 |
2.5292 | 0.0047 | 370 | 40.8337 |
1.8677 | 0.0047 | 372 | 40.9356 |
1.5897 | 0.0048 | 374 | 41.0250 |
1.5096 | 0.0048 | 376 | 41.0558 |
1.6413 | 0.0048 | 378 | 41.2060 |
1.6334 | 0.0048 | 380 | 41.2175 |
2.0367 | 0.0049 | 382 | 41.3215 |
1.9155 | 0.0049 | 384 | 41.4322 |
1.9553 | 0.0049 | 386 | 41.4096 |
2.3982 | 0.0049 | 388 | 41.3870 |
2.1094 | 0.0050 | 390 | 41.2572 |
1.9943 | 0.0050 | 392 | 41.1927 |
2.1017 | 0.0050 | 394 | 41.1805 |
1.8297 | 0.0050 | 396 | 41.0817 |
2.2271 | 0.0051 | 398 | 41.0460 |
2.022 | 0.0051 | 400 | 41.0754 |
1.8099 | 0.0051 | 402 | 41.0777 |
2.0973 | 0.0051 | 404 | 41.1348 |
2.03 | 0.0052 | 406 | 41.1109 |
1.7342 | 0.0052 | 408 | 41.1719 |
2.0422 | 0.0052 | 410 | 41.1616 |
2.6192 | 0.0052 | 412 | 41.0411 |
1.7107 | 0.0053 | 414 | 41.0704 |
2.8018 | 0.0053 | 416 | 41.0641 |
1.3767 | 0.0053 | 418 | 41.0719 |
1.9952 | 0.0054 | 420 | 41.0151 |
1.7584 | 0.0054 | 422 | 40.9978 |
2.1318 | 0.0054 | 424 | 40.9933 |
2.3412 | 0.0054 | 426 | 40.9837 |
1.6604 | 0.0055 | 428 | 41.0310 |
1.6301 | 0.0055 | 430 | 40.9782 |
2.0232 | 0.0055 | 432 | 40.9377 |
1.7096 | 0.0055 | 434 | 40.9645 |
2.1696 | 0.0056 | 436 | 40.9631 |
1.5297 | 0.0056 | 438 | 40.9690 |
1.4017 | 0.0056 | 440 | 41.0132 |
1.7817 | 0.0056 | 442 | 40.9486 |
1.7264 | 0.0057 | 444 | 40.9499 |
1.8601 | 0.0057 | 446 | 41.0064 |
1.9614 | 0.0057 | 448 | 41.0266 |
2.3045 | 0.0057 | 450 | 41.0035 |
2.67 | 0.0058 | 452 | 41.0159 |
1.5752 | 0.0058 | 454 | 40.9748 |
1.7464 | 0.0058 | 456 | 40.9395 |
1.9167 | 0.0058 | 458 | 40.9119 |
1.8777 | 0.0059 | 460 | 40.9021 |
1.5879 | 0.0059 | 462 | 40.9164 |
1.942 | 0.0059 | 464 | 40.8847 |
1.6303 | 0.0059 | 466 | 40.9104 |
2.1252 | 0.0060 | 468 | 40.9000 |
2.2879 | 0.0060 | 470 | 40.9209 |
1.7646 | 0.0060 | 472 | 40.8601 |
2.3169 | 0.0060 | 474 | 40.8726 |
1.7797 | 0.0061 | 476 | 40.8563 |
2.0428 | 0.0061 | 478 | 40.8609 |
2.4124 | 0.0061 | 480 | 40.8663 |
2.2955 | 0.0061 | 482 | 40.8601 |
1.3035 | 0.0062 | 484 | 40.8517 |
2.611 | 0.0062 | 486 | 40.8781 |
2.0677 | 0.0062 | 488 | 40.8694 |
2.1645 | 0.0062 | 490 | 40.8864 |
2.0708 | 0.0063 | 492 | 40.8633 |
1.663 | 0.0063 | 494 | 40.8689 |
1.9784 | 0.0063 | 496 | 40.8672 |
1.7215 | 0.0063 | 498 | 40.8439 |
2.2366 | 0.0064 | 500 | 40.8546 |
Framework versions
- PEFT 0.12.0
- Transformers 4.44.0
- Pytorch 2.2.2+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1