scenario-NON-KD-PR-COPY-D2_data-AmazonScience_massive_all_1_155
This model is a fine-tuned version of microsoft/mdeberta-v3-base on the massive dataset. It achieves the following results on the evaluation set:
- Loss: 1.5373
- Accuracy: 0.8549
- F1: 0.8312
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 55
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 |
---|---|---|---|---|---|
0.7969 | 0.2672 | 5000 | 0.8078 | 0.7872 | 0.7220 |
0.5961 | 0.5344 | 10000 | 0.7024 | 0.8198 | 0.7755 |
0.4989 | 0.8017 | 15000 | 0.6733 | 0.8326 | 0.7971 |
0.3251 | 1.0689 | 20000 | 0.6783 | 0.8420 | 0.8062 |
0.3181 | 1.3361 | 25000 | 0.7127 | 0.8426 | 0.8072 |
0.2735 | 1.6033 | 30000 | 0.6944 | 0.8467 | 0.8194 |
0.2605 | 1.8706 | 35000 | 0.7549 | 0.8422 | 0.8142 |
0.1911 | 2.1378 | 40000 | 0.8079 | 0.8433 | 0.8150 |
0.1873 | 2.4050 | 45000 | 0.7712 | 0.8488 | 0.8241 |
0.1839 | 2.6722 | 50000 | 0.8284 | 0.8489 | 0.8234 |
0.1709 | 2.9394 | 55000 | 0.7833 | 0.8535 | 0.8283 |
0.1274 | 3.2067 | 60000 | 0.9298 | 0.8471 | 0.8237 |
0.1352 | 3.4739 | 65000 | 0.9268 | 0.8468 | 0.8250 |
0.1257 | 3.7411 | 70000 | 0.9509 | 0.8480 | 0.8232 |
0.1132 | 4.0083 | 75000 | 1.0047 | 0.8465 | 0.8239 |
0.0917 | 4.2756 | 80000 | 1.0471 | 0.8505 | 0.8259 |
0.0991 | 4.5428 | 85000 | 1.0301 | 0.8496 | 0.8291 |
0.093 | 4.8100 | 90000 | 1.0625 | 0.8481 | 0.8232 |
0.069 | 5.0772 | 95000 | 1.1380 | 0.8463 | 0.8200 |
0.0733 | 5.3444 | 100000 | 1.1618 | 0.8477 | 0.8250 |
0.0741 | 5.6117 | 105000 | 1.1398 | 0.8481 | 0.8276 |
0.0654 | 5.8789 | 110000 | 1.1903 | 0.8515 | 0.8299 |
0.0526 | 6.1461 | 115000 | 1.2244 | 0.8518 | 0.8277 |
0.0499 | 6.4133 | 120000 | 1.3166 | 0.8485 | 0.8245 |
0.0524 | 6.6806 | 125000 | 1.3335 | 0.8513 | 0.8286 |
0.0484 | 6.9478 | 130000 | 1.2970 | 0.8527 | 0.8286 |
0.0414 | 7.2150 | 135000 | 1.3790 | 0.8485 | 0.8266 |
0.0434 | 7.4822 | 140000 | 1.4027 | 0.8505 | 0.8251 |
0.0412 | 7.7495 | 145000 | 1.3861 | 0.8524 | 0.8307 |
0.0264 | 8.0167 | 150000 | 1.4604 | 0.8518 | 0.8284 |
0.0256 | 8.2839 | 155000 | 1.4584 | 0.8536 | 0.8315 |
0.0274 | 8.5511 | 160000 | 1.4963 | 0.8540 | 0.8322 |
0.0303 | 8.8183 | 165000 | 1.4727 | 0.8536 | 0.8317 |
0.0202 | 9.0856 | 170000 | 1.5227 | 0.8543 | 0.8316 |
0.0185 | 9.3528 | 175000 | 1.5326 | 0.8547 | 0.8312 |
0.0213 | 9.6200 | 180000 | 1.5333 | 0.8546 | 0.8305 |
0.0197 | 9.8872 | 185000 | 1.5373 | 0.8549 | 0.8312 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.1.1+cu121
- Datasets 2.14.5
- Tokenizers 0.19.1
- Downloads last month
- 3
Model tree for haryoaw/scenario-NON-KD-PR-COPY-D2_data-AmazonScience_massive_all_1_155
Base model
microsoft/mdeberta-v3-base