--- license: mit base_model: microsoft/Phi-3-mini-4k-instruct tags: - trl - sft - generated_from_trainer model-index: - name: MedMobile results: [] --- # MedMobile This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on the UltraMedical dataset. It achieves the following results on the evaluation set: - Loss: 0.7358 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:-----:|:---------------:| | 0.8656 | 0.0225 | 200 | 0.7711 | | 0.7615 | 0.0451 | 400 | 0.7521 | | 0.748 | 0.0676 | 600 | 0.7457 | | 0.7465 | 0.0902 | 800 | 0.7428 | | 0.7468 | 0.1127 | 1000 | 0.7419 | | 0.7434 | 0.1352 | 1200 | 0.7429 | | 0.7467 | 0.1578 | 1400 | 0.7451 | | 0.7508 | 0.1803 | 1600 | 0.7469 | | 0.7505 | 0.2029 | 1800 | 0.7503 | | 0.7541 | 0.2254 | 2000 | 0.7531 | | 0.7559 | 0.2479 | 2200 | 0.7576 | | 0.7592 | 0.2705 | 2400 | 0.7599 | | 0.7729 | 0.2930 | 2600 | 0.7635 | | 0.772 | 0.3156 | 2800 | 0.7645 | | 0.7707 | 0.3381 | 3000 | 0.7628 | | 0.7616 | 0.3606 | 3200 | 0.7614 | | 0.7632 | 0.3832 | 3400 | 0.7590 | | 0.7613 | 0.4057 | 3600 | 0.7574 | | 0.7581 | 0.4283 | 3800 | 0.7558 | | 0.7583 | 0.4508 | 4000 | 0.7539 | | 0.7509 | 0.4733 | 4200 | 0.7518 | | 0.7559 | 0.4959 | 4400 | 0.7506 | | 0.7523 | 0.5184 | 4600 | 0.7491 | | 0.7461 | 0.5410 | 4800 | 0.7469 | | 0.7504 | 0.5635 | 5000 | 0.7464 | | 0.7486 | 0.5860 | 5200 | 0.7449 | | 0.7454 | 0.6086 | 5400 | 0.7436 | | 0.7451 | 0.6311 | 5600 | 0.7427 | | 0.7431 | 0.6537 | 5800 | 0.7412 | | 0.7438 | 0.6762 | 6000 | 0.7402 | | 0.7471 | 0.6987 | 6200 | 0.7390 | | 0.7416 | 0.7213 | 6400 | 0.7378 | | 0.7345 | 0.7438 | 6600 | 0.7364 | | 0.7437 | 0.7663 | 6800 | 0.7349 | | 0.7431 | 0.7889 | 7000 | 0.7349 | | 0.737 | 0.8114 | 7200 | 0.7339 | | 0.7358 | 0.8340 | 7400 | 0.7333 | | 0.7336 | 0.8565 | 7600 | 0.7320 | | 0.7327 | 0.8790 | 7800 | 0.7310 | | 0.7288 | 0.9016 | 8000 | 0.7303 | | 0.7326 | 0.9241 | 8200 | 0.7295 | | 0.7354 | 0.9467 | 8400 | 0.7287 | | 0.731 | 0.9692 | 8600 | 0.7278 | | 0.7317 | 0.9917 | 8800 | 0.7272 | | 0.6809 | 1.0143 | 9000 | 0.7359 | | 0.6548 | 1.0368 | 9200 | 0.7341 | | 0.6463 | 1.0594 | 9400 | 0.7353 | | 0.6516 | 1.0819 | 9600 | 0.7357 | | 0.6544 | 1.1044 | 9800 | 0.7345 | | 0.6558 | 1.1270 | 10000 | 0.7342 | | 0.6532 | 1.1495 | 10200 | 0.7331 | | 0.653 | 1.1721 | 10400 | 0.7328 | | 0.6583 | 1.1946 | 10600 | 0.7323 | | 0.6537 | 1.2171 | 10800 | 0.7326 | | 0.6622 | 1.2397 | 11000 | 0.7318 | | 0.6596 | 1.2622 | 11200 | 0.7315 | | 0.6522 | 1.2848 | 11400 | 0.7304 | | 0.6517 | 1.3073 | 11600 | 0.7300 | | 0.657 | 1.3298 | 11800 | 0.7296 | | 0.6554 | 1.3524 | 12000 | 0.7286 | | 0.6545 | 1.3749 | 12200 | 0.7287 | | 0.6556 | 1.3975 | 12400 | 0.7283 | | 0.655 | 1.4200 | 12600 | 0.7294 | | 0.6489 | 1.4425 | 12800 | 0.7285 | | 0.6539 | 1.4651 | 13000 | 0.7269 | | 0.654 | 1.4876 | 13200 | 0.7273 | | 0.6556 | 1.5102 | 13400 | 0.7273 | | 0.6529 | 1.5327 | 13600 | 0.7271 | | 0.6504 | 1.5552 | 13800 | 0.7264 | | 0.6498 | 1.5778 | 14000 | 0.7256 | | 0.6517 | 1.6003 | 14200 | 0.7255 | | 0.656 | 1.6229 | 14400 | 0.7252 | | 0.6471 | 1.6454 | 14600 | 0.7242 | | 0.6485 | 1.6679 | 14800 | 0.7243 | | 0.6545 | 1.6905 | 15000 | 0.7242 | | 0.6527 | 1.7130 | 15200 | 0.7238 | | 0.6504 | 1.7356 | 15400 | 0.7236 | | 0.6492 | 1.7581 | 15600 | 0.7229 | | 0.6529 | 1.7806 | 15800 | 0.7232 | | 0.6507 | 1.8032 | 16000 | 0.7226 | | 0.653 | 1.8257 | 16200 | 0.7229 | | 0.6461 | 1.8483 | 16400 | 0.7223 | | 0.6453 | 1.8708 | 16600 | 0.7221 | | 0.6534 | 1.8933 | 16800 | 0.7219 | | 0.6455 | 1.9159 | 17000 | 0.7220 | | 0.6485 | 1.9384 | 17200 | 0.7212 | | 0.6536 | 1.9610 | 17400 | 0.7214 | | 0.6444 | 1.9835 | 17600 | 0.7211 | | 0.6346 | 2.0060 | 17800 | 0.7356 | | 0.5929 | 2.0286 | 18000 | 0.7368 | | 0.5951 | 2.0511 | 18200 | 0.7371 | | 0.6013 | 2.0736 | 18400 | 0.7374 | | 0.6004 | 2.0962 | 18600 | 0.7375 | | 0.5991 | 2.1187 | 18800 | 0.7375 | | 0.5971 | 2.1413 | 19000 | 0.7369 | | 0.597 | 2.1638 | 19200 | 0.7380 | | 0.5951 | 2.1863 | 19400 | 0.7370 | | 0.5916 | 2.2089 | 19600 | 0.7370 | | 0.5992 | 2.2314 | 19800 | 0.7372 | | 0.6011 | 2.2540 | 20000 | 0.7364 | | 0.6003 | 2.2765 | 20200 | 0.7370 | | 0.6003 | 2.2990 | 20400 | 0.7370 | | 0.5985 | 2.3216 | 20600 | 0.7370 | | 0.5988 | 2.3441 | 20800 | 0.7367 | | 0.5959 | 2.3667 | 21000 | 0.7370 | | 0.6019 | 2.3892 | 21200 | 0.7370 | | 0.5977 | 2.4117 | 21400 | 0.7367 | | 0.602 | 2.4343 | 21600 | 0.7368 | | 0.5958 | 2.4568 | 21800 | 0.7368 | | 0.5969 | 2.4794 | 22000 | 0.7360 | | 0.6025 | 2.5019 | 22200 | 0.7362 | | 0.5942 | 2.5244 | 22400 | 0.7361 | | 0.6006 | 2.5470 | 22600 | 0.7361 | | 0.5952 | 2.5695 | 22800 | 0.7366 | | 0.6007 | 2.5921 | 23000 | 0.7363 | | 0.6003 | 2.6146 | 23200 | 0.7363 | | 0.6006 | 2.6371 | 23400 | 0.7359 | | 0.6014 | 2.6597 | 23600 | 0.7360 | | 0.6008 | 2.6822 | 23800 | 0.7356 | | 0.6005 | 2.7048 | 24000 | 0.7357 | | 0.5958 | 2.7273 | 24200 | 0.7356 | | 0.5977 | 2.7498 | 24400 | 0.7358 | | 0.6 | 2.7724 | 24600 | 0.7358 | | 0.5978 | 2.7949 | 24800 | 0.7362 | | 0.6018 | 2.8175 | 25000 | 0.7359 | | 0.6079 | 2.8400 | 25200 | 0.7359 | | 0.6036 | 2.8625 | 25400 | 0.7359 | | 0.5985 | 2.8851 | 25600 | 0.7359 | | 0.6019 | 2.9076 | 25800 | 0.7359 | | 0.5994 | 2.9302 | 26000 | 0.7358 | | 0.6027 | 2.9527 | 26200 | 0.7358 | | 0.6014 | 2.9752 | 26400 | 0.7358 | | 0.5957 | 2.9978 | 26600 | 0.7358 | ### Framework versions - Transformers 4.43.3 - Pytorch 2.3.1+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1