File size: 3,142 Bytes

403e421

---
license: mit
base_model: microsoft/phi-2
tags:
- generated_from_trainer
model-index:
- name: V0316MP1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# V0316MP1

This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.5025

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_steps: 20
- num_epochs: 3
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 2.4218        | 0.09  | 10   | 2.3701          |
| 2.3588        | 0.17  | 20   | 2.3216          |
| 2.2547        | 0.26  | 30   | 2.2504          |
| 2.0897        | 0.34  | 40   | 2.1789          |
| 1.9766        | 0.43  | 50   | 2.1106          |
| 1.8207        | 0.51  | 60   | 2.0495          |
| 1.7309        | 0.6   | 70   | 2.0001          |
| 1.666         | 0.68  | 80   | 1.9488          |
| 1.5586        | 0.77  | 90   | 1.9120          |
| 1.4977        | 0.85  | 100  | 1.8712          |
| 1.422         | 0.94  | 110  | 1.8324          |
| 1.3569        | 1.02  | 120  | 1.7940          |
| 1.2811        | 1.11  | 130  | 1.7640          |
| 1.2312        | 1.19  | 140  | 1.7329          |
| 1.1463        | 1.28  | 150  | 1.7065          |
| 1.1087        | 1.37  | 160  | 1.6802          |
| 1.0139        | 1.45  | 170  | 1.6581          |
| 0.968         | 1.54  | 180  | 1.6377          |
| 0.9078        | 1.62  | 190  | 1.6183          |
| 0.871         | 1.71  | 200  | 1.6013          |
| 0.8252        | 1.79  | 210  | 1.5863          |
| 0.7983        | 1.88  | 220  | 1.5675          |
| 0.7561        | 1.96  | 230  | 1.5566          |
| 0.7413        | 2.05  | 240  | 1.5443          |
| 0.7156        | 2.13  | 250  | 1.5348          |
| 0.701         | 2.22  | 260  | 1.5243          |
| 0.673         | 2.3   | 270  | 1.5174          |
| 0.6627        | 2.39  | 280  | 1.5126          |
| 0.648         | 2.47  | 290  | 1.5119          |
| 0.6553        | 2.56  | 300  | 1.5088          |
| 0.6447        | 2.65  | 310  | 1.5051          |
| 0.6227        | 2.73  | 320  | 1.5045          |
| 0.6338        | 2.82  | 330  | 1.5023          |
| 0.6224        | 2.9   | 340  | 1.5017          |
| 0.6115        | 2.99  | 350  | 1.5025          |


### Framework versions

- Transformers 4.36.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1