mistrial_nemo_output
This model is a fine-tuned version of unsloth/Mistral-Nemo-Base-2407-bnb-4bit on the generator dataset. It achieves the following results on the evaluation set:
- Loss: 1.5118
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 4
- eval_batch_size: 4
- seed: 100
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- num_epochs: 1.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.7287 | 0.0516 | 20 | 1.7033 |
1.6508 | 0.1033 | 40 | 1.6490 |
1.6242 | 0.1549 | 60 | 1.6253 |
1.6216 | 0.2066 | 80 | 1.6089 |
1.619 | 0.2582 | 100 | 1.5958 |
1.5579 | 0.3099 | 120 | 1.5842 |
1.5578 | 0.3615 | 140 | 1.5739 |
1.5515 | 0.4132 | 160 | 1.5641 |
1.5739 | 0.4648 | 180 | 1.5550 |
1.5669 | 0.5165 | 200 | 1.5460 |
1.5601 | 0.5681 | 220 | 1.5380 |
1.5392 | 0.6198 | 240 | 1.5310 |
1.5321 | 0.6714 | 260 | 1.5251 |
1.5326 | 0.7230 | 280 | 1.5201 |
1.5197 | 0.7747 | 300 | 1.5165 |
1.5229 | 0.8263 | 320 | 1.5142 |
1.4988 | 0.8780 | 340 | 1.5127 |
1.5044 | 0.9296 | 360 | 1.5119 |
1.5105 | 0.9813 | 380 | 1.5118 |
Framework versions
- PEFT 0.12.1.dev0
- Transformers 4.45.0.dev0
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for xxxxxccc/mistrial_nemo_output
Base model
unsloth/Mistral-Nemo-Base-2407-bnb-4bit