--- base_model: mistralai/Mistral-7B-Instruct-v0.2 library_name: peft license: apache-2.0 tags: - generated_from_trainer model-index: - name: finetune/outputs/climate results: [] --- [

](https://github.com/OpenAccess-AI-Collective/axolotl)

See axolotl config

axolotl version: `0.4.1` ```yaml base_model: mistralai/Mistral-7B-Instruct-v0.2 model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: true strict: false chat_template: chatml datasets: - path: Howard881010/climate type: alpaca train_on_split: train dataset_prepared_path: val_set_size: 0.05 output_dir: ./finetune/outputs/climate adapter: qlora lora_model_dir: sequence_len: 2048 sample_packing: false pad_to_sequence_len: true lora_r: 32 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: lora_target_linear: true lora_fan_in_fan_out: wandb_project: finetune wandb_entity: wandb_watch: wandb_name: climate wandb_log_model: gradient_accumulation_steps: 2 micro_batch_size: 1 num_epochs: 10 optimizer: paged_adamw_32bit lr_scheduler: cosine learning_rate: 0.0002 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true eval_sample_packing: False warmup_steps: 10 evals_per_epoch: 4 eval_table_size: saves_per_epoch: 1 debug: deepspeed: weight_decay: 0.0 fsdp: fsdp_config: # For finetune seed: 42 ```

[

](https://rosewandb.ucsd.edu/cht028/finetune/runs/8a5o02qn) # finetune/outputs/climate This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.0009 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 10 - num_epochs: 10 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 1.7628 | 0.0056 | 1 | 1.9544 | | 1.1905 | 0.2542 | 45 | 1.2650 | | 1.0583 | 0.5085 | 90 | 1.1289 | | 0.9094 | 0.7627 | 135 | 0.9717 | | 0.6033 | 1.0169 | 180 | 0.7865 | | 0.6043 | 1.2712 | 225 | 0.6347 | | 0.3525 | 1.5254 | 270 | 0.4456 | | 0.1879 | 1.7797 | 315 | 0.2918 | | 0.1367 | 2.0339 | 360 | 0.1608 | | 0.1627 | 2.2881 | 405 | 0.1098 | | 0.1465 | 2.5424 | 450 | 0.0722 | | 0.1019 | 2.7966 | 495 | 0.0458 | | 0.161 | 3.0508 | 540 | 0.0354 | | 0.0597 | 3.3051 | 585 | 0.0189 | | 0.1038 | 3.5593 | 630 | 0.0130 | | 0.0754 | 3.8136 | 675 | 0.0078 | | 0.0632 | 4.0678 | 720 | 0.0051 | | 0.0364 | 4.3220 | 765 | 0.0032 | | 0.1342 | 4.5763 | 810 | 0.0019 | | 0.0776 | 4.8305 | 855 | 0.0014 | | 0.0337 | 5.0847 | 900 | 0.0012 | | 0.0591 | 5.3390 | 945 | 0.0011 | | 0.0171 | 5.5932 | 990 | 0.0010 | | 0.0732 | 5.8475 | 1035 | 0.0010 | | 0.0538 | 6.1017 | 1080 | 0.0010 | | 0.0234 | 6.3559 | 1125 | 0.0010 | | 0.1259 | 6.6102 | 1170 | 0.0009 | | 0.1216 | 6.8644 | 1215 | 0.0009 | | 0.0687 | 7.1186 | 1260 | 0.0009 | | 0.1172 | 7.3729 | 1305 | 0.0009 | | 0.1007 | 7.6271 | 1350 | 0.0009 | | 0.1372 | 7.8814 | 1395 | 0.0009 | | 0.0925 | 8.1356 | 1440 | 0.0009 | | 0.0342 | 8.3898 | 1485 | 0.0009 | | 0.0688 | 8.6441 | 1530 | 0.0009 | | 0.0576 | 8.8983 | 1575 | 0.0009 | | 0.0575 | 9.1525 | 1620 | 0.0009 | | 0.0707 | 9.4068 | 1665 | 0.0009 | | 0.1519 | 9.6610 | 1710 | 0.0009 | | 0.0666 | 9.9153 | 1755 | 0.0009 | ### Framework versions - PEFT 0.11.1 - Transformers 4.43.1 - Pytorch 2.3.0+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1