--- base_model: mistralai/Mistral-7B-Instruct-v0.2 library_name: peft license: apache-2.0 tags: - generated_from_trainer model-index: - name: finetune/outputs/climate results: [] --- [

](https://github.com/OpenAccess-AI-Collective/axolotl)

See axolotl config

axolotl version: `0.4.1` ```yaml base_model: mistralai/Mistral-7B-Instruct-v0.2 model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: true strict: false chat_template: chatml datasets: - path: Howard881010/climate type: alpaca train_on_split: train dataset_prepared_path: val_set_size: 0.05 output_dir: ./finetune/outputs/climate adapter: qlora lora_model_dir: sequence_len: 2048 sample_packing: false pad_to_sequence_len: true lora_r: 32 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: lora_target_linear: true lora_fan_in_fan_out: wandb_project: finetune wandb_entity: wandb_watch: wandb_name: climate wandb_log_model: gradient_accumulation_steps: 2 micro_batch_size: 1 num_epochs: 10 optimizer: paged_adamw_32bit lr_scheduler: cosine learning_rate: 0.0002 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true eval_sample_packing: False warmup_steps: 10 evals_per_epoch: 4 eval_table_size: saves_per_epoch: 1 debug: deepspeed: weight_decay: 0.0 fsdp: fsdp_config: # For finetune seed: 42 ```

[

](https://rosewandb.ucsd.edu/cht028/finetune/runs/xdu6khql) # finetune/outputs/climate This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.0009 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 10 - num_epochs: 10 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 1.7472 | 0.0056 | 1 | 2.0532 | | 1.1662 | 0.2542 | 45 | 1.2719 | | 0.8512 | 0.5085 | 90 | 1.1146 | | 1.141 | 0.7627 | 135 | 0.9757 | | 0.5009 | 1.0169 | 180 | 0.7862 | | 0.4804 | 1.2712 | 225 | 0.6073 | | 0.3472 | 1.5254 | 270 | 0.4267 | | 0.2733 | 1.7797 | 315 | 0.2808 | | 0.1484 | 2.0339 | 360 | 0.1742 | | 0.2064 | 2.2881 | 405 | 0.1261 | | 0.1144 | 2.5424 | 450 | 0.0700 | | 0.0787 | 2.7966 | 495 | 0.0390 | | 0.0523 | 3.0508 | 540 | 0.0269 | | 0.0606 | 3.3051 | 585 | 0.0193 | | 0.0568 | 3.5593 | 630 | 0.0132 | | 0.063 | 3.8136 | 675 | 0.0064 | | 0.081 | 4.0678 | 720 | 0.0039 | | 0.0748 | 4.3220 | 765 | 0.0022 | | 0.0812 | 4.5763 | 810 | 0.0017 | | 0.0313 | 4.8305 | 855 | 0.0015 | | 0.0229 | 5.0847 | 900 | 0.0012 | | 0.0518 | 5.3390 | 945 | 0.0011 | | 0.019 | 5.5932 | 990 | 0.0011 | | 0.09 | 5.8475 | 1035 | 0.0010 | | 0.0907 | 6.1017 | 1080 | 0.0010 | | 0.0876 | 6.3559 | 1125 | 0.0010 | | 0.0716 | 6.6102 | 1170 | 0.0010 | | 0.0728 | 6.8644 | 1215 | 0.0009 | | 0.0338 | 7.1186 | 1260 | 0.0009 | | 0.032 | 7.3729 | 1305 | 0.0009 | | 0.0304 | 7.6271 | 1350 | 0.0009 | | 0.0508 | 7.8814 | 1395 | 0.0009 | | 0.0196 | 8.1356 | 1440 | 0.0009 | | 0.0709 | 8.3898 | 1485 | 0.0009 | | 0.0852 | 8.6441 | 1530 | 0.0009 | | 0.0803 | 8.8983 | 1575 | 0.0009 | | 0.1225 | 9.1525 | 1620 | 0.0009 | | 0.0533 | 9.4068 | 1665 | 0.0009 | | 0.0374 | 9.6610 | 1710 | 0.0009 | | 0.0857 | 9.9153 | 1755 | 0.0009 | ### Framework versions - PEFT 0.11.1 - Transformers 4.43.1 - Pytorch 2.3.0+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1