See axolotl config

axolotl version: 0.4.1

base_model: mistralai/Mistral-7B-Instruct-v0.2
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: true
strict: false

chat_template: chatml
datasets:
  - path: Howard881010/gas-west
    type: alpaca
    train_on_split: train
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./finetune/outputs/gas-west

adapter: qlora
lora_model_dir:

sequence_len: 1200
sample_packing: false
pad_to_sequence_len: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: finetune
wandb_entity:
wandb_watch:
wandb_name: gas-west
wandb_log_model:

gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 10
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention: 
flash_attention: true
eval_sample_packing: False

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
# For finetune
seed: 42

finetune/outputs/gas-west

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0003

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 16
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss
1.4517	0.0022	1	1.3369
0.6431	0.2508	114	0.6256
0.3998	0.5017	228	0.4131
0.1741	0.7525	342	0.2322
0.0913	1.0033	456	0.1268
0.0679	1.2541	570	0.0809
0.0503	1.5050	684	0.0605
0.0476	1.7558	798	0.0484
0.0084	2.0066	912	0.0417
0.0273	2.2574	1026	0.0410
0.0296	2.5083	1140	0.0384
0.0317	2.7591	1254	0.0344
0.0086	3.0099	1368	0.0268
0.0076	3.2607	1482	0.0224
0.0043	3.5116	1596	0.0206
0.0085	3.7624	1710	0.0127
0.0071	4.0132	1824	0.0081
0.002	4.2640	1938	0.0053
0.0028	4.5149	2052	0.0034
0.0007	4.7657	2166	0.0016
0.0003	5.0165	2280	0.0008
0.0002	5.2673	2394	0.0005
0.0002	5.5182	2508	0.0004
0.0001	5.7690	2622	0.0004
0.0001	6.0198	2736	0.0004
0.0001	6.2706	2850	0.0004
0.0001	6.5215	2964	0.0004
0.0001	6.7723	3078	0.0004
0.0001	7.0231	3192	0.0004
0.0001	7.2739	3306	0.0004
0.0001	7.5248	3420	0.0004
0.0001	7.7756	3534	0.0004
0.0002	8.0264	3648	0.0004
0.0002	8.2772	3762	0.0003
0.0001	8.5281	3876	0.0004
0.0001	8.7789	3990	0.0003
0.0002	9.0297	4104	0.0003
0.0001	9.2805	4218	0.0003
0.0001	9.5314	4332	0.0004
0.0001	9.7822	4446	0.0003

Framework versions

PEFT 0.11.1
Transformers 4.43.1
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

Rose-STL-Lab
/

gas-west

finetune/outputs/gas-west

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Rose-STL-Lab/gas-west

Evaluation results