dyang415
/

nohto-v0-1e

Generated from Trainer

Model card Files Files and versions Community

dyang415 commited on Mar 7

Commit

87ed089

•

1 Parent(s): 8022d8b

Training in progress, step 5

Files changed (2) hide show

README.md +8 -18
adapter_model.safetensors +1 -1

README.md CHANGED Viewed

@@ -2,7 +2,6 @@
 license: apache-2.0
 library_name: peft
 tags:
-- axolotl
 - generated_from_trainer
 base_model: mistralai/Mistral-7B-Instruct-v0.2
 model-index:
@@ -56,8 +55,8 @@ wandb_project: nohto
 wandb_name: nohto-v0
 wandb_log_model: end
-gradient_accumulation_steps: 4
-micro_batch_size: 2
 num_epochs: 1
 optimizer: paged_adamw_8bit
 lr_scheduler: cosine
@@ -93,9 +92,7 @@ fsdp_config:
 # nohto-v0-1e
-This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on the None dataset.
-It achieves the following results on the evaluation set:
-- Loss: 2.9476
 ## Model description
@@ -115,26 +112,19 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 0.0002
-- train_batch_size: 2
-- eval_batch_size: 2
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 2
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 16
-- total_eval_batch_size: 4
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
 - num_epochs: 1
-### Training results
-| Training Loss | Epoch | Step | Validation Loss |
-|:-------------:|:-----:|:----:|:---------------:|
-| 2.0421        | 0.8   | 1    | 2.9476          |
 ### Framework versions
 - PEFT 0.7.0

 license: apache-2.0
 library_name: peft
 tags:
 - generated_from_trainer
 base_model: mistralai/Mistral-7B-Instruct-v0.2
 model-index:
 wandb_name: nohto-v0
 wandb_log_model: end
+gradient_accumulation_steps: 2
+micro_batch_size: 1
 num_epochs: 1
 optimizer: paged_adamw_8bit
 lr_scheduler: cosine
 # nohto-v0-1e
+This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset.
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 0.0002
+- train_batch_size: 1
+- eval_batch_size: 1
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 2
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 4
+- total_eval_batch_size: 2
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
 - num_epochs: 1
 ### Framework versions
 - PEFT 0.7.0

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bd918f8e28692f41aab4a8c5210d8f284fe3817818c0199a5f29b5c8cd7394ec
 size 102820600

 version https://git-lfs.github.com/spec/v1
+oid sha256:861ce3b2a2552f19f21c782fbf5435f8136b9e9148c867a4755278047864f6de
 size 102820600