File size: 4,671 Bytes

81e19ac

2024-05-15 13:34:20,264 - INFO: Calling run..
2024-05-15 13:34:20,264 - INFO: Problem Type: text_causal_language_modeling
2024-05-15 13:34:20,264 - INFO: Global random seed: 916310
2024-05-15 13:34:20,264 - INFO: Preparing the data...
2024-05-15 13:34:20,265 - INFO: Setting up automatic validation split...
2024-05-15 13:34:20,271 - INFO: Preparing train and validation data
2024-05-15 13:34:20,271 - INFO: Loading train dataset...
2024-05-15 13:34:21,196 - INFO: Stop token ids: [tensor([   27,    91, 41681,    91,    29]), tensor([  27,   91, 9399,   91,   29]), tensor([  27,   91, 9125,   91,   29])]
2024-05-15 13:34:21,210 - INFO: Loading validation dataset...
2024-05-15 13:34:21,608 - INFO: Stop token ids: [tensor([   27,    91, 41681,    91,    29]), tensor([  27,   91, 9399,   91,   29]), tensor([  27,   91, 9125,   91,   29])]
2024-05-15 13:34:21,625 - INFO: Number of observations in train dataset: 15
2024-05-15 13:34:21,625 - INFO: Number of observations in validation dataset: 1
2024-05-15 13:34:22,161 - INFO: Stop token ids: [tensor([   27,    91, 41681,    91,    29], device='cuda:0'), tensor([  27,   91, 9399,   91,   29], device='cuda:0'), tensor([  27,   91, 9125,   91,   29], device='cuda:0')]
2024-05-15 13:34:22,173 - WARNING: PAD token id not matching between config and tokenizer. Overwriting with tokenizer id.
2024-05-15 13:34:22,173 - INFO: Setting pretraining_tp of model config to 1.
2024-05-15 13:34:22,188 - INFO: Using bfloat16 for backbone
2024-05-15 13:34:22,188 - INFO: Loading meta-llama/Meta-Llama-3-8B. This may take a while.
2024-05-15 13:35:50,131 - INFO: Loaded meta-llama/Meta-Llama-3-8B.
2024-05-15 13:35:50,134 - WARNING: PAD token id not matching between generation config and tokenizer. Overwriting with tokenizer id.
2024-05-15 13:35:50,135 - INFO: Lora module names: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']
2024-05-15 13:35:50,379 - INFO: Enough space available for saving model weights.Required space: 15817.20MB, Available space: 983440.06MB.
2024-05-15 13:35:50,387 - INFO: Optimizer AdamW has been provided with parameters {'eps': 1e-08, 'weight_decay': 0.0, 'betas': (0.8999999762, 0.9990000129), 'lr': 0.0001}
2024-05-15 13:35:51,788 - INFO: started process: 0, can_track: True, tracking_mode: TrackingMode.AFTER_EPOCH
2024-05-15 13:35:51,788 - INFO: Training Epoch: 1 / 2
2024-05-15 13:35:51,789 - INFO: train loss:   0%|          | 0/3 [00:00<?, ?it/s]
2024-05-15 13:35:51,922 - INFO: Evaluation step: 3
2024-05-15 13:35:53,635 - INFO: train loss: 6.76:  33%|###3      | 1/3 [00:01<00:03,  1.85s/it]
2024-05-15 13:35:54,389 - INFO: train loss: 6.14:  67%|######6   | 2/3 [00:02<00:01,  1.20s/it]
2024-05-15 13:35:55,354 - INFO: train loss: 4.83: 100%|##########| 3/3 [00:03<00:00,  1.09s/it]
2024-05-15 13:35:55,354 - INFO: train loss: 4.83: 100%|##########| 3/3 [00:03<00:00,  1.19s/it]
2024-05-15 13:35:55,354 - INFO: Saving last model checkpoint to /app/output
2024-05-15 13:35:55,354 - INFO: Saving checkpoint..
2024-05-15 13:36:45,100 - INFO: Starting validation inference
2024-05-15 13:36:45,100 - INFO: validation progress:   0%|          | 0/1 [00:00<?, ?it/s]
2024-05-15 13:36:45,330 - INFO: validation progress: 100%|##########| 1/1 [00:00<00:00,  4.35it/s]
2024-05-15 13:36:45,334 - INFO: validation progress: 100%|##########| 1/1 [00:00<00:00,  4.29it/s]
2024-05-15 13:36:45,371 - INFO: Validation Perplexity: 1.52548
2024-05-15 13:36:45,371 - INFO: Mean validation loss: 0.42231
2024-05-15 13:36:45,908 - INFO: Training Epoch: 2 / 2
2024-05-15 13:36:45,908 - INFO: train loss:   0%|          | 0/3 [00:00<?, ?it/s]
2024-05-15 13:36:45,995 - INFO: Evaluation step: 3
2024-05-15 13:36:46,947 - INFO: train loss: 0.44:  33%|###3      | 1/3 [00:01<00:02,  1.04s/it]
2024-05-15 13:36:47,767 - INFO: train loss: 0.29:  67%|######6   | 2/3 [00:01<00:00,  1.10it/s]
2024-05-15 13:36:49,228 - INFO: train loss: 0.23: 100%|##########| 3/3 [00:03<00:00,  1.16s/it]
2024-05-15 13:36:49,228 - INFO: train loss: 0.23: 100%|##########| 3/3 [00:03<00:00,  1.11s/it]
2024-05-15 13:36:49,228 - INFO: Saving last model checkpoint to /app/output
2024-05-15 13:36:49,229 - INFO: Saving checkpoint..
2024-05-15 13:37:33,992 - INFO: Starting validation inference
2024-05-15 13:37:33,993 - INFO: validation progress:   0%|          | 0/1 [00:00<?, ?it/s]
2024-05-15 13:37:34,208 - INFO: validation progress: 100%|##########| 1/1 [00:00<00:00,  4.65it/s]
2024-05-15 13:37:34,210 - INFO: validation progress: 100%|##########| 1/1 [00:00<00:00,  4.59it/s]
2024-05-15 13:37:34,242 - INFO: Validation Perplexity: 1.07148
2024-05-15 13:37:34,242 - INFO: Mean validation loss: 0.06904