FatimaHR / logs.log
OOzodbek's picture
Upload logs
81e19ac verified
raw
history blame contribute delete
No virus
4.67 kB
2024-05-15 13:34:20,264 - INFO: Calling run..
2024-05-15 13:34:20,264 - INFO: Problem Type: text_causal_language_modeling
2024-05-15 13:34:20,264 - INFO: Global random seed: 916310
2024-05-15 13:34:20,264 - INFO: Preparing the data...
2024-05-15 13:34:20,265 - INFO: Setting up automatic validation split...
2024-05-15 13:34:20,271 - INFO: Preparing train and validation data
2024-05-15 13:34:20,271 - INFO: Loading train dataset...
2024-05-15 13:34:21,196 - INFO: Stop token ids: [tensor([ 27, 91, 41681, 91, 29]), tensor([ 27, 91, 9399, 91, 29]), tensor([ 27, 91, 9125, 91, 29])]
2024-05-15 13:34:21,210 - INFO: Loading validation dataset...
2024-05-15 13:34:21,608 - INFO: Stop token ids: [tensor([ 27, 91, 41681, 91, 29]), tensor([ 27, 91, 9399, 91, 29]), tensor([ 27, 91, 9125, 91, 29])]
2024-05-15 13:34:21,625 - INFO: Number of observations in train dataset: 15
2024-05-15 13:34:21,625 - INFO: Number of observations in validation dataset: 1
2024-05-15 13:34:22,161 - INFO: Stop token ids: [tensor([ 27, 91, 41681, 91, 29], device='cuda:0'), tensor([ 27, 91, 9399, 91, 29], device='cuda:0'), tensor([ 27, 91, 9125, 91, 29], device='cuda:0')]
2024-05-15 13:34:22,173 - WARNING: PAD token id not matching between config and tokenizer. Overwriting with tokenizer id.
2024-05-15 13:34:22,173 - INFO: Setting pretraining_tp of model config to 1.
2024-05-15 13:34:22,188 - INFO: Using bfloat16 for backbone
2024-05-15 13:34:22,188 - INFO: Loading meta-llama/Meta-Llama-3-8B. This may take a while.
2024-05-15 13:35:50,131 - INFO: Loaded meta-llama/Meta-Llama-3-8B.
2024-05-15 13:35:50,134 - WARNING: PAD token id not matching between generation config and tokenizer. Overwriting with tokenizer id.
2024-05-15 13:35:50,135 - INFO: Lora module names: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']
2024-05-15 13:35:50,379 - INFO: Enough space available for saving model weights.Required space: 15817.20MB, Available space: 983440.06MB.
2024-05-15 13:35:50,387 - INFO: Optimizer AdamW has been provided with parameters {'eps': 1e-08, 'weight_decay': 0.0, 'betas': (0.8999999762, 0.9990000129), 'lr': 0.0001}
2024-05-15 13:35:51,788 - INFO: started process: 0, can_track: True, tracking_mode: TrackingMode.AFTER_EPOCH
2024-05-15 13:35:51,788 - INFO: Training Epoch: 1 / 2
2024-05-15 13:35:51,789 - INFO: train loss: 0%| | 0/3 [00:00<?, ?it/s]
2024-05-15 13:35:51,922 - INFO: Evaluation step: 3
2024-05-15 13:35:53,635 - INFO: train loss: 6.76: 33%|###3 | 1/3 [00:01<00:03, 1.85s/it]
2024-05-15 13:35:54,389 - INFO: train loss: 6.14: 67%|######6 | 2/3 [00:02<00:01, 1.20s/it]
2024-05-15 13:35:55,354 - INFO: train loss: 4.83: 100%|##########| 3/3 [00:03<00:00, 1.09s/it]
2024-05-15 13:35:55,354 - INFO: train loss: 4.83: 100%|##########| 3/3 [00:03<00:00, 1.19s/it]
2024-05-15 13:35:55,354 - INFO: Saving last model checkpoint to /app/output
2024-05-15 13:35:55,354 - INFO: Saving checkpoint..
2024-05-15 13:36:45,100 - INFO: Starting validation inference
2024-05-15 13:36:45,100 - INFO: validation progress: 0%| | 0/1 [00:00<?, ?it/s]
2024-05-15 13:36:45,330 - INFO: validation progress: 100%|##########| 1/1 [00:00<00:00, 4.35it/s]
2024-05-15 13:36:45,334 - INFO: validation progress: 100%|##########| 1/1 [00:00<00:00, 4.29it/s]
2024-05-15 13:36:45,371 - INFO: Validation Perplexity: 1.52548
2024-05-15 13:36:45,371 - INFO: Mean validation loss: 0.42231
2024-05-15 13:36:45,908 - INFO: Training Epoch: 2 / 2
2024-05-15 13:36:45,908 - INFO: train loss: 0%| | 0/3 [00:00<?, ?it/s]
2024-05-15 13:36:45,995 - INFO: Evaluation step: 3
2024-05-15 13:36:46,947 - INFO: train loss: 0.44: 33%|###3 | 1/3 [00:01<00:02, 1.04s/it]
2024-05-15 13:36:47,767 - INFO: train loss: 0.29: 67%|######6 | 2/3 [00:01<00:00, 1.10it/s]
2024-05-15 13:36:49,228 - INFO: train loss: 0.23: 100%|##########| 3/3 [00:03<00:00, 1.16s/it]
2024-05-15 13:36:49,228 - INFO: train loss: 0.23: 100%|##########| 3/3 [00:03<00:00, 1.11s/it]
2024-05-15 13:36:49,228 - INFO: Saving last model checkpoint to /app/output
2024-05-15 13:36:49,229 - INFO: Saving checkpoint..
2024-05-15 13:37:33,992 - INFO: Starting validation inference
2024-05-15 13:37:33,993 - INFO: validation progress: 0%| | 0/1 [00:00<?, ?it/s]
2024-05-15 13:37:34,208 - INFO: validation progress: 100%|##########| 1/1 [00:00<00:00, 4.65it/s]
2024-05-15 13:37:34,210 - INFO: validation progress: 100%|##########| 1/1 [00:00<00:00, 4.59it/s]
2024-05-15 13:37:34,242 - INFO: Validation Perplexity: 1.07148
2024-05-15 13:37:34,242 - INFO: Mean validation loss: 0.06904