08/16/2024 12:33:49 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.

08/16/2024 12:33:49 - INFO - llamafactory.hparams.parser - Process rank: 6, device: xpu:6, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

08/16/2024 12:33:49 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.

08/16/2024 12:33:49 - INFO - llamafactory.hparams.parser - Process rank: 1, device: xpu:1, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

08/16/2024 12:33:49 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.

08/16/2024 12:33:49 - INFO - llamafactory.hparams.parser - Process rank: 2, device: xpu:2, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

08/16/2024 12:33:50 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.

08/16/2024 12:33:50 - INFO - llamafactory.hparams.parser - Process rank: 7, device: xpu:7, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

08/16/2024 12:33:50 - INFO - llamafactory.data.template - Replace eos token: <|im_end|>

08/16/2024 12:33:50 - INFO - llamafactory.data.template - Replace eos token: <|im_end|>

[WARNING|parser.py:296] 2024-08-16 12:33:50,245 >> `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.

[INFO|parser.py:348] 2024-08-16 12:33:50,246 >> Process rank: 0, device: xpu:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

08/16/2024 12:33:50 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.

08/16/2024 12:33:50 - INFO - llamafactory.hparams.parser - Process rank: 4, device: xpu:4, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

08/16/2024 12:33:50 - INFO - llamafactory.data.template - Replace eos token: <|im_end|>

08/16/2024 12:33:50 - INFO - llamafactory.data.template - Replace eos token: <|im_end|>

[INFO|tokenization_utils_base.py:2289] 2024-08-16 12:33:50,368 >> loading file vocab.json from cache at /home/u16abc30f4f31eb21df44af89e63a742/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/vocab.json

08/16/2024 12:33:50 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.

08/16/2024 12:33:50 - INFO - llamafactory.hparams.parser - Process rank: 5, device: xpu:5, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

08/16/2024 12:33:50 - INFO - llamafactory.data.template - Replace eos token: <|im_end|>

[INFO|tokenization_utils_base.py:2289] 2024-08-16 12:33:50,368 >> loading file merges.txt from cache at /home/u16abc30f4f31eb21df44af89e63a742/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/merges.txt

[INFO|tokenization_utils_base.py:2289] 2024-08-16 12:33:50,368 >> loading file tokenizer.json from cache at /home/u16abc30f4f31eb21df44af89e63a742/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/tokenizer.json

[INFO|tokenization_utils_base.py:2289] 2024-08-16 12:33:50,368 >> loading file added_tokens.json from cache at None

[INFO|tokenization_utils_base.py:2289] 2024-08-16 12:33:50,368 >> loading file special_tokens_map.json from cache at None

[INFO|tokenization_utils_base.py:2289] 2024-08-16 12:33:50,368 >> loading file tokenizer_config.json from cache at /home/u16abc30f4f31eb21df44af89e63a742/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/tokenizer_config.json

[INFO|tokenization_utils_base.py:2533] 2024-08-16 12:33:50,597 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[INFO|template.py:270] 2024-08-16 12:33:50,597 >> Replace eos token: <|im_end|>

[INFO|loader.py:52] 2024-08-16 12:33:50,598 >> Loading dataset cangjie.json...

08/16/2024 12:33:50 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.

08/16/2024 12:33:50 - INFO - llamafactory.hparams.parser - Process rank: 3, device: xpu:3, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

08/16/2024 12:33:50 - INFO - llamafactory.data.template - Replace eos token: <|im_end|>

08/16/2024 12:33:51 - INFO - llamafactory.data.template - Replace eos token: <|im_end|>

08/16/2024 12:33:52 - INFO - llamafactory.data.loader - Loading dataset cangjie.json...

08/16/2024 12:33:52 - INFO - llamafactory.data.loader - Loading dataset cangjie.json...

08/16/2024 12:33:52 - INFO - llamafactory.data.loader - Loading dataset cangjie.json...

08/16/2024 12:33:52 - INFO - llamafactory.data.loader - Loading dataset cangjie.json...

08/16/2024 12:33:52 - INFO - llamafactory.data.loader - Loading dataset cangjie.json...

08/16/2024 12:33:52 - INFO - llamafactory.data.loader - Loading dataset cangjie.json...

08/16/2024 12:33:52 - INFO - llamafactory.data.loader - Loading dataset cangjie.json...

[INFO|configuration_utils.py:733] 2024-08-16 12:33:53,975 >> loading configuration file config.json from cache at /home/u16abc30f4f31eb21df44af89e63a742/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json

[INFO|configuration_utils.py:800] 2024-08-16 12:33:53,979 >> Model config Qwen2Config {
  "_name_or_path": "Qwen/Qwen2-7B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.4",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|modeling_utils.py:3644] 2024-08-16 12:33:54,043 >> loading weights file model.safetensors from cache at /home/u16abc30f4f31eb21df44af89e63a742/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/model.safetensors.index.json

[INFO|modeling_utils.py:1572] 2024-08-16 12:33:54,047 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.

[INFO|configuration_utils.py:1038] 2024-08-16 12:33:54,052 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151645
}


08/16/2024 12:34:40 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

08/16/2024 12:34:40 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.

08/16/2024 12:34:40 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

08/16/2024 12:34:40 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA

08/16/2024 12:34:40 - WARNING - bitsandbytes.cextension - The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.

08/16/2024 12:34:40 - INFO - llamafactory.model.loader - trainable params: 20,185,088 || all params: 7,635,801,600 || trainable%: 0.2643

08/16/2024 12:34:40 - INFO - llamafactory.train.trainer_utils - Using LoRA+ optimizer with loraplus lr ratio 16.00.

08/16/2024 12:34:57 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

08/16/2024 12:34:57 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.

08/16/2024 12:34:57 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

08/16/2024 12:34:57 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA

08/16/2024 12:34:57 - WARNING - bitsandbytes.cextension - The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.

08/16/2024 12:34:57 - INFO - llamafactory.model.loader - trainable params: 20,185,088 || all params: 7,635,801,600 || trainable%: 0.2643

08/16/2024 12:34:58 - INFO - llamafactory.train.trainer_utils - Using LoRA+ optimizer with loraplus lr ratio 16.00.

08/16/2024 12:34:58 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

08/16/2024 12:34:58 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.

08/16/2024 12:34:58 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

08/16/2024 12:34:58 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA

08/16/2024 12:34:58 - WARNING - bitsandbytes.cextension - The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.

08/16/2024 12:34:58 - INFO - llamafactory.model.loader - trainable params: 20,185,088 || all params: 7,635,801,600 || trainable%: 0.2643

08/16/2024 12:34:58 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

08/16/2024 12:34:58 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.

08/16/2024 12:34:58 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

08/16/2024 12:34:58 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA

08/16/2024 12:34:58 - WARNING - bitsandbytes.cextension - The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.

08/16/2024 12:34:58 - INFO - llamafactory.train.trainer_utils - Using LoRA+ optimizer with loraplus lr ratio 16.00.

08/16/2024 12:34:58 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

08/16/2024 12:34:58 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.

08/16/2024 12:34:58 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

08/16/2024 12:34:58 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA

08/16/2024 12:34:58 - WARNING - bitsandbytes.cextension - The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.

08/16/2024 12:34:58 - INFO - llamafactory.model.loader - trainable params: 20,185,088 || all params: 7,635,801,600 || trainable%: 0.2643

[INFO|modeling_utils.py:4473] 2024-08-16 12:34:59,011 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.


[INFO|modeling_utils.py:4481] 2024-08-16 12:34:59,011 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2-7B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.

[INFO|configuration_utils.py:993] 2024-08-16 12:34:59,106 >> loading configuration file generation_config.json from cache at /home/u16abc30f4f31eb21df44af89e63a742/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/generation_config.json

[INFO|configuration_utils.py:1038] 2024-08-16 12:34:59,106 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "repetition_penalty": 1.05,
  "temperature": 0.7,
  "top_k": 20,
  "top_p": 0.8
}


08/16/2024 12:34:59 - INFO - llamafactory.train.trainer_utils - Using LoRA+ optimizer with loraplus lr ratio 16.00.

[INFO|checkpointing.py:103] 2024-08-16 12:34:59,217 >> Gradient checkpointing enabled.

[INFO|attention.py:86] 2024-08-16 12:34:59,217 >> Using vanilla attention implementation.

[INFO|adapter.py:302] 2024-08-16 12:34:59,217 >> Upcasting trainable params to float32.

[INFO|adapter.py:158] 2024-08-16 12:34:59,218 >> Fine-tuning method: LoRA

[WARNING|cextension.py:101] 2024-08-16 12:34:59,228 >> The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.

08/16/2024 12:34:59 - INFO - llamafactory.model.loader - trainable params: 20,185,088 || all params: 7,635,801,600 || trainable%: 0.2643

08/16/2024 12:34:59 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

08/16/2024 12:34:59 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.

08/16/2024 12:34:59 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

08/16/2024 12:34:59 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA

08/16/2024 12:34:59 - WARNING - bitsandbytes.cextension - The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.

08/16/2024 12:34:59 - INFO - llamafactory.train.trainer_utils - Using LoRA+ optimizer with loraplus lr ratio 16.00.

[INFO|loader.py:196] 2024-08-16 12:34:59,540 >> trainable params: 20,185,088 || all params: 7,635,801,600 || trainable%: 0.2643

[INFO|trainer.py:648] 2024-08-16 12:34:59,550 >> Using auto half precision backend

[INFO|trainer_utils.py:305] 2024-08-16 12:34:59,691 >> Using LoRA+ optimizer with loraplus lr ratio 16.00.

08/16/2024 12:34:59 - INFO - llamafactory.model.loader - trainable params: 20,185,088 || all params: 7,635,801,600 || trainable%: 0.2643

08/16/2024 12:34:59 - INFO - llamafactory.train.trainer_utils - Using LoRA+ optimizer with loraplus lr ratio 16.00.

08/16/2024 12:35:00 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

08/16/2024 12:35:00 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.

08/16/2024 12:35:00 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

08/16/2024 12:35:00 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA

08/16/2024 12:35:00 - WARNING - bitsandbytes.cextension - The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.

08/16/2024 12:35:00 - INFO - llamafactory.model.loader - trainable params: 20,185,088 || all params: 7,635,801,600 || trainable%: 0.2643

08/16/2024 12:35:00 - INFO - llamafactory.train.trainer_utils - Using LoRA+ optimizer with loraplus lr ratio 16.00.

[INFO|trainer.py:2134] 2024-08-16 12:35:19,204 >> ***** Running training *****

[INFO|trainer.py:2135] 2024-08-16 12:35:19,206 >>   Num examples = 32,962

[INFO|trainer.py:2136] 2024-08-16 12:35:19,206 >>   Num Epochs = 3

[INFO|trainer.py:2137] 2024-08-16 12:35:19,206 >>   Instantaneous batch size per device = 4

[INFO|trainer.py:2140] 2024-08-16 12:35:19,206 >>   Total train batch size (w. parallel, distributed & accumulation) = 256

[INFO|trainer.py:2141] 2024-08-16 12:35:19,206 >>   Gradient Accumulation steps = 8

[INFO|trainer.py:2142] 2024-08-16 12:35:19,206 >>   Total optimization steps = 384

[INFO|trainer.py:2143] 2024-08-16 12:35:19,210 >>   Number of trainable parameters = 20,185,088

[INFO|callbacks.py:312] 2024-08-16 12:36:02,659 >> {'loss': 0.8310, 'learning_rate': 1.0000e-04, 'epoch': 0.02, 'throughput': 12068.40}

[INFO|callbacks.py:312] 2024-08-16 12:36:44,933 >> {'loss': 0.7398, 'learning_rate': 9.9993e-05, 'epoch': 0.03, 'throughput': 12232.95}

[INFO|callbacks.py:312] 2024-08-16 12:37:27,241 >> {'loss': 0.6438, 'learning_rate': 9.9973e-05, 'epoch': 0.05, 'throughput': 12285.60}

[INFO|callbacks.py:312] 2024-08-16 12:38:09,612 >> {'loss': 0.6032, 'learning_rate': 9.9939e-05, 'epoch': 0.06, 'throughput': 12307.51}

[INFO|callbacks.py:312] 2024-08-16 12:38:51,927 >> {'loss': 0.5433, 'learning_rate': 9.9892e-05, 'epoch': 0.08, 'throughput': 12323.95}

[INFO|callbacks.py:312] 2024-08-16 12:39:34,282 >> {'loss': 0.5228, 'learning_rate': 9.9831e-05, 'epoch': 0.09, 'throughput': 12332.99}

[INFO|callbacks.py:312] 2024-08-16 12:40:16,580 >> {'loss': 0.4702, 'learning_rate': 9.9757e-05, 'epoch': 0.11, 'throughput': 12341.83}

[INFO|callbacks.py:312] 2024-08-16 12:40:58,913 >> {'loss': 0.4574, 'learning_rate': 9.9669e-05, 'epoch': 0.12, 'throughput': 12347.21}

[INFO|callbacks.py:312] 2024-08-16 12:41:41,280 >> {'loss': 0.5093, 'learning_rate': 9.9568e-05, 'epoch': 0.14, 'throughput': 12350.26}

[INFO|callbacks.py:312] 2024-08-16 12:42:23,640 >> {'loss': 0.4771, 'learning_rate': 9.9453e-05, 'epoch': 0.16, 'throughput': 12352.92}

[INFO|callbacks.py:312] 2024-08-16 12:43:05,996 >> {'loss': 0.4343, 'learning_rate': 9.9325e-05, 'epoch': 0.17, 'throughput': 12355.21}

[INFO|callbacks.py:312] 2024-08-16 12:43:48,388 >> {'loss': 0.4455, 'learning_rate': 9.9184e-05, 'epoch': 0.19, 'throughput': 12356.26}

[INFO|callbacks.py:312] 2024-08-16 12:44:30,830 >> {'loss': 0.4669, 'learning_rate': 9.9029e-05, 'epoch': 0.20, 'throughput': 12356.01}

[INFO|callbacks.py:312] 2024-08-16 12:45:13,185 >> {'loss': 0.4723, 'learning_rate': 9.8861e-05, 'epoch': 0.22, 'throughput': 12357.59}

[INFO|callbacks.py:312] 2024-08-16 12:45:55,574 >> {'loss': 0.4364, 'learning_rate': 9.8680e-05, 'epoch': 0.23, 'throughput': 12358.33}

[INFO|callbacks.py:312] 2024-08-16 12:46:37,892 >> {'loss': 0.4013, 'learning_rate': 9.8486e-05, 'epoch': 0.25, 'throughput': 12360.25}

[INFO|callbacks.py:312] 2024-08-16 12:47:20,267 >> {'loss': 0.3875, 'learning_rate': 9.8279e-05, 'epoch': 0.26, 'throughput': 12360.97}

[INFO|callbacks.py:312] 2024-08-16 12:48:02,620 >> {'loss': 0.4187, 'learning_rate': 9.8058e-05, 'epoch': 0.28, 'throughput': 12361.99}

[INFO|callbacks.py:312] 2024-08-16 12:48:44,965 >> {'loss': 0.3981, 'learning_rate': 9.7825e-05, 'epoch': 0.29, 'throughput': 12363.00}

[INFO|callbacks.py:312] 2024-08-16 12:49:27,368 >> {'loss': 0.4121, 'learning_rate': 9.7578e-05, 'epoch': 0.31, 'throughput': 12363.06}

[INFO|callbacks.py:312] 2024-08-16 12:50:09,733 >> {'loss': 0.3920, 'learning_rate': 9.7319e-05, 'epoch': 0.33, 'throughput': 12363.66}

[INFO|callbacks.py:312] 2024-08-16 12:50:52,018 >> {'loss': 0.3845, 'learning_rate': 9.7047e-05, 'epoch': 0.34, 'throughput': 12365.26}

[INFO|callbacks.py:312] 2024-08-16 12:51:34,403 >> {'loss': 0.3970, 'learning_rate': 9.6762e-05, 'epoch': 0.36, 'throughput': 12365.45}

[INFO|callbacks.py:312] 2024-08-16 12:52:16,774 >> {'loss': 0.3753, 'learning_rate': 9.6465e-05, 'epoch': 0.37, 'throughput': 12365.80}

[INFO|callbacks.py:312] 2024-08-16 12:52:59,105 >> {'loss': 0.3870, 'learning_rate': 9.6155e-05, 'epoch': 0.39, 'throughput': 12366.58}

[INFO|callbacks.py:312] 2024-08-16 12:53:41,453 >> {'loss': 0.3724, 'learning_rate': 9.5832e-05, 'epoch': 0.40, 'throughput': 12367.11}

[INFO|callbacks.py:312] 2024-08-16 12:54:23,808 >> {'loss': 0.4394, 'learning_rate': 9.5497e-05, 'epoch': 0.42, 'throughput': 12367.54}

[INFO|callbacks.py:312] 2024-08-16 12:55:06,153 >> {'loss': 0.4177, 'learning_rate': 9.5150e-05, 'epoch': 0.43, 'throughput': 12368.03}

[INFO|callbacks.py:312] 2024-08-16 12:55:48,525 >> {'loss': 0.3939, 'learning_rate': 9.4790e-05, 'epoch': 0.45, 'throughput': 12368.21}

[INFO|callbacks.py:312] 2024-08-16 12:56:30,845 >> {'loss': 0.4207, 'learning_rate': 9.4419e-05, 'epoch': 0.47, 'throughput': 12368.89}

[INFO|callbacks.py:312] 2024-08-16 12:57:13,203 >> {'loss': 0.3653, 'learning_rate': 9.4035e-05, 'epoch': 0.48, 'throughput': 12369.17}

[INFO|callbacks.py:312] 2024-08-16 12:57:55,540 >> {'loss': 0.3925, 'learning_rate': 9.3640e-05, 'epoch': 0.50, 'throughput': 12369.62}

[INFO|callbacks.py:312] 2024-08-16 12:58:37,884 >> {'loss': 0.3982, 'learning_rate': 9.3233e-05, 'epoch': 0.51, 'throughput': 12369.99}

[INFO|callbacks.py:312] 2024-08-16 12:59:20,230 >> {'loss': 0.3709, 'learning_rate': 9.2814e-05, 'epoch': 0.53, 'throughput': 12370.31}

[INFO|callbacks.py:312] 2024-08-16 13:00:02,597 >> {'loss': 0.3744, 'learning_rate': 9.2383e-05, 'epoch': 0.54, 'throughput': 12370.44}

[INFO|callbacks.py:312] 2024-08-16 13:00:44,955 >> {'loss': 0.3929, 'learning_rate': 9.1941e-05, 'epoch': 0.56, 'throughput': 12370.64}

[INFO|callbacks.py:312] 2024-08-16 13:01:27,325 >> {'loss': 0.3716, 'learning_rate': 9.1488e-05, 'epoch': 0.57, 'throughput': 12370.73}

[INFO|callbacks.py:312] 2024-08-16 13:02:09,657 >> {'loss': 0.3959, 'learning_rate': 9.1023e-05, 'epoch': 0.59, 'throughput': 12371.11}

[INFO|callbacks.py:312] 2024-08-16 13:02:52,031 >> {'loss': 0.3514, 'learning_rate': 9.0547e-05, 'epoch': 0.61, 'throughput': 12371.15}

[INFO|callbacks.py:312] 2024-08-16 13:03:34,381 >> {'loss': 0.3767, 'learning_rate': 9.0061e-05, 'epoch': 0.62, 'throughput': 12371.38}

[INFO|callbacks.py:312] 2024-08-16 13:04:16,765 >> {'loss': 0.3710, 'learning_rate': 8.9563e-05, 'epoch': 0.64, 'throughput': 12371.34}

[INFO|callbacks.py:312] 2024-08-16 13:04:59,089 >> {'loss': 0.3529, 'learning_rate': 8.9055e-05, 'epoch': 0.65, 'throughput': 12371.72}

[INFO|callbacks.py:312] 2024-08-16 13:05:41,445 >> {'loss': 0.3044, 'learning_rate': 8.8536e-05, 'epoch': 0.67, 'throughput': 12371.87}

[INFO|callbacks.py:312] 2024-08-16 13:06:23,764 >> {'loss': 0.3532, 'learning_rate': 8.8007e-05, 'epoch': 0.68, 'throughput': 12372.26}

[INFO|callbacks.py:312] 2024-08-16 13:07:06,185 >> {'loss': 0.3461, 'learning_rate': 8.7467e-05, 'epoch': 0.70, 'throughput': 12371.97}

[INFO|callbacks.py:312] 2024-08-16 13:07:48,606 >> {'loss': 0.3513, 'learning_rate': 8.6918e-05, 'epoch': 0.71, 'throughput': 12371.69}

[INFO|callbacks.py:312] 2024-08-16 13:08:30,947 >> {'loss': 0.3842, 'learning_rate': 8.6358e-05, 'epoch': 0.73, 'throughput': 12371.92}

[INFO|callbacks.py:312] 2024-08-16 13:09:13,288 >> {'loss': 0.3630, 'learning_rate': 8.5789e-05, 'epoch': 0.74, 'throughput': 12372.14}

[INFO|callbacks.py:312] 2024-08-16 13:09:55,576 >> {'loss': 0.3079, 'learning_rate': 8.5210e-05, 'epoch': 0.76, 'throughput': 12372.67}

[INFO|callbacks.py:312] 2024-08-16 13:10:37,964 >> {'loss': 0.3769, 'learning_rate': 8.4621e-05, 'epoch': 0.78, 'throughput': 12372.59}

[INFO|callbacks.py:312] 2024-08-16 13:11:20,331 >> {'loss': 0.3907, 'learning_rate': 8.4023e-05, 'epoch': 0.79, 'throughput': 12372.64}

[INFO|callbacks.py:312] 2024-08-16 13:12:02,695 >> {'loss': 0.3457, 'learning_rate': 8.3416e-05, 'epoch': 0.81, 'throughput': 12372.69}

[INFO|callbacks.py:312] 2024-08-16 13:12:45,011 >> {'loss': 0.3889, 'learning_rate': 8.2800e-05, 'epoch': 0.82, 'throughput': 12373.02}

[INFO|callbacks.py:312] 2024-08-16 13:13:27,369 >> {'loss': 0.3142, 'learning_rate': 8.2174e-05, 'epoch': 0.84, 'throughput': 12373.10}

[INFO|callbacks.py:312] 2024-08-16 13:14:09,715 >> {'loss': 0.3299, 'learning_rate': 8.1541e-05, 'epoch': 0.85, 'throughput': 12373.25}

[INFO|callbacks.py:312] 2024-08-16 13:14:52,085 >> {'loss': 0.3425, 'learning_rate': 8.0898e-05, 'epoch': 0.87, 'throughput': 12373.26}

[INFO|callbacks.py:312] 2024-08-16 13:15:34,456 >> {'loss': 0.3363, 'learning_rate': 8.0247e-05, 'epoch': 0.88, 'throughput': 12373.27}

[INFO|callbacks.py:312] 2024-08-16 13:16:16,801 >> {'loss': 0.3725, 'learning_rate': 7.9589e-05, 'epoch': 0.90, 'throughput': 12373.41}

[INFO|callbacks.py:312] 2024-08-16 13:16:59,117 >> {'loss': 0.3397, 'learning_rate': 7.8922e-05, 'epoch': 0.92, 'throughput': 12373.69}

[INFO|callbacks.py:312] 2024-08-16 13:17:41,455 >> {'loss': 0.2812, 'learning_rate': 7.8247e-05, 'epoch': 0.93, 'throughput': 12373.85}

[INFO|callbacks.py:312] 2024-08-16 13:18:23,836 >> {'loss': 0.3555, 'learning_rate': 7.7564e-05, 'epoch': 0.95, 'throughput': 12373.80}

[INFO|callbacks.py:312] 2024-08-16 13:19:06,200 >> {'loss': 0.3362, 'learning_rate': 7.6875e-05, 'epoch': 0.96, 'throughput': 12373.83}

[INFO|callbacks.py:312] 2024-08-16 13:19:48,535 >> {'loss': 0.3133, 'learning_rate': 7.6177e-05, 'epoch': 0.98, 'throughput': 12374.00}

[INFO|callbacks.py:312] 2024-08-16 13:20:30,848 >> {'loss': 0.3119, 'learning_rate': 7.5473e-05, 'epoch': 0.99, 'throughput': 12374.26}

[INFO|callbacks.py:312] 2024-08-16 13:21:13,219 >> {'loss': 0.3117, 'learning_rate': 7.4762e-05, 'epoch': 1.01, 'throughput': 12374.25}

[INFO|callbacks.py:312] 2024-08-16 13:21:55,569 >> {'loss': 0.3290, 'learning_rate': 7.4044e-05, 'epoch': 1.02, 'throughput': 12374.33}

[INFO|callbacks.py:312] 2024-08-16 13:22:37,938 >> {'loss': 0.2790, 'learning_rate': 7.3320e-05, 'epoch': 1.04, 'throughput': 12374.33}

[INFO|callbacks.py:312] 2024-08-16 13:23:20,316 >> {'loss': 0.2682, 'learning_rate': 7.2590e-05, 'epoch': 1.06, 'throughput': 12374.30}

[INFO|callbacks.py:312] 2024-08-16 13:24:02,700 >> {'loss': 0.2930, 'learning_rate': 7.1853e-05, 'epoch': 1.07, 'throughput': 12374.23}

[INFO|callbacks.py:312] 2024-08-16 13:24:45,029 >> {'loss': 0.3150, 'learning_rate': 7.1110e-05, 'epoch': 1.09, 'throughput': 12374.40}

[INFO|callbacks.py:312] 2024-08-16 13:25:27,456 >> {'loss': 0.2890, 'learning_rate': 7.0362e-05, 'epoch': 1.10, 'throughput': 12374.16}

[INFO|callbacks.py:312] 2024-08-16 13:26:09,792 >> {'loss': 0.2808, 'learning_rate': 6.9608e-05, 'epoch': 1.12, 'throughput': 12374.30}

[INFO|callbacks.py:312] 2024-08-16 13:26:52,153 >> {'loss': 0.3180, 'learning_rate': 6.8849e-05, 'epoch': 1.13, 'throughput': 12374.33}

[INFO|callbacks.py:312] 2024-08-16 13:27:34,497 >> {'loss': 0.2685, 'learning_rate': 6.8085e-05, 'epoch': 1.15, 'throughput': 12374.43}

[INFO|callbacks.py:312] 2024-08-16 13:28:16,828 >> {'loss': 0.3121, 'learning_rate': 6.7315e-05, 'epoch': 1.16, 'throughput': 12374.58}

[INFO|callbacks.py:312] 2024-08-16 13:28:59,183 >> {'loss': 0.2835, 'learning_rate': 6.6542e-05, 'epoch': 1.18, 'throughput': 12374.63}

[INFO|callbacks.py:312] 2024-08-16 13:29:41,447 >> {'loss': 0.2905, 'learning_rate': 6.5763e-05, 'epoch': 1.19, 'throughput': 12375.02}

[INFO|callbacks.py:312] 2024-08-16 13:30:23,793 >> {'loss': 0.3277, 'learning_rate': 6.4981e-05, 'epoch': 1.21, 'throughput': 12375.10}

[INFO|callbacks.py:312] 2024-08-16 13:31:06,159 >> {'loss': 0.2788, 'learning_rate': 6.4194e-05, 'epoch': 1.23, 'throughput': 12375.10}

[INFO|callbacks.py:312] 2024-08-16 13:31:48,520 >> {'loss': 0.2971, 'learning_rate': 6.3404e-05, 'epoch': 1.24, 'throughput': 12375.12}

[INFO|callbacks.py:312] 2024-08-16 13:32:30,924 >> {'loss': 0.2870, 'learning_rate': 6.2610e-05, 'epoch': 1.26, 'throughput': 12374.98}

[INFO|callbacks.py:312] 2024-08-16 13:33:13,286 >> {'loss': 0.3196, 'learning_rate': 6.1812e-05, 'epoch': 1.27, 'throughput': 12375.00}

[INFO|callbacks.py:312] 2024-08-16 13:33:55,649 >> {'loss': 0.3021, 'learning_rate': 6.1011e-05, 'epoch': 1.29, 'throughput': 12375.01}

[INFO|callbacks.py:312] 2024-08-16 13:34:38,007 >> {'loss': 0.2329, 'learning_rate': 6.0208e-05, 'epoch': 1.30, 'throughput': 12375.04}

[INFO|callbacks.py:312] 2024-08-16 13:35:20,352 >> {'loss': 0.2803, 'learning_rate': 5.9401e-05, 'epoch': 1.32, 'throughput': 12375.12}

[INFO|callbacks.py:312] 2024-08-16 13:36:02,691 >> {'loss': 0.2744, 'learning_rate': 5.8592e-05, 'epoch': 1.33, 'throughput': 12375.21}

[INFO|callbacks.py:312] 2024-08-16 13:36:44,975 >> {'loss': 0.3332, 'learning_rate': 5.7781e-05, 'epoch': 1.35, 'throughput': 12375.49}

[INFO|callbacks.py:312] 2024-08-16 13:37:27,361 >> {'loss': 0.3223, 'learning_rate': 5.6968e-05, 'epoch': 1.37, 'throughput': 12375.42}

[INFO|callbacks.py:312] 2024-08-16 13:38:09,707 >> {'loss': 0.3127, 'learning_rate': 5.6152e-05, 'epoch': 1.38, 'throughput': 12375.48}

[INFO|callbacks.py:312] 2024-08-16 13:38:52,014 >> {'loss': 0.2908, 'learning_rate': 5.5335e-05, 'epoch': 1.40, 'throughput': 12375.67}

[INFO|callbacks.py:312] 2024-08-16 13:39:34,381 >> {'loss': 0.2918, 'learning_rate': 5.4517e-05, 'epoch': 1.41, 'throughput': 12375.66}

[INFO|callbacks.py:312] 2024-08-16 13:40:16,717 >> {'loss': 0.2680, 'learning_rate': 5.3697e-05, 'epoch': 1.43, 'throughput': 12375.75}

[INFO|callbacks.py:312] 2024-08-16 13:40:59,093 >> {'loss': 0.2862, 'learning_rate': 5.2877e-05, 'epoch': 1.44, 'throughput': 12375.71}

[INFO|callbacks.py:312] 2024-08-16 13:41:41,470 >> {'loss': 0.3054, 'learning_rate': 5.2055e-05, 'epoch': 1.46, 'throughput': 12375.67}

[INFO|callbacks.py:312] 2024-08-16 13:42:23,818 >> {'loss': 0.2814, 'learning_rate': 5.1233e-05, 'epoch': 1.47, 'throughput': 12375.72}

[INFO|callbacks.py:312] 2024-08-16 13:43:06,212 >> {'loss': 0.2703, 'learning_rate': 5.0411e-05, 'epoch': 1.49, 'throughput': 12375.63}

[INFO|callbacks.py:312] 2024-08-16 13:43:48,537 >> {'loss': 0.2689, 'learning_rate': 4.9589e-05, 'epoch': 1.51, 'throughput': 12375.75}

[INFO|callbacks.py:312] 2024-08-16 13:44:30,897 >> {'loss': 0.3013, 'learning_rate': 4.8767e-05, 'epoch': 1.52, 'throughput': 12375.76}

[INFO|callbacks.py:312] 2024-08-16 13:45:13,277 >> {'loss': 0.2751, 'learning_rate': 4.7945e-05, 'epoch': 1.54, 'throughput': 12375.72}

[INFO|callbacks.py:312] 2024-08-16 13:45:55,623 >> {'loss': 0.3178, 'learning_rate': 4.7123e-05, 'epoch': 1.55, 'throughput': 12375.77}

[INFO|callbacks.py:312] 2024-08-16 13:46:37,977 >> {'loss': 0.2742, 'learning_rate': 4.6303e-05, 'epoch': 1.57, 'throughput': 12375.80}

[INFO|callbacks.py:312] 2024-08-16 13:47:20,326 >> {'loss': 0.2751, 'learning_rate': 4.5483e-05, 'epoch': 1.58, 'throughput': 12375.84}

[INFO|callbacks.py:312] 2024-08-16 13:48:02,717 >> {'loss': 0.3256, 'learning_rate': 4.4665e-05, 'epoch': 1.60, 'throughput': 12375.76}

[INFO|callbacks.py:312] 2024-08-16 13:48:45,048 >> {'loss': 0.2603, 'learning_rate': 4.3848e-05, 'epoch': 1.61, 'throughput': 12375.86}

[INFO|callbacks.py:312] 2024-08-16 13:49:27,372 >> {'loss': 0.2598, 'learning_rate': 4.3032e-05, 'epoch': 1.63, 'throughput': 12375.97}

[INFO|callbacks.py:312] 2024-08-16 13:50:09,716 >> {'loss': 0.2811, 'learning_rate': 4.2219e-05, 'epoch': 1.65, 'throughput': 12376.02}

[INFO|callbacks.py:312] 2024-08-16 13:50:52,092 >> {'loss': 0.2386, 'learning_rate': 4.1408e-05, 'epoch': 1.66, 'throughput': 12375.99}

[INFO|callbacks.py:312] 2024-08-16 13:51:34,476 >> {'loss': 0.2999, 'learning_rate': 4.0599e-05, 'epoch': 1.68, 'throughput': 12375.93}

[INFO|callbacks.py:312] 2024-08-16 13:52:16,880 >> {'loss': 0.2857, 'learning_rate': 3.9792e-05, 'epoch': 1.69, 'throughput': 12375.82}

[INFO|callbacks.py:312] 2024-08-16 13:52:59,255 >> {'loss': 0.2533, 'learning_rate': 3.8989e-05, 'epoch': 1.71, 'throughput': 12375.79}

[INFO|callbacks.py:312] 2024-08-16 13:53:41,557 >> {'loss': 0.3060, 'learning_rate': 3.8188e-05, 'epoch': 1.72, 'throughput': 12375.96}

[INFO|callbacks.py:312] 2024-08-16 13:54:23,889 >> {'loss': 0.2594, 'learning_rate': 3.7390e-05, 'epoch': 1.74, 'throughput': 12376.04}

[INFO|callbacks.py:312] 2024-08-16 13:55:06,247 >> {'loss': 0.2840, 'learning_rate': 3.6596e-05, 'epoch': 1.75, 'throughput': 12376.05}

[INFO|callbacks.py:312] 2024-08-16 13:55:48,620 >> {'loss': 0.3300, 'learning_rate': 3.5806e-05, 'epoch': 1.77, 'throughput': 12376.03}

[INFO|callbacks.py:312] 2024-08-16 13:56:31,000 >> {'loss': 0.2968, 'learning_rate': 3.5019e-05, 'epoch': 1.78, 'throughput': 12375.98}

[INFO|callbacks.py:312] 2024-08-16 13:57:13,317 >> {'loss': 0.2836, 'learning_rate': 3.4237e-05, 'epoch': 1.80, 'throughput': 12376.10}

[INFO|callbacks.py:312] 2024-08-16 13:57:55,663 >> {'loss': 0.2452, 'learning_rate': 3.3458e-05, 'epoch': 1.82, 'throughput': 12376.14}

[INFO|callbacks.py:312] 2024-08-16 13:58:38,049 >> {'loss': 0.2526, 'learning_rate': 3.2685e-05, 'epoch': 1.83, 'throughput': 12376.09}

[INFO|callbacks.py:312] 2024-08-16 13:59:20,385 >> {'loss': 0.2578, 'learning_rate': 3.1915e-05, 'epoch': 1.85, 'throughput': 12376.15}

[INFO|callbacks.py:312] 2024-08-16 14:00:02,721 >> {'loss': 0.2895, 'learning_rate': 3.1151e-05, 'epoch': 1.86, 'throughput': 12376.22}

[INFO|callbacks.py:312] 2024-08-16 14:00:45,089 >> {'loss': 0.3047, 'learning_rate': 3.0392e-05, 'epoch': 1.88, 'throughput': 12376.20}

[INFO|callbacks.py:312] 2024-08-16 14:01:27,480 >> {'loss': 0.2958, 'learning_rate': 2.9638e-05, 'epoch': 1.89, 'throughput': 12376.13}

[INFO|callbacks.py:312] 2024-08-16 14:02:09,856 >> {'loss': 0.2598, 'learning_rate': 2.8890e-05, 'epoch': 1.91, 'throughput': 12376.10}

[INFO|callbacks.py:312] 2024-08-16 14:02:52,224 >> {'loss': 0.2699, 'learning_rate': 2.8147e-05, 'epoch': 1.92, 'throughput': 12376.09}

[INFO|callbacks.py:312] 2024-08-16 14:03:34,541 >> {'loss': 0.2827, 'learning_rate': 2.7410e-05, 'epoch': 1.94, 'throughput': 12376.20}

[INFO|callbacks.py:312] 2024-08-16 14:04:16,887 >> {'loss': 0.3119, 'learning_rate': 2.6680e-05, 'epoch': 1.96, 'throughput': 12376.24}

[INFO|callbacks.py:312] 2024-08-16 14:04:59,273 >> {'loss': 0.2837, 'learning_rate': 2.5956e-05, 'epoch': 1.97, 'throughput': 12376.19}

[INFO|callbacks.py:312] 2024-08-16 14:05:41,665 >> {'loss': 0.2511, 'learning_rate': 2.5238e-05, 'epoch': 1.99, 'throughput': 12376.12}

[INFO|callbacks.py:312] 2024-08-16 14:06:24,053 >> {'loss': 0.2501, 'learning_rate': 2.4527e-05, 'epoch': 2.00, 'throughput': 12376.06}

[INFO|callbacks.py:312] 2024-08-16 14:07:06,393 >> {'loss': 0.2296, 'learning_rate': 2.3823e-05, 'epoch': 2.02, 'throughput': 12376.11}

[INFO|callbacks.py:312] 2024-08-16 14:07:48,745 >> {'loss': 0.2333, 'learning_rate': 2.3125e-05, 'epoch': 2.03, 'throughput': 12376.14}

[INFO|callbacks.py:312] 2024-08-16 14:08:31,135 >> {'loss': 0.2119, 'learning_rate': 2.2436e-05, 'epoch': 2.05, 'throughput': 12376.08}

[INFO|callbacks.py:312] 2024-08-16 14:09:13,523 >> {'loss': 0.2676, 'learning_rate': 2.1753e-05, 'epoch': 2.06, 'throughput': 12376.02}

[INFO|callbacks.py:312] 2024-08-16 14:09:55,858 >> {'loss': 0.2285, 'learning_rate': 2.1078e-05, 'epoch': 2.08, 'throughput': 12376.08}

[INFO|callbacks.py:312] 2024-08-16 14:10:38,203 >> {'loss': 0.2281, 'learning_rate': 2.0411e-05, 'epoch': 2.10, 'throughput': 12376.12}

[INFO|callbacks.py:312] 2024-08-16 14:11:20,589 >> {'loss': 0.2701, 'learning_rate': 1.9753e-05, 'epoch': 2.11, 'throughput': 12376.07}

[INFO|callbacks.py:312] 2024-08-16 14:12:03,007 >> {'loss': 0.2033, 'learning_rate': 1.9102e-05, 'epoch': 2.13, 'throughput': 12375.95}

[INFO|callbacks.py:312] 2024-08-16 14:12:45,356 >> {'loss': 0.2489, 'learning_rate': 1.8459e-05, 'epoch': 2.14, 'throughput': 12375.99}

[INFO|callbacks.py:312] 2024-08-16 14:13:27,726 >> {'loss': 0.2294, 'learning_rate': 1.7826e-05, 'epoch': 2.16, 'throughput': 12375.97}

[INFO|callbacks.py:312] 2024-08-16 14:14:10,114 >> {'loss': 0.2452, 'learning_rate': 1.7200e-05, 'epoch': 2.17, 'throughput': 12375.92}

[INFO|callbacks.py:312] 2024-08-16 14:14:52,477 >> {'loss': 0.2420, 'learning_rate': 1.6584e-05, 'epoch': 2.19, 'throughput': 12375.92}

[INFO|callbacks.py:312] 2024-08-16 14:15:34,799 >> {'loss': 0.2894, 'learning_rate': 1.5977e-05, 'epoch': 2.20, 'throughput': 12376.01}

[INFO|callbacks.py:312] 2024-08-16 14:16:17,116 >> {'loss': 0.2310, 'learning_rate': 1.5379e-05, 'epoch': 2.22, 'throughput': 12376.10}

[INFO|callbacks.py:312] 2024-08-16 14:16:59,419 >> {'loss': 0.2412, 'learning_rate': 1.4790e-05, 'epoch': 2.23, 'throughput': 12376.22}

[INFO|callbacks.py:312] 2024-08-16 14:17:41,780 >> {'loss': 0.2464, 'learning_rate': 1.4211e-05, 'epoch': 2.25, 'throughput': 12376.23}

[INFO|callbacks.py:312] 2024-08-16 14:18:24,121 >> {'loss': 0.2231, 'learning_rate': 1.3642e-05, 'epoch': 2.27, 'throughput': 12376.27}

[INFO|callbacks.py:312] 2024-08-16 14:19:06,463 >> {'loss': 0.2432, 'learning_rate': 1.3082e-05, 'epoch': 2.28, 'throughput': 12376.31}

[INFO|callbacks.py:312] 2024-08-16 14:19:48,827 >> {'loss': 0.2147, 'learning_rate': 1.2533e-05, 'epoch': 2.30, 'throughput': 12376.31}

[INFO|callbacks.py:312] 2024-08-16 14:20:31,125 >> {'loss': 0.2490, 'learning_rate': 1.1993e-05, 'epoch': 2.31, 'throughput': 12376.43}

[INFO|callbacks.py:312] 2024-08-16 14:21:13,484 >> {'loss': 0.2362, 'learning_rate': 1.1464e-05, 'epoch': 2.33, 'throughput': 12376.44}

[INFO|callbacks.py:312] 2024-08-16 14:21:55,830 >> {'loss': 0.2232, 'learning_rate': 1.0945e-05, 'epoch': 2.34, 'throughput': 12376.47}

[INFO|callbacks.py:312] 2024-08-16 14:22:38,180 >> {'loss': 0.2569, 'learning_rate': 1.0437e-05, 'epoch': 2.36, 'throughput': 12376.49}

[INFO|callbacks.py:312] 2024-08-16 14:23:20,529 >> {'loss': 0.2140, 'learning_rate': 9.9394e-06, 'epoch': 2.37, 'throughput': 12376.52}

[INFO|callbacks.py:312] 2024-08-16 14:24:02,885 >> {'loss': 0.2451, 'learning_rate': 9.4527e-06, 'epoch': 2.39, 'throughput': 12376.53}

[INFO|callbacks.py:312] 2024-08-16 14:24:45,189 >> {'loss': 0.2288, 'learning_rate': 8.9770e-06, 'epoch': 2.41, 'throughput': 12376.63}

[INFO|callbacks.py:312] 2024-08-16 14:25:27,580 >> {'loss': 0.2332, 'learning_rate': 8.5124e-06, 'epoch': 2.42, 'throughput': 12376.58}

[INFO|callbacks.py:312] 2024-08-16 14:26:09,900 >> {'loss': 0.2020, 'learning_rate': 8.0590e-06, 'epoch': 2.44, 'throughput': 12376.66}

[INFO|callbacks.py:312] 2024-08-16 14:26:52,220 >> {'loss': 0.2270, 'learning_rate': 7.6170e-06, 'epoch': 2.45, 'throughput': 12376.73}

[INFO|callbacks.py:312] 2024-08-16 14:27:34,558 >> {'loss': 0.2429, 'learning_rate': 7.1864e-06, 'epoch': 2.47, 'throughput': 12376.77}

[INFO|callbacks.py:312] 2024-08-16 14:28:16,947 >> {'loss': 0.2147, 'learning_rate': 6.7674e-06, 'epoch': 2.48, 'throughput': 12376.72}

[INFO|callbacks.py:312] 2024-08-16 14:28:59,276 >> {'loss': 0.2423, 'learning_rate': 6.3601e-06, 'epoch': 2.50, 'throughput': 12376.78}

[INFO|callbacks.py:312] 2024-08-16 14:29:41,633 >> {'loss': 0.2828, 'learning_rate': 5.9647e-06, 'epoch': 2.51, 'throughput': 12376.78}

[INFO|callbacks.py:312] 2024-08-16 14:30:23,993 >> {'loss': 0.2461, 'learning_rate': 5.5811e-06, 'epoch': 2.53, 'throughput': 12376.78}

[INFO|callbacks.py:312] 2024-08-16 14:31:06,397 >> {'loss': 0.2960, 'learning_rate': 5.2095e-06, 'epoch': 2.55, 'throughput': 12376.71}

[INFO|callbacks.py:312] 2024-08-16 14:31:48,695 >> {'loss': 0.2061, 'learning_rate': 4.8501e-06, 'epoch': 2.56, 'throughput': 12376.82}

[INFO|callbacks.py:312] 2024-08-16 14:32:31,012 >> {'loss': 0.2323, 'learning_rate': 4.5029e-06, 'epoch': 2.58, 'throughput': 12376.90}

[INFO|callbacks.py:312] 2024-08-16 14:33:13,360 >> {'loss': 0.2156, 'learning_rate': 4.1680e-06, 'epoch': 2.59, 'throughput': 12376.92}

[INFO|callbacks.py:312] 2024-08-16 14:33:55,692 >> {'loss': 0.2393, 'learning_rate': 3.8455e-06, 'epoch': 2.61, 'throughput': 12376.97}

[INFO|callbacks.py:312] 2024-08-16 14:34:38,031 >> {'loss': 0.2165, 'learning_rate': 3.5354e-06, 'epoch': 2.62, 'throughput': 12377.00}

[INFO|callbacks.py:312] 2024-08-16 14:35:20,371 >> {'loss': 0.2313, 'learning_rate': 3.2380e-06, 'epoch': 2.64, 'throughput': 12377.04}

[INFO|callbacks.py:312] 2024-08-16 14:36:02,747 >> {'loss': 0.2336, 'learning_rate': 2.9532e-06, 'epoch': 2.65, 'throughput': 12377.01}

[INFO|callbacks.py:312] 2024-08-16 14:36:45,116 >> {'loss': 0.2311, 'learning_rate': 2.6811e-06, 'epoch': 2.67, 'throughput': 12376.99}

[INFO|callbacks.py:312] 2024-08-16 14:37:27,476 >> {'loss': 0.2130, 'learning_rate': 2.4218e-06, 'epoch': 2.68, 'throughput': 12376.99}

[INFO|callbacks.py:312] 2024-08-16 14:38:09,820 >> {'loss': 0.2364, 'learning_rate': 2.1754e-06, 'epoch': 2.70, 'throughput': 12377.02}

[INFO|callbacks.py:312] 2024-08-16 14:38:52,212 >> {'loss': 0.2147, 'learning_rate': 1.9420e-06, 'epoch': 2.72, 'throughput': 12376.97}

[INFO|callbacks.py:312] 2024-08-16 14:39:34,551 >> {'loss': 0.2316, 'learning_rate': 1.7215e-06, 'epoch': 2.73, 'throughput': 12377.00}

[INFO|callbacks.py:312] 2024-08-16 14:40:16,929 >> {'loss': 0.2263, 'learning_rate': 1.5141e-06, 'epoch': 2.75, 'throughput': 12376.97}

[INFO|callbacks.py:312] 2024-08-16 14:40:59,276 >> {'loss': 0.2323, 'learning_rate': 1.3198e-06, 'epoch': 2.76, 'throughput': 12376.99}

[INFO|callbacks.py:312] 2024-08-16 14:41:41,677 >> {'loss': 0.2246, 'learning_rate': 1.1387e-06, 'epoch': 2.78, 'throughput': 12376.92}

[INFO|callbacks.py:312] 2024-08-16 14:42:24,023 >> {'loss': 0.2287, 'learning_rate': 9.7079e-07, 'epoch': 2.79, 'throughput': 12376.95}

[INFO|callbacks.py:312] 2024-08-16 14:43:06,390 >> {'loss': 0.2081, 'learning_rate': 8.1616e-07, 'epoch': 2.81, 'throughput': 12376.94}

[INFO|callbacks.py:312] 2024-08-16 14:43:48,708 >> {'loss': 0.2260, 'learning_rate': 6.7483e-07, 'epoch': 2.82, 'throughput': 12377.00}

[INFO|callbacks.py:312] 2024-08-16 14:44:31,032 >> {'loss': 0.2730, 'learning_rate': 5.4685e-07, 'epoch': 2.84, 'throughput': 12377.06}

[INFO|callbacks.py:312] 2024-08-16 14:45:13,357 >> {'loss': 0.2634, 'learning_rate': 4.3224e-07, 'epoch': 2.86, 'throughput': 12377.12}

[INFO|callbacks.py:312] 2024-08-16 14:45:55,708 >> {'loss': 0.2592, 'learning_rate': 3.3105e-07, 'epoch': 2.87, 'throughput': 12377.13}

[INFO|callbacks.py:312] 2024-08-16 14:46:38,100 >> {'loss': 0.2566, 'learning_rate': 2.4329e-07, 'epoch': 2.89, 'throughput': 12377.08}

[INFO|callbacks.py:312] 2024-08-16 14:47:20,422 >> {'loss': 0.2575, 'learning_rate': 1.6899e-07, 'epoch': 2.90, 'throughput': 12377.14}

[INFO|callbacks.py:312] 2024-08-16 14:48:02,805 >> {'loss': 0.2482, 'learning_rate': 1.0818e-07, 'epoch': 2.92, 'throughput': 12377.10}

[INFO|callbacks.py:312] 2024-08-16 14:48:45,154 >> {'loss': 0.2483, 'learning_rate': 6.0859e-08, 'epoch': 2.93, 'throughput': 12377.12}

[INFO|callbacks.py:312] 2024-08-16 14:49:27,463 >> {'loss': 0.2359, 'learning_rate': 2.7052e-08, 'epoch': 2.95, 'throughput': 12377.19}

[INFO|callbacks.py:312] 2024-08-16 14:50:09,825 >> {'loss': 0.2434, 'learning_rate': 6.7634e-09, 'epoch': 2.96, 'throughput': 12377.19}

[INFO|callbacks.py:312] 2024-08-16 14:50:52,222 >> {'loss': 0.2042, 'learning_rate': 0.0000e+00, 'epoch': 2.98, 'throughput': 12377.13}

[INFO|trainer.py:3503] 2024-08-16 14:50:52,226 >> Saving model checkpoint to saves/Qwen2-7B-Chat/lora/train_08-16-2/checkpoint-384

[INFO|configuration_utils.py:733] 2024-08-16 14:50:52,618 >> loading configuration file config.json from cache at /home/u16abc30f4f31eb21df44af89e63a742/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json

[INFO|configuration_utils.py:800] 2024-08-16 14:50:52,622 >> Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.4",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|tokenization_utils_base.py:2702] 2024-08-16 14:50:52,748 >> tokenizer config file saved in saves/Qwen2-7B-Chat/lora/train_08-16-2/checkpoint-384/tokenizer_config.json

[INFO|tokenization_utils_base.py:2711] 2024-08-16 14:50:52,750 >> Special tokens file saved in saves/Qwen2-7B-Chat/lora/train_08-16-2/checkpoint-384/special_tokens_map.json

[INFO|trainer.py:2394] 2024-08-16 14:50:53,095 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:3503] 2024-08-16 14:50:53,100 >> Saving model checkpoint to saves/Qwen2-7B-Chat/lora/train_08-16-2

[INFO|configuration_utils.py:733] 2024-08-16 14:50:53,350 >> loading configuration file config.json from cache at /home/u16abc30f4f31eb21df44af89e63a742/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json

[INFO|configuration_utils.py:800] 2024-08-16 14:50:53,352 >> Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.4",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|tokenization_utils_base.py:2702] 2024-08-16 14:50:53,458 >> tokenizer config file saved in saves/Qwen2-7B-Chat/lora/train_08-16-2/tokenizer_config.json

[INFO|tokenization_utils_base.py:2711] 2024-08-16 14:50:53,460 >> Special tokens file saved in saves/Qwen2-7B-Chat/lora/train_08-16-2/special_tokens_map.json

[WARNING|ploting.py:89] 2024-08-16 14:50:53,677 >> No metric eval_loss to plot.

[INFO|trainer.py:3819] 2024-08-16 14:50:53,683 >> 
***** Running Evaluation *****

[INFO|trainer.py:3821] 2024-08-16 14:50:53,683 >>   Num examples = 1735

[INFO|trainer.py:3824] 2024-08-16 14:50:53,683 >>   Batch size = 4

[INFO|modelcard.py:449] 2024-08-16 14:51:38,535 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}