|
06/17/2024 20:24:18 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
|
|
06/17/2024 20:24:18 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> |
|
|
|
06/17/2024 20:24:22 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_en... |
|
|
|
06/17/2024 20:24:27 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_zh... |
|
|
|
06/17/2024 20:24:32 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_en... |
|
|
|
06/17/2024 20:24:36 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json |
|
|
|
06/17/2024 20:24:36 - INFO - transformers.configuration_utils - Model config Qwen2Config { |
|
"_name_or_path": "Qwen/Qwen2-7B-Instruct", |
|
"architectures": [ |
|
"Qwen2ForCausalLM" |
|
], |
|
"attention_dropout": 0.0, |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151645, |
|
"hidden_act": "silu", |
|
"hidden_size": 3584, |
|
"initializer_range": 0.02, |
|
"intermediate_size": 18944, |
|
"max_position_embeddings": 32768, |
|
"max_window_layers": 28, |
|
"model_type": "qwen2", |
|
"num_attention_heads": 28, |
|
"num_hidden_layers": 28, |
|
"num_key_value_heads": 4, |
|
"rms_norm_eps": 1e-06, |
|
"rope_theta": 1000000.0, |
|
"sliding_window": 131072, |
|
"tie_word_embeddings": false, |
|
"torch_dtype": "bfloat16", |
|
"transformers_version": "4.41.2", |
|
"use_cache": true, |
|
"use_sliding_window": false, |
|
"vocab_size": 152064 |
|
} |
|
|
|
|
|
06/17/2024 20:24:36 - INFO - llamafactory.model.model_utils.quantization - Quantizing model to 4 bit. |
|
|
|
06/17/2024 20:24:36 - INFO - transformers.modeling_utils - loading weights file model.safetensors from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/model.safetensors.index.json |
|
|
|
06/17/2024 20:24:36 - INFO - transformers.modeling_utils - Instantiating Qwen2ForCausalLM model under default dtype torch.float16. |
|
|
|
06/17/2024 20:24:36 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151645 |
|
} |
|
|
|
|
|
06/17/2024 20:24:37 - INFO - llamafactory.model.model_utils.quantization - Quantizing model to 4 bit. |
|
|
|
06/17/2024 20:25:04 - INFO - transformers.modeling_utils - All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
|
|
06/17/2024 20:25:04 - INFO - transformers.modeling_utils - All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2-7B-Instruct. |
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
|
|
06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. |
|
|
|
06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. |
|
|
|
06/17/2024 20:25:04 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. |
|
|
|
06/17/2024 20:25:04 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA |
|
|
|
06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.misc - Found linear modules: o_proj,k_proj,up_proj,gate_proj,v_proj,q_proj,down_proj |
|
|
|
06/17/2024 20:25:04 - INFO - transformers.generation.configuration_utils - loading configuration file generation_config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/generation_config.json |
|
|
|
06/17/2024 20:25:04 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"do_sample": true, |
|
"eos_token_id": [ |
|
151645, |
|
151643 |
|
], |
|
"pad_token_id": 151643, |
|
"repetition_penalty": 1.05, |
|
"temperature": 0.7, |
|
"top_k": 20, |
|
"top_p": 0.8 |
|
} |
|
|
|
|
|
06/17/2024 20:25:04 - INFO - llamafactory.model.loader - trainable params: 20185088 || all params: 7635801600 || trainable%: 0.2643 |
|
|
|
06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. |
|
|
|
06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. |
|
|
|
06/17/2024 20:25:04 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. |
|
|
|
06/17/2024 20:25:04 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA |
|
|
|
06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.misc - Found linear modules: o_proj,up_proj,down_proj,q_proj,v_proj,k_proj,gate_proj |
|
|
|
06/17/2024 20:25:05 - INFO - llamafactory.model.loader - trainable params: 20185088 || all params: 7635801600 || trainable%: 0.2643 |
|
|
|
06/17/2024 20:25:05 - WARNING - accelerate.utils.other - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. |
|
|
|
06/17/2024 20:25:05 - INFO - transformers.trainer - Using auto half precision backend |
|
|
|
06/17/2024 20:25:05 - INFO - transformers.trainer - ***** Running training ***** |
|
|
|
06/17/2024 20:25:05 - INFO - transformers.trainer - Num examples = 2,000 |
|
|
|
06/17/2024 20:25:05 - INFO - transformers.trainer - Num Epochs = 3 |
|
|
|
06/17/2024 20:25:05 - INFO - transformers.trainer - Instantaneous batch size per device = 1 |
|
|
|
06/17/2024 20:25:05 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 16 |
|
|
|
06/17/2024 20:25:05 - INFO - transformers.trainer - Gradient Accumulation steps = 8 |
|
|
|
06/17/2024 20:25:05 - INFO - transformers.trainer - Total optimization steps = 375 |
|
|
|
06/17/2024 20:25:05 - INFO - transformers.trainer - Number of trainable parameters = 20,185,088 |
|
|
|
06/17/2024 20:26:04 - INFO - llamafactory.extras.callbacks - {'loss': 0.6880, 'learning_rate': 4.9978e-05, 'epoch': 0.04, 'throughput': 807.59} |
|
|
|
06/17/2024 20:26:54 - INFO - llamafactory.extras.callbacks - {'loss': 0.7630, 'learning_rate': 4.9912e-05, 'epoch': 0.08, 'throughput': 788.63} |
|
|
|
06/17/2024 20:27:47 - INFO - llamafactory.extras.callbacks - {'loss': 0.6882, 'learning_rate': 4.9803e-05, 'epoch': 0.12, 'throughput': 782.46} |
|
|
|
06/17/2024 20:28:41 - INFO - llamafactory.extras.callbacks - {'loss': 0.6951, 'learning_rate': 4.9650e-05, 'epoch': 0.16, 'throughput': 778.58} |
|
|
|
06/17/2024 20:29:32 - INFO - llamafactory.extras.callbacks - {'loss': 0.5008, 'learning_rate': 4.9454e-05, 'epoch': 0.20, 'throughput': 776.76} |
|
|
|
06/17/2024 20:30:18 - INFO - llamafactory.extras.callbacks - {'loss': 0.5420, 'learning_rate': 4.9215e-05, 'epoch': 0.24, 'throughput': 780.46} |
|
|
|
06/17/2024 20:31:09 - INFO - llamafactory.extras.callbacks - {'loss': 0.5369, 'learning_rate': 4.8933e-05, 'epoch': 0.28, 'throughput': 781.27} |
|
|
|
06/17/2024 20:32:05 - INFO - llamafactory.extras.callbacks - {'loss': 0.4948, 'learning_rate': 4.8609e-05, 'epoch': 0.32, 'throughput': 780.39} |
|
|
|
06/17/2024 20:32:52 - INFO - llamafactory.extras.callbacks - {'loss': 0.5244, 'learning_rate': 4.8244e-05, 'epoch': 0.36, 'throughput': 778.70} |
|
|
|
06/17/2024 20:33:37 - INFO - llamafactory.extras.callbacks - {'loss': 0.4210, 'learning_rate': 4.7839e-05, 'epoch': 0.40, 'throughput': 780.25} |
|
|
|
06/17/2024 20:34:25 - INFO - llamafactory.extras.callbacks - {'loss': 0.4517, 'learning_rate': 4.7393e-05, 'epoch': 0.44, 'throughput': 779.83} |
|
|
|
06/17/2024 20:35:19 - INFO - llamafactory.extras.callbacks - {'loss': 0.4661, 'learning_rate': 4.6908e-05, 'epoch': 0.48, 'throughput': 775.58} |
|
|
|
06/17/2024 20:36:09 - INFO - llamafactory.extras.callbacks - {'loss': 0.4928, 'learning_rate': 4.6384e-05, 'epoch': 0.52, 'throughput': 775.62} |
|
|
|
06/17/2024 20:37:00 - INFO - llamafactory.extras.callbacks - {'loss': 0.5424, 'learning_rate': 4.5823e-05, 'epoch': 0.56, 'throughput': 775.79} |
|
|
|
06/17/2024 20:37:52 - INFO - llamafactory.extras.callbacks - {'loss': 0.5419, 'learning_rate': 4.5225e-05, 'epoch': 0.60, 'throughput': 774.15} |
|
|
|
06/17/2024 20:38:39 - INFO - llamafactory.extras.callbacks - {'loss': 0.4558, 'learning_rate': 4.4592e-05, 'epoch': 0.64, 'throughput': 774.75} |
|
|
|
06/17/2024 20:39:27 - INFO - llamafactory.extras.callbacks - {'loss': 0.5656, 'learning_rate': 4.3925e-05, 'epoch': 0.68, 'throughput': 776.75} |
|
|
|
06/17/2024 20:40:18 - INFO - llamafactory.extras.callbacks - {'loss': 0.4832, 'learning_rate': 4.3224e-05, 'epoch': 0.72, 'throughput': 780.75} |
|
|
|
06/17/2024 20:41:04 - INFO - llamafactory.extras.callbacks - {'loss': 0.4626, 'learning_rate': 4.2492e-05, 'epoch': 0.76, 'throughput': 781.15} |
|
|
|
06/17/2024 20:41:56 - INFO - llamafactory.extras.callbacks - {'loss': 0.4837, 'learning_rate': 4.1728e-05, 'epoch': 0.80, 'throughput': 780.33} |
|
|
|
06/17/2024 20:41:56 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-100 |
|
|
|
06/17/2024 20:41:57 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json |
|
|
|
06/17/2024 20:41:57 - INFO - transformers.configuration_utils - Model config Qwen2Config { |
|
"architectures": [ |
|
"Qwen2ForCausalLM" |
|
], |
|
"attention_dropout": 0.0, |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151645, |
|
"hidden_act": "silu", |
|
"hidden_size": 3584, |
|
"initializer_range": 0.02, |
|
"intermediate_size": 18944, |
|
"max_position_embeddings": 32768, |
|
"max_window_layers": 28, |
|
"model_type": "qwen2", |
|
"num_attention_heads": 28, |
|
"num_hidden_layers": 28, |
|
"num_key_value_heads": 4, |
|
"rms_norm_eps": 1e-06, |
|
"rope_theta": 1000000.0, |
|
"sliding_window": 131072, |
|
"tie_word_embeddings": false, |
|
"torch_dtype": "bfloat16", |
|
"transformers_version": "4.41.2", |
|
"use_cache": true, |
|
"use_sliding_window": false, |
|
"vocab_size": 152064 |
|
} |
|
|
|
|
|
06/17/2024 20:41:57 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-100/tokenizer_config.json |
|
|
|
06/17/2024 20:41:57 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-100/special_tokens_map.json |
|
|
|
06/17/2024 20:42:48 - INFO - llamafactory.extras.callbacks - {'loss': 0.5144, 'learning_rate': 4.0936e-05, 'epoch': 0.84, 'throughput': 779.83} |
|
|
|
06/17/2024 20:43:33 - INFO - llamafactory.extras.callbacks - {'loss': 0.4930, 'learning_rate': 4.0115e-05, 'epoch': 0.88, 'throughput': 780.58} |
|
|
|
06/17/2024 20:44:20 - INFO - llamafactory.extras.callbacks - {'loss': 0.4083, 'learning_rate': 3.9268e-05, 'epoch': 0.92, 'throughput': 781.80} |
|
|
|
06/17/2024 20:45:14 - INFO - llamafactory.extras.callbacks - {'loss': 0.5172, 'learning_rate': 3.8396e-05, 'epoch': 0.96, 'throughput': 782.01} |
|
|
|
06/17/2024 20:46:10 - INFO - llamafactory.extras.callbacks - {'loss': 0.5843, 'learning_rate': 3.7500e-05, 'epoch': 1.00, 'throughput': 782.22} |
|
|
|
06/17/2024 20:46:58 - INFO - llamafactory.extras.callbacks - {'loss': 0.4567, 'learning_rate': 3.6582e-05, 'epoch': 1.04, 'throughput': 783.58} |
|
|
|
06/17/2024 20:47:43 - INFO - llamafactory.extras.callbacks - {'loss': 0.4180, 'learning_rate': 3.5644e-05, 'epoch': 1.08, 'throughput': 784.96} |
|
|
|
06/17/2024 20:48:37 - INFO - llamafactory.extras.callbacks - {'loss': 0.3785, 'learning_rate': 3.4688e-05, 'epoch': 1.12, 'throughput': 783.96} |
|
|
|
06/17/2024 20:49:22 - INFO - llamafactory.extras.callbacks - {'loss': 0.4097, 'learning_rate': 3.3714e-05, 'epoch': 1.16, 'throughput': 783.11} |
|
|
|
06/17/2024 20:50:09 - INFO - llamafactory.extras.callbacks - {'loss': 0.4507, 'learning_rate': 3.2725e-05, 'epoch': 1.20, 'throughput': 783.25} |
|
|
|
06/17/2024 20:51:00 - INFO - llamafactory.extras.callbacks - {'loss': 0.3680, 'learning_rate': 3.1723e-05, 'epoch': 1.24, 'throughput': 782.23} |
|
|
|
06/17/2024 20:51:53 - INFO - llamafactory.extras.callbacks - {'loss': 0.4301, 'learning_rate': 3.0709e-05, 'epoch': 1.28, 'throughput': 782.26} |
|
|
|
06/17/2024 20:52:43 - INFO - llamafactory.extras.callbacks - {'loss': 0.4488, 'learning_rate': 2.9685e-05, 'epoch': 1.32, 'throughput': 781.89} |
|
|
|
06/17/2024 20:53:33 - INFO - llamafactory.extras.callbacks - {'loss': 0.4075, 'learning_rate': 2.8652e-05, 'epoch': 1.36, 'throughput': 781.74} |
|
|
|
06/17/2024 20:54:30 - INFO - llamafactory.extras.callbacks - {'loss': 0.4991, 'learning_rate': 2.7613e-05, 'epoch': 1.40, 'throughput': 781.86} |
|
|
|
06/17/2024 20:55:20 - INFO - llamafactory.extras.callbacks - {'loss': 0.4894, 'learning_rate': 2.6570e-05, 'epoch': 1.44, 'throughput': 782.49} |
|
|
|
06/17/2024 20:56:12 - INFO - llamafactory.extras.callbacks - {'loss': 0.4967, 'learning_rate': 2.5524e-05, 'epoch': 1.48, 'throughput': 782.06} |
|
|
|
06/17/2024 20:57:06 - INFO - llamafactory.extras.callbacks - {'loss': 0.5297, 'learning_rate': 2.4476e-05, 'epoch': 1.52, 'throughput': 783.03} |
|
|
|
06/17/2024 20:57:55 - INFO - llamafactory.extras.callbacks - {'loss': 0.3939, 'learning_rate': 2.3430e-05, 'epoch': 1.56, 'throughput': 781.82} |
|
|
|
06/17/2024 20:58:49 - INFO - llamafactory.extras.callbacks - {'loss': 0.4610, 'learning_rate': 2.2387e-05, 'epoch': 1.60, 'throughput': 781.19} |
|
|
|
06/17/2024 20:58:49 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-200 |
|
|
|
06/17/2024 20:58:50 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json |
|
|
|
06/17/2024 20:58:50 - INFO - transformers.configuration_utils - Model config Qwen2Config { |
|
"architectures": [ |
|
"Qwen2ForCausalLM" |
|
], |
|
"attention_dropout": 0.0, |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151645, |
|
"hidden_act": "silu", |
|
"hidden_size": 3584, |
|
"initializer_range": 0.02, |
|
"intermediate_size": 18944, |
|
"max_position_embeddings": 32768, |
|
"max_window_layers": 28, |
|
"model_type": "qwen2", |
|
"num_attention_heads": 28, |
|
"num_hidden_layers": 28, |
|
"num_key_value_heads": 4, |
|
"rms_norm_eps": 1e-06, |
|
"rope_theta": 1000000.0, |
|
"sliding_window": 131072, |
|
"tie_word_embeddings": false, |
|
"torch_dtype": "bfloat16", |
|
"transformers_version": "4.41.2", |
|
"use_cache": true, |
|
"use_sliding_window": false, |
|
"vocab_size": 152064 |
|
} |
|
|
|
|
|
06/17/2024 20:58:50 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-200/tokenizer_config.json |
|
|
|
06/17/2024 20:58:50 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-200/special_tokens_map.json |
|
|
|
06/17/2024 20:59:48 - INFO - llamafactory.extras.callbacks - {'loss': 0.4622, 'learning_rate': 2.1348e-05, 'epoch': 1.64, 'throughput': 779.87} |
|
|
|
06/17/2024 21:00:36 - INFO - llamafactory.extras.callbacks - {'loss': 0.4043, 'learning_rate': 2.0315e-05, 'epoch': 1.68, 'throughput': 779.61} |
|
|
|
06/17/2024 21:01:25 - INFO - llamafactory.extras.callbacks - {'loss': 0.4280, 'learning_rate': 1.9291e-05, 'epoch': 1.72, 'throughput': 779.45} |
|
|
|
06/17/2024 21:02:14 - INFO - llamafactory.extras.callbacks - {'loss': 0.3779, 'learning_rate': 1.8277e-05, 'epoch': 1.76, 'throughput': 778.48} |
|
|
|
06/17/2024 21:03:05 - INFO - llamafactory.extras.callbacks - {'loss': 0.4526, 'learning_rate': 1.7275e-05, 'epoch': 1.80, 'throughput': 779.26} |
|
|
|
06/17/2024 21:03:56 - INFO - llamafactory.extras.callbacks - {'loss': 0.4627, 'learning_rate': 1.6286e-05, 'epoch': 1.84, 'throughput': 779.12} |
|
|
|
06/17/2024 21:04:48 - INFO - llamafactory.extras.callbacks - {'loss': 0.4873, 'learning_rate': 1.5312e-05, 'epoch': 1.88, 'throughput': 779.09} |
|
|
|
06/17/2024 21:05:40 - INFO - llamafactory.extras.callbacks - {'loss': 0.3234, 'learning_rate': 1.4356e-05, 'epoch': 1.92, 'throughput': 780.05} |
|
|
|
06/17/2024 21:06:28 - INFO - llamafactory.extras.callbacks - {'loss': 0.4438, 'learning_rate': 1.3418e-05, 'epoch': 1.96, 'throughput': 780.37} |
|
|
|
06/17/2024 21:07:21 - INFO - llamafactory.extras.callbacks - {'loss': 0.4407, 'learning_rate': 1.2500e-05, 'epoch': 2.00, 'throughput': 779.97} |
|
|
|
06/17/2024 21:08:15 - INFO - llamafactory.extras.callbacks - {'loss': 0.4401, 'learning_rate': 1.1604e-05, 'epoch': 2.04, 'throughput': 779.51} |
|
|
|
06/17/2024 21:09:04 - INFO - llamafactory.extras.callbacks - {'loss': 0.3771, 'learning_rate': 1.0732e-05, 'epoch': 2.08, 'throughput': 780.06} |
|
|
|
06/17/2024 21:09:57 - INFO - llamafactory.extras.callbacks - {'loss': 0.4043, 'learning_rate': 9.8850e-06, 'epoch': 2.12, 'throughput': 781.03} |
|
|
|
06/17/2024 21:10:42 - INFO - llamafactory.extras.callbacks - {'loss': 0.4018, 'learning_rate': 9.0644e-06, 'epoch': 2.16, 'throughput': 781.14} |
|
|
|
06/17/2024 21:11:32 - INFO - llamafactory.extras.callbacks - {'loss': 0.4258, 'learning_rate': 8.2717e-06, 'epoch': 2.20, 'throughput': 781.03} |
|
|
|
06/17/2024 21:12:19 - INFO - llamafactory.extras.callbacks - {'loss': 0.3912, 'learning_rate': 7.5084e-06, 'epoch': 2.24, 'throughput': 780.49} |
|
|
|
06/17/2024 21:13:06 - INFO - llamafactory.extras.callbacks - {'loss': 0.3458, 'learning_rate': 6.7758e-06, 'epoch': 2.28, 'throughput': 780.17} |
|
|
|
06/17/2024 21:13:55 - INFO - llamafactory.extras.callbacks - {'loss': 0.4255, 'learning_rate': 6.0751e-06, 'epoch': 2.32, 'throughput': 780.22} |
|
|
|
06/17/2024 21:14:45 - INFO - llamafactory.extras.callbacks - {'loss': 0.4222, 'learning_rate': 5.4077e-06, 'epoch': 2.36, 'throughput': 780.80} |
|
|
|
06/17/2024 21:15:33 - INFO - llamafactory.extras.callbacks - {'loss': 0.3990, 'learning_rate': 4.7746e-06, 'epoch': 2.40, 'throughput': 780.45} |
|
|
|
06/17/2024 21:15:33 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-300 |
|
|
|
06/17/2024 21:15:34 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json |
|
|
|
06/17/2024 21:15:34 - INFO - transformers.configuration_utils - Model config Qwen2Config { |
|
"architectures": [ |
|
"Qwen2ForCausalLM" |
|
], |
|
"attention_dropout": 0.0, |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151645, |
|
"hidden_act": "silu", |
|
"hidden_size": 3584, |
|
"initializer_range": 0.02, |
|
"intermediate_size": 18944, |
|
"max_position_embeddings": 32768, |
|
"max_window_layers": 28, |
|
"model_type": "qwen2", |
|
"num_attention_heads": 28, |
|
"num_hidden_layers": 28, |
|
"num_key_value_heads": 4, |
|
"rms_norm_eps": 1e-06, |
|
"rope_theta": 1000000.0, |
|
"sliding_window": 131072, |
|
"tie_word_embeddings": false, |
|
"torch_dtype": "bfloat16", |
|
"transformers_version": "4.41.2", |
|
"use_cache": true, |
|
"use_sliding_window": false, |
|
"vocab_size": 152064 |
|
} |
|
|
|
|
|
06/17/2024 21:15:34 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-300/tokenizer_config.json |
|
|
|
06/17/2024 21:15:34 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-300/special_tokens_map.json |
|
|
|
06/17/2024 21:16:26 - INFO - llamafactory.extras.callbacks - {'loss': 0.3382, 'learning_rate': 4.1770e-06, 'epoch': 2.44, 'throughput': 780.00} |
|
|
|
06/17/2024 21:17:20 - INFO - llamafactory.extras.callbacks - {'loss': 0.4465, 'learning_rate': 3.6159e-06, 'epoch': 2.48, 'throughput': 780.28} |
|
|
|
06/17/2024 21:18:13 - INFO - llamafactory.extras.callbacks - {'loss': 0.3250, 'learning_rate': 3.0923e-06, 'epoch': 2.52, 'throughput': 779.93} |
|
|
|
06/17/2024 21:19:04 - INFO - llamafactory.extras.callbacks - {'loss': 0.3920, 'learning_rate': 2.6072e-06, 'epoch': 2.56, 'throughput': 779.59} |
|
|
|
06/17/2024 21:19:53 - INFO - llamafactory.extras.callbacks - {'loss': 0.3672, 'learning_rate': 2.1614e-06, 'epoch': 2.60, 'throughput': 779.34} |
|
|
|
06/17/2024 21:20:43 - INFO - llamafactory.extras.callbacks - {'loss': 0.3554, 'learning_rate': 1.7556e-06, 'epoch': 2.64, 'throughput': 779.07} |
|
|
|
06/17/2024 21:21:27 - INFO - llamafactory.extras.callbacks - {'loss': 0.3801, 'learning_rate': 1.3906e-06, 'epoch': 2.68, 'throughput': 778.76} |
|
|
|
06/17/2024 21:22:20 - INFO - llamafactory.extras.callbacks - {'loss': 0.4350, 'learning_rate': 1.0670e-06, 'epoch': 2.72, 'throughput': 779.56} |
|
|
|
06/17/2024 21:23:15 - INFO - llamafactory.extras.callbacks - {'loss': 0.4063, 'learning_rate': 7.8542e-07, 'epoch': 2.76, 'throughput': 779.43} |
|
|
|
06/17/2024 21:24:15 - INFO - llamafactory.extras.callbacks - {'loss': 0.4894, 'learning_rate': 5.4631e-07, 'epoch': 2.80, 'throughput': 779.76} |
|
|
|
06/17/2024 21:25:05 - INFO - llamafactory.extras.callbacks - {'loss': 0.3822, 'learning_rate': 3.5010e-07, 'epoch': 2.84, 'throughput': 779.50} |
|
|
|
06/17/2024 21:25:57 - INFO - llamafactory.extras.callbacks - {'loss': 0.4028, 'learning_rate': 1.9713e-07, 'epoch': 2.88, 'throughput': 779.46} |
|
|
|
06/17/2024 21:26:50 - INFO - llamafactory.extras.callbacks - {'loss': 0.4293, 'learning_rate': 8.7679e-08, 'epoch': 2.92, 'throughput': 779.71} |
|
|
|
06/17/2024 21:27:36 - INFO - llamafactory.extras.callbacks - {'loss': 0.4280, 'learning_rate': 2.1929e-08, 'epoch': 2.96, 'throughput': 779.90} |
|
|
|
06/17/2024 21:28:29 - INFO - llamafactory.extras.callbacks - {'loss': 0.4766, 'learning_rate': 0.0000e+00, 'epoch': 3.00, 'throughput': 779.83} |
|
|
|
06/17/2024 21:28:29 - INFO - transformers.trainer - |
|
|
|
Training completed. Do not forget to share your model on huggingface.co/models =) |
|
|
|
|
|
|
|
06/17/2024 21:28:29 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05 |
|
|
|
06/17/2024 21:28:30 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json |
|
|
|
06/17/2024 21:28:30 - INFO - transformers.configuration_utils - Model config Qwen2Config { |
|
"architectures": [ |
|
"Qwen2ForCausalLM" |
|
], |
|
"attention_dropout": 0.0, |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151645, |
|
"hidden_act": "silu", |
|
"hidden_size": 3584, |
|
"initializer_range": 0.02, |
|
"intermediate_size": 18944, |
|
"max_position_embeddings": 32768, |
|
"max_window_layers": 28, |
|
"model_type": "qwen2", |
|
"num_attention_heads": 28, |
|
"num_hidden_layers": 28, |
|
"num_key_value_heads": 4, |
|
"rms_norm_eps": 1e-06, |
|
"rope_theta": 1000000.0, |
|
"sliding_window": 131072, |
|
"tie_word_embeddings": false, |
|
"torch_dtype": "bfloat16", |
|
"transformers_version": "4.41.2", |
|
"use_cache": true, |
|
"use_sliding_window": false, |
|
"vocab_size": 152064 |
|
} |
|
|
|
|
|
06/17/2024 21:28:30 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/tokenizer_config.json |
|
|
|
06/17/2024 21:28:30 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/special_tokens_map.json |
|
|
|
06/17/2024 21:28:30 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot. |
|
|
|
06/17/2024 21:28:30 - INFO - transformers.modelcard - Dropping the following result as it does not have all the necessary fields: |
|
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}} |
|
|
|
|