Model save

3886181 verified 2 days ago

No virus

40.6 kB

	The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
	0it [00:00, ?it/s] 0it [00:00, ?it/s]
	/opt/conda/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
	warnings.warn(
	/opt/conda/lib/python3.10/site-packages/transformers/training_args.py:1525: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead
	warnings.warn(
	/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
	warnings.warn(
	Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 1000 examples [00:00, 9200.31 examples/s] Generating train split: 2264 examples [00:00, 12772.38 examples/s]
	Generating validation split: 0 examples [00:00, ? examples/s] Generating validation split: 30 examples [00:00, 9002.58 examples/s]
	Running tokenizer on train dataset: 0%\| \| 0/2264 [00:00<?, ? examples/s] Running tokenizer on train dataset: 44%\|████▍ \| 1000/2264 [00:00<00:00, 2116.19 examples/s] Running tokenizer on train dataset: 88%\|████████▊ \| 2000/2264 [00:01<00:00, 1835.22 examples/s] Running tokenizer on train dataset: 100%\|██████████\| 2264/2264 [00:01<00:00, 1755.65 examples/s] Running tokenizer on train dataset: 100%\|██████████\| 2264/2264 [00:01<00:00, 1810.46 examples/s]
	Saving cached train data ...
	Saving the dataset (0/1 shards): 0%\| \| 0/2264 [00:00<?, ? examples/s] Saving the dataset (1/1 shards): 100%\|██████████\| 2264/2264 [00:00<00:00, 381407.57 examples/s] Saving the dataset (1/1 shards): 100%\|██████████\| 2264/2264 [00:00<00:00, 372140.31 examples/s]
	Running tokenizer on validation dataset: 0%\| \| 0/30 [00:00<?, ? examples/s] Running tokenizer on validation dataset: 100%\|██████████\| 30/30 [00:00<00:00, 1441.99 examples/s]
	Saving cached validation data ...
	Saving the dataset (0/1 shards): 0%\| \| 0/30 [00:00<?, ? examples/s] Saving the dataset (1/1 shards): 100%\|██████████\| 30/30 [00:00<00:00, 6447.82 examples/s] Saving the dataset (1/1 shards): 100%\|██████████\| 30/30 [00:00<00:00, 6237.81 examples/s]
	WandbCallback activated.
	[WARNING\|trainer_callback.py:423] 2024-09-27 08:07:19,230 >> You are adding a <class 'transformers.integrations.integration_utils.WandbCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is
	:DefaultFlowCallback
	WandbCallback
	/opt/conda/lib/python3.10/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
	warnings.warn(
	All 270 steps, warm_up steps: 200
	wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
	wandb: Currently logged in as: abdiharyadi. Use `wandb login --relogin` to force relogin
	wandb: wandb version 0.18.1 is available! To upgrade, please run:
	wandb: $ pip install wandb --upgrade
	wandb: Tracking run with wandb version 0.17.5
	wandb: Run data is saved locally in /kaggle/working/amr-tst-indo/AMRBART-id/fine-tune/wandb/run-20240927_080721-nmb2wrh4
	wandb: Run `wandb offline` to turn off syncing.
	wandb: Syncing run /kaggle/working/amr-tst-indo/AMRBART-id/fine-tune/../outputs/mbart-en-id-smaller-fted-fted
	wandb: ⭐️ View project at https://wandb.ai/abdiharyadi/amr-tst
	wandb: 🚀 View run at https://wandb.ai/abdiharyadi/amr-tst/runs/nmb2wrh4
	0%\| \| 0/270 [00:00<?, ?it/s]/opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
	self.pid = os.fork()
	0%\| \| 1/270 [00:01<05:07, 1.14s/it] {'loss': 5.1609, 'learning_rate': 5e-09, 'epoch': 0.01}
	0%\| \| 1/270 [00:01<05:07, 1.14s/it] 1%\| \| 2/270 [00:01<03:38, 1.23it/s] 1%\| \| 3/270 [00:02<03:14, 1.37it/s] 1%\|▏ \| 4/270 [00:03<03:07, 1.42it/s] 2%\|▏ \| 5/270 [00:03<03:12, 1.38it/s] 2%\|▏ \| 6/270 [00:04<03:09, 1.39it/s] 3%\|▎ \| 7/270 [00:05<03:06, 1.41it/s] 3%\|▎ \| 8/270 [00:05<03:01, 1.45it/s] 3%\|▎ \| 9/270 [00:06<03:02, 1.43it/s] 4%\|▎ \| 10/270 [00:07<02:55, 1.48it/s] 4%\|▍ \| 11/270 [00:07<02:51, 1.51it/s] 4%\|▍ \| 12/270 [00:08<02:53, 1.49it/s] 5%\|▍ \| 13/270 [00:09<02:55, 1.46it/s] 5%\|▌ \| 14/270 [00:10<03:04, 1.39it/s] 6%\|▌ \| 15/270 [00:10<03:02, 1.40it/s] 6%\|▌ \| 16/270 [00:11<03:03, 1.39it/s] 6%\|▋ \| 17/270 [00:12<02:57, 1.43it/s] 7%\|▋ \| 18/270 [00:12<02:54, 1.44it/s] 7%\|▋ \| 19/270 [00:13<02:54, 1.44it/s] 7%\|▋ \| 20/270 [00:14<02:50, 1.46it/s] {'loss': 4.6464, 'learning_rate': 1e-07, 'epoch': 0.22}
	7%\|▋ \| 20/270 [00:14<02:50, 1.46it/s] 8%\|▊ \| 21/270 [00:14<02:51, 1.46it/s] 8%\|▊ \| 22/270 [00:15<02:44, 1.51it/s] 9%\|▊ \| 23/270 [00:16<02:39, 1.55it/s] 9%\|▉ \| 24/270 [00:16<02:37, 1.56it/s] 9%\|▉ \| 25/270 [00:17<02:42, 1.51it/s] 10%\|▉ \| 26/270 [00:18<02:40, 1.52it/s] 10%\|█ \| 27/270 [00:18<02:38, 1.54it/s] 10%\|█ \| 28/270 [00:19<02:36, 1.54it/s] 11%\|█ \| 29/270 [00:20<02:41, 1.49it/s] 11%\|█ \| 30/270 [00:20<02:36, 1.53it/s] 11%\|█▏ \| 31/270 [00:21<02:39, 1.50it/s] 12%\|█▏ \| 32/270 [00:22<02:40, 1.49it/s] 12%\|█▏ \| 33/270 [00:22<02:36, 1.51it/s] 13%\|█▎ \| 34/270 [00:23<02:36, 1.51it/s] 13%\|█▎ \| 35/270 [00:23<02:34, 1.52it/s] 13%\|█▎ \| 36/270 [00:24<02:31, 1.54it/s] 14%\|█▎ \| 37/270 [00:25<02:36, 1.49it/s] 14%\|█▍ \| 38/270 [00:25<02:31, 1.53it/s] 14%\|█▍ \| 39/270 [00:26<02:25, 1.58it/s] 15%\|█▍ \| 40/270 [00:27<02:29, 1.54it/s] {'loss': 4.4527, 'learning_rate': 2e-07, 'epoch': 0.44}
	15%\|█▍ \| 40/270 [00:27<02:29, 1.54it/s] 15%\|█▌ \| 41/270 [00:27<02:30, 1.52it/s] 16%\|█▌ \| 42/270 [00:28<02:34, 1.47it/s] 16%\|█▌ \| 43/270 [00:29<02:35, 1.46it/s] 16%\|█▋ \| 44/270 [00:29<02:30, 1.50it/s] 17%\|█▋ \| 45/270 [00:30<02:30, 1.49it/s] 17%\|█▋ \| 46/270 [00:31<02:24, 1.55it/s] 17%\|█▋ \| 47/270 [00:31<02:22, 1.56it/s] 18%\|█▊ \| 48/270 [00:32<02:22, 1.56it/s] 18%\|█▊ \| 49/270 [00:33<02:17, 1.61it/s] 19%\|█▊ \| 50/270 [00:33<02:19, 1.58it/s] 19%\|█▉ \| 51/270 [00:34<02:24, 1.52it/s] 19%\|█▉ \| 52/270 [00:35<02:21, 1.54it/s] 20%\|█▉ \| 53/270 [00:35<02:18, 1.57it/s] 20%\|██ \| 54/270 [00:36<02:20, 1.54it/s] 20%\|██ \| 55/270 [00:36<02:16, 1.58it/s] 21%\|██ \| 56/270 [00:37<02:17, 1.56it/s] 21%\|██ \| 57/270 [00:38<02:18, 1.54it/s] 21%\|██▏ \| 58/270 [00:38<02:18, 1.53it/s] 22%\|██▏ \| 59/270 [00:39<02:19, 1.51it/s] 22%\|██▏ \| 60/270 [00:40<02:13, 1.57it/s] {'loss': 3.8262, 'learning_rate': 3e-07, 'epoch': 0.66}
	22%\|██▏ \| 60/270 [00:40<02:13, 1.57it/s] 23%\|██▎ \| 61/270 [00:40<02:16, 1.53it/s] 23%\|██▎ \| 62/270 [00:41<02:18, 1.51it/s] 23%\|██▎ \| 63/270 [00:42<02:18, 1.50it/s] 24%\|██▎ \| 64/270 [00:42<02:12, 1.56it/s] 24%\|██▍ \| 65/270 [00:43<02:10, 1.57it/s] 24%\|██▍ \| 66/270 [00:44<02:09, 1.57it/s] 25%\|██▍ \| 67/270 [00:44<02:12, 1.54it/s] 25%\|██▌ \| 68/270 [00:45<02:07, 1.58it/s] 26%\|██▌ \| 69/270 [00:46<02:10, 1.54it/s] 26%\|██▌ \| 70/270 [00:46<02:11, 1.52it/s] 26%\|██▋ \| 71/270 [00:47<02:16, 1.46it/s] 27%\|██▋ \| 72/270 [00:48<02:20, 1.41it/s] 27%\|██▋ \| 73/270 [00:48<02:14, 1.46it/s] 27%\|██▋ \| 74/270 [00:49<02:08, 1.53it/s] 28%\|██▊ \| 75/270 [00:50<02:04, 1.56it/s] 28%\|██▊ \| 76/270 [00:50<02:05, 1.55it/s] 29%\|██▊ \| 77/270 [00:51<02:10, 1.48it/s] 29%\|██▉ \| 78/270 [00:52<02:09, 1.48it/s] 29%\|██▉ \| 79/270 [00:52<02:05, 1.52it/s] 30%\|██▉ \| 80/270 [00:53<02:05, 1.52it/s] {'loss': 3.2887, 'learning_rate': 4e-07, 'epoch': 0.88}
	30%\|██▉ \| 80/270 [00:53<02:05, 1.52it/s] 30%\|███ \| 81/270 [00:54<02:03, 1.54it/s] 30%\|███ \| 82/270 [00:54<02:02, 1.53it/s] 31%\|███ \| 83/270 [00:55<02:03, 1.52it/s] 31%\|███ \| 84/270 [00:56<02:01, 1.54it/s] 31%\|███▏ \| 85/270 [00:56<02:01, 1.52it/s] 32%\|███▏ \| 86/270 [00:57<01:59, 1.54it/s] 32%\|███▏ \| 87/270 [00:58<02:00, 1.51it/s] 33%\|███▎ \| 88/270 [00:58<02:04, 1.46it/s] 33%\|███▎ \| 89/270 [00:59<01:59, 1.52it/s] 33%\|███▎ \| 90/270 [00:59<01:53, 1.59it/s]Generation Kwargs:
	{'max_length': 1024, 'max_gen_length': 1024, 'num_beams': 5}

	0%\| \| 0/6 [00:00<?, ?it/s][A
	33%\|███▎ \| 2/6 [00:33<01:07, 16.96s/it][A
	50%\|█████ \| 3/6 [00:36<00:32, 10.84s/it][A
	67%\|██████▋ \| 4/6 [01:10<00:39, 19.70s/it][A
	83%\|████████▎ \| 5/6 [01:13<00:13, 13.77s/it][A
	100%\|██████████\| 6/6 [01:47<00:00, 20.53s/it][AEmpty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!
	Empty AMR failure!

	[A{'eval_loss': 3.400739908218384, 'eval_smatch': 0.3046, 'eval_gen_len': 210.7333, 'eval_runtime': 149.7975, 'eval_samples_per_second': 0.2, 'eval_steps_per_second': 0.04, 'epoch': 0.99}
	33%\|███▎ \| 90/270 [03:30<01:53, 1.59it/s]
	100%\|██████████\| 6/6 [01:49<00:00, 20.53s/it][A
	[A[WARNING\|configuration_utils.py:448] 2024-09-27 08:11:08,058 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
	Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2}
	/opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
	self.pid = os.fork()
	34%\|███▎ \| 91/270 [03:41<2:26:04, 48.96s/it] 34%\|███▍ \| 92/270 [03:42<1:42:17, 34.48s/it] 34%\|███▍ \| 93/270 [03:43<1:11:49, 24.35s/it] 35%\|███▍ \| 94/270 [03:43<50:34, 17.24s/it] 35%\|███▌ \| 95/270 [03:44<35:44, 12.26s/it] 36%\|███▌ \| 96/270 [03:44<25:22, 8.75s/it] 36%\|███▌ \| 97/270 [03:45<18:16, 6.34s/it] 36%\|███▋ \| 98/270 [03:46<13:18, 4.64s/it] 37%\|███▋ \| 99/270 [03:46<09:51, 3.46s/it] 37%\|███▋ \| 100/270 [03:47<07:28, 2.64s/it] {'loss': 2.9042, 'learning_rate': 5e-07, 'epoch': 1.1}
	37%\|███▋ \| 100/270 [03:47<07:28, 2.64s/it] 37%\|███▋ \| 101/270 [03:48<05:41, 2.02s/it] 38%\|███▊ \| 102/270 [03:49<04:34, 1.63s/it] 38%\|███▊ \| 103/270 [03:49<03:50, 1.38s/it] 39%\|███▊ \| 104/270 [03:50<03:17, 1.19s/it] 39%\|███▉ \| 105/270 [03:51<02:52, 1.04s/it] 39%\|███▉ \| 106/270 [03:51<02:32, 1.08it/s] 40%\|███▉ \| 107/270 [03:52<02:25, 1.12it/s] 40%\|████ \| 108/270 [03:53<02:16, 1.19it/s] 40%\|████ \| 109/270 [03:54<02:07, 1.27it/s] 41%\|████ \| 110/270 [03:54<01:59, 1.33it/s] 41%\|████ \| 111/270 [03:55<01:54, 1.39it/s] 41%\|████▏ \| 112/270 [03:56<01:52, 1.40it/s] 42%\|████▏ \| 113/270 [03:56<01:49, 1.44it/s] 42%\|████▏ \| 114/270 [03:57<01:46, 1.46it/s] 43%\|████▎ \| 115/270 [03:58<01:46, 1.45it/s] 43%\|████▎ \| 116/270 [03:58<01:47, 1.43it/s] 43%\|████▎ \| 117/270 [03:59<01:45, 1.46it/s] 44%\|████▎ \| 118/270 [04:00<01:46, 1.42it/s] 44%\|████▍ \| 119/270 [04:00<01:46, 1.42it/s] 44%\|████▍ \| 120/270 [04:01<01:44, 1.43it/s] {'loss': 1.8419, 'learning_rate': 6e-07, 'epoch': 1.32}
	44%\|████▍ \| 120/270 [04:01<01:44, 1.43it/s] 45%\|████▍ \| 121/270 [04:02<01:43, 1.44it/s] 45%\|████▌ \| 122/270 [04:03<01:46, 1.40it/s] 46%\|████▌ \| 123/270 [04:03<01:46, 1.38it/s] 46%\|████▌ \| 124/270 [04:04<01:40, 1.46it/s] 46%\|████▋ \| 125/270 [04:05<01:38, 1.48it/s] 47%\|████▋ \| 126/270 [04:05<01:34, 1.52it/s] 47%\|████▋ \| 127/270 [04:06<01:34, 1.52it/s] 47%\|████▋ \| 128/270 [04:07<01:33, 1.52it/s] 48%\|████▊ \| 129/270 [04:07<01:32, 1.53it/s] 48%\|████▊ \| 130/270 [04:08<01:35, 1.47it/s] 49%\|████▊ \| 131/270 [04:08<01:30, 1.54it/s] 49%\|████▉ \| 132/270 [04:09<01:29, 1.55it/s] 49%\|████▉ \| 133/270 [04:10<01:31, 1.50it/s] 50%\|████▉ \| 134/270 [04:11<01:30, 1.50it/s] 50%\|█████ \| 135/270 [04:11<01:30, 1.48it/s] 50%\|█████ \| 136/270 [04:12<01:30, 1.49it/s] 51%\|█████ \| 137/270 [04:12<01:27, 1.52it/s] 51%\|█████ \| 138/270 [04:13<01:26, 1.53it/s] 51%\|█████▏ \| 139/270 [04:14<01:24, 1.55it/s] 52%\|█████▏ \| 140/270 [04:14<01:26, 1.51it/s] {'loss': 1.6323, 'learning_rate': 7e-07, 'epoch': 1.55}
	52%\|█████▏ \| 140/270 [04:14<01:26, 1.51it/s] 52%\|█████▏ \| 141/270 [04:15<01:30, 1.42it/s] 53%\|█████▎ \| 142/270 [04:16<01:27, 1.46it/s] 53%\|█████▎ \| 143/270 [04:17<01:28, 1.43it/s] 53%\|█████▎ \| 144/270 [04:17<01:27, 1.44it/s] 54%\|█████▎ \| 145/270 [04:18<01:26, 1.44it/s] 54%\|█████▍ \| 146/270 [04:19<01:24, 1.46it/s] 54%\|█████▍ \| 147/270 [04:19<01:22, 1.49it/s] 55%\|█████▍ \| 148/270 [04:20<01:25, 1.43it/s] 55%\|█████▌ \| 149/270 [04:21<01:23, 1.46it/s] 56%\|█████▌ \| 150/270 [04:21<01:21, 1.47it/s] 56%\|█████▌ \| 151/270 [04:22<01:22, 1.44it/s] 56%\|█████▋ \| 152/270 [04:23<01:21, 1.44it/s] 57%\|█████▋ \| 153/270 [04:23<01:19, 1.47it/s] 57%\|█████▋ \| 154/270 [04:24<01:19, 1.45it/s] 57%\|█████▋ \| 155/270 [04:25<01:16, 1.50it/s] 58%\|█████▊ \| 156/270 [04:26<01:20, 1.42it/s] 58%\|█████▊ \| 157/270 [04:26<01:19, 1.43it/s] 59%\|█████▊ \| 158/270 [04:27<01:19, 1.40it/s] 59%\|█████▉ \| 159/270 [04:28<01:15, 1.48it/s] 59%\|█████▉ \| 160/270 [04:28<01:15, 1.46it/s] {'loss': 1.4964, 'learning_rate': 8e-07, 'epoch': 1.77}
	59%\|█████▉ \| 160/270 [04:28<01:15, 1.46it/s] 60%\|█████▉ \| 161/270 [04:29<01:13, 1.48it/s] 60%\|██████ \| 162/270 [04:30<01:13, 1.48it/s] 60%\|██████ \| 163/270 [04:30<01:10, 1.52it/s] 61%\|██████ \| 164/270 [04:31<01:11, 1.49it/s] 61%\|██████ \| 165/270 [04:32<01:09, 1.51it/s] 61%\|██████▏ \| 166/270 [04:32<01:07, 1.53it/s] 62%\|██████▏ \| 167/270 [04:33<01:05, 1.57it/s] 62%\|██████▏ \| 168/270 [04:34<01:08, 1.49it/s] 63%\|██████▎ \| 169/270 [04:34<01:08, 1.47it/s] 63%\|██████▎ \| 170/270 [04:35<01:10, 1.42it/s] 63%\|██████▎ \| 171/270 [04:36<01:07, 1.48it/s] 64%\|██████▎ \| 172/270 [04:36<01:05, 1.49it/s] 64%\|██████▍ \| 173/270 [04:37<01:02, 1.55it/s] 64%\|██████▍ \| 174/270 [04:38<01:01, 1.55it/s] 65%\|██████▍ \| 175/270 [04:38<01:01, 1.55it/s] 65%\|██████▌ \| 176/270 [04:39<01:01, 1.53it/s] 66%\|██████▌ \| 177/270 [04:39<01:00, 1.55it/s] 66%\|██████▌ \| 178/270 [04:40<00:59, 1.55it/s] 66%\|██████▋ \| 179/270 [04:41<00:58, 1.56it/s] 67%\|██████▋ \| 180/270 [04:41<00:58, 1.54it/s] {'loss': 1.5144, 'learning_rate': 9e-07, 'epoch': 1.99}
	67%\|██████▋ \| 180/270 [04:41<00:58, 1.54it/s] 67%\|██████▋ \| 181/270 [04:42<00:58, 1.52it/s]Generation Kwargs:
	{'max_length': 1024, 'max_gen_length': 1024, 'num_beams': 5}

	0%\| \| 0/6 [00:00<?, ?it/s][A
	33%\|███▎ \| 2/6 [00:00<00:01, 3.16it/s][A
	50%\|█████ \| 3/6 [00:01<00:01, 1.95it/s][A
	67%\|██████▋ \| 4/6 [00:02<00:01, 1.84it/s][A
	83%\|████████▎ \| 5/6 [00:03<00:00, 1.35it/s][A
	100%\|██████████\| 6/6 [00:05<00:00, 1.13s/it][A
	[A{'eval_loss': 1.8273289203643799, 'eval_smatch': 0.4078, 'eval_gen_len': 29.0, 'eval_runtime': 6.2458, 'eval_samples_per_second': 4.803, 'eval_steps_per_second': 0.961, 'epoch': 2.0}
	67%\|██████▋ \| 181/270 [04:48<00:58, 1.52it/s]
	100%\|██████████\| 6/6 [00:05<00:00, 1.13s/it][A
	[A[WARNING\|configuration_utils.py:448] 2024-09-27 08:12:26,894 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
	Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2}
	/opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
	self.pid = os.fork()
	67%\|██████▋ \| 182/270 [04:58<07:42, 5.26s/it] 68%\|██████▊ \| 183/270 [04:59<05:37, 3.87s/it] 68%\|██████▊ \| 184/270 [04:59<04:11, 2.93s/it] 69%\|██████▊ \| 185/270 [05:00<03:12, 2.26s/it] 69%\|██████▉ \| 186/270 [05:01<02:28, 1.77s/it] 69%\|██████▉ \| 187/270 [05:01<01:59, 1.44s/it] 70%\|██████▉ \| 188/270 [05:02<01:39, 1.22s/it] 70%\|███████ \| 189/270 [05:03<01:26, 1.06s/it] 70%\|███████ \| 190/270 [05:04<01:16, 1.05it/s] 71%\|███████ \| 191/270 [05:04<01:09, 1.14it/s] 71%\|███████ \| 192/270 [05:05<01:02, 1.26it/s] 71%\|███████▏ \| 193/270 [05:06<00:59, 1.28it/s] 72%\|███████▏ \| 194/270 [05:06<00:57, 1.32it/s] 72%\|███████▏ \| 195/270 [05:07<00:55, 1.34it/s] 73%\|███████▎ \| 196/270 [05:08<00:53, 1.38it/s] 73%\|███████▎ \| 197/270 [05:08<00:53, 1.37it/s] 73%\|███████▎ \| 198/270 [05:09<00:50, 1.43it/s] 74%\|███████▎ \| 199/270 [05:10<00:47, 1.48it/s] 74%\|███████▍ \| 200/270 [05:10<00:49, 1.42it/s] {'loss': 1.4392, 'learning_rate': 1e-06, 'epoch': 2.21}
	74%\|███████▍ \| 200/270 [05:10<00:49, 1.42it/s] 74%\|███████▍ \| 201/270 [05:11<00:46, 1.47it/s] 75%\|███████▍ \| 202/270 [05:12<00:45, 1.50it/s] 75%\|███████▌ \| 203/270 [05:12<00:44, 1.51it/s] 76%\|███████▌ \| 204/270 [05:13<00:43, 1.52it/s] 76%\|███████▌ \| 205/270 [05:14<00:43, 1.49it/s] 76%\|███████▋ \| 206/270 [05:14<00:44, 1.44it/s] 77%\|███████▋ \| 207/270 [05:15<00:42, 1.49it/s] 77%\|███████▋ \| 208/270 [05:16<00:43, 1.43it/s] 77%\|███████▋ \| 209/270 [05:17<00:42, 1.44it/s] 78%\|███████▊ \| 210/270 [05:17<00:40, 1.49it/s] 78%\|███████▊ \| 211/270 [05:18<00:39, 1.50it/s] 79%\|███████▊ \| 212/270 [05:18<00:37, 1.54it/s] 79%\|███████▉ \| 213/270 [05:19<00:37, 1.51it/s] 79%\|███████▉ \| 214/270 [05:20<00:37, 1.48it/s] 80%\|███████▉ \| 215/270 [05:20<00:35, 1.55it/s] 80%\|████████ \| 216/270 [05:21<00:34, 1.55it/s] 80%\|████████ \| 217/270 [05:22<00:33, 1.60it/s] 81%\|████████ \| 218/270 [05:22<00:32, 1.59it/s] 81%\|████████ \| 219/270 [05:23<00:32, 1.57it/s] 81%\|████████▏ \| 220/270 [05:24<00:31, 1.58it/s] {'loss': 1.4428, 'learning_rate': 7.428571428571427e-07, 'epoch': 2.43}
	81%\|████████▏ \| 220/270 [05:24<00:31, 1.58it/s] 82%\|████████▏ \| 221/270 [05:24<00:30, 1.58it/s] 82%\|████████▏ \| 222/270 [05:25<00:29, 1.60it/s] 83%\|████████▎ \| 223/270 [05:25<00:29, 1.59it/s] 83%\|████████▎ \| 224/270 [05:26<00:31, 1.48it/s] 83%\|████████▎ \| 225/270 [05:27<00:29, 1.52it/s] 84%\|████████▎ \| 226/270 [05:27<00:28, 1.53it/s] 84%\|████████▍ \| 227/270 [05:28<00:28, 1.52it/s] 84%\|████████▍ \| 228/270 [05:29<00:27, 1.53it/s] 85%\|████████▍ \| 229/270 [05:29<00:26, 1.58it/s] 85%\|████████▌ \| 230/270 [05:30<00:24, 1.62it/s] 86%\|████████▌ \| 231/270 [05:31<00:24, 1.56it/s] 86%\|████████▌ \| 232/270 [05:31<00:25, 1.52it/s] 86%\|████████▋ \| 233/270 [05:32<00:24, 1.51it/s] 87%\|████████▋ \| 234/270 [05:33<00:24, 1.48it/s] 87%\|████████▋ \| 235/270 [05:33<00:22, 1.55it/s] 87%\|████████▋ \| 236/270 [05:34<00:22, 1.50it/s] 88%\|████████▊ \| 237/270 [05:35<00:21, 1.51it/s] 88%\|████████▊ \| 238/270 [05:35<00:21, 1.50it/s] 89%\|████████▊ \| 239/270 [05:36<00:20, 1.51it/s] 89%\|████████▉ \| 240/270 [05:37<00:20, 1.47it/s] {'loss': 1.377, 'learning_rate': 4.857142857142857e-07, 'epoch': 2.65}
	89%\|████████▉ \| 240/270 [05:37<00:20, 1.47it/s] 89%\|████████▉ \| 241/270 [05:37<00:19, 1.46it/s] 90%\|████████▉ \| 242/270 [05:38<00:18, 1.53it/s] 90%\|█████████ \| 243/270 [05:39<00:19, 1.41it/s] 90%\|█████████ \| 244/270 [05:39<00:18, 1.44it/s] 91%\|█████████ \| 245/270 [05:40<00:17, 1.41it/s] 91%\|█████████ \| 246/270 [05:41<00:16, 1.47it/s] 91%\|█████████▏\| 247/270 [05:42<00:15, 1.48it/s] 92%\|█████████▏\| 248/270 [05:42<00:14, 1.49it/s] 92%\|█████████▏\| 249/270 [05:43<00:14, 1.47it/s] 93%\|█████████▎\| 250/270 [05:44<00:13, 1.48it/s] 93%\|█████████▎\| 251/270 [05:44<00:13, 1.42it/s] 93%\|█████████▎\| 252/270 [05:45<00:12, 1.48it/s] 94%\|█████████▎\| 253/270 [05:46<00:12, 1.41it/s] 94%\|█████████▍\| 254/270 [05:46<00:11, 1.44it/s] 94%\|█████████▍\| 255/270 [05:47<00:10, 1.48it/s] 95%\|█████████▍\| 256/270 [05:48<00:09, 1.51it/s] 95%\|█████████▌\| 257/270 [05:48<00:08, 1.50it/s] 96%\|█████████▌\| 258/270 [05:49<00:07, 1.50it/s] 96%\|█████████▌\| 259/270 [05:50<00:07, 1.48it/s] 96%\|█████████▋\| 260/270 [05:50<00:06, 1.47it/s] {'loss': 1.3575, 'learning_rate': 2.285714285714286e-07, 'epoch': 2.87}
	96%\|█████████▋\| 260/270 [05:50<00:06, 1.47it/s] 97%\|█████████▋\| 261/270 [05:51<00:06, 1.47it/s] 97%\|█████████▋\| 262/270 [05:52<00:05, 1.46it/s] 97%\|█████████▋\| 263/270 [05:52<00:04, 1.42it/s] 98%\|█████████▊\| 264/270 [05:53<00:04, 1.47it/s] 98%\|█████████▊\| 265/270 [05:54<00:03, 1.49it/s] 99%\|█████████▊\| 266/270 [05:54<00:02, 1.48it/s] 99%\|█████████▉\| 267/270 [05:55<00:02, 1.45it/s] 99%\|█████████▉\| 268/270 [05:56<00:01, 1.45it/s] 100%\|█████████▉\| 269/270 [05:56<00:00, 1.53it/s] 100%\|██████████\| 270/270 [05:57<00:00, 1.52it/s][WARNING\|configuration_utils.py:448] 2024-09-27 08:13:35,534 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
	Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2}
	Generation Kwargs:
	{'max_length': 1024, 'max_gen_length': 1024, 'num_beams': 5}
	/opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
	self.pid = os.fork()

	0%\| \| 0/6 [00:00<?, ?it/s][A
	33%\|███▎ \| 2/6 [00:00<00:01, 2.23it/s][A
	50%\|█████ \| 3/6 [00:01<00:01, 1.54it/s][A
	67%\|██████▋ \| 4/6 [00:02<00:01, 1.50it/s][A
	83%\|████████▎ \| 5/6 [00:04<00:00, 1.05it/s][A
	100%\|██████████\| 6/6 [00:08<00:00, 2.14s/it][A
	[A{'eval_loss': 1.7345261573791504, 'eval_smatch': 0.4022, 'eval_gen_len': 33.6, 'eval_runtime': 15.6305, 'eval_samples_per_second': 1.919, 'eval_steps_per_second': 0.384, 'epoch': 2.98}
	100%\|██████████\| 270/270 [06:21<00:00, 1.52it/s]
	100%\|██████████\| 6/6 [00:14<00:00, 2.14s/it][A
	[A[WARNING\|configuration_utils.py:448] 2024-09-27 08:13:59,667 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
	Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2}
	[WARNING\|trainer.py:2764] 2024-09-27 08:14:14,430 >> There were missing keys in the checkpoint model loaded: ['model.encoder.embed_tokens.weight', 'model.decoder.embed_tokens.weight', 'lm_head.weight'].
	{'train_runtime': 413.6641, 'train_samples_per_second': 16.419, 'train_steps_per_second': 0.653, 'train_loss': 2.365596493968257, 'epoch': 2.98}
	100%\|██████████\| 270/270 [06:36<00:00, 1.52it/s] 100%\|██████████\| 270/270 [06:36<00:00, 1.47s/it]
	[WARNING\|configuration_utils.py:448] 2024-09-27 08:15:07,015 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
	Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2}
	[WARNING\|configuration_utils.py:448] 2024-09-27 08:15:11,181 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
	Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2}
	model.safetensors: 0%\| \| 0.00/1.58G [00:00<?, ?B/s] model.safetensors: 0%\| \| 4.15M/1.58G [00:00<04:15, 6.15MB/s] model.safetensors: 0%\| \| 4.85M/1.58G [00:00<04:25, 5.92MB/s] model.safetensors: 0%\| \| 6.28M/1.58G [00:00<03:29, 7.47MB/s] model.safetensors: 1%\| \| 8.00M/1.58G [00:01<02:44, 9.54MB/s] model.safetensors: 1%\| \| 10.2M/1.58G [00:01<02:05, 12.5MB/s] model.safetensors: 1%\| \| 15.6M/1.58G [00:01<01:23, 18.7MB/s] model.safetensors: 1%\| \| 17.6M/1.58G [00:01<02:58, 8.72MB/s] model.safetensors: 1%\|▏ \| 23.5M/1.58G [00:02<03:13, 8.01MB/s] model.safetensors: 2%\|▏ \| 25.2M/1.58G [00:02<03:12, 8.04MB/s] model.safetensors: 2%\|▏ \| 27.6M/1.58G [00:03<02:40, 9.67MB/s] model.safetensors: 2%\|▏ \| 32.0M/1.58G [00:03<03:33, 7.24MB/s] model.safetensors: 3%\|▎ \| 40.0M/1.58G [00:04<02:45, 9.27MB/s] model.safetensors: 3%\|▎ \| 43.7M/1.58G [00:04<02:25, 10.6MB/s] model.safetensors: 3%\|▎ \| 48.0M/1.58G [00:05<03:08, 8.09MB/s] model.safetensors: 4%\|▍ \| 64.0M/1.58G [00:06<01:56, 13.0MB/s] model.safetensors: 5%\|▍ \| 72.0M/1.58G [00:07<02:11, 11.4MB/s] model.safetensors: 5%\|▍ \| 74.2M/1.58G [00:07<02:11, 11.4MB/s] model.safetensors: 5%\|▌ \| 79.3M/1.58G [00:07<01:44, 14.3MB/s] model.safetensors: 5%\|▌ \| 81.7M/1.58G [00:08<02:30, 9.94MB/s] model.safetensors: 6%\|▌ \| 87.8M/1.58G [00:09<02:53, 8.58MB/s] model.safetensors: 6%\|▌ \| 90.2M/1.58G [00:09<02:44, 9.02MB/s] model.safetensors: 6%\|▌ \| 93.8M/1.58G [00:09<02:12, 11.2MB/s] model.safetensors: 6%\|▌ \| 96.0M/1.58G [00:10<03:17, 7.48MB/s] model.safetensors: 7%\|▋ \| 104M/1.58G [00:10<03:01, 8.11MB/s] model.safetensors: 7%\|▋ \| 106M/1.58G [00:11<02:52, 8.53MB/s] model.safetensors: 7%\|▋ \| 112M/1.58G [00:11<02:00, 12.2MB/s] model.safetensors: 7%\|▋ \| 114M/1.58G [00:11<02:59, 8.13MB/s] model.safetensors: 8%\|▊ \| 128M/1.58G [00:12<01:41, 14.2MB/s] model.safetensors: 9%\|▊ \| 136M/1.58G [00:13<02:01, 11.8MB/s] model.safetensors: 9%\|▉ \| 138M/1.58G [00:13<02:01, 11.8MB/s] model.safetensors: 9%\|▉ \| 142M/1.58G [00:13<01:45, 13.6MB/s] model.safetensors: 9%\|▉ \| 144M/1.58G [00:14<02:50, 8.39MB/s] model.safetensors: 10%\|█ \| 160M/1.58G [00:14<01:31, 15.5MB/s] model.safetensors: 11%\|█ \| 176M/1.58G [00:15<01:10, 20.0MB/s] model.safetensors: 12%\|█▏ \| 192M/1.58G [00:16<01:04, 21.6MB/s] model.safetensors: 13%\|█▎ \| 208M/1.58G [00:16<01:03, 21.4MB/s] model.safetensors: 14%\|█▍ \| 224M/1.58G [00:17<00:53, 25.1MB/s] model.safetensors: 15%\|█▌ \| 240M/1.58G [00:17<00:50, 26.5MB/s] model.safetensors: 16%\|█▋ \| 256M/1.58G [00:18<00:48, 27.4MB/s] model.safetensors: 17%\|█▋ \| 272M/1.58G [00:18<00:46, 27.8MB/s] model.safetensors: 18%\|█▊ \| 288M/1.58G [00:19<00:45, 28.0MB/s] model.safetensors: 19%\|█▉ \| 304M/1.58G [00:19<00:42, 30.2MB/s] model.safetensors: 20%\|██ \| 320M/1.58G [00:20<00:46, 26.8MB/s] model.safetensors: 21%\|██▏ \| 336M/1.58G [00:21<00:43, 28.3MB/s] model.safetensors: 22%\|██▏ \| 352M/1.58G [00:21<00:42, 28.6MB/s] model.safetensors: 23%\|██▎ \| 368M/1.58G [00:22<00:43, 27.6MB/s] model.safetensors: 24%\|██▍ \| 384M/1.58G [00:22<00:43, 27.5MB/s] model.safetensors: 25%\|██▌ \| 400M/1.58G [00:23<00:40, 28.9MB/s] model.safetensors: 26%\|██▋ \| 416M/1.58G [00:23<00:38, 30.1MB/s] model.safetensors: 27%\|██▋ \| 432M/1.58G [00:24<00:37, 30.5MB/s] model.safetensors: 28%\|██▊ \| 448M/1.58G [00:24<00:34, 32.6MB/s] model.safetensors: 29%\|██▉ \| 464M/1.58G [00:25<00:33, 33.1MB/s] model.safetensors: 30%\|███ \| 480M/1.58G [00:26<00:37, 29.0MB/s] model.safetensors: 31%\|███▏ \| 496M/1.58G [00:26<00:35, 30.4MB/s] model.safetensors: 33%\|███▎ \| 512M/1.58G [00:27<00:35, 30.2MB/s] model.safetensors: 34%\|███▎ \| 528M/1.58G [00:27<00:33, 31.7MB/s] model.safetensors: 35%\|███▍ \| 544M/1.58G [00:28<00:39, 26.3MB/s] model.safetensors: 36%\|███▌ \| 560M/1.58G [00:28<00:35, 28.2MB/s] model.safetensors: 37%\|███▋ \| 576M/1.58G [00:29<00:35, 28.1MB/s] model.safetensors: 38%\|███▊ \| 592M/1.58G [00:29<00:33, 29.6MB/s] model.safetensors: 39%\|███▊ \| 608M/1.58G [00:30<00:31, 30.9MB/s] model.safetensors: 40%\|███▉ \| 624M/1.58G [00:30<00:29, 31.8MB/s] model.safetensors: 41%\|████ \| 640M/1.58G [00:31<00:32, 28.8MB/s] model.safetensors: 42%\|████▏ \| 656M/1.58G [00:31<00:29, 30.9MB/s] model.safetensors: 43%\|████▎ \| 672M/1.58G [00:32<00:30, 30.0MB/s] model.safetensors: 44%\|████▎ \| 688M/1.58G [00:32<00:28, 31.0MB/s] model.safetensors: 45%\|████▍ \| 704M/1.58G [00:34<00:51, 17.1MB/s] model.safetensors: 46%\|████▌ \| 720M/1.58G [00:35<00:42, 20.2MB/s] model.safetensors: 47%\|████▋ \| 736M/1.58G [00:35<00:36, 23.2MB/s] model.safetensors: 48%\|████▊ \| 752M/1.58G [00:36<00:32, 25.2MB/s] model.safetensors: 49%\|████▉ \| 768M/1.58G [00:36<00:28, 28.0MB/s] model.safetensors: 50%\|████▉ \| 784M/1.58G [00:37<00:27, 28.5MB/s] model.safetensors: 51%\|█████ \| 800M/1.58G [00:37<00:26, 29.8MB/s] model.safetensors: 52%\|█████▏ \| 816M/1.58G [00:38<00:23, 32.2MB/s] model.safetensors: 53%\|█████▎ \| 832M/1.58G [00:38<00:24, 30.5MB/s] model.safetensors: 54%\|█████▍ \| 848M/1.58G [00:39<00:24, 29.3MB/s] model.safetensors: 55%\|█████▍ \| 864M/1.58G [00:39<00:22, 31.8MB/s] model.safetensors: 56%\|█████▌ \| 880M/1.58G [00:40<00:21, 32.4MB/s] model.safetensors: 57%\|█████▋ \| 896M/1.58G [00:40<00:22, 30.6MB/s] model.safetensors: 58%\|█████▊ \| 912M/1.58G [00:41<00:20, 31.8MB/s] model.safetensors: 59%\|█████▉ \| 928M/1.58G [00:41<00:20, 30.9MB/s] model.safetensors: 60%\|█████▉ \| 944M/1.58G [00:42<00:21, 30.0MB/s] model.safetensors: 61%\|██████ \| 960M/1.58G [00:43<00:21, 28.2MB/s] model.safetensors: 62%\|██████▏ \| 976M/1.58G [00:43<00:19, 30.5MB/s] model.safetensors: 63%\|██████▎ \| 992M/1.58G [00:44<00:23, 25.2MB/s] model.safetensors: 64%\|██████▍ \| 1.01G/1.58G [00:44<00:21, 27.0MB/s] model.safetensors: 65%\|██████▌ \| 1.02G/1.58G [00:45<00:20, 27.2MB/s] model.safetensors: 66%\|██████▌ \| 1.04G/1.58G [00:45<00:18, 29.6MB/s] model.safetensors: 67%\|██████▋ \| 1.06G/1.58G [00:46<00:16, 31.0MB/s] model.safetensors: 68%\|██████▊ \| 1.07G/1.58G [00:46<00:15, 32.4MB/s] model.safetensors: 69%\|██████▉ \| 1.09G/1.58G [00:47<00:15, 32.4MB/s] model.safetensors: 70%\|███████ \| 1.10G/1.58G [00:47<00:15, 30.4MB/s] model.safetensors: 71%\|███████ \| 1.12G/1.58G [00:48<00:14, 31.4MB/s] model.safetensors: 72%\|███████▏ \| 1.14G/1.58G [00:48<00:13, 31.5MB/s] model.safetensors: 73%\|███████▎ \| 1.15G/1.58G [00:49<00:16, 26.2MB/s] model.safetensors: 74%\|███████▍ \| 1.17G/1.58G [00:50<00:15, 26.5MB/s] model.safetensors: 75%\|███████▌ \| 1.18G/1.58G [00:50<00:13, 29.6MB/s] model.safetensors: 76%\|███████▌ \| 1.20G/1.58G [00:51<00:12, 30.0MB/s] model.safetensors: 77%\|███████▋ \| 1.22G/1.58G [00:51<00:11, 31.9MB/s] model.safetensors: 78%\|███████▊ \| 1.23G/1.58G [00:52<00:10, 33.2MB/s] model.safetensors: 79%\|███████▉ \| 1.25G/1.58G [00:52<00:09, 33.2MB/s] model.safetensors: 80%\|████████ \| 1.26G/1.58G [00:52<00:09, 33.3MB/s] model.safetensors: 81%\|████████▏ \| 1.28G/1.58G [00:53<00:08, 33.2MB/s] model.safetensors: 82%\|████████▏ \| 1.30G/1.58G [00:54<00:08, 31.1MB/s] model.safetensors: 83%\|████████▎ \| 1.31G/1.58G [00:54<00:08, 31.3MB/s] model.safetensors: 84%\|████████▍ \| 1.33G/1.58G [00:55<00:07, 32.0MB/s] model.safetensors: 85%\|████████▌ \| 1.34G/1.58G [00:55<00:07, 32.3MB/s] model.safetensors: 86%\|████████▋ \| 1.36G/1.58G [00:56<00:06, 32.1MB/s] model.safetensors: 87%\|████████▋ \| 1.38G/1.58G [00:56<00:06, 32.5MB/s] model.safetensors: 88%\|████████▊ \| 1.39G/1.58G [00:57<00:05, 31.6MB/s] model.safetensors: 89%\|████████▉ \| 1.41G/1.58G [00:57<00:05, 32.5MB/s] model.safetensors: 90%\|█████████ \| 1.42G/1.58G [00:58<00:04, 32.2MB/s] model.safetensors: 91%\|█████████▏\| 1.44G/1.58G [00:58<00:04, 33.0MB/s] model.safetensors: 92%\|█████████▏\| 1.46G/1.58G [00:58<00:03, 32.7MB/s] model.safetensors: 93%\|█████████▎\| 1.47G/1.58G [00:59<00:03, 32.4MB/s] model.safetensors: 94%\|█████████▍\| 1.49G/1.58G [01:00<00:02, 29.8MB/s] model.safetensors: 95%\|█████████▌\| 1.50G/1.58G [01:00<00:02, 29.9MB/s] model.safetensors: 96%\|█████████▋\| 1.52G/1.58G [01:01<00:01, 28.4MB/s] model.safetensors: 98%\|█████████▊\| 1.54G/1.58G [01:01<00:01, 28.7MB/s] model.safetensors: 99%\|█████████▊\| 1.55G/1.58G [01:02<00:00, 31.1MB/s] model.safetensors: 100%\|█████████▉\| 1.57G/1.58G [01:02<00:00, 32.6MB/s] model.safetensors: 100%\|██████████\| 1.58G/1.58G [01:03<00:00, 24.9MB/s]