abdiharyadi's picture
Model save
3886181 verified
raw
history blame contribute delete
No virus
40.6 kB
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0it [00:00, ?it/s] 0it [00:00, ?it/s]
/opt/conda/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/opt/conda/lib/python3.10/site-packages/transformers/training_args.py:1525: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of πŸ€— Transformers. Use `eval_strategy` instead
warnings.warn(
/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn(
Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 1000 examples [00:00, 9200.31 examples/s] Generating train split: 2264 examples [00:00, 12772.38 examples/s]
Generating validation split: 0 examples [00:00, ? examples/s] Generating validation split: 30 examples [00:00, 9002.58 examples/s]
Running tokenizer on train dataset: 0%| | 0/2264 [00:00<?, ? examples/s] Running tokenizer on train dataset: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1000/2264 [00:00<00:00, 2116.19 examples/s] Running tokenizer on train dataset: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2000/2264 [00:01<00:00, 1835.22 examples/s] Running tokenizer on train dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2264/2264 [00:01<00:00, 1755.65 examples/s] Running tokenizer on train dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2264/2264 [00:01<00:00, 1810.46 examples/s]
Saving cached train data ...
Saving the dataset (0/1 shards): 0%| | 0/2264 [00:00<?, ? examples/s] Saving the dataset (1/1 shards): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2264/2264 [00:00<00:00, 381407.57 examples/s] Saving the dataset (1/1 shards): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2264/2264 [00:00<00:00, 372140.31 examples/s]
Running tokenizer on validation dataset: 0%| | 0/30 [00:00<?, ? examples/s] Running tokenizer on validation dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:00<00:00, 1441.99 examples/s]
Saving cached validation data ...
Saving the dataset (0/1 shards): 0%| | 0/30 [00:00<?, ? examples/s] Saving the dataset (1/1 shards): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:00<00:00, 6447.82 examples/s] Saving the dataset (1/1 shards): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:00<00:00, 6237.81 examples/s]
WandbCallback activated.
[WARNING|trainer_callback.py:423] 2024-09-27 08:07:19,230 >> You are adding a <class 'transformers.integrations.integration_utils.WandbCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is
:DefaultFlowCallback
WandbCallback
/opt/conda/lib/python3.10/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
All 270 steps, warm_up steps: 200
wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
wandb: Currently logged in as: abdiharyadi. Use `wandb login --relogin` to force relogin
wandb: wandb version 0.18.1 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.17.5
wandb: Run data is saved locally in /kaggle/working/amr-tst-indo/AMRBART-id/fine-tune/wandb/run-20240927_080721-nmb2wrh4
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run /kaggle/working/amr-tst-indo/AMRBART-id/fine-tune/../outputs/mbart-en-id-smaller-fted-fted
wandb: ⭐️ View project at https://wandb.ai/abdiharyadi/amr-tst
wandb: πŸš€ View run at https://wandb.ai/abdiharyadi/amr-tst/runs/nmb2wrh4
0%| | 0/270 [00:00<?, ?it/s]/opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
self.pid = os.fork()
0%| | 1/270 [00:01<05:07, 1.14s/it] {'loss': 5.1609, 'learning_rate': 5e-09, 'epoch': 0.01}
0%| | 1/270 [00:01<05:07, 1.14s/it] 1%| | 2/270 [00:01<03:38, 1.23it/s] 1%| | 3/270 [00:02<03:14, 1.37it/s] 1%|▏ | 4/270 [00:03<03:07, 1.42it/s] 2%|▏ | 5/270 [00:03<03:12, 1.38it/s] 2%|▏ | 6/270 [00:04<03:09, 1.39it/s] 3%|β–Ž | 7/270 [00:05<03:06, 1.41it/s] 3%|β–Ž | 8/270 [00:05<03:01, 1.45it/s] 3%|β–Ž | 9/270 [00:06<03:02, 1.43it/s] 4%|β–Ž | 10/270 [00:07<02:55, 1.48it/s] 4%|▍ | 11/270 [00:07<02:51, 1.51it/s] 4%|▍ | 12/270 [00:08<02:53, 1.49it/s] 5%|▍ | 13/270 [00:09<02:55, 1.46it/s] 5%|β–Œ | 14/270 [00:10<03:04, 1.39it/s] 6%|β–Œ | 15/270 [00:10<03:02, 1.40it/s] 6%|β–Œ | 16/270 [00:11<03:03, 1.39it/s] 6%|β–‹ | 17/270 [00:12<02:57, 1.43it/s] 7%|β–‹ | 18/270 [00:12<02:54, 1.44it/s] 7%|β–‹ | 19/270 [00:13<02:54, 1.44it/s] 7%|β–‹ | 20/270 [00:14<02:50, 1.46it/s] {'loss': 4.6464, 'learning_rate': 1e-07, 'epoch': 0.22}
7%|β–‹ | 20/270 [00:14<02:50, 1.46it/s] 8%|β–Š | 21/270 [00:14<02:51, 1.46it/s] 8%|β–Š | 22/270 [00:15<02:44, 1.51it/s] 9%|β–Š | 23/270 [00:16<02:39, 1.55it/s] 9%|β–‰ | 24/270 [00:16<02:37, 1.56it/s] 9%|β–‰ | 25/270 [00:17<02:42, 1.51it/s] 10%|β–‰ | 26/270 [00:18<02:40, 1.52it/s] 10%|β–ˆ | 27/270 [00:18<02:38, 1.54it/s] 10%|β–ˆ | 28/270 [00:19<02:36, 1.54it/s] 11%|β–ˆ | 29/270 [00:20<02:41, 1.49it/s] 11%|β–ˆ | 30/270 [00:20<02:36, 1.53it/s] 11%|β–ˆβ– | 31/270 [00:21<02:39, 1.50it/s] 12%|β–ˆβ– | 32/270 [00:22<02:40, 1.49it/s] 12%|β–ˆβ– | 33/270 [00:22<02:36, 1.51it/s] 13%|β–ˆβ–Ž | 34/270 [00:23<02:36, 1.51it/s] 13%|β–ˆβ–Ž | 35/270 [00:23<02:34, 1.52it/s] 13%|β–ˆβ–Ž | 36/270 [00:24<02:31, 1.54it/s] 14%|β–ˆβ–Ž | 37/270 [00:25<02:36, 1.49it/s] 14%|β–ˆβ– | 38/270 [00:25<02:31, 1.53it/s] 14%|β–ˆβ– | 39/270 [00:26<02:25, 1.58it/s] 15%|β–ˆβ– | 40/270 [00:27<02:29, 1.54it/s] {'loss': 4.4527, 'learning_rate': 2e-07, 'epoch': 0.44}
15%|β–ˆβ– | 40/270 [00:27<02:29, 1.54it/s] 15%|β–ˆβ–Œ | 41/270 [00:27<02:30, 1.52it/s] 16%|β–ˆβ–Œ | 42/270 [00:28<02:34, 1.47it/s] 16%|β–ˆβ–Œ | 43/270 [00:29<02:35, 1.46it/s] 16%|β–ˆβ–‹ | 44/270 [00:29<02:30, 1.50it/s] 17%|β–ˆβ–‹ | 45/270 [00:30<02:30, 1.49it/s] 17%|β–ˆβ–‹ | 46/270 [00:31<02:24, 1.55it/s] 17%|β–ˆβ–‹ | 47/270 [00:31<02:22, 1.56it/s] 18%|β–ˆβ–Š | 48/270 [00:32<02:22, 1.56it/s] 18%|β–ˆβ–Š | 49/270 [00:33<02:17, 1.61it/s] 19%|β–ˆβ–Š | 50/270 [00:33<02:19, 1.58it/s] 19%|β–ˆβ–‰ | 51/270 [00:34<02:24, 1.52it/s] 19%|β–ˆβ–‰ | 52/270 [00:35<02:21, 1.54it/s] 20%|β–ˆβ–‰ | 53/270 [00:35<02:18, 1.57it/s] 20%|β–ˆβ–ˆ | 54/270 [00:36<02:20, 1.54it/s] 20%|β–ˆβ–ˆ | 55/270 [00:36<02:16, 1.58it/s] 21%|β–ˆβ–ˆ | 56/270 [00:37<02:17, 1.56it/s] 21%|β–ˆβ–ˆ | 57/270 [00:38<02:18, 1.54it/s] 21%|β–ˆβ–ˆβ– | 58/270 [00:38<02:18, 1.53it/s] 22%|β–ˆβ–ˆβ– | 59/270 [00:39<02:19, 1.51it/s] 22%|β–ˆβ–ˆβ– | 60/270 [00:40<02:13, 1.57it/s] {'loss': 3.8262, 'learning_rate': 3e-07, 'epoch': 0.66}
22%|β–ˆβ–ˆβ– | 60/270 [00:40<02:13, 1.57it/s] 23%|β–ˆβ–ˆβ–Ž | 61/270 [00:40<02:16, 1.53it/s] 23%|β–ˆβ–ˆβ–Ž | 62/270 [00:41<02:18, 1.51it/s] 23%|β–ˆβ–ˆβ–Ž | 63/270 [00:42<02:18, 1.50it/s] 24%|β–ˆβ–ˆβ–Ž | 64/270 [00:42<02:12, 1.56it/s] 24%|β–ˆβ–ˆβ– | 65/270 [00:43<02:10, 1.57it/s] 24%|β–ˆβ–ˆβ– | 66/270 [00:44<02:09, 1.57it/s] 25%|β–ˆβ–ˆβ– | 67/270 [00:44<02:12, 1.54it/s] 25%|β–ˆβ–ˆβ–Œ | 68/270 [00:45<02:07, 1.58it/s] 26%|β–ˆβ–ˆβ–Œ | 69/270 [00:46<02:10, 1.54it/s] 26%|β–ˆβ–ˆβ–Œ | 70/270 [00:46<02:11, 1.52it/s] 26%|β–ˆβ–ˆβ–‹ | 71/270 [00:47<02:16, 1.46it/s] 27%|β–ˆβ–ˆβ–‹ | 72/270 [00:48<02:20, 1.41it/s] 27%|β–ˆβ–ˆβ–‹ | 73/270 [00:48<02:14, 1.46it/s] 27%|β–ˆβ–ˆβ–‹ | 74/270 [00:49<02:08, 1.53it/s] 28%|β–ˆβ–ˆβ–Š | 75/270 [00:50<02:04, 1.56it/s] 28%|β–ˆβ–ˆβ–Š | 76/270 [00:50<02:05, 1.55it/s] 29%|β–ˆβ–ˆβ–Š | 77/270 [00:51<02:10, 1.48it/s] 29%|β–ˆβ–ˆβ–‰ | 78/270 [00:52<02:09, 1.48it/s] 29%|β–ˆβ–ˆβ–‰ | 79/270 [00:52<02:05, 1.52it/s] 30%|β–ˆβ–ˆβ–‰ | 80/270 [00:53<02:05, 1.52it/s] {'loss': 3.2887, 'learning_rate': 4e-07, 'epoch': 0.88}
30%|β–ˆβ–ˆβ–‰ | 80/270 [00:53<02:05, 1.52it/s] 30%|β–ˆβ–ˆβ–ˆ | 81/270 [00:54<02:03, 1.54it/s] 30%|β–ˆβ–ˆβ–ˆ | 82/270 [00:54<02:02, 1.53it/s] 31%|β–ˆβ–ˆβ–ˆ | 83/270 [00:55<02:03, 1.52it/s] 31%|β–ˆβ–ˆβ–ˆ | 84/270 [00:56<02:01, 1.54it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 85/270 [00:56<02:01, 1.52it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 86/270 [00:57<01:59, 1.54it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 87/270 [00:58<02:00, 1.51it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 88/270 [00:58<02:04, 1.46it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 89/270 [00:59<01:59, 1.52it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 90/270 [00:59<01:53, 1.59it/s]Generation Kwargs:
{'max_length': 1024, 'max_gen_length': 1024, 'num_beams': 5}
0%| | 0/6 [00:00<?, ?it/s]
33%|β–ˆβ–ˆβ–ˆβ–Ž | 2/6 [00:33<01:07, 16.96s/it]
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3/6 [00:36<00:32, 10.84s/it]
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/6 [01:10<00:39, 19.70s/it]
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 5/6 [01:13<00:13, 13.77s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [01:47<00:00, 20.53s/it]Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
Empty AMR failure!
{'eval_loss': 3.400739908218384, 'eval_smatch': 0.3046, 'eval_gen_len': 210.7333, 'eval_runtime': 149.7975, 'eval_samples_per_second': 0.2, 'eval_steps_per_second': 0.04, 'epoch': 0.99}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 90/270 [03:30<01:53, 1.59it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [01:49<00:00, 20.53s/it]
[WARNING|configuration_utils.py:448] 2024-09-27 08:11:08,058 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2}
/opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
self.pid = os.fork()
34%|β–ˆβ–ˆβ–ˆβ–Ž | 91/270 [03:41<2:26:04, 48.96s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 92/270 [03:42<1:42:17, 34.48s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 93/270 [03:43<1:11:49, 24.35s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 94/270 [03:43<50:34, 17.24s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 95/270 [03:44<35:44, 12.26s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 96/270 [03:44<25:22, 8.75s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 97/270 [03:45<18:16, 6.34s/it] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 98/270 [03:46<13:18, 4.64s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 99/270 [03:46<09:51, 3.46s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 100/270 [03:47<07:28, 2.64s/it] {'loss': 2.9042, 'learning_rate': 5e-07, 'epoch': 1.1}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 100/270 [03:47<07:28, 2.64s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 101/270 [03:48<05:41, 2.02s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 102/270 [03:49<04:34, 1.63s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 103/270 [03:49<03:50, 1.38s/it] 39%|β–ˆβ–ˆβ–ˆβ–Š | 104/270 [03:50<03:17, 1.19s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 105/270 [03:51<02:52, 1.04s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 106/270 [03:51<02:32, 1.08it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 107/270 [03:52<02:25, 1.12it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 108/270 [03:53<02:16, 1.19it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 109/270 [03:54<02:07, 1.27it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 110/270 [03:54<01:59, 1.33it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 111/270 [03:55<01:54, 1.39it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 112/270 [03:56<01:52, 1.40it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 113/270 [03:56<01:49, 1.44it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 114/270 [03:57<01:46, 1.46it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 115/270 [03:58<01:46, 1.45it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 116/270 [03:58<01:47, 1.43it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 117/270 [03:59<01:45, 1.46it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 118/270 [04:00<01:46, 1.42it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 119/270 [04:00<01:46, 1.42it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 120/270 [04:01<01:44, 1.43it/s] {'loss': 1.8419, 'learning_rate': 6e-07, 'epoch': 1.32}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 120/270 [04:01<01:44, 1.43it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 121/270 [04:02<01:43, 1.44it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 122/270 [04:03<01:46, 1.40it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 123/270 [04:03<01:46, 1.38it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 124/270 [04:04<01:40, 1.46it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 125/270 [04:05<01:38, 1.48it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 126/270 [04:05<01:34, 1.52it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 127/270 [04:06<01:34, 1.52it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 128/270 [04:07<01:33, 1.52it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 129/270 [04:07<01:32, 1.53it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 130/270 [04:08<01:35, 1.47it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 131/270 [04:08<01:30, 1.54it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 132/270 [04:09<01:29, 1.55it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 133/270 [04:10<01:31, 1.50it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 134/270 [04:11<01:30, 1.50it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 135/270 [04:11<01:30, 1.48it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 136/270 [04:12<01:30, 1.49it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 137/270 [04:12<01:27, 1.52it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 138/270 [04:13<01:26, 1.53it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 139/270 [04:14<01:24, 1.55it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 140/270 [04:14<01:26, 1.51it/s] {'loss': 1.6323, 'learning_rate': 7e-07, 'epoch': 1.55}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 140/270 [04:14<01:26, 1.51it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 141/270 [04:15<01:30, 1.42it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 142/270 [04:16<01:27, 1.46it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 143/270 [04:17<01:28, 1.43it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 144/270 [04:17<01:27, 1.44it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 145/270 [04:18<01:26, 1.44it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 146/270 [04:19<01:24, 1.46it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 147/270 [04:19<01:22, 1.49it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 148/270 [04:20<01:25, 1.43it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 149/270 [04:21<01:23, 1.46it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 150/270 [04:21<01:21, 1.47it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 151/270 [04:22<01:22, 1.44it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 152/270 [04:23<01:21, 1.44it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 153/270 [04:23<01:19, 1.47it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 154/270 [04:24<01:19, 1.45it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 155/270 [04:25<01:16, 1.50it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 156/270 [04:26<01:20, 1.42it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 157/270 [04:26<01:19, 1.43it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 158/270 [04:27<01:19, 1.40it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 159/270 [04:28<01:15, 1.48it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 160/270 [04:28<01:15, 1.46it/s] {'loss': 1.4964, 'learning_rate': 8e-07, 'epoch': 1.77}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 160/270 [04:28<01:15, 1.46it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 161/270 [04:29<01:13, 1.48it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 162/270 [04:30<01:13, 1.48it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 163/270 [04:30<01:10, 1.52it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 164/270 [04:31<01:11, 1.49it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 165/270 [04:32<01:09, 1.51it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 166/270 [04:32<01:07, 1.53it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 167/270 [04:33<01:05, 1.57it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 168/270 [04:34<01:08, 1.49it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 169/270 [04:34<01:08, 1.47it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 170/270 [04:35<01:10, 1.42it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 171/270 [04:36<01:07, 1.48it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 172/270 [04:36<01:05, 1.49it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 173/270 [04:37<01:02, 1.55it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 174/270 [04:38<01:01, 1.55it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 175/270 [04:38<01:01, 1.55it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 176/270 [04:39<01:01, 1.53it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 177/270 [04:39<01:00, 1.55it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 178/270 [04:40<00:59, 1.55it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 179/270 [04:41<00:58, 1.56it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 180/270 [04:41<00:58, 1.54it/s] {'loss': 1.5144, 'learning_rate': 9e-07, 'epoch': 1.99}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 180/270 [04:41<00:58, 1.54it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 181/270 [04:42<00:58, 1.52it/s]Generation Kwargs:
{'max_length': 1024, 'max_gen_length': 1024, 'num_beams': 5}
0%| | 0/6 [00:00<?, ?it/s]
33%|β–ˆβ–ˆβ–ˆβ–Ž | 2/6 [00:00<00:01, 3.16it/s]
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3/6 [00:01<00:01, 1.95it/s]
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/6 [00:02<00:01, 1.84it/s]
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 5/6 [00:03<00:00, 1.35it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [00:05<00:00, 1.13s/it]
{'eval_loss': 1.8273289203643799, 'eval_smatch': 0.4078, 'eval_gen_len': 29.0, 'eval_runtime': 6.2458, 'eval_samples_per_second': 4.803, 'eval_steps_per_second': 0.961, 'epoch': 2.0}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 181/270 [04:48<00:58, 1.52it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [00:05<00:00, 1.13s/it]
[WARNING|configuration_utils.py:448] 2024-09-27 08:12:26,894 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2}
/opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
self.pid = os.fork()
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 182/270 [04:58<07:42, 5.26s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 183/270 [04:59<05:37, 3.87s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 184/270 [04:59<04:11, 2.93s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 185/270 [05:00<03:12, 2.26s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 186/270 [05:01<02:28, 1.77s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 187/270 [05:01<01:59, 1.44s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 188/270 [05:02<01:39, 1.22s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 189/270 [05:03<01:26, 1.06s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 190/270 [05:04<01:16, 1.05it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 191/270 [05:04<01:09, 1.14it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 192/270 [05:05<01:02, 1.26it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 193/270 [05:06<00:59, 1.28it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 194/270 [05:06<00:57, 1.32it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 195/270 [05:07<00:55, 1.34it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 196/270 [05:08<00:53, 1.38it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 197/270 [05:08<00:53, 1.37it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 198/270 [05:09<00:50, 1.43it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 199/270 [05:10<00:47, 1.48it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 200/270 [05:10<00:49, 1.42it/s] {'loss': 1.4392, 'learning_rate': 1e-06, 'epoch': 2.21}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 200/270 [05:10<00:49, 1.42it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 201/270 [05:11<00:46, 1.47it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 202/270 [05:12<00:45, 1.50it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 203/270 [05:12<00:44, 1.51it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 204/270 [05:13<00:43, 1.52it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 205/270 [05:14<00:43, 1.49it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 206/270 [05:14<00:44, 1.44it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 207/270 [05:15<00:42, 1.49it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 208/270 [05:16<00:43, 1.43it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 209/270 [05:17<00:42, 1.44it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 210/270 [05:17<00:40, 1.49it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 211/270 [05:18<00:39, 1.50it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 212/270 [05:18<00:37, 1.54it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 213/270 [05:19<00:37, 1.51it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 214/270 [05:20<00:37, 1.48it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 215/270 [05:20<00:35, 1.55it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 216/270 [05:21<00:34, 1.55it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 217/270 [05:22<00:33, 1.60it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 218/270 [05:22<00:32, 1.59it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 219/270 [05:23<00:32, 1.57it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 220/270 [05:24<00:31, 1.58it/s] {'loss': 1.4428, 'learning_rate': 7.428571428571427e-07, 'epoch': 2.43}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 220/270 [05:24<00:31, 1.58it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 221/270 [05:24<00:30, 1.58it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 222/270 [05:25<00:29, 1.60it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 223/270 [05:25<00:29, 1.59it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 224/270 [05:26<00:31, 1.48it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 225/270 [05:27<00:29, 1.52it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 226/270 [05:27<00:28, 1.53it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 227/270 [05:28<00:28, 1.52it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 228/270 [05:29<00:27, 1.53it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 229/270 [05:29<00:26, 1.58it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 230/270 [05:30<00:24, 1.62it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 231/270 [05:31<00:24, 1.56it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 232/270 [05:31<00:25, 1.52it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 233/270 [05:32<00:24, 1.51it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 234/270 [05:33<00:24, 1.48it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 235/270 [05:33<00:22, 1.55it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 236/270 [05:34<00:22, 1.50it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 237/270 [05:35<00:21, 1.51it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 238/270 [05:35<00:21, 1.50it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 239/270 [05:36<00:20, 1.51it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 240/270 [05:37<00:20, 1.47it/s] {'loss': 1.377, 'learning_rate': 4.857142857142857e-07, 'epoch': 2.65}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 240/270 [05:37<00:20, 1.47it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 241/270 [05:37<00:19, 1.46it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 242/270 [05:38<00:18, 1.53it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 243/270 [05:39<00:19, 1.41it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 244/270 [05:39<00:18, 1.44it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 245/270 [05:40<00:17, 1.41it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 246/270 [05:41<00:16, 1.47it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 247/270 [05:42<00:15, 1.48it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 248/270 [05:42<00:14, 1.49it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 249/270 [05:43<00:14, 1.47it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 250/270 [05:44<00:13, 1.48it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 251/270 [05:44<00:13, 1.42it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 252/270 [05:45<00:12, 1.48it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 253/270 [05:46<00:12, 1.41it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 254/270 [05:46<00:11, 1.44it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 255/270 [05:47<00:10, 1.48it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 256/270 [05:48<00:09, 1.51it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 257/270 [05:48<00:08, 1.50it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 258/270 [05:49<00:07, 1.50it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 259/270 [05:50<00:07, 1.48it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 260/270 [05:50<00:06, 1.47it/s] {'loss': 1.3575, 'learning_rate': 2.285714285714286e-07, 'epoch': 2.87}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 260/270 [05:50<00:06, 1.47it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 261/270 [05:51<00:06, 1.47it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 262/270 [05:52<00:05, 1.46it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 263/270 [05:52<00:04, 1.42it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 264/270 [05:53<00:04, 1.47it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 265/270 [05:54<00:03, 1.49it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 266/270 [05:54<00:02, 1.48it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 267/270 [05:55<00:02, 1.45it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 268/270 [05:56<00:01, 1.45it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 269/270 [05:56<00:00, 1.53it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 270/270 [05:57<00:00, 1.52it/s][WARNING|configuration_utils.py:448] 2024-09-27 08:13:35,534 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2}
Generation Kwargs:
{'max_length': 1024, 'max_gen_length': 1024, 'num_beams': 5}
/opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
self.pid = os.fork()
0%| | 0/6 [00:00<?, ?it/s]
33%|β–ˆβ–ˆβ–ˆβ–Ž | 2/6 [00:00<00:01, 2.23it/s]
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3/6 [00:01<00:01, 1.54it/s]
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/6 [00:02<00:01, 1.50it/s]
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 5/6 [00:04<00:00, 1.05it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [00:08<00:00, 2.14s/it]
{'eval_loss': 1.7345261573791504, 'eval_smatch': 0.4022, 'eval_gen_len': 33.6, 'eval_runtime': 15.6305, 'eval_samples_per_second': 1.919, 'eval_steps_per_second': 0.384, 'epoch': 2.98}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 270/270 [06:21<00:00, 1.52it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [00:14<00:00, 2.14s/it]
[WARNING|configuration_utils.py:448] 2024-09-27 08:13:59,667 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2}
[WARNING|trainer.py:2764] 2024-09-27 08:14:14,430 >> There were missing keys in the checkpoint model loaded: ['model.encoder.embed_tokens.weight', 'model.decoder.embed_tokens.weight', 'lm_head.weight'].
{'train_runtime': 413.6641, 'train_samples_per_second': 16.419, 'train_steps_per_second': 0.653, 'train_loss': 2.365596493968257, 'epoch': 2.98}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 270/270 [06:36<00:00, 1.52it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 270/270 [06:36<00:00, 1.47s/it]
[WARNING|configuration_utils.py:448] 2024-09-27 08:15:07,015 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2}
[WARNING|configuration_utils.py:448] 2024-09-27 08:15:11,181 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2}
model.safetensors: 0%| | 0.00/1.58G [00:00<?, ?B/s] model.safetensors: 0%| | 4.15M/1.58G [00:00<04:15, 6.15MB/s] model.safetensors: 0%| | 4.85M/1.58G [00:00<04:25, 5.92MB/s] model.safetensors: 0%| | 6.28M/1.58G [00:00<03:29, 7.47MB/s] model.safetensors: 1%| | 8.00M/1.58G [00:01<02:44, 9.54MB/s] model.safetensors: 1%| | 10.2M/1.58G [00:01<02:05, 12.5MB/s] model.safetensors: 1%| | 15.6M/1.58G [00:01<01:23, 18.7MB/s] model.safetensors: 1%| | 17.6M/1.58G [00:01<02:58, 8.72MB/s] model.safetensors: 1%|▏ | 23.5M/1.58G [00:02<03:13, 8.01MB/s] model.safetensors: 2%|▏ | 25.2M/1.58G [00:02<03:12, 8.04MB/s] model.safetensors: 2%|▏ | 27.6M/1.58G [00:03<02:40, 9.67MB/s] model.safetensors: 2%|▏ | 32.0M/1.58G [00:03<03:33, 7.24MB/s] model.safetensors: 3%|β–Ž | 40.0M/1.58G [00:04<02:45, 9.27MB/s] model.safetensors: 3%|β–Ž | 43.7M/1.58G [00:04<02:25, 10.6MB/s] model.safetensors: 3%|β–Ž | 48.0M/1.58G [00:05<03:08, 8.09MB/s] model.safetensors: 4%|▍ | 64.0M/1.58G [00:06<01:56, 13.0MB/s] model.safetensors: 5%|▍ | 72.0M/1.58G [00:07<02:11, 11.4MB/s] model.safetensors: 5%|▍ | 74.2M/1.58G [00:07<02:11, 11.4MB/s] model.safetensors: 5%|β–Œ | 79.3M/1.58G [00:07<01:44, 14.3MB/s] model.safetensors: 5%|β–Œ | 81.7M/1.58G [00:08<02:30, 9.94MB/s] model.safetensors: 6%|β–Œ | 87.8M/1.58G [00:09<02:53, 8.58MB/s] model.safetensors: 6%|β–Œ | 90.2M/1.58G [00:09<02:44, 9.02MB/s] model.safetensors: 6%|β–Œ | 93.8M/1.58G [00:09<02:12, 11.2MB/s] model.safetensors: 6%|β–Œ | 96.0M/1.58G [00:10<03:17, 7.48MB/s] model.safetensors: 7%|β–‹ | 104M/1.58G [00:10<03:01, 8.11MB/s] model.safetensors: 7%|β–‹ | 106M/1.58G [00:11<02:52, 8.53MB/s] model.safetensors: 7%|β–‹ | 112M/1.58G [00:11<02:00, 12.2MB/s] model.safetensors: 7%|β–‹ | 114M/1.58G [00:11<02:59, 8.13MB/s] model.safetensors: 8%|β–Š | 128M/1.58G [00:12<01:41, 14.2MB/s] model.safetensors: 9%|β–Š | 136M/1.58G [00:13<02:01, 11.8MB/s] model.safetensors: 9%|β–‰ | 138M/1.58G [00:13<02:01, 11.8MB/s] model.safetensors: 9%|β–‰ | 142M/1.58G [00:13<01:45, 13.6MB/s] model.safetensors: 9%|β–‰ | 144M/1.58G [00:14<02:50, 8.39MB/s] model.safetensors: 10%|β–ˆ | 160M/1.58G [00:14<01:31, 15.5MB/s] model.safetensors: 11%|β–ˆ | 176M/1.58G [00:15<01:10, 20.0MB/s] model.safetensors: 12%|β–ˆβ– | 192M/1.58G [00:16<01:04, 21.6MB/s] model.safetensors: 13%|β–ˆβ–Ž | 208M/1.58G [00:16<01:03, 21.4MB/s] model.safetensors: 14%|β–ˆβ– | 224M/1.58G [00:17<00:53, 25.1MB/s] model.safetensors: 15%|β–ˆβ–Œ | 240M/1.58G [00:17<00:50, 26.5MB/s] model.safetensors: 16%|β–ˆβ–‹ | 256M/1.58G [00:18<00:48, 27.4MB/s] model.safetensors: 17%|β–ˆβ–‹ | 272M/1.58G [00:18<00:46, 27.8MB/s] model.safetensors: 18%|β–ˆβ–Š | 288M/1.58G [00:19<00:45, 28.0MB/s] model.safetensors: 19%|β–ˆβ–‰ | 304M/1.58G [00:19<00:42, 30.2MB/s] model.safetensors: 20%|β–ˆβ–ˆ | 320M/1.58G [00:20<00:46, 26.8MB/s] model.safetensors: 21%|β–ˆβ–ˆβ– | 336M/1.58G [00:21<00:43, 28.3MB/s] model.safetensors: 22%|β–ˆβ–ˆβ– | 352M/1.58G [00:21<00:42, 28.6MB/s] model.safetensors: 23%|β–ˆβ–ˆβ–Ž | 368M/1.58G [00:22<00:43, 27.6MB/s] model.safetensors: 24%|β–ˆβ–ˆβ– | 384M/1.58G [00:22<00:43, 27.5MB/s] model.safetensors: 25%|β–ˆβ–ˆβ–Œ | 400M/1.58G [00:23<00:40, 28.9MB/s] model.safetensors: 26%|β–ˆβ–ˆβ–‹ | 416M/1.58G [00:23<00:38, 30.1MB/s] model.safetensors: 27%|β–ˆβ–ˆβ–‹ | 432M/1.58G [00:24<00:37, 30.5MB/s] model.safetensors: 28%|β–ˆβ–ˆβ–Š | 448M/1.58G [00:24<00:34, 32.6MB/s] model.safetensors: 29%|β–ˆβ–ˆβ–‰ | 464M/1.58G [00:25<00:33, 33.1MB/s] model.safetensors: 30%|β–ˆβ–ˆβ–ˆ | 480M/1.58G [00:26<00:37, 29.0MB/s] model.safetensors: 31%|β–ˆβ–ˆβ–ˆβ– | 496M/1.58G [00:26<00:35, 30.4MB/s] model.safetensors: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 512M/1.58G [00:27<00:35, 30.2MB/s] model.safetensors: 34%|β–ˆβ–ˆβ–ˆβ–Ž | 528M/1.58G [00:27<00:33, 31.7MB/s] model.safetensors: 35%|β–ˆβ–ˆβ–ˆβ– | 544M/1.58G [00:28<00:39, 26.3MB/s] model.safetensors: 36%|β–ˆβ–ˆβ–ˆβ–Œ | 560M/1.58G [00:28<00:35, 28.2MB/s] model.safetensors: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 576M/1.58G [00:29<00:35, 28.1MB/s] model.safetensors: 38%|β–ˆβ–ˆβ–ˆβ–Š | 592M/1.58G [00:29<00:33, 29.6MB/s] model.safetensors: 39%|β–ˆβ–ˆβ–ˆβ–Š | 608M/1.58G [00:30<00:31, 30.9MB/s] model.safetensors: 40%|β–ˆβ–ˆβ–ˆβ–‰ | 624M/1.58G [00:30<00:29, 31.8MB/s] model.safetensors: 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 640M/1.58G [00:31<00:32, 28.8MB/s] model.safetensors: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 656M/1.58G [00:31<00:29, 30.9MB/s] model.safetensors: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 672M/1.58G [00:32<00:30, 30.0MB/s] model.safetensors: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 688M/1.58G [00:32<00:28, 31.0MB/s] model.safetensors: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 704M/1.58G [00:34<00:51, 17.1MB/s] model.safetensors: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 720M/1.58G [00:35<00:42, 20.2MB/s] model.safetensors: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 736M/1.58G [00:35<00:36, 23.2MB/s] model.safetensors: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 752M/1.58G [00:36<00:32, 25.2MB/s] model.safetensors: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 768M/1.58G [00:36<00:28, 28.0MB/s] model.safetensors: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 784M/1.58G [00:37<00:27, 28.5MB/s] model.safetensors: 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 800M/1.58G [00:37<00:26, 29.8MB/s] model.safetensors: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 816M/1.58G [00:38<00:23, 32.2MB/s] model.safetensors: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 832M/1.58G [00:38<00:24, 30.5MB/s] model.safetensors: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 848M/1.58G [00:39<00:24, 29.3MB/s] model.safetensors: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 864M/1.58G [00:39<00:22, 31.8MB/s] model.safetensors: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 880M/1.58G [00:40<00:21, 32.4MB/s] model.safetensors: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 896M/1.58G [00:40<00:22, 30.6MB/s] model.safetensors: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 912M/1.58G [00:41<00:20, 31.8MB/s] model.safetensors: 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 928M/1.58G [00:41<00:20, 30.9MB/s] model.safetensors: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 944M/1.58G [00:42<00:21, 30.0MB/s] model.safetensors: 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 960M/1.58G [00:43<00:21, 28.2MB/s] model.safetensors: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 976M/1.58G [00:43<00:19, 30.5MB/s] model.safetensors: 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 992M/1.58G [00:44<00:23, 25.2MB/s] model.safetensors: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.01G/1.58G [00:44<00:21, 27.0MB/s] model.safetensors: 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.02G/1.58G [00:45<00:20, 27.2MB/s] model.safetensors: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.04G/1.58G [00:45<00:18, 29.6MB/s] model.safetensors: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.06G/1.58G [00:46<00:16, 31.0MB/s] model.safetensors: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.07G/1.58G [00:46<00:15, 32.4MB/s] model.safetensors: 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.09G/1.58G [00:47<00:15, 32.4MB/s] model.safetensors: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.10G/1.58G [00:47<00:15, 30.4MB/s] model.safetensors: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.12G/1.58G [00:48<00:14, 31.4MB/s] model.safetensors: 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.14G/1.58G [00:48<00:13, 31.5MB/s] model.safetensors: 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.15G/1.58G [00:49<00:16, 26.2MB/s] model.safetensors: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.17G/1.58G [00:50<00:15, 26.5MB/s] model.safetensors: 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.18G/1.58G [00:50<00:13, 29.6MB/s] model.safetensors: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.20G/1.58G [00:51<00:12, 30.0MB/s] model.safetensors: 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.22G/1.58G [00:51<00:11, 31.9MB/s] model.safetensors: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.23G/1.58G [00:52<00:10, 33.2MB/s] model.safetensors: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.25G/1.58G [00:52<00:09, 33.2MB/s] model.safetensors: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.26G/1.58G [00:52<00:09, 33.3MB/s] model.safetensors: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.28G/1.58G [00:53<00:08, 33.2MB/s] model.safetensors: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.30G/1.58G [00:54<00:08, 31.1MB/s] model.safetensors: 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.31G/1.58G [00:54<00:08, 31.3MB/s] model.safetensors: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.33G/1.58G [00:55<00:07, 32.0MB/s] model.safetensors: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.34G/1.58G [00:55<00:07, 32.3MB/s] model.safetensors: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.36G/1.58G [00:56<00:06, 32.1MB/s] model.safetensors: 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.38G/1.58G [00:56<00:06, 32.5MB/s] model.safetensors: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.39G/1.58G [00:57<00:05, 31.6MB/s] model.safetensors: 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.41G/1.58G [00:57<00:05, 32.5MB/s] model.safetensors: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.42G/1.58G [00:58<00:04, 32.2MB/s] model.safetensors: 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1.44G/1.58G [00:58<00:04, 33.0MB/s] model.safetensors: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1.46G/1.58G [00:58<00:03, 32.7MB/s] model.safetensors: 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1.47G/1.58G [00:59<00:03, 32.4MB/s] model.safetensors: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1.49G/1.58G [01:00<00:02, 29.8MB/s] model.safetensors: 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1.50G/1.58G [01:00<00:02, 29.9MB/s] model.safetensors: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1.52G/1.58G [01:01<00:01, 28.4MB/s] model.safetensors: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1.54G/1.58G [01:01<00:01, 28.7MB/s] model.safetensors: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1.55G/1.58G [01:02<00:00, 31.1MB/s] model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1.57G/1.58G [01:02<00:00, 32.6MB/s] model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.58G/1.58G [01:03<00:00, 24.9MB/s]