|
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`. |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
/opt/conda/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations |
|
warnings.warn( |
|
/opt/conda/lib/python3.10/site-packages/transformers/training_args.py:1525: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of π€ Transformers. Use `eval_strategy` instead |
|
warnings.warn( |
|
/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 |
|
warnings.warn( |
|
Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 1000 examples [00:00, 9200.31 examples/s]
Generating train split: 2264 examples [00:00, 12772.38 examples/s] |
|
Generating validation split: 0 examples [00:00, ? examples/s]
Generating validation split: 30 examples [00:00, 9002.58 examples/s] |
|
Running tokenizer on train dataset: 0%| | 0/2264 [00:00<?, ? examples/s]
Running tokenizer on train dataset: 44%|βββββ | 1000/2264 [00:00<00:00, 2116.19 examples/s]
Running tokenizer on train dataset: 88%|βββββββββ | 2000/2264 [00:01<00:00, 1835.22 examples/s]
Running tokenizer on train dataset: 100%|ββββββββββ| 2264/2264 [00:01<00:00, 1755.65 examples/s]
Running tokenizer on train dataset: 100%|ββββββββββ| 2264/2264 [00:01<00:00, 1810.46 examples/s] |
|
Saving cached train data ... |
|
Saving the dataset (0/1 shards): 0%| | 0/2264 [00:00<?, ? examples/s]
Saving the dataset (1/1 shards): 100%|ββββββββββ| 2264/2264 [00:00<00:00, 381407.57 examples/s]
Saving the dataset (1/1 shards): 100%|ββββββββββ| 2264/2264 [00:00<00:00, 372140.31 examples/s] |
|
Running tokenizer on validation dataset: 0%| | 0/30 [00:00<?, ? examples/s]
Running tokenizer on validation dataset: 100%|ββββββββββ| 30/30 [00:00<00:00, 1441.99 examples/s] |
|
Saving cached validation data ... |
|
Saving the dataset (0/1 shards): 0%| | 0/30 [00:00<?, ? examples/s]
Saving the dataset (1/1 shards): 100%|ββββββββββ| 30/30 [00:00<00:00, 6447.82 examples/s]
Saving the dataset (1/1 shards): 100%|ββββββββββ| 30/30 [00:00<00:00, 6237.81 examples/s] |
|
WandbCallback activated. |
|
[WARNING|trainer_callback.py:423] 2024-09-27 08:07:19,230 >> You are adding a <class 'transformers.integrations.integration_utils.WandbCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is |
|
:DefaultFlowCallback |
|
WandbCallback |
|
/opt/conda/lib/python3.10/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning |
|
warnings.warn( |
|
All 270 steps, warm_up steps: 200 |
|
wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter. |
|
wandb: Currently logged in as: abdiharyadi. Use `wandb login --relogin` to force relogin |
|
wandb: wandb version 0.18.1 is available! To upgrade, please run: |
|
wandb: $ pip install wandb --upgrade |
|
wandb: Tracking run with wandb version 0.17.5 |
|
wandb: Run data is saved locally in /kaggle/working/amr-tst-indo/AMRBART-id/fine-tune/wandb/run-20240927_080721-nmb2wrh4 |
|
wandb: Run `wandb offline` to turn off syncing. |
|
wandb: Syncing run /kaggle/working/amr-tst-indo/AMRBART-id/fine-tune/../outputs/mbart-en-id-smaller-fted-fted |
|
wandb: βοΈ View project at https://wandb.ai/abdiharyadi/amr-tst |
|
wandb: π View run at https://wandb.ai/abdiharyadi/amr-tst/runs/nmb2wrh4 |
|
0%| | 0/270 [00:00<?, ?it/s]/opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock. |
|
self.pid = os.fork() |
|
0%| | 1/270 [00:01<05:07, 1.14s/it]
{'loss': 5.1609, 'learning_rate': 5e-09, 'epoch': 0.01} |
|
0%| | 1/270 [00:01<05:07, 1.14s/it]
1%| | 2/270 [00:01<03:38, 1.23it/s]
1%| | 3/270 [00:02<03:14, 1.37it/s]
1%|β | 4/270 [00:03<03:07, 1.42it/s]
2%|β | 5/270 [00:03<03:12, 1.38it/s]
2%|β | 6/270 [00:04<03:09, 1.39it/s]
3%|β | 7/270 [00:05<03:06, 1.41it/s]
3%|β | 8/270 [00:05<03:01, 1.45it/s]
3%|β | 9/270 [00:06<03:02, 1.43it/s]
4%|β | 10/270 [00:07<02:55, 1.48it/s]
4%|β | 11/270 [00:07<02:51, 1.51it/s]
4%|β | 12/270 [00:08<02:53, 1.49it/s]
5%|β | 13/270 [00:09<02:55, 1.46it/s]
5%|β | 14/270 [00:10<03:04, 1.39it/s]
6%|β | 15/270 [00:10<03:02, 1.40it/s]
6%|β | 16/270 [00:11<03:03, 1.39it/s]
6%|β | 17/270 [00:12<02:57, 1.43it/s]
7%|β | 18/270 [00:12<02:54, 1.44it/s]
7%|β | 19/270 [00:13<02:54, 1.44it/s]
7%|β | 20/270 [00:14<02:50, 1.46it/s]
{'loss': 4.6464, 'learning_rate': 1e-07, 'epoch': 0.22} |
|
7%|β | 20/270 [00:14<02:50, 1.46it/s]
8%|β | 21/270 [00:14<02:51, 1.46it/s]
8%|β | 22/270 [00:15<02:44, 1.51it/s]
9%|β | 23/270 [00:16<02:39, 1.55it/s]
9%|β | 24/270 [00:16<02:37, 1.56it/s]
9%|β | 25/270 [00:17<02:42, 1.51it/s]
10%|β | 26/270 [00:18<02:40, 1.52it/s]
10%|β | 27/270 [00:18<02:38, 1.54it/s]
10%|β | 28/270 [00:19<02:36, 1.54it/s]
11%|β | 29/270 [00:20<02:41, 1.49it/s]
11%|β | 30/270 [00:20<02:36, 1.53it/s]
11%|ββ | 31/270 [00:21<02:39, 1.50it/s]
12%|ββ | 32/270 [00:22<02:40, 1.49it/s]
12%|ββ | 33/270 [00:22<02:36, 1.51it/s]
13%|ββ | 34/270 [00:23<02:36, 1.51it/s]
13%|ββ | 35/270 [00:23<02:34, 1.52it/s]
13%|ββ | 36/270 [00:24<02:31, 1.54it/s]
14%|ββ | 37/270 [00:25<02:36, 1.49it/s]
14%|ββ | 38/270 [00:25<02:31, 1.53it/s]
14%|ββ | 39/270 [00:26<02:25, 1.58it/s]
15%|ββ | 40/270 [00:27<02:29, 1.54it/s]
{'loss': 4.4527, 'learning_rate': 2e-07, 'epoch': 0.44} |
|
15%|ββ | 40/270 [00:27<02:29, 1.54it/s]
15%|ββ | 41/270 [00:27<02:30, 1.52it/s]
16%|ββ | 42/270 [00:28<02:34, 1.47it/s]
16%|ββ | 43/270 [00:29<02:35, 1.46it/s]
16%|ββ | 44/270 [00:29<02:30, 1.50it/s]
17%|ββ | 45/270 [00:30<02:30, 1.49it/s]
17%|ββ | 46/270 [00:31<02:24, 1.55it/s]
17%|ββ | 47/270 [00:31<02:22, 1.56it/s]
18%|ββ | 48/270 [00:32<02:22, 1.56it/s]
18%|ββ | 49/270 [00:33<02:17, 1.61it/s]
19%|ββ | 50/270 [00:33<02:19, 1.58it/s]
19%|ββ | 51/270 [00:34<02:24, 1.52it/s]
19%|ββ | 52/270 [00:35<02:21, 1.54it/s]
20%|ββ | 53/270 [00:35<02:18, 1.57it/s]
20%|ββ | 54/270 [00:36<02:20, 1.54it/s]
20%|ββ | 55/270 [00:36<02:16, 1.58it/s]
21%|ββ | 56/270 [00:37<02:17, 1.56it/s]
21%|ββ | 57/270 [00:38<02:18, 1.54it/s]
21%|βββ | 58/270 [00:38<02:18, 1.53it/s]
22%|βββ | 59/270 [00:39<02:19, 1.51it/s]
22%|βββ | 60/270 [00:40<02:13, 1.57it/s]
{'loss': 3.8262, 'learning_rate': 3e-07, 'epoch': 0.66} |
|
22%|βββ | 60/270 [00:40<02:13, 1.57it/s]
23%|βββ | 61/270 [00:40<02:16, 1.53it/s]
23%|βββ | 62/270 [00:41<02:18, 1.51it/s]
23%|βββ | 63/270 [00:42<02:18, 1.50it/s]
24%|βββ | 64/270 [00:42<02:12, 1.56it/s]
24%|βββ | 65/270 [00:43<02:10, 1.57it/s]
24%|βββ | 66/270 [00:44<02:09, 1.57it/s]
25%|βββ | 67/270 [00:44<02:12, 1.54it/s]
25%|βββ | 68/270 [00:45<02:07, 1.58it/s]
26%|βββ | 69/270 [00:46<02:10, 1.54it/s]
26%|βββ | 70/270 [00:46<02:11, 1.52it/s]
26%|βββ | 71/270 [00:47<02:16, 1.46it/s]
27%|βββ | 72/270 [00:48<02:20, 1.41it/s]
27%|βββ | 73/270 [00:48<02:14, 1.46it/s]
27%|βββ | 74/270 [00:49<02:08, 1.53it/s]
28%|βββ | 75/270 [00:50<02:04, 1.56it/s]
28%|βββ | 76/270 [00:50<02:05, 1.55it/s]
29%|βββ | 77/270 [00:51<02:10, 1.48it/s]
29%|βββ | 78/270 [00:52<02:09, 1.48it/s]
29%|βββ | 79/270 [00:52<02:05, 1.52it/s]
30%|βββ | 80/270 [00:53<02:05, 1.52it/s]
{'loss': 3.2887, 'learning_rate': 4e-07, 'epoch': 0.88} |
|
30%|βββ | 80/270 [00:53<02:05, 1.52it/s]
30%|βββ | 81/270 [00:54<02:03, 1.54it/s]
30%|βββ | 82/270 [00:54<02:02, 1.53it/s]
31%|βββ | 83/270 [00:55<02:03, 1.52it/s]
31%|βββ | 84/270 [00:56<02:01, 1.54it/s]
31%|ββββ | 85/270 [00:56<02:01, 1.52it/s]
32%|ββββ | 86/270 [00:57<01:59, 1.54it/s]
32%|ββββ | 87/270 [00:58<02:00, 1.51it/s]
33%|ββββ | 88/270 [00:58<02:04, 1.46it/s]
33%|ββββ | 89/270 [00:59<01:59, 1.52it/s]
33%|ββββ | 90/270 [00:59<01:53, 1.59it/s]Generation Kwargs: |
|
{'max_length': 1024, 'max_gen_length': 1024, 'num_beams': 5} |
|
|
|
0%| | 0/6 [00:00<?, ?it/s][A |
|
33%|ββββ | 2/6 [00:33<01:07, 16.96s/it][A |
|
50%|βββββ | 3/6 [00:36<00:32, 10.84s/it][A |
|
67%|βββββββ | 4/6 [01:10<00:39, 19.70s/it][A |
|
83%|βββββββββ | 5/6 [01:13<00:13, 13.77s/it][A |
|
100%|ββββββββββ| 6/6 [01:47<00:00, 20.53s/it][AEmpty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
Empty AMR failure! |
|
|
|
[A{'eval_loss': 3.400739908218384, 'eval_smatch': 0.3046, 'eval_gen_len': 210.7333, 'eval_runtime': 149.7975, 'eval_samples_per_second': 0.2, 'eval_steps_per_second': 0.04, 'epoch': 0.99} |
|
33%|ββββ | 90/270 [03:30<01:53, 1.59it/s] |
|
100%|ββββββββββ| 6/6 [01:49<00:00, 20.53s/it][A |
|
[A[WARNING|configuration_utils.py:448] 2024-09-27 08:11:08,058 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41. |
|
Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2} |
|
/opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock. |
|
self.pid = os.fork() |
|
34%|ββββ | 91/270 [03:41<2:26:04, 48.96s/it]
34%|ββββ | 92/270 [03:42<1:42:17, 34.48s/it]
34%|ββββ | 93/270 [03:43<1:11:49, 24.35s/it]
35%|ββββ | 94/270 [03:43<50:34, 17.24s/it]
35%|ββββ | 95/270 [03:44<35:44, 12.26s/it]
36%|ββββ | 96/270 [03:44<25:22, 8.75s/it]
36%|ββββ | 97/270 [03:45<18:16, 6.34s/it]
36%|ββββ | 98/270 [03:46<13:18, 4.64s/it]
37%|ββββ | 99/270 [03:46<09:51, 3.46s/it]
37%|ββββ | 100/270 [03:47<07:28, 2.64s/it]
{'loss': 2.9042, 'learning_rate': 5e-07, 'epoch': 1.1} |
|
37%|ββββ | 100/270 [03:47<07:28, 2.64s/it]
37%|ββββ | 101/270 [03:48<05:41, 2.02s/it]
38%|ββββ | 102/270 [03:49<04:34, 1.63s/it]
38%|ββββ | 103/270 [03:49<03:50, 1.38s/it]
39%|ββββ | 104/270 [03:50<03:17, 1.19s/it]
39%|ββββ | 105/270 [03:51<02:52, 1.04s/it]
39%|ββββ | 106/270 [03:51<02:32, 1.08it/s]
40%|ββββ | 107/270 [03:52<02:25, 1.12it/s]
40%|ββββ | 108/270 [03:53<02:16, 1.19it/s]
40%|ββββ | 109/270 [03:54<02:07, 1.27it/s]
41%|ββββ | 110/270 [03:54<01:59, 1.33it/s]
41%|ββββ | 111/270 [03:55<01:54, 1.39it/s]
41%|βββββ | 112/270 [03:56<01:52, 1.40it/s]
42%|βββββ | 113/270 [03:56<01:49, 1.44it/s]
42%|βββββ | 114/270 [03:57<01:46, 1.46it/s]
43%|βββββ | 115/270 [03:58<01:46, 1.45it/s]
43%|βββββ | 116/270 [03:58<01:47, 1.43it/s]
43%|βββββ | 117/270 [03:59<01:45, 1.46it/s]
44%|βββββ | 118/270 [04:00<01:46, 1.42it/s]
44%|βββββ | 119/270 [04:00<01:46, 1.42it/s]
44%|βββββ | 120/270 [04:01<01:44, 1.43it/s]
{'loss': 1.8419, 'learning_rate': 6e-07, 'epoch': 1.32} |
|
44%|βββββ | 120/270 [04:01<01:44, 1.43it/s]
45%|βββββ | 121/270 [04:02<01:43, 1.44it/s]
45%|βββββ | 122/270 [04:03<01:46, 1.40it/s]
46%|βββββ | 123/270 [04:03<01:46, 1.38it/s]
46%|βββββ | 124/270 [04:04<01:40, 1.46it/s]
46%|βββββ | 125/270 [04:05<01:38, 1.48it/s]
47%|βββββ | 126/270 [04:05<01:34, 1.52it/s]
47%|βββββ | 127/270 [04:06<01:34, 1.52it/s]
47%|βββββ | 128/270 [04:07<01:33, 1.52it/s]
48%|βββββ | 129/270 [04:07<01:32, 1.53it/s]
48%|βββββ | 130/270 [04:08<01:35, 1.47it/s]
49%|βββββ | 131/270 [04:08<01:30, 1.54it/s]
49%|βββββ | 132/270 [04:09<01:29, 1.55it/s]
49%|βββββ | 133/270 [04:10<01:31, 1.50it/s]
50%|βββββ | 134/270 [04:11<01:30, 1.50it/s]
50%|βββββ | 135/270 [04:11<01:30, 1.48it/s]
50%|βββββ | 136/270 [04:12<01:30, 1.49it/s]
51%|βββββ | 137/270 [04:12<01:27, 1.52it/s]
51%|βββββ | 138/270 [04:13<01:26, 1.53it/s]
51%|ββββββ | 139/270 [04:14<01:24, 1.55it/s]
52%|ββββββ | 140/270 [04:14<01:26, 1.51it/s]
{'loss': 1.6323, 'learning_rate': 7e-07, 'epoch': 1.55} |
|
52%|ββββββ | 140/270 [04:14<01:26, 1.51it/s]
52%|ββββββ | 141/270 [04:15<01:30, 1.42it/s]
53%|ββββββ | 142/270 [04:16<01:27, 1.46it/s]
53%|ββββββ | 143/270 [04:17<01:28, 1.43it/s]
53%|ββββββ | 144/270 [04:17<01:27, 1.44it/s]
54%|ββββββ | 145/270 [04:18<01:26, 1.44it/s]
54%|ββββββ | 146/270 [04:19<01:24, 1.46it/s]
54%|ββββββ | 147/270 [04:19<01:22, 1.49it/s]
55%|ββββββ | 148/270 [04:20<01:25, 1.43it/s]
55%|ββββββ | 149/270 [04:21<01:23, 1.46it/s]
56%|ββββββ | 150/270 [04:21<01:21, 1.47it/s]
56%|ββββββ | 151/270 [04:22<01:22, 1.44it/s]
56%|ββββββ | 152/270 [04:23<01:21, 1.44it/s]
57%|ββββββ | 153/270 [04:23<01:19, 1.47it/s]
57%|ββββββ | 154/270 [04:24<01:19, 1.45it/s]
57%|ββββββ | 155/270 [04:25<01:16, 1.50it/s]
58%|ββββββ | 156/270 [04:26<01:20, 1.42it/s]
58%|ββββββ | 157/270 [04:26<01:19, 1.43it/s]
59%|ββββββ | 158/270 [04:27<01:19, 1.40it/s]
59%|ββββββ | 159/270 [04:28<01:15, 1.48it/s]
59%|ββββββ | 160/270 [04:28<01:15, 1.46it/s]
{'loss': 1.4964, 'learning_rate': 8e-07, 'epoch': 1.77} |
|
59%|ββββββ | 160/270 [04:28<01:15, 1.46it/s]
60%|ββββββ | 161/270 [04:29<01:13, 1.48it/s]
60%|ββββββ | 162/270 [04:30<01:13, 1.48it/s]
60%|ββββββ | 163/270 [04:30<01:10, 1.52it/s]
61%|ββββββ | 164/270 [04:31<01:11, 1.49it/s]
61%|ββββββ | 165/270 [04:32<01:09, 1.51it/s]
61%|βββββββ | 166/270 [04:32<01:07, 1.53it/s]
62%|βββββββ | 167/270 [04:33<01:05, 1.57it/s]
62%|βββββββ | 168/270 [04:34<01:08, 1.49it/s]
63%|βββββββ | 169/270 [04:34<01:08, 1.47it/s]
63%|βββββββ | 170/270 [04:35<01:10, 1.42it/s]
63%|βββββββ | 171/270 [04:36<01:07, 1.48it/s]
64%|βββββββ | 172/270 [04:36<01:05, 1.49it/s]
64%|βββββββ | 173/270 [04:37<01:02, 1.55it/s]
64%|βββββββ | 174/270 [04:38<01:01, 1.55it/s]
65%|βββββββ | 175/270 [04:38<01:01, 1.55it/s]
65%|βββββββ | 176/270 [04:39<01:01, 1.53it/s]
66%|βββββββ | 177/270 [04:39<01:00, 1.55it/s]
66%|βββββββ | 178/270 [04:40<00:59, 1.55it/s]
66%|βββββββ | 179/270 [04:41<00:58, 1.56it/s]
67%|βββββββ | 180/270 [04:41<00:58, 1.54it/s]
{'loss': 1.5144, 'learning_rate': 9e-07, 'epoch': 1.99} |
|
67%|βββββββ | 180/270 [04:41<00:58, 1.54it/s]
67%|βββββββ | 181/270 [04:42<00:58, 1.52it/s]Generation Kwargs: |
|
{'max_length': 1024, 'max_gen_length': 1024, 'num_beams': 5} |
|
|
|
0%| | 0/6 [00:00<?, ?it/s][A |
|
33%|ββββ | 2/6 [00:00<00:01, 3.16it/s][A |
|
50%|βββββ | 3/6 [00:01<00:01, 1.95it/s][A |
|
67%|βββββββ | 4/6 [00:02<00:01, 1.84it/s][A |
|
83%|βββββββββ | 5/6 [00:03<00:00, 1.35it/s][A |
|
100%|ββββββββββ| 6/6 [00:05<00:00, 1.13s/it][A
|
|
[A{'eval_loss': 1.8273289203643799, 'eval_smatch': 0.4078, 'eval_gen_len': 29.0, 'eval_runtime': 6.2458, 'eval_samples_per_second': 4.803, 'eval_steps_per_second': 0.961, 'epoch': 2.0} |
|
67%|βββββββ | 181/270 [04:48<00:58, 1.52it/s] |
|
100%|ββββββββββ| 6/6 [00:05<00:00, 1.13s/it][A |
|
[A[WARNING|configuration_utils.py:448] 2024-09-27 08:12:26,894 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41. |
|
Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2} |
|
/opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock. |
|
self.pid = os.fork() |
|
67%|βββββββ | 182/270 [04:58<07:42, 5.26s/it]
68%|βββββββ | 183/270 [04:59<05:37, 3.87s/it]
68%|βββββββ | 184/270 [04:59<04:11, 2.93s/it]
69%|βββββββ | 185/270 [05:00<03:12, 2.26s/it]
69%|βββββββ | 186/270 [05:01<02:28, 1.77s/it]
69%|βββββββ | 187/270 [05:01<01:59, 1.44s/it]
70%|βββββββ | 188/270 [05:02<01:39, 1.22s/it]
70%|βββββββ | 189/270 [05:03<01:26, 1.06s/it]
70%|βββββββ | 190/270 [05:04<01:16, 1.05it/s]
71%|βββββββ | 191/270 [05:04<01:09, 1.14it/s]
71%|βββββββ | 192/270 [05:05<01:02, 1.26it/s]
71%|ββββββββ | 193/270 [05:06<00:59, 1.28it/s]
72%|ββββββββ | 194/270 [05:06<00:57, 1.32it/s]
72%|ββββββββ | 195/270 [05:07<00:55, 1.34it/s]
73%|ββββββββ | 196/270 [05:08<00:53, 1.38it/s]
73%|ββββββββ | 197/270 [05:08<00:53, 1.37it/s]
73%|ββββββββ | 198/270 [05:09<00:50, 1.43it/s]
74%|ββββββββ | 199/270 [05:10<00:47, 1.48it/s]
74%|ββββββββ | 200/270 [05:10<00:49, 1.42it/s]
{'loss': 1.4392, 'learning_rate': 1e-06, 'epoch': 2.21} |
|
74%|ββββββββ | 200/270 [05:10<00:49, 1.42it/s]
74%|ββββββββ | 201/270 [05:11<00:46, 1.47it/s]
75%|ββββββββ | 202/270 [05:12<00:45, 1.50it/s]
75%|ββββββββ | 203/270 [05:12<00:44, 1.51it/s]
76%|ββββββββ | 204/270 [05:13<00:43, 1.52it/s]
76%|ββββββββ | 205/270 [05:14<00:43, 1.49it/s]
76%|ββββββββ | 206/270 [05:14<00:44, 1.44it/s]
77%|ββββββββ | 207/270 [05:15<00:42, 1.49it/s]
77%|ββββββββ | 208/270 [05:16<00:43, 1.43it/s]
77%|ββββββββ | 209/270 [05:17<00:42, 1.44it/s]
78%|ββββββββ | 210/270 [05:17<00:40, 1.49it/s]
78%|ββββββββ | 211/270 [05:18<00:39, 1.50it/s]
79%|ββββββββ | 212/270 [05:18<00:37, 1.54it/s]
79%|ββββββββ | 213/270 [05:19<00:37, 1.51it/s]
79%|ββββββββ | 214/270 [05:20<00:37, 1.48it/s]
80%|ββββββββ | 215/270 [05:20<00:35, 1.55it/s]
80%|ββββββββ | 216/270 [05:21<00:34, 1.55it/s]
80%|ββββββββ | 217/270 [05:22<00:33, 1.60it/s]
81%|ββββββββ | 218/270 [05:22<00:32, 1.59it/s]
81%|ββββββββ | 219/270 [05:23<00:32, 1.57it/s]
81%|βββββββββ | 220/270 [05:24<00:31, 1.58it/s]
{'loss': 1.4428, 'learning_rate': 7.428571428571427e-07, 'epoch': 2.43} |
|
81%|βββββββββ | 220/270 [05:24<00:31, 1.58it/s]
82%|βββββββββ | 221/270 [05:24<00:30, 1.58it/s]
82%|βββββββββ | 222/270 [05:25<00:29, 1.60it/s]
83%|βββββββββ | 223/270 [05:25<00:29, 1.59it/s]
83%|βββββββββ | 224/270 [05:26<00:31, 1.48it/s]
83%|βββββββββ | 225/270 [05:27<00:29, 1.52it/s]
84%|βββββββββ | 226/270 [05:27<00:28, 1.53it/s]
84%|βββββββββ | 227/270 [05:28<00:28, 1.52it/s]
84%|βββββββββ | 228/270 [05:29<00:27, 1.53it/s]
85%|βββββββββ | 229/270 [05:29<00:26, 1.58it/s]
85%|βββββββββ | 230/270 [05:30<00:24, 1.62it/s]
86%|βββββββββ | 231/270 [05:31<00:24, 1.56it/s]
86%|βββββββββ | 232/270 [05:31<00:25, 1.52it/s]
86%|βββββββββ | 233/270 [05:32<00:24, 1.51it/s]
87%|βββββββββ | 234/270 [05:33<00:24, 1.48it/s]
87%|βββββββββ | 235/270 [05:33<00:22, 1.55it/s]
87%|βββββββββ | 236/270 [05:34<00:22, 1.50it/s]
88%|βββββββββ | 237/270 [05:35<00:21, 1.51it/s]
88%|βββββββββ | 238/270 [05:35<00:21, 1.50it/s]
89%|βββββββββ | 239/270 [05:36<00:20, 1.51it/s]
89%|βββββββββ | 240/270 [05:37<00:20, 1.47it/s]
{'loss': 1.377, 'learning_rate': 4.857142857142857e-07, 'epoch': 2.65} |
|
89%|βββββββββ | 240/270 [05:37<00:20, 1.47it/s]
89%|βββββββββ | 241/270 [05:37<00:19, 1.46it/s]
90%|βββββββββ | 242/270 [05:38<00:18, 1.53it/s]
90%|βββββββββ | 243/270 [05:39<00:19, 1.41it/s]
90%|βββββββββ | 244/270 [05:39<00:18, 1.44it/s]
91%|βββββββββ | 245/270 [05:40<00:17, 1.41it/s]
91%|βββββββββ | 246/270 [05:41<00:16, 1.47it/s]
91%|ββββββββββ| 247/270 [05:42<00:15, 1.48it/s]
92%|ββββββββββ| 248/270 [05:42<00:14, 1.49it/s]
92%|ββββββββββ| 249/270 [05:43<00:14, 1.47it/s]
93%|ββββββββββ| 250/270 [05:44<00:13, 1.48it/s]
93%|ββββββββββ| 251/270 [05:44<00:13, 1.42it/s]
93%|ββββββββββ| 252/270 [05:45<00:12, 1.48it/s]
94%|ββββββββββ| 253/270 [05:46<00:12, 1.41it/s]
94%|ββββββββββ| 254/270 [05:46<00:11, 1.44it/s]
94%|ββββββββββ| 255/270 [05:47<00:10, 1.48it/s]
95%|ββββββββββ| 256/270 [05:48<00:09, 1.51it/s]
95%|ββββββββββ| 257/270 [05:48<00:08, 1.50it/s]
96%|ββββββββββ| 258/270 [05:49<00:07, 1.50it/s]
96%|ββββββββββ| 259/270 [05:50<00:07, 1.48it/s]
96%|ββββββββββ| 260/270 [05:50<00:06, 1.47it/s]
{'loss': 1.3575, 'learning_rate': 2.285714285714286e-07, 'epoch': 2.87} |
|
96%|ββββββββββ| 260/270 [05:50<00:06, 1.47it/s]
97%|ββββββββββ| 261/270 [05:51<00:06, 1.47it/s]
97%|ββββββββββ| 262/270 [05:52<00:05, 1.46it/s]
97%|ββββββββββ| 263/270 [05:52<00:04, 1.42it/s]
98%|ββββββββββ| 264/270 [05:53<00:04, 1.47it/s]
98%|ββββββββββ| 265/270 [05:54<00:03, 1.49it/s]
99%|ββββββββββ| 266/270 [05:54<00:02, 1.48it/s]
99%|ββββββββββ| 267/270 [05:55<00:02, 1.45it/s]
99%|ββββββββββ| 268/270 [05:56<00:01, 1.45it/s]
100%|ββββββββββ| 269/270 [05:56<00:00, 1.53it/s]
100%|ββββββββββ| 270/270 [05:57<00:00, 1.52it/s][WARNING|configuration_utils.py:448] 2024-09-27 08:13:35,534 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41. |
|
Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2} |
|
Generation Kwargs: |
|
{'max_length': 1024, 'max_gen_length': 1024, 'num_beams': 5} |
|
/opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock. |
|
self.pid = os.fork() |
|
|
|
0%| | 0/6 [00:00<?, ?it/s][A |
|
33%|ββββ | 2/6 [00:00<00:01, 2.23it/s][A |
|
50%|βββββ | 3/6 [00:01<00:01, 1.54it/s][A |
|
67%|βββββββ | 4/6 [00:02<00:01, 1.50it/s][A |
|
83%|βββββββββ | 5/6 [00:04<00:00, 1.05it/s][A |
|
100%|ββββββββββ| 6/6 [00:08<00:00, 2.14s/it][A
|
|
[A{'eval_loss': 1.7345261573791504, 'eval_smatch': 0.4022, 'eval_gen_len': 33.6, 'eval_runtime': 15.6305, 'eval_samples_per_second': 1.919, 'eval_steps_per_second': 0.384, 'epoch': 2.98} |
|
100%|ββββββββββ| 270/270 [06:21<00:00, 1.52it/s] |
|
100%|ββββββββββ| 6/6 [00:14<00:00, 2.14s/it][A |
|
[A[WARNING|configuration_utils.py:448] 2024-09-27 08:13:59,667 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41. |
|
Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2} |
|
[WARNING|trainer.py:2764] 2024-09-27 08:14:14,430 >> There were missing keys in the checkpoint model loaded: ['model.encoder.embed_tokens.weight', 'model.decoder.embed_tokens.weight', 'lm_head.weight']. |
|
{'train_runtime': 413.6641, 'train_samples_per_second': 16.419, 'train_steps_per_second': 0.653, 'train_loss': 2.365596493968257, 'epoch': 2.98} |
|
100%|ββββββββββ| 270/270 [06:36<00:00, 1.52it/s]
100%|ββββββββββ| 270/270 [06:36<00:00, 1.47s/it] |
|
[WARNING|configuration_utils.py:448] 2024-09-27 08:15:07,015 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41. |
|
Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2} |
|
[WARNING|configuration_utils.py:448] 2024-09-27 08:15:11,181 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41. |
|
Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2} |
|
model.safetensors: 0%| | 0.00/1.58G [00:00<?, ?B/s]
model.safetensors: 0%| | 4.15M/1.58G [00:00<04:15, 6.15MB/s]
model.safetensors: 0%| | 4.85M/1.58G [00:00<04:25, 5.92MB/s]
model.safetensors: 0%| | 6.28M/1.58G [00:00<03:29, 7.47MB/s]
model.safetensors: 1%| | 8.00M/1.58G [00:01<02:44, 9.54MB/s]
model.safetensors: 1%| | 10.2M/1.58G [00:01<02:05, 12.5MB/s]
model.safetensors: 1%| | 15.6M/1.58G [00:01<01:23, 18.7MB/s]
model.safetensors: 1%| | 17.6M/1.58G [00:01<02:58, 8.72MB/s]
model.safetensors: 1%|β | 23.5M/1.58G [00:02<03:13, 8.01MB/s]
model.safetensors: 2%|β | 25.2M/1.58G [00:02<03:12, 8.04MB/s]
model.safetensors: 2%|β | 27.6M/1.58G [00:03<02:40, 9.67MB/s]
model.safetensors: 2%|β | 32.0M/1.58G [00:03<03:33, 7.24MB/s]
model.safetensors: 3%|β | 40.0M/1.58G [00:04<02:45, 9.27MB/s]
model.safetensors: 3%|β | 43.7M/1.58G [00:04<02:25, 10.6MB/s]
model.safetensors: 3%|β | 48.0M/1.58G [00:05<03:08, 8.09MB/s]
model.safetensors: 4%|β | 64.0M/1.58G [00:06<01:56, 13.0MB/s]
model.safetensors: 5%|β | 72.0M/1.58G [00:07<02:11, 11.4MB/s]
model.safetensors: 5%|β | 74.2M/1.58G [00:07<02:11, 11.4MB/s]
model.safetensors: 5%|β | 79.3M/1.58G [00:07<01:44, 14.3MB/s]
model.safetensors: 5%|β | 81.7M/1.58G [00:08<02:30, 9.94MB/s]
model.safetensors: 6%|β | 87.8M/1.58G [00:09<02:53, 8.58MB/s]
model.safetensors: 6%|β | 90.2M/1.58G [00:09<02:44, 9.02MB/s]
model.safetensors: 6%|β | 93.8M/1.58G [00:09<02:12, 11.2MB/s]
model.safetensors: 6%|β | 96.0M/1.58G [00:10<03:17, 7.48MB/s]
model.safetensors: 7%|β | 104M/1.58G [00:10<03:01, 8.11MB/s]
model.safetensors: 7%|β | 106M/1.58G [00:11<02:52, 8.53MB/s]
model.safetensors: 7%|β | 112M/1.58G [00:11<02:00, 12.2MB/s]
model.safetensors: 7%|β | 114M/1.58G [00:11<02:59, 8.13MB/s]
model.safetensors: 8%|β | 128M/1.58G [00:12<01:41, 14.2MB/s]
model.safetensors: 9%|β | 136M/1.58G [00:13<02:01, 11.8MB/s]
model.safetensors: 9%|β | 138M/1.58G [00:13<02:01, 11.8MB/s]
model.safetensors: 9%|β | 142M/1.58G [00:13<01:45, 13.6MB/s]
model.safetensors: 9%|β | 144M/1.58G [00:14<02:50, 8.39MB/s]
model.safetensors: 10%|β | 160M/1.58G [00:14<01:31, 15.5MB/s]
model.safetensors: 11%|β | 176M/1.58G [00:15<01:10, 20.0MB/s]
model.safetensors: 12%|ββ | 192M/1.58G [00:16<01:04, 21.6MB/s]
model.safetensors: 13%|ββ | 208M/1.58G [00:16<01:03, 21.4MB/s]
model.safetensors: 14%|ββ | 224M/1.58G [00:17<00:53, 25.1MB/s]
model.safetensors: 15%|ββ | 240M/1.58G [00:17<00:50, 26.5MB/s]
model.safetensors: 16%|ββ | 256M/1.58G [00:18<00:48, 27.4MB/s]
model.safetensors: 17%|ββ | 272M/1.58G [00:18<00:46, 27.8MB/s]
model.safetensors: 18%|ββ | 288M/1.58G [00:19<00:45, 28.0MB/s]
model.safetensors: 19%|ββ | 304M/1.58G [00:19<00:42, 30.2MB/s]
model.safetensors: 20%|ββ | 320M/1.58G [00:20<00:46, 26.8MB/s]
model.safetensors: 21%|βββ | 336M/1.58G [00:21<00:43, 28.3MB/s]
model.safetensors: 22%|βββ | 352M/1.58G [00:21<00:42, 28.6MB/s]
model.safetensors: 23%|βββ | 368M/1.58G [00:22<00:43, 27.6MB/s]
model.safetensors: 24%|βββ | 384M/1.58G [00:22<00:43, 27.5MB/s]
model.safetensors: 25%|βββ | 400M/1.58G [00:23<00:40, 28.9MB/s]
model.safetensors: 26%|βββ | 416M/1.58G [00:23<00:38, 30.1MB/s]
model.safetensors: 27%|βββ | 432M/1.58G [00:24<00:37, 30.5MB/s]
model.safetensors: 28%|βββ | 448M/1.58G [00:24<00:34, 32.6MB/s]
model.safetensors: 29%|βββ | 464M/1.58G [00:25<00:33, 33.1MB/s]
model.safetensors: 30%|βββ | 480M/1.58G [00:26<00:37, 29.0MB/s]
model.safetensors: 31%|ββββ | 496M/1.58G [00:26<00:35, 30.4MB/s]
model.safetensors: 33%|ββββ | 512M/1.58G [00:27<00:35, 30.2MB/s]
model.safetensors: 34%|ββββ | 528M/1.58G [00:27<00:33, 31.7MB/s]
model.safetensors: 35%|ββββ | 544M/1.58G [00:28<00:39, 26.3MB/s]
model.safetensors: 36%|ββββ | 560M/1.58G [00:28<00:35, 28.2MB/s]
model.safetensors: 37%|ββββ | 576M/1.58G [00:29<00:35, 28.1MB/s]
model.safetensors: 38%|ββββ | 592M/1.58G [00:29<00:33, 29.6MB/s]
model.safetensors: 39%|ββββ | 608M/1.58G [00:30<00:31, 30.9MB/s]
model.safetensors: 40%|ββββ | 624M/1.58G [00:30<00:29, 31.8MB/s]
model.safetensors: 41%|ββββ | 640M/1.58G [00:31<00:32, 28.8MB/s]
model.safetensors: 42%|βββββ | 656M/1.58G [00:31<00:29, 30.9MB/s]
model.safetensors: 43%|βββββ | 672M/1.58G [00:32<00:30, 30.0MB/s]
model.safetensors: 44%|βββββ | 688M/1.58G [00:32<00:28, 31.0MB/s]
model.safetensors: 45%|βββββ | 704M/1.58G [00:34<00:51, 17.1MB/s]
model.safetensors: 46%|βββββ | 720M/1.58G [00:35<00:42, 20.2MB/s]
model.safetensors: 47%|βββββ | 736M/1.58G [00:35<00:36, 23.2MB/s]
model.safetensors: 48%|βββββ | 752M/1.58G [00:36<00:32, 25.2MB/s]
model.safetensors: 49%|βββββ | 768M/1.58G [00:36<00:28, 28.0MB/s]
model.safetensors: 50%|βββββ | 784M/1.58G [00:37<00:27, 28.5MB/s]
model.safetensors: 51%|βββββ | 800M/1.58G [00:37<00:26, 29.8MB/s]
model.safetensors: 52%|ββββββ | 816M/1.58G [00:38<00:23, 32.2MB/s]
model.safetensors: 53%|ββββββ | 832M/1.58G [00:38<00:24, 30.5MB/s]
model.safetensors: 54%|ββββββ | 848M/1.58G [00:39<00:24, 29.3MB/s]
model.safetensors: 55%|ββββββ | 864M/1.58G [00:39<00:22, 31.8MB/s]
model.safetensors: 56%|ββββββ | 880M/1.58G [00:40<00:21, 32.4MB/s]
model.safetensors: 57%|ββββββ | 896M/1.58G [00:40<00:22, 30.6MB/s]
model.safetensors: 58%|ββββββ | 912M/1.58G [00:41<00:20, 31.8MB/s]
model.safetensors: 59%|ββββββ | 928M/1.58G [00:41<00:20, 30.9MB/s]
model.safetensors: 60%|ββββββ | 944M/1.58G [00:42<00:21, 30.0MB/s]
model.safetensors: 61%|ββββββ | 960M/1.58G [00:43<00:21, 28.2MB/s]
model.safetensors: 62%|βββββββ | 976M/1.58G [00:43<00:19, 30.5MB/s]
model.safetensors: 63%|βββββββ | 992M/1.58G [00:44<00:23, 25.2MB/s]
model.safetensors: 64%|βββββββ | 1.01G/1.58G [00:44<00:21, 27.0MB/s]
model.safetensors: 65%|βββββββ | 1.02G/1.58G [00:45<00:20, 27.2MB/s]
model.safetensors: 66%|βββββββ | 1.04G/1.58G [00:45<00:18, 29.6MB/s]
model.safetensors: 67%|βββββββ | 1.06G/1.58G [00:46<00:16, 31.0MB/s]
model.safetensors: 68%|βββββββ | 1.07G/1.58G [00:46<00:15, 32.4MB/s]
model.safetensors: 69%|βββββββ | 1.09G/1.58G [00:47<00:15, 32.4MB/s]
model.safetensors: 70%|βββββββ | 1.10G/1.58G [00:47<00:15, 30.4MB/s]
model.safetensors: 71%|βββββββ | 1.12G/1.58G [00:48<00:14, 31.4MB/s]
model.safetensors: 72%|ββββββββ | 1.14G/1.58G [00:48<00:13, 31.5MB/s]
model.safetensors: 73%|ββββββββ | 1.15G/1.58G [00:49<00:16, 26.2MB/s]
model.safetensors: 74%|ββββββββ | 1.17G/1.58G [00:50<00:15, 26.5MB/s]
model.safetensors: 75%|ββββββββ | 1.18G/1.58G [00:50<00:13, 29.6MB/s]
model.safetensors: 76%|ββββββββ | 1.20G/1.58G [00:51<00:12, 30.0MB/s]
model.safetensors: 77%|ββββββββ | 1.22G/1.58G [00:51<00:11, 31.9MB/s]
model.safetensors: 78%|ββββββββ | 1.23G/1.58G [00:52<00:10, 33.2MB/s]
model.safetensors: 79%|ββββββββ | 1.25G/1.58G [00:52<00:09, 33.2MB/s]
model.safetensors: 80%|ββββββββ | 1.26G/1.58G [00:52<00:09, 33.3MB/s]
model.safetensors: 81%|βββββββββ | 1.28G/1.58G [00:53<00:08, 33.2MB/s]
model.safetensors: 82%|βββββββββ | 1.30G/1.58G [00:54<00:08, 31.1MB/s]
model.safetensors: 83%|βββββββββ | 1.31G/1.58G [00:54<00:08, 31.3MB/s]
model.safetensors: 84%|βββββββββ | 1.33G/1.58G [00:55<00:07, 32.0MB/s]
model.safetensors: 85%|βββββββββ | 1.34G/1.58G [00:55<00:07, 32.3MB/s]
model.safetensors: 86%|βββββββββ | 1.36G/1.58G [00:56<00:06, 32.1MB/s]
model.safetensors: 87%|βββββββββ | 1.38G/1.58G [00:56<00:06, 32.5MB/s]
model.safetensors: 88%|βββββββββ | 1.39G/1.58G [00:57<00:05, 31.6MB/s]
model.safetensors: 89%|βββββββββ | 1.41G/1.58G [00:57<00:05, 32.5MB/s]
model.safetensors: 90%|βββββββββ | 1.42G/1.58G [00:58<00:04, 32.2MB/s]
model.safetensors: 91%|ββββββββββ| 1.44G/1.58G [00:58<00:04, 33.0MB/s]
model.safetensors: 92%|ββββββββββ| 1.46G/1.58G [00:58<00:03, 32.7MB/s]
model.safetensors: 93%|ββββββββββ| 1.47G/1.58G [00:59<00:03, 32.4MB/s]
model.safetensors: 94%|ββββββββββ| 1.49G/1.58G [01:00<00:02, 29.8MB/s]
model.safetensors: 95%|ββββββββββ| 1.50G/1.58G [01:00<00:02, 29.9MB/s]
model.safetensors: 96%|ββββββββββ| 1.52G/1.58G [01:01<00:01, 28.4MB/s]
model.safetensors: 98%|ββββββββββ| 1.54G/1.58G [01:01<00:01, 28.7MB/s]
model.safetensors: 99%|ββββββββββ| 1.55G/1.58G [01:02<00:00, 31.1MB/s]
model.safetensors: 100%|ββββββββββ| 1.57G/1.58G [01:02<00:00, 32.6MB/s]
model.safetensors: 100%|ββββββββββ| 1.58G/1.58G [01:03<00:00, 24.9MB/s] |
|
|