2021-07-14 18:15:07.946587: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory [18:15:09] - INFO - __main__ - Training/evaluation parameters TrainingArguments( _n_gpu=0, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_find_unused_parameters=None, debug=[], deepspeed=None, disable_tqdm=False, do_eval=False, do_predict=False, do_train=False, eval_accumulation_steps=None, eval_steps=500, evaluation_strategy=IntervalStrategy.NO, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, gradient_accumulation_steps=1, greater_is_better=None, group_by_length=False, ignore_data_skip=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0002, length_column_name=length, load_best_model_at_end=False, local_rank=-1, log_level=-1, log_level_replica=-1, log_on_each_node=True, logging_dir=./runs/Jul14_18-15-09_t1v-n-b95d739e-w-0, logging_first_step=False, logging_steps=500, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.LINEAR, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=50.0, output_dir=./, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=128, per_device_train_batch_size=128, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=, push_to_hub_organization=None, push_to_hub_token=None, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./, save_on_each_node=False, save_steps=500, save_strategy=IntervalStrategy.STEPS, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, tpu_metrics_debug=False, tpu_num_cores=None, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=1000, weight_decay=0.0, ) [18:15:09] - DEBUG - urllib3.connectionpool - Starting new HTTPS connection (1): s3.amazonaws.com:443 [18:15:09] - DEBUG - urllib3.connectionpool - https://s3.amazonaws.com:443 "HEAD /datasets.huggingface.co/datasets/datasets/oscar/oscar.py HTTP/1.1" 404 0 [18:15:09] - DEBUG - urllib3.connectionpool - Starting new HTTPS connection (1): raw.githubusercontent.com:443 [18:15:09] - DEBUG - urllib3.connectionpool - https://raw.githubusercontent.com:443 "HEAD /huggingface/datasets/master/datasets/oscar/oscar.py HTTP/1.1" 200 0 [18:15:09] - DEBUG - urllib3.connectionpool - Starting new HTTPS connection (1): raw.githubusercontent.com:443 [18:15:09] - DEBUG - urllib3.connectionpool - https://raw.githubusercontent.com:443 "HEAD /huggingface/datasets/master/datasets/oscar/dataset_infos.json HTTP/1.1" 200 0 [18:15:09] - WARNING - datasets.builder - Reusing dataset oscar (/home/wilso/.cache/huggingface/datasets/oscar/unshuffled_deduplicated_su/1.0.0/84838bd49d2295f62008383b05620571535451d84545037bb94d6f3501651df2) [18:15:09] - DEBUG - urllib3.connectionpool - Starting new HTTPS connection (1): s3.amazonaws.com:443 [18:15:10] - DEBUG - urllib3.connectionpool - https://s3.amazonaws.com:443 "HEAD /datasets.huggingface.co/datasets/datasets/cc100/cc100.py HTTP/1.1" 404 0 [18:15:10] - DEBUG - urllib3.connectionpool - Starting new HTTPS connection (1): raw.githubusercontent.com:443 [18:15:10] - DEBUG - urllib3.connectionpool - https://raw.githubusercontent.com:443 "HEAD /huggingface/datasets/master/datasets/cc100/cc100.py HTTP/1.1" 200 0 [18:15:10] - DEBUG - urllib3.connectionpool - Starting new HTTPS connection (1): raw.githubusercontent.com:443 [18:15:10] - DEBUG - urllib3.connectionpool - https://raw.githubusercontent.com:443 "HEAD /huggingface/datasets/master/datasets/cc100/dataset_infos.json HTTP/1.1" 200 0 [18:15:10] - WARNING - datasets.builder - Using custom data configuration su-lang=su [18:15:10] - WARNING - datasets.builder - Reusing dataset cc100 (/home/wilso/.cache/huggingface/datasets/cc100/su-lang=su/0.0.0/b583dd47b0dd43a3c3773075abd993be12d0eee93dbd2cfe15a0e4e94d481e80) [18:15:10] - DEBUG - urllib3.connectionpool - Starting new HTTPS connection (1): s3.amazonaws.com:443 [18:15:11] - DEBUG - urllib3.connectionpool - https://s3.amazonaws.com:443 "HEAD /datasets.huggingface.co/datasets/datasets/mc4/mc4.py HTTP/1.1" 404 0 [18:15:11] - DEBUG - urllib3.connectionpool - Starting new HTTPS connection (1): raw.githubusercontent.com:443 [18:15:11] - DEBUG - urllib3.connectionpool - https://raw.githubusercontent.com:443 "HEAD /huggingface/datasets/master/datasets/mc4/mc4.py HTTP/1.1" 200 0 [18:15:11] - DEBUG - urllib3.connectionpool - Starting new HTTPS connection (1): raw.githubusercontent.com:443 [18:15:11] - DEBUG - urllib3.connectionpool - https://raw.githubusercontent.com:443 "HEAD /huggingface/datasets/master/datasets/mc4/dataset_infos.json HTTP/1.1" 404 0 [18:15:11] - WARNING - datasets.builder - Reusing dataset mc4 (/home/wilso/.cache/huggingface/datasets/mc4/su/0.0.0/a2bc8f2c4d913b8b16fac4d1a63d673fa6cb22859520dcac7f193feec1f00cae) [18:15:11] - DEBUG - urllib3.connectionpool - Starting new HTTPS connection (1): s3.amazonaws.com:443 [18:15:12] - DEBUG - urllib3.connectionpool - https://s3.amazonaws.com:443 "HEAD /datasets.huggingface.co/datasets/datasets/text/text.py HTTP/1.1" 200 0 [18:15:15] - WARNING - datasets.builder - Using custom data configuration default-4435ed1b47132023 [18:15:15] - WARNING - datasets.builder - Reusing dataset text (/home/wilso/.cache/huggingface/datasets/text/default-4435ed1b47132023/0.0.0/e16f44aa1b321ece1f87b07977cc5d70be93d69b20486d6dacd62e12cf25c9a5) [18:15:17] - WARNING - datasets.arrow_dataset - Loading cached split indices for dataset at /home/wilso/.cache/huggingface/datasets/oscar/unshuffled_deduplicated_su/1.0.0/84838bd49d2295f62008383b05620571535451d84545037bb94d6f3501651df2/cache-72fb5b88b494cae1.arrow and /home/wilso/.cache/huggingface/datasets/oscar/unshuffled_deduplicated_su/1.0.0/84838bd49d2295f62008383b05620571535451d84545037bb94d6f3501651df2/cache-33ca85d513d1dc28.arrow #2: 0%| | 0/16 [00:00::_M_dispose() @ 0x7f9cc9b95800 976 (unknown) @ 0x7f9ef258a210 1363223696 (unknown) E0714 18:15:42.718146 590836 process_state.cc:771] RAW: Raising signal 15 with default behavior @ 0x7f9cd4f5cd3d 48 std::vector<>::~vector() PC: @ 0x7f9cd4f5aaf8 (unknown) std::vector<>::~vector() @ 0x7f9cc9b95800 976 (unknown) @ 0x7f9ef258a210 1363223568 (unknown) E0714 18:15:42.727687 590817 process_state.cc:771] RAW: Raising signal 15 with default behavior @ 0x7f9cd3cd7a71 80 arrow::SimpleTable::~SimpleTable() @ 0x5d1b18 (unknown) (unknown) @ 0x90bf00 (unknown) (unknown) https://symbolize.stripped_domain/r/?trace=7f9cd4f59b61,7f9cc9b957ff,7f9ef258a20f,7f9cd4f5cd3c,7f9cd3cd7a70,5d1b17,90beff&map=2a762cd764e70bc90ae4c7f9747c08d7:7f9cbcc53000-7f9cc9ed4280 E0714 18:15:42.735483 590781 coredump_hook.cc:250] RAW: Remote crash gathering disabled for SIGTERM. @ 0x7f9cd378967a (unknown) std::_Sp_counted_ptr_inplace<>::_M_dispose() https://symbolize.stripped_domain/r/?trace=7f9cd4f5aaf8,7f9cc9b957ff,7f9ef258a20f,7f9cd3789679&map=2a762cd764e70bc90ae4c7f9747c08d7:7f9cbcc53000-7f9cc9ed4280 E0714 18:15:42.748057 590829 coredump_hook.cc:250] RAW: Remote crash gathering disabled for SIGTERM. E0714 18:15:42.748956 590781 process_state.cc:771] RAW: Raising signal 15 with default behavior E0714 18:15:42.753471 590829 process_state.cc:771] RAW: Raising signal 15 with default behavior #5: 0%| | 0/2 [00:00