yuekai commited on
Commit
c96c265
1 Parent(s): bd56f7c

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. libritts-r/log/log-train-2024-08-06-08-02-16-0 +14 -0
  2. libritts-r/log/log-train-2024-08-06-08-02-16-1 +14 -0
  3. libritts-r/log/log-train-2024-08-06-08-02-16-2 +14 -0
  4. libritts-r/log/log-train-2024-08-06-08-02-16-3 +14 -0
  5. libritts-r/log/log-train-2024-08-06-08-02-16-4 +15 -0
  6. libritts-r/log/log-train-2024-08-06-08-02-16-5 +14 -0
  7. libritts-r/log/log-train-2024-08-06-08-02-16-6 +14 -0
  8. libritts-r/log/log-train-2024-08-06-08-02-16-7 +14 -0
  9. libritts-r/log/log-train-2024-08-06-08-03-57-0 +7 -0
  10. libritts-r/log/log-train-2024-08-06-08-03-57-1 +7 -0
  11. libritts-r/log/log-train-2024-08-06-08-03-57-2 +7 -0
  12. libritts-r/log/log-train-2024-08-06-08-03-57-3 +7 -0
  13. libritts-r/log/log-train-2024-08-06-08-03-57-4 +7 -0
  14. libritts-r/log/log-train-2024-08-06-08-03-57-5 +7 -0
  15. libritts-r/log/log-train-2024-08-06-08-03-57-6 +7 -0
  16. libritts-r/log/log-train-2024-08-06-08-03-57-7 +7 -0
  17. libritts-r/log/log-train-2024-08-06-08-06-14-0 +357 -0
  18. libritts-r/log/log-train-2024-08-06-08-06-14-1 +336 -0
  19. libritts-r/log/log-train-2024-08-06-08-06-14-2 +336 -0
  20. libritts-r/log/log-train-2024-08-06-08-06-14-3 +336 -0
  21. libritts-r/log/log-train-2024-08-06-08-06-14-4 +336 -0
  22. libritts-r/log/log-train-2024-08-06-08-06-14-5 +336 -0
  23. libritts-r/log/log-train-2024-08-06-08-06-14-6 +336 -0
  24. libritts-r/log/log-train-2024-08-06-08-06-14-7 +336 -0
  25. libritts-r/log/log-train-2024-08-06-14-23-41-0 +0 -0
  26. libritts-r/log/log-train-2024-08-06-14-23-41-1 +0 -0
  27. libritts-r/log/log-train-2024-08-06-14-23-41-2 +0 -0
  28. libritts-r/log/log-train-2024-08-06-14-23-41-3 +0 -0
  29. libritts-r/log/log-train-2024-08-06-14-23-41-4 +0 -0
  30. libritts-r/log/log-train-2024-08-06-14-23-41-5 +0 -0
  31. libritts-r/log/log-train-2024-08-06-14-23-41-6 +0 -0
  32. libritts-r/log/log-train-2024-08-06-14-23-41-7 +0 -0
  33. libritts-r/tensorboard_stage1/events.out.tfevents.1722931336.6867463.3160.0 +3 -0
  34. libritts-r/tensorboard_stage1/events.out.tfevents.1722931437.6867463.17896.0 +3 -0
  35. libritts-r/tensorboard_stage1/events.out.tfevents.1722931574.6867463.20306.0 +3 -0
  36. libritts-r/tensorboard_stage2/events.out.tfevents.1722954221.6867463.1063288.0 +3 -0
  37. libritts/log/log-train-2024-08-06-03-01-46-0 +15 -0
  38. libritts/log/log-train-2024-08-06-03-01-46-1 +15 -0
  39. libritts/log/log-train-2024-08-06-03-01-46-2 +15 -0
  40. libritts/log/log-train-2024-08-06-03-01-46-3 +15 -0
  41. libritts/log/log-train-2024-08-06-03-01-46-4 +15 -0
  42. libritts/log/log-train-2024-08-06-03-01-46-5 +15 -0
  43. libritts/log/log-train-2024-08-06-03-01-46-6 +15 -0
  44. libritts/log/log-train-2024-08-06-03-01-46-7 +15 -0
  45. libritts/log/log-train-2024-08-06-03-26-50-0 +14 -0
  46. libritts/log/log-train-2024-08-06-03-26-50-1 +14 -0
  47. libritts/log/log-train-2024-08-06-03-26-50-2 +14 -0
  48. libritts/log/log-train-2024-08-06-03-26-50-3 +14 -0
  49. libritts/log/log-train-2024-08-06-03-26-50-4 +14 -0
  50. libritts/log/log-train-2024-08-06-03-26-50-5 +14 -0
libritts-r/log/log-train-2024-08-06-08-02-16-0 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:02:16,439 INFO [trainer.py:870] (0/8) Training started
2
+ 2024-08-06 08:02:16,443 INFO [trainer.py:889] (0/8) Device: cuda:0
3
+ 2024-08-06 08:02:16,444 INFO [trainer.py:890] (0/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:02:16,444 INFO [trainer.py:892] (0/8) About to create model
5
+ 2024-08-06 08:02:17,494 INFO [trainer.py:899] (0/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:02:18,318 INFO [trainer.py:914] (0/8) Using DDP
7
+ 2024-08-06 08:02:20,442 INFO [datamodule.py:427] (0/8) About to get train cuts
8
+ 2024-08-06 08:02:20,455 INFO [datamodule.py:434] (0/8) About to get dev cuts
9
+ 2024-08-06 08:02:20,462 INFO [datamodule.py:292] (0/8) Disable SpecAugment
10
+ 2024-08-06 08:02:20,462 INFO [datamodule.py:294] (0/8) About to create train dataset
11
+ 2024-08-06 08:02:20,464 INFO [datamodule.py:323] (0/8) Using DynamicBucketingSampler
12
+ 2024-08-06 08:02:21,071 INFO [datamodule.py:344] (0/8) About to create train dataloader
13
+ 2024-08-06 08:02:21,071 INFO [datamodule.py:367] (0/8) About to create dev dataset
14
+ 2024-08-06 08:02:21,396 INFO [datamodule.py:388] (0/8) About to create dev dataloader
libritts-r/log/log-train-2024-08-06-08-02-16-1 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:02:16,470 INFO [trainer.py:870] (1/8) Training started
2
+ 2024-08-06 08:02:16,471 INFO [trainer.py:889] (1/8) Device: cuda:1
3
+ 2024-08-06 08:02:16,471 INFO [trainer.py:890] (1/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:02:16,471 INFO [trainer.py:892] (1/8) About to create model
5
+ 2024-08-06 08:02:17,214 INFO [trainer.py:899] (1/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:02:17,933 INFO [trainer.py:914] (1/8) Using DDP
7
+ 2024-08-06 08:02:20,444 INFO [datamodule.py:427] (1/8) About to get train cuts
8
+ 2024-08-06 08:02:20,455 INFO [datamodule.py:434] (1/8) About to get dev cuts
9
+ 2024-08-06 08:02:20,462 INFO [datamodule.py:292] (1/8) Disable SpecAugment
10
+ 2024-08-06 08:02:20,462 INFO [datamodule.py:294] (1/8) About to create train dataset
11
+ 2024-08-06 08:02:20,464 INFO [datamodule.py:323] (1/8) Using DynamicBucketingSampler
12
+ 2024-08-06 08:02:21,071 INFO [datamodule.py:344] (1/8) About to create train dataloader
13
+ 2024-08-06 08:02:21,071 INFO [datamodule.py:367] (1/8) About to create dev dataset
14
+ 2024-08-06 08:02:21,393 INFO [datamodule.py:388] (1/8) About to create dev dataloader
libritts-r/log/log-train-2024-08-06-08-02-16-2 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:02:16,456 INFO [trainer.py:870] (2/8) Training started
2
+ 2024-08-06 08:02:16,457 INFO [trainer.py:889] (2/8) Device: cuda:2
3
+ 2024-08-06 08:02:16,457 INFO [trainer.py:890] (2/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:02:16,458 INFO [trainer.py:892] (2/8) About to create model
5
+ 2024-08-06 08:02:17,414 INFO [trainer.py:899] (2/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:02:18,019 INFO [trainer.py:914] (2/8) Using DDP
7
+ 2024-08-06 08:02:20,445 INFO [datamodule.py:427] (2/8) About to get train cuts
8
+ 2024-08-06 08:02:20,455 INFO [datamodule.py:434] (2/8) About to get dev cuts
9
+ 2024-08-06 08:02:20,462 INFO [datamodule.py:292] (2/8) Disable SpecAugment
10
+ 2024-08-06 08:02:20,462 INFO [datamodule.py:294] (2/8) About to create train dataset
11
+ 2024-08-06 08:02:20,464 INFO [datamodule.py:323] (2/8) Using DynamicBucketingSampler
12
+ 2024-08-06 08:02:21,076 INFO [datamodule.py:344] (2/8) About to create train dataloader
13
+ 2024-08-06 08:02:21,076 INFO [datamodule.py:367] (2/8) About to create dev dataset
14
+ 2024-08-06 08:02:21,400 INFO [datamodule.py:388] (2/8) About to create dev dataloader
libritts-r/log/log-train-2024-08-06-08-02-16-3 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:02:16,435 INFO [trainer.py:870] (3/8) Training started
2
+ 2024-08-06 08:02:16,436 INFO [trainer.py:889] (3/8) Device: cuda:3
3
+ 2024-08-06 08:02:16,436 INFO [trainer.py:890] (3/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:02:16,436 INFO [trainer.py:892] (3/8) About to create model
5
+ 2024-08-06 08:02:17,492 INFO [trainer.py:899] (3/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:02:18,433 INFO [trainer.py:914] (3/8) Using DDP
7
+ 2024-08-06 08:02:20,445 INFO [datamodule.py:427] (3/8) About to get train cuts
8
+ 2024-08-06 08:02:20,455 INFO [datamodule.py:434] (3/8) About to get dev cuts
9
+ 2024-08-06 08:02:20,462 INFO [datamodule.py:292] (3/8) Disable SpecAugment
10
+ 2024-08-06 08:02:20,462 INFO [datamodule.py:294] (3/8) About to create train dataset
11
+ 2024-08-06 08:02:20,464 INFO [datamodule.py:323] (3/8) Using DynamicBucketingSampler
12
+ 2024-08-06 08:02:21,078 INFO [datamodule.py:344] (3/8) About to create train dataloader
13
+ 2024-08-06 08:02:21,078 INFO [datamodule.py:367] (3/8) About to create dev dataset
14
+ 2024-08-06 08:02:21,410 INFO [datamodule.py:388] (3/8) About to create dev dataloader
libritts-r/log/log-train-2024-08-06-08-02-16-4 ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:02:16,446 INFO [trainer.py:870] (4/8) Training started
2
+ 2024-08-06 08:02:16,447 INFO [trainer.py:889] (4/8) Device: cuda:4
3
+ 2024-08-06 08:02:16,447 INFO [trainer.py:890] (4/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:02:16,447 INFO [trainer.py:892] (4/8) About to create model
5
+ 2024-08-06 08:02:17,470 INFO [trainer.py:899] (4/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:02:18,312 INFO [trainer.py:914] (4/8) Using DDP
7
+ 2024-08-06 08:02:20,442 INFO [datamodule.py:427] (4/8) About to get train cuts
8
+ 2024-08-06 08:02:20,456 INFO [datamodule.py:434] (4/8) About to get dev cuts
9
+ 2024-08-06 08:02:20,462 INFO [datamodule.py:292] (4/8) Disable SpecAugment
10
+ 2024-08-06 08:02:20,463 INFO [datamodule.py:294] (4/8) About to create train dataset
11
+ 2024-08-06 08:02:20,464 INFO [datamodule.py:323] (4/8) Using DynamicBucketingSampler
12
+ 2024-08-06 08:02:21,079 INFO [datamodule.py:344] (4/8) About to create train dataloader
13
+ 2024-08-06 08:02:21,080 INFO [datamodule.py:367] (4/8) About to create dev dataset
14
+ 2024-08-06 08:02:21,408 INFO [datamodule.py:388] (4/8) About to create dev dataloader
15
+ 2024-08-06 08:02:39,869 INFO [trainer.py:1092] (4/8) Saving batch to exp/valle/batch-bdd640fb-0667-1ad1-1c80-317fa3b1799d.pt
libritts-r/log/log-train-2024-08-06-08-02-16-5 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:02:16,470 INFO [trainer.py:870] (5/8) Training started
2
+ 2024-08-06 08:02:16,471 INFO [trainer.py:889] (5/8) Device: cuda:5
3
+ 2024-08-06 08:02:16,471 INFO [trainer.py:890] (5/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:02:16,471 INFO [trainer.py:892] (5/8) About to create model
5
+ 2024-08-06 08:02:17,212 INFO [trainer.py:899] (5/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:02:17,934 INFO [trainer.py:914] (5/8) Using DDP
7
+ 2024-08-06 08:02:20,445 INFO [datamodule.py:427] (5/8) About to get train cuts
8
+ 2024-08-06 08:02:20,455 INFO [datamodule.py:434] (5/8) About to get dev cuts
9
+ 2024-08-06 08:02:20,462 INFO [datamodule.py:292] (5/8) Disable SpecAugment
10
+ 2024-08-06 08:02:20,462 INFO [datamodule.py:294] (5/8) About to create train dataset
11
+ 2024-08-06 08:02:20,464 INFO [datamodule.py:323] (5/8) Using DynamicBucketingSampler
12
+ 2024-08-06 08:02:21,086 INFO [datamodule.py:344] (5/8) About to create train dataloader
13
+ 2024-08-06 08:02:21,086 INFO [datamodule.py:367] (5/8) About to create dev dataset
14
+ 2024-08-06 08:02:21,424 INFO [datamodule.py:388] (5/8) About to create dev dataloader
libritts-r/log/log-train-2024-08-06-08-02-16-6 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:02:16,468 INFO [trainer.py:870] (6/8) Training started
2
+ 2024-08-06 08:02:16,469 INFO [trainer.py:889] (6/8) Device: cuda:6
3
+ 2024-08-06 08:02:16,469 INFO [trainer.py:890] (6/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:02:16,469 INFO [trainer.py:892] (6/8) About to create model
5
+ 2024-08-06 08:02:17,495 INFO [trainer.py:899] (6/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:02:18,427 INFO [trainer.py:914] (6/8) Using DDP
7
+ 2024-08-06 08:02:20,443 INFO [datamodule.py:427] (6/8) About to get train cuts
8
+ 2024-08-06 08:02:20,455 INFO [datamodule.py:434] (6/8) About to get dev cuts
9
+ 2024-08-06 08:02:20,462 INFO [datamodule.py:292] (6/8) Disable SpecAugment
10
+ 2024-08-06 08:02:20,462 INFO [datamodule.py:294] (6/8) About to create train dataset
11
+ 2024-08-06 08:02:20,464 INFO [datamodule.py:323] (6/8) Using DynamicBucketingSampler
12
+ 2024-08-06 08:02:21,075 INFO [datamodule.py:344] (6/8) About to create train dataloader
13
+ 2024-08-06 08:02:21,076 INFO [datamodule.py:367] (6/8) About to create dev dataset
14
+ 2024-08-06 08:02:21,402 INFO [datamodule.py:388] (6/8) About to create dev dataloader
libritts-r/log/log-train-2024-08-06-08-02-16-7 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:02:16,467 INFO [trainer.py:870] (7/8) Training started
2
+ 2024-08-06 08:02:16,468 INFO [trainer.py:889] (7/8) Device: cuda:7
3
+ 2024-08-06 08:02:16,469 INFO [trainer.py:890] (7/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:02:16,469 INFO [trainer.py:892] (7/8) About to create model
5
+ 2024-08-06 08:02:17,493 INFO [trainer.py:899] (7/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:02:18,432 INFO [trainer.py:914] (7/8) Using DDP
7
+ 2024-08-06 08:02:20,442 INFO [datamodule.py:427] (7/8) About to get train cuts
8
+ 2024-08-06 08:02:20,455 INFO [datamodule.py:434] (7/8) About to get dev cuts
9
+ 2024-08-06 08:02:20,462 INFO [datamodule.py:292] (7/8) Disable SpecAugment
10
+ 2024-08-06 08:02:20,462 INFO [datamodule.py:294] (7/8) About to create train dataset
11
+ 2024-08-06 08:02:20,464 INFO [datamodule.py:323] (7/8) Using DynamicBucketingSampler
12
+ 2024-08-06 08:02:21,073 INFO [datamodule.py:344] (7/8) About to create train dataloader
13
+ 2024-08-06 08:02:21,074 INFO [datamodule.py:367] (7/8) About to create dev dataset
14
+ 2024-08-06 08:02:21,402 INFO [datamodule.py:388] (7/8) About to create dev dataloader
libritts-r/log/log-train-2024-08-06-08-03-57-0 ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:03:57,516 INFO [trainer.py:870] (0/8) Training started
2
+ 2024-08-06 08:03:57,521 INFO [trainer.py:889] (0/8) Device: cuda:0
3
+ 2024-08-06 08:03:57,521 INFO [trainer.py:890] (0/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:03:57,521 INFO [trainer.py:892] (0/8) About to create model
5
+ 2024-08-06 08:03:58,244 INFO [trainer.py:899] (0/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:03:59,498 INFO [trainer.py:914] (0/8) Using DDP
7
+ 2024-08-06 08:04:02,291 INFO [datamodule.py:427] (0/8) About to get train cuts
libritts-r/log/log-train-2024-08-06-08-03-57-1 ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:03:57,570 INFO [trainer.py:870] (1/8) Training started
2
+ 2024-08-06 08:03:57,571 INFO [trainer.py:889] (1/8) Device: cuda:1
3
+ 2024-08-06 08:03:57,571 INFO [trainer.py:890] (1/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:03:57,571 INFO [trainer.py:892] (1/8) About to create model
5
+ 2024-08-06 08:03:58,276 INFO [trainer.py:899] (1/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:03:59,916 INFO [trainer.py:914] (1/8) Using DDP
7
+ 2024-08-06 08:04:02,295 INFO [datamodule.py:427] (1/8) About to get train cuts
libritts-r/log/log-train-2024-08-06-08-03-57-2 ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:03:57,568 INFO [trainer.py:870] (2/8) Training started
2
+ 2024-08-06 08:03:57,569 INFO [trainer.py:889] (2/8) Device: cuda:2
3
+ 2024-08-06 08:03:57,569 INFO [trainer.py:890] (2/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:03:57,569 INFO [trainer.py:892] (2/8) About to create model
5
+ 2024-08-06 08:03:58,294 INFO [trainer.py:899] (2/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:03:59,503 INFO [trainer.py:914] (2/8) Using DDP
7
+ 2024-08-06 08:04:02,295 INFO [datamodule.py:427] (2/8) About to get train cuts
libritts-r/log/log-train-2024-08-06-08-03-57-3 ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:03:57,566 INFO [trainer.py:870] (3/8) Training started
2
+ 2024-08-06 08:03:57,567 INFO [trainer.py:889] (3/8) Device: cuda:3
3
+ 2024-08-06 08:03:57,567 INFO [trainer.py:890] (3/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:03:57,567 INFO [trainer.py:892] (3/8) About to create model
5
+ 2024-08-06 08:03:58,258 INFO [trainer.py:899] (3/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:03:59,929 INFO [trainer.py:914] (3/8) Using DDP
7
+ 2024-08-06 08:04:02,295 INFO [datamodule.py:427] (3/8) About to get train cuts
libritts-r/log/log-train-2024-08-06-08-03-57-4 ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:03:57,571 INFO [trainer.py:870] (4/8) Training started
2
+ 2024-08-06 08:03:57,572 INFO [trainer.py:889] (4/8) Device: cuda:4
3
+ 2024-08-06 08:03:57,572 INFO [trainer.py:890] (4/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:03:57,573 INFO [trainer.py:892] (4/8) About to create model
5
+ 2024-08-06 08:03:58,346 INFO [trainer.py:899] (4/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:03:59,592 INFO [trainer.py:914] (4/8) Using DDP
7
+ 2024-08-06 08:04:02,295 INFO [datamodule.py:427] (4/8) About to get train cuts
libritts-r/log/log-train-2024-08-06-08-03-57-5 ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:03:57,572 INFO [trainer.py:870] (5/8) Training started
2
+ 2024-08-06 08:03:57,573 INFO [trainer.py:889] (5/8) Device: cuda:5
3
+ 2024-08-06 08:03:57,573 INFO [trainer.py:890] (5/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:03:57,573 INFO [trainer.py:892] (5/8) About to create model
5
+ 2024-08-06 08:03:58,346 INFO [trainer.py:899] (5/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:03:59,588 INFO [trainer.py:914] (5/8) Using DDP
7
+ 2024-08-06 08:04:02,295 INFO [datamodule.py:427] (5/8) About to get train cuts
libritts-r/log/log-train-2024-08-06-08-03-57-6 ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:03:57,572 INFO [trainer.py:870] (6/8) Training started
2
+ 2024-08-06 08:03:57,573 INFO [trainer.py:889] (6/8) Device: cuda:6
3
+ 2024-08-06 08:03:57,573 INFO [trainer.py:890] (6/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:03:57,573 INFO [trainer.py:892] (6/8) About to create model
5
+ 2024-08-06 08:03:58,337 INFO [trainer.py:899] (6/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:03:59,524 INFO [trainer.py:914] (6/8) Using DDP
7
+ 2024-08-06 08:04:02,293 INFO [datamodule.py:427] (6/8) About to get train cuts
libritts-r/log/log-train-2024-08-06-08-03-57-7 ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:03:57,571 INFO [trainer.py:870] (7/8) Training started
2
+ 2024-08-06 08:03:57,572 INFO [trainer.py:889] (7/8) Device: cuda:7
3
+ 2024-08-06 08:03:57,572 INFO [trainer.py:890] (7/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:03:57,572 INFO [trainer.py:892] (7/8) About to create model
5
+ 2024-08-06 08:03:58,300 INFO [trainer.py:899] (7/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:03:59,957 INFO [trainer.py:914] (7/8) Using DDP
7
+ 2024-08-06 08:04:02,295 INFO [datamodule.py:427] (7/8) About to get train cuts
libritts-r/log/log-train-2024-08-06-08-06-14-0 ADDED
@@ -0,0 +1,357 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:06:14,316 INFO [trainer.py:870] (0/8) Training started
2
+ 2024-08-06 08:06:14,320 INFO [trainer.py:889] (0/8) Device: cuda:0
3
+ 2024-08-06 08:06:14,320 INFO [trainer.py:890] (0/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:06:14,320 INFO [trainer.py:892] (0/8) About to create model
5
+ 2024-08-06 08:06:15,058 INFO [trainer.py:899] (0/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:06:16,197 INFO [trainer.py:914] (0/8) Using DDP
7
+ 2024-08-06 08:06:19,148 INFO [datamodule.py:427] (0/8) About to get train cuts
8
+ 2024-08-06 08:06:19,149 INFO [datamodule.py:434] (0/8) About to get dev cuts
9
+ 2024-08-06 08:06:19,151 INFO [datamodule.py:292] (0/8) Disable SpecAugment
10
+ 2024-08-06 08:06:19,151 INFO [datamodule.py:294] (0/8) About to create train dataset
11
+ 2024-08-06 08:06:19,152 INFO [datamodule.py:323] (0/8) Using DynamicBucketingSampler
12
+ 2024-08-06 08:06:19,772 INFO [datamodule.py:344] (0/8) About to create train dataloader
13
+ 2024-08-06 08:06:19,772 INFO [datamodule.py:367] (0/8) About to create dev dataset
14
+ 2024-08-06 08:06:20,101 INFO [datamodule.py:388] (0/8) About to create dev dataloader
15
+ 2024-08-06 08:08:02,122 INFO [trainer.py:765] (0/8) Epoch 1, batch 100, train_loss[loss=4.278, ArTop10Accuracy=0.5092, over 14148.00 frames. ], tot_loss[loss=5.044, ArTop10Accuracy=0.3756, over 4763.57 frames. ], batch size: 62, lr: 2.25e-02
16
+ 2024-08-06 08:09:28,829 INFO [trainer.py:765] (0/8) Epoch 1, batch 200, train_loss[loss=4.012, ArTop10Accuracy=0.5509, over 13728.00 frames. ], tot_loss[loss=4.485, ArTop10Accuracy=0.4688, over 7742.94 frames. ], batch size: 34, lr: 3.00e-02
17
+ 2024-08-06 08:10:52,430 INFO [trainer.py:765] (0/8) Epoch 1, batch 300, train_loss[loss=3.906, ArTop10Accuracy=0.5625, over 14160.00 frames. ], tot_loss[loss=4.212, ArTop10Accuracy=0.5139, over 9378.07 frames. ], batch size: 44, lr: 3.00e-02
18
+ 2024-08-06 08:12:12,698 INFO [trainer.py:765] (0/8) Epoch 1, batch 400, train_loss[loss=3.687, ArTop10Accuracy=0.6066, over 10701.00 frames. ], tot_loss[loss=4.03, ArTop10Accuracy=0.5447, over 10279.72 frames. ], batch size: 15, lr: 3.00e-02
19
+ 2024-08-06 08:13:40,049 INFO [trainer.py:765] (0/8) Epoch 1, batch 500, train_loss[loss=3.62, ArTop10Accuracy=0.6184, over 12078.00 frames. ], tot_loss[loss=3.882, ArTop10Accuracy=0.5706, over 10862.69 frames. ], batch size: 22, lr: 2.99e-02
20
+ 2024-08-06 08:15:00,242 INFO [trainer.py:765] (0/8) Epoch 1, batch 600, train_loss[loss=3.472, ArTop10Accuracy=0.6423, over 11520.00 frames. ], tot_loss[loss=3.767, ArTop10Accuracy=0.5908, over 11393.62 frames. ], batch size: 18, lr: 2.99e-02
21
+ 2024-08-06 08:16:26,423 INFO [trainer.py:765] (0/8) Epoch 1, batch 700, train_loss[loss=3.358, ArTop10Accuracy=0.672, over 10137.00 frames. ], tot_loss[loss=3.689, ArTop10Accuracy=0.6048, over 11532.89 frames. ], batch size: 12, lr: 2.99e-02
22
+ 2024-08-06 08:17:43,016 INFO [trainer.py:765] (0/8) Epoch 1, batch 800, train_loss[loss=3.327, ArTop10Accuracy=0.6665, over 10185.00 frames. ], tot_loss[loss=3.625, ArTop10Accuracy=0.6163, over 11642.41 frames. ], batch size: 12, lr: 2.98e-02
23
+ 2024-08-06 08:18:56,150 INFO [trainer.py:765] (0/8) Epoch 1, batch 900, train_loss[loss=3.441, ArTop10Accuracy=0.6465, over 12888.00 frames. ], tot_loss[loss=3.567, ArTop10Accuracy=0.6273, over 11696.36 frames. ], batch size: 27, lr: 2.98e-02
24
+ 2024-08-06 08:20:12,861 INFO [trainer.py:765] (0/8) Epoch 1, batch 1000, train_loss[loss=3.339, ArTop10Accuracy=0.6726, over 13041.00 frames. ], tot_loss[loss=3.523, ArTop10Accuracy=0.6352, over 11871.89 frames. ], batch size: 27, lr: 2.97e-02
25
+ 2024-08-06 08:20:13,538 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 9.300e+01 1.871e+02 2.675e+02 4.030e+02 9.119e+03, threshold=5.351e+02, percent-clipped=0.0
26
+ 2024-08-06 08:21:29,154 INFO [trainer.py:765] (0/8) Epoch 1, batch 1100, train_loss[loss=3.447, ArTop10Accuracy=0.6526, over 13650.00 frames. ], tot_loss[loss=3.488, ArTop10Accuracy=0.6416, over 11935.79 frames. ], batch size: 34, lr: 2.96e-02
27
+ 2024-08-06 08:22:45,413 INFO [trainer.py:765] (0/8) Epoch 1, batch 1200, train_loss[loss=3.399, ArTop10Accuracy=0.6582, over 12078.00 frames. ], tot_loss[loss=3.456, ArTop10Accuracy=0.6476, over 11841.59 frames. ], batch size: 101, lr: 2.96e-02
28
+ 2024-08-06 08:23:45,268 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
29
+ 2024-08-06 08:23:45,272 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-1.pt
30
+ 2024-08-06 08:25:36,238 INFO [trainer.py:765] (0/8) Epoch 2, batch 100, train_loss[loss=3.464, ArTop10Accuracy=0.6425, over 14586.00 frames. ], tot_loss[loss=3.422, ArTop10Accuracy=0.6529, over 4763.48 frames. ], batch size: 62, lr: 2.90e-02
31
+ 2024-08-06 08:26:58,957 INFO [trainer.py:765] (0/8) Epoch 2, batch 200, train_loss[loss=3.368, ArTop10Accuracy=0.6675, over 13398.00 frames. ], tot_loss[loss=3.39, ArTop10Accuracy=0.6587, over 7742.95 frames. ], batch size: 34, lr: 2.89e-02
32
+ 2024-08-06 08:28:25,534 INFO [trainer.py:765] (0/8) Epoch 2, batch 300, train_loss[loss=3.316, ArTop10Accuracy=0.6735, over 14268.00 frames. ], tot_loss[loss=3.369, ArTop10Accuracy=0.6629, over 9373.85 frames. ], batch size: 44, lr: 2.89e-02
33
+ 2024-08-06 08:29:48,638 INFO [trainer.py:765] (0/8) Epoch 2, batch 400, train_loss[loss=3.324, ArTop10Accuracy=0.6711, over 10158.00 frames. ], tot_loss[loss=3.357, ArTop10Accuracy=0.6656, over 10297.01 frames. ], batch size: 14, lr: 2.88e-02
34
+ 2024-08-06 08:31:22,899 INFO [trainer.py:765] (0/8) Epoch 2, batch 500, train_loss[loss=3.281, ArTop10Accuracy=0.6779, over 12660.00 frames. ], tot_loss[loss=3.346, ArTop10Accuracy=0.6677, over 10864.71 frames. ], batch size: 23, lr: 2.87e-02
35
+ 2024-08-06 08:32:45,689 INFO [trainer.py:765] (0/8) Epoch 2, batch 600, train_loss[loss=3.325, ArTop10Accuracy=0.6746, over 11358.00 frames. ], tot_loss[loss=3.334, ArTop10Accuracy=0.6699, over 11387.56 frames. ], batch size: 18, lr: 2.86e-02
36
+ 2024-08-06 08:34:13,583 INFO [trainer.py:765] (0/8) Epoch 2, batch 700, train_loss[loss=3.228, ArTop10Accuracy=0.6954, over 10290.00 frames. ], tot_loss[loss=3.325, ArTop10Accuracy=0.6716, over 11532.16 frames. ], batch size: 12, lr: 2.85e-02
37
+ 2024-08-06 08:34:31,174 INFO [trainer.py:803] (0/8) Computing validation loss
38
+ 2024-08-06 08:34:40,887 INFO [trainer.py:811] (0/8) Epoch 2, validation: loss=3.277, ArTop10Accuracy=0.6803, over 1827537.00 frames.
39
+ 2024-08-06 08:34:40,888 INFO [trainer.py:814] (0/8) Maximum memory allocated so far is 28892MB
40
+ 2024-08-06 08:34:41,700 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 7.953e+01 1.592e+02 2.200e+02 3.344e+02 2.949e+03, threshold=4.400e+02, percent-clipped=8.6
41
+ 2024-08-06 08:35:39,879 INFO [trainer.py:765] (0/8) Epoch 2, batch 800, train_loss[loss=3.349, ArTop10Accuracy=0.6662, over 10188.00 frames. ], tot_loss[loss=3.321, ArTop10Accuracy=0.6725, over 11635.67 frames. ], batch size: 12, lr: 2.84e-02
42
+ 2024-08-06 08:36:56,371 INFO [trainer.py:765] (0/8) Epoch 2, batch 900, train_loss[loss=3.325, ArTop10Accuracy=0.6663, over 12789.00 frames. ], tot_loss[loss=3.308, ArTop10Accuracy=0.6752, over 11673.27 frames. ], batch size: 27, lr: 2.83e-02
43
+ 2024-08-06 08:38:10,512 INFO [trainer.py:765] (0/8) Epoch 2, batch 1000, train_loss[loss=3.315, ArTop10Accuracy=0.6742, over 12846.00 frames. ], tot_loss[loss=3.298, ArTop10Accuracy=0.677, over 11874.71 frames. ], batch size: 27, lr: 2.82e-02
44
+ 2024-08-06 08:39:25,060 INFO [trainer.py:765] (0/8) Epoch 2, batch 1100, train_loss[loss=3.25, ArTop10Accuracy=0.6829, over 13677.00 frames. ], tot_loss[loss=3.29, ArTop10Accuracy=0.6784, over 11958.14 frames. ], batch size: 34, lr: 2.81e-02
45
+ 2024-08-06 08:40:38,220 INFO [trainer.py:765] (0/8) Epoch 2, batch 1200, train_loss[loss=3.32, ArTop10Accuracy=0.6759, over 12444.00 frames. ], tot_loss[loss=3.281, ArTop10Accuracy=0.6801, over 11872.51 frames. ], batch size: 103, lr: 2.80e-02
46
+ 2024-08-06 08:41:38,460 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
47
+ 2024-08-06 08:41:38,463 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-2.pt
48
+ 2024-08-06 08:43:36,651 INFO [trainer.py:765] (0/8) Epoch 3, batch 100, train_loss[loss=3.221, ArTop10Accuracy=0.6956, over 14391.00 frames. ], tot_loss[loss=3.251, ArTop10Accuracy=0.6852, over 4763.04 frames. ], batch size: 63, lr: 2.67e-02
49
+ 2024-08-06 08:45:10,502 INFO [trainer.py:765] (0/8) Epoch 3, batch 200, train_loss[loss=3.149, ArTop10Accuracy=0.7102, over 13716.00 frames. ], tot_loss[loss=3.222, ArTop10Accuracy=0.6905, over 7746.96 frames. ], batch size: 34, lr: 2.66e-02
50
+ 2024-08-06 08:46:29,258 INFO [trainer.py:765] (0/8) Epoch 3, batch 300, train_loss[loss=3.237, ArTop10Accuracy=0.6852, over 14136.00 frames. ], tot_loss[loss=3.205, ArTop10Accuracy=0.6942, over 9365.48 frames. ], batch size: 44, lr: 2.64e-02
51
+ 2024-08-06 08:48:04,219 INFO [trainer.py:765] (0/8) Epoch 3, batch 400, train_loss[loss=3.123, ArTop10Accuracy=0.7122, over 10929.00 frames. ], tot_loss[loss=3.19, ArTop10Accuracy=0.6973, over 10272.52 frames. ], batch size: 15, lr: 2.63e-02
52
+ 2024-08-06 08:48:40,881 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 9.282e+01 1.561e+02 1.981e+02 2.686e+02 1.768e+03, threshold=3.962e+02, percent-clipped=7.6
53
+ 2024-08-06 08:49:25,542 INFO [trainer.py:765] (0/8) Epoch 3, batch 500, train_loss[loss=3.102, ArTop10Accuracy=0.715, over 12600.00 frames. ], tot_loss[loss=3.169, ArTop10Accuracy=0.7016, over 10831.61 frames. ], batch size: 23, lr: 2.62e-02
54
+ 2024-08-06 08:51:00,477 INFO [trainer.py:765] (0/8) Epoch 3, batch 600, train_loss[loss=3.068, ArTop10Accuracy=0.7223, over 11331.00 frames. ], tot_loss[loss=3.154, ArTop10Accuracy=0.7042, over 11373.76 frames. ], batch size: 18, lr: 2.61e-02
55
+ 2024-08-06 08:52:31,618 INFO [trainer.py:765] (0/8) Epoch 3, batch 700, train_loss[loss=3.132, ArTop10Accuracy=0.7058, over 9480.00 frames. ], tot_loss[loss=3.145, ArTop10Accuracy=0.7061, over 11509.58 frames. ], batch size: 11, lr: 2.60e-02
56
+ 2024-08-06 08:53:57,388 INFO [trainer.py:765] (0/8) Epoch 3, batch 800, train_loss[loss=3.117, ArTop10Accuracy=0.7134, over 9261.00 frames. ], tot_loss[loss=3.138, ArTop10Accuracy=0.7073, over 11647.27 frames. ], batch size: 11, lr: 2.59e-02
57
+ 2024-08-06 08:55:15,118 INFO [trainer.py:765] (0/8) Epoch 3, batch 900, train_loss[loss=3.036, ArTop10Accuracy=0.7285, over 12813.00 frames. ], tot_loss[loss=3.12, ArTop10Accuracy=0.7107, over 11687.90 frames. ], batch size: 27, lr: 2.57e-02
58
+ 2024-08-06 08:56:31,558 INFO [trainer.py:765] (0/8) Epoch 3, batch 1000, train_loss[loss=3.046, ArTop10Accuracy=0.725, over 13272.00 frames. ], tot_loss[loss=3.111, ArTop10Accuracy=0.7124, over 11871.56 frames. ], batch size: 28, lr: 2.56e-02
59
+ 2024-08-06 08:57:46,506 INFO [trainer.py:765] (0/8) Epoch 3, batch 1100, train_loss[loss=2.998, ArTop10Accuracy=0.7314, over 13731.00 frames. ], tot_loss[loss=3.104, ArTop10Accuracy=0.7135, over 11943.01 frames. ], batch size: 34, lr: 2.55e-02
60
+ 2024-08-06 08:59:01,400 INFO [trainer.py:765] (0/8) Epoch 3, batch 1200, train_loss[loss=3.119, ArTop10Accuracy=0.7097, over 12366.00 frames. ], tot_loss[loss=3.097, ArTop10Accuracy=0.7148, over 11874.89 frames. ], batch size: 101, lr: 2.54e-02
61
+ 2024-08-06 09:00:02,053 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
62
+ 2024-08-06 09:00:02,056 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-3.pt
63
+ 2024-08-06 09:01:50,742 INFO [trainer.py:765] (0/8) Epoch 4, batch 100, train_loss[loss=3.096, ArTop10Accuracy=0.7157, over 14289.00 frames. ], tot_loss[loss=3.065, ArTop10Accuracy=0.72, over 4767.49 frames. ], batch size: 63, lr: 2.38e-02
64
+ 2024-08-06 09:02:52,858 INFO [trainer.py:803] (0/8) Computing validation loss
65
+ 2024-08-06 09:03:02,384 INFO [trainer.py:811] (0/8) Epoch 4, validation: loss=2.997, ArTop10Accuracy=0.7338, over 1827537.00 frames.
66
+ 2024-08-06 09:03:02,384 INFO [trainer.py:814] (0/8) Maximum memory allocated so far is 29208MB
67
+ 2024-08-06 09:03:03,364 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.499e+02 1.782e+02 2.273e+02 1.100e+03, threshold=3.565e+02, percent-clipped=4.7
68
+ 2024-08-06 09:03:29,274 INFO [trainer.py:765] (0/8) Epoch 4, batch 200, train_loss[loss=3.021, ArTop10Accuracy=0.7286, over 13689.00 frames. ], tot_loss[loss=3.042, ArTop10Accuracy=0.7243, over 7737.91 frames. ], batch size: 34, lr: 2.37e-02
69
+ 2024-08-06 09:05:01,733 INFO [trainer.py:765] (0/8) Epoch 4, batch 300, train_loss[loss=3.066, ArTop10Accuracy=0.7201, over 14166.00 frames. ], tot_loss[loss=3.037, ArTop10Accuracy=0.7256, over 9341.46 frames. ], batch size: 45, lr: 2.36e-02
70
+ 2024-08-06 09:06:28,148 INFO [trainer.py:765] (0/8) Epoch 4, batch 400, train_loss[loss=2.954, ArTop10Accuracy=0.7482, over 11040.00 frames. ], tot_loss[loss=3.034, ArTop10Accuracy=0.7265, over 10259.58 frames. ], batch size: 15, lr: 2.34e-02
71
+ 2024-08-06 09:08:01,927 INFO [trainer.py:765] (0/8) Epoch 4, batch 500, train_loss[loss=2.944, ArTop10Accuracy=0.742, over 12204.00 frames. ], tot_loss[loss=3.023, ArTop10Accuracy=0.7286, over 10800.93 frames. ], batch size: 22, lr: 2.33e-02
72
+ 2024-08-06 09:09:28,540 INFO [trainer.py:765] (0/8) Epoch 4, batch 600, train_loss[loss=2.976, ArTop10Accuracy=0.7362, over 12120.00 frames. ], tot_loss[loss=3.02, ArTop10Accuracy=0.7291, over 11356.71 frames. ], batch size: 19, lr: 2.32e-02
73
+ 2024-08-06 09:10:59,865 INFO [trainer.py:765] (0/8) Epoch 4, batch 700, train_loss[loss=2.918, ArTop10Accuracy=0.7454, over 10062.00 frames. ], tot_loss[loss=3.021, ArTop10Accuracy=0.7289, over 11509.51 frames. ], batch size: 12, lr: 2.31e-02
74
+ 2024-08-06 09:12:17,513 INFO [trainer.py:765] (0/8) Epoch 4, batch 800, train_loss[loss=2.966, ArTop10Accuracy=0.7337, over 9609.00 frames. ], tot_loss[loss=3.023, ArTop10Accuracy=0.7287, over 11636.06 frames. ], batch size: 11, lr: 2.30e-02
75
+ 2024-08-06 09:13:33,212 INFO [trainer.py:765] (0/8) Epoch 4, batch 900, train_loss[loss=3.013, ArTop10Accuracy=0.7299, over 12831.00 frames. ], tot_loss[loss=3.013, ArTop10Accuracy=0.7305, over 11687.30 frames. ], batch size: 27, lr: 2.29e-02
76
+ 2024-08-06 09:14:47,520 INFO [trainer.py:765] (0/8) Epoch 4, batch 1000, train_loss[loss=3.014, ArTop10Accuracy=0.7309, over 13050.00 frames. ], tot_loss[loss=3.013, ArTop10Accuracy=0.7305, over 11894.91 frames. ], batch size: 27, lr: 2.28e-02
77
+ 2024-08-06 09:16:02,982 INFO [trainer.py:765] (0/8) Epoch 4, batch 1100, train_loss[loss=3.08, ArTop10Accuracy=0.7138, over 13473.00 frames. ], tot_loss[loss=3.013, ArTop10Accuracy=0.7305, over 11959.77 frames. ], batch size: 34, lr: 2.26e-02
78
+ 2024-08-06 09:16:53,291 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.440e+02 1.636e+02 1.968e+02 7.702e+02, threshold=3.273e+02, percent-clipped=1.3
79
+ 2024-08-06 09:17:18,344 INFO [trainer.py:765] (0/8) Epoch 4, batch 1200, train_loss[loss=3.102, ArTop10Accuracy=0.7123, over 12105.00 frames. ], tot_loss[loss=3.012, ArTop10Accuracy=0.7306, over 11874.25 frames. ], batch size: 101, lr: 2.25e-02
80
+ 2024-08-06 09:18:17,203 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
81
+ 2024-08-06 09:18:17,206 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-4.pt
82
+ 2024-08-06 09:20:17,173 INFO [trainer.py:765] (0/8) Epoch 5, batch 100, train_loss[loss=2.964, ArTop10Accuracy=0.7386, over 14166.00 frames. ], tot_loss[loss=2.993, ArTop10Accuracy=0.7338, over 4763.83 frames. ], batch size: 62, lr: 2.10e-02
83
+ 2024-08-06 09:21:52,291 INFO [trainer.py:765] (0/8) Epoch 5, batch 200, train_loss[loss=2.948, ArTop10Accuracy=0.742, over 13764.00 frames. ], tot_loss[loss=2.985, ArTop10Accuracy=0.7353, over 7747.26 frames. ], batch size: 34, lr: 2.09e-02
84
+ 2024-08-06 09:23:19,241 INFO [trainer.py:765] (0/8) Epoch 5, batch 300, train_loss[loss=2.968, ArTop10Accuracy=0.7409, over 14202.00 frames. ], tot_loss[loss=2.975, ArTop10Accuracy=0.7374, over 9374.58 frames. ], batch size: 44, lr: 2.08e-02
85
+ 2024-08-06 09:24:53,537 INFO [trainer.py:765] (0/8) Epoch 5, batch 400, train_loss[loss=2.865, ArTop10Accuracy=0.759, over 10353.00 frames. ], tot_loss[loss=2.971, ArTop10Accuracy=0.7383, over 10278.00 frames. ], batch size: 14, lr: 2.07e-02
86
+ 2024-08-06 09:26:19,418 INFO [trainer.py:765] (0/8) Epoch 5, batch 500, train_loss[loss=2.904, ArTop10Accuracy=0.7532, over 12828.00 frames. ], tot_loss[loss=2.963, ArTop10Accuracy=0.7399, over 10863.61 frames. ], batch size: 23, lr: 2.06e-02
87
+ 2024-08-06 09:27:49,537 INFO [trainer.py:765] (0/8) Epoch 5, batch 600, train_loss[loss=2.948, ArTop10Accuracy=0.7446, over 11325.00 frames. ], tot_loss[loss=2.967, ArTop10Accuracy=0.7389, over 11367.62 frames. ], batch size: 18, lr: 2.05e-02
88
+ 2024-08-06 09:29:21,670 INFO [trainer.py:765] (0/8) Epoch 5, batch 700, train_loss[loss=2.924, ArTop10Accuracy=0.7441, over 10278.00 frames. ], tot_loss[loss=2.968, ArTop10Accuracy=0.7386, over 11525.90 frames. ], batch size: 12, lr: 2.04e-02
89
+ 2024-08-06 09:30:44,693 INFO [trainer.py:765] (0/8) Epoch 5, batch 800, train_loss[loss=2.783, ArTop10Accuracy=0.78, over 10116.00 frames. ], tot_loss[loss=2.969, ArTop10Accuracy=0.7385, over 11608.55 frames. ], batch size: 12, lr: 2.03e-02
90
+ 2024-08-06 09:31:51,239 INFO [trainer.py:803] (0/8) Computing validation loss
91
+ 2024-08-06 09:32:00,760 INFO [trainer.py:811] (0/8) Epoch 5, validation: loss=2.926, ArTop10Accuracy=0.7466, over 1827537.00 frames.
92
+ 2024-08-06 09:32:00,761 INFO [trainer.py:814] (0/8) Maximum memory allocated so far is 29301MB
93
+ 2024-08-06 09:32:01,710 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.349e+02 1.525e+02 1.806e+02 1.007e+03, threshold=3.049e+02, percent-clipped=2.3
94
+ 2024-08-06 09:32:10,554 INFO [trainer.py:765] (0/8) Epoch 5, batch 900, train_loss[loss=2.973, ArTop10Accuracy=0.7398, over 12834.00 frames. ], tot_loss[loss=2.962, ArTop10Accuracy=0.74, over 11677.29 frames. ], batch size: 27, lr: 2.02e-02
95
+ 2024-08-06 09:33:27,322 INFO [trainer.py:765] (0/8) Epoch 5, batch 1000, train_loss[loss=2.965, ArTop10Accuracy=0.7397, over 12891.00 frames. ], tot_loss[loss=2.961, ArTop10Accuracy=0.7404, over 11884.96 frames. ], batch size: 27, lr: 2.01e-02
96
+ 2024-08-06 09:34:42,300 INFO [trainer.py:765] (0/8) Epoch 5, batch 1100, train_loss[loss=2.896, ArTop10Accuracy=0.7578, over 13686.00 frames. ], tot_loss[loss=2.963, ArTop10Accuracy=0.7399, over 11934.83 frames. ], batch size: 34, lr: 2.00e-02
97
+ 2024-08-06 09:35:56,332 INFO [trainer.py:765] (0/8) Epoch 5, batch 1200, train_loss[loss=3.075, ArTop10Accuracy=0.7164, over 12558.00 frames. ], tot_loss[loss=2.96, ArTop10Accuracy=0.7403, over 11853.83 frames. ], batch size: 101, lr: 1.99e-02
98
+ 2024-08-06 09:36:54,969 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
99
+ 2024-08-06 09:36:54,973 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-5.pt
100
+ 2024-08-06 09:38:52,662 INFO [trainer.py:765] (0/8) Epoch 6, batch 100, train_loss[loss=2.91, ArTop10Accuracy=0.7488, over 14733.00 frames. ], tot_loss[loss=2.956, ArTop10Accuracy=0.7406, over 4774.76 frames. ], batch size: 62, lr: 1.85e-02
101
+ 2024-08-06 09:40:19,834 INFO [trainer.py:765] (0/8) Epoch 6, batch 200, train_loss[loss=2.939, ArTop10Accuracy=0.7433, over 13842.00 frames. ], tot_loss[loss=2.937, ArTop10Accuracy=0.7444, over 7770.99 frames. ], batch size: 34, lr: 1.84e-02
102
+ 2024-08-06 09:41:52,967 INFO [trainer.py:765] (0/8) Epoch 6, batch 300, train_loss[loss=2.942, ArTop10Accuracy=0.7465, over 14082.00 frames. ], tot_loss[loss=2.928, ArTop10Accuracy=0.7464, over 9388.00 frames. ], batch size: 44, lr: 1.83e-02
103
+ 2024-08-06 09:43:17,829 INFO [trainer.py:765] (0/8) Epoch 6, batch 400, train_loss[loss=2.955, ArTop10Accuracy=0.7367, over 10458.00 frames. ], tot_loss[loss=2.924, ArTop10Accuracy=0.7473, over 10284.90 frames. ], batch size: 14, lr: 1.83e-02
104
+ 2024-08-06 09:44:54,130 INFO [trainer.py:765] (0/8) Epoch 6, batch 500, train_loss[loss=2.873, ArTop10Accuracy=0.7609, over 12111.00 frames. ], tot_loss[loss=2.92, ArTop10Accuracy=0.7479, over 10842.93 frames. ], batch size: 22, lr: 1.82e-02
105
+ 2024-08-06 09:46:22,873 INFO [trainer.py:765] (0/8) Epoch 6, batch 600, train_loss[loss=2.85, ArTop10Accuracy=0.7588, over 11493.00 frames. ], tot_loss[loss=2.919, ArTop10Accuracy=0.7482, over 11346.39 frames. ], batch size: 18, lr: 1.81e-02
106
+ 2024-08-06 09:46:37,217 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.012e+02 1.339e+02 1.480e+02 1.701e+02 7.506e+02, threshold=2.959e+02, percent-clipped=1.1
107
+ 2024-08-06 09:47:57,871 INFO [trainer.py:765] (0/8) Epoch 6, batch 700, train_loss[loss=2.809, ArTop10Accuracy=0.7689, over 9942.00 frames. ], tot_loss[loss=2.924, ArTop10Accuracy=0.7472, over 11515.67 frames. ], batch size: 12, lr: 1.80e-02
108
+ 2024-08-06 09:49:15,955 INFO [trainer.py:765] (0/8) Epoch 6, batch 800, train_loss[loss=2.98, ArTop10Accuracy=0.7345, over 10128.00 frames. ], tot_loss[loss=2.93, ArTop10Accuracy=0.7461, over 11608.70 frames. ], batch size: 12, lr: 1.79e-02
109
+ 2024-08-06 09:50:32,135 INFO [trainer.py:765] (0/8) Epoch 6, batch 900, train_loss[loss=2.967, ArTop10Accuracy=0.7375, over 12921.00 frames. ], tot_loss[loss=2.921, ArTop10Accuracy=0.7477, over 11657.07 frames. ], batch size: 27, lr: 1.78e-02
110
+ 2024-08-06 09:51:47,297 INFO [trainer.py:765] (0/8) Epoch 6, batch 1000, train_loss[loss=2.928, ArTop10Accuracy=0.7449, over 12840.00 frames. ], tot_loss[loss=2.926, ArTop10Accuracy=0.747, over 11871.01 frames. ], batch size: 27, lr: 1.77e-02
111
+ 2024-08-06 09:53:00,921 INFO [trainer.py:765] (0/8) Epoch 6, batch 1100, train_loss[loss=2.869, ArTop10Accuracy=0.7588, over 13548.00 frames. ], tot_loss[loss=2.93, ArTop10Accuracy=0.7462, over 11945.67 frames. ], batch size: 34, lr: 1.77e-02
112
+ 2024-08-06 09:54:14,336 INFO [trainer.py:765] (0/8) Epoch 6, batch 1200, train_loss[loss=3.067, ArTop10Accuracy=0.7191, over 11925.00 frames. ], tot_loss[loss=2.927, ArTop10Accuracy=0.7467, over 11849.90 frames. ], batch size: 101, lr: 1.76e-02
113
+ 2024-08-06 09:55:13,161 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
114
+ 2024-08-06 09:55:13,166 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-6.pt
115
+ 2024-08-06 09:57:06,699 INFO [trainer.py:765] (0/8) Epoch 7, batch 100, train_loss[loss=3.021, ArTop10Accuracy=0.7334, over 14799.00 frames. ], tot_loss[loss=2.908, ArTop10Accuracy=0.7499, over 4751.91 frames. ], batch size: 62, lr: 1.64e-02
116
+ 2024-08-06 09:58:39,426 INFO [trainer.py:765] (0/8) Epoch 7, batch 200, train_loss[loss=2.914, ArTop10Accuracy=0.747, over 13647.00 frames. ], tot_loss[loss=2.895, ArTop10Accuracy=0.7525, over 7761.94 frames. ], batch size: 34, lr: 1.64e-02
117
+ 2024-08-06 10:00:06,082 INFO [trainer.py:765] (0/8) Epoch 7, batch 300, train_loss[loss=2.962, ArTop10Accuracy=0.7351, over 14187.00 frames. ], tot_loss[loss=2.891, ArTop10Accuracy=0.7533, over 9377.67 frames. ], batch size: 44, lr: 1.63e-02
118
+ 2024-08-06 10:00:40,508 INFO [trainer.py:803] (0/8) Computing validation loss
119
+ 2024-08-06 10:00:50,245 INFO [trainer.py:811] (0/8) Epoch 7, validation: loss=2.88, ArTop10Accuracy=0.7554, over 1827537.00 frames.
120
+ 2024-08-06 10:00:50,246 INFO [trainer.py:814] (0/8) Maximum memory allocated so far is 29301MB
121
+ 2024-08-06 10:00:50,977 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.002e+02 1.286e+02 1.429e+02 1.605e+02 1.020e+03, threshold=2.857e+02, percent-clipped=1.5
122
+ 2024-08-06 10:01:49,117 INFO [trainer.py:765] (0/8) Epoch 7, batch 400, train_loss[loss=2.834, ArTop10Accuracy=0.7614, over 10248.00 frames. ], tot_loss[loss=2.894, ArTop10Accuracy=0.7525, over 10301.33 frames. ], batch size: 14, lr: 1.62e-02
123
+ 2024-08-06 10:03:21,458 INFO [trainer.py:765] (0/8) Epoch 7, batch 500, train_loss[loss=2.807, ArTop10Accuracy=0.7693, over 12213.00 frames. ], tot_loss[loss=2.891, ArTop10Accuracy=0.7533, over 10851.85 frames. ], batch size: 22, lr: 1.61e-02
124
+ 2024-08-06 10:04:51,882 INFO [trainer.py:765] (0/8) Epoch 7, batch 600, train_loss[loss=2.822, ArTop10Accuracy=0.7734, over 11343.00 frames. ], tot_loss[loss=2.892, ArTop10Accuracy=0.7531, over 11367.18 frames. ], batch size: 18, lr: 1.61e-02
125
+ 2024-08-06 10:06:25,111 INFO [trainer.py:765] (0/8) Epoch 7, batch 700, train_loss[loss=2.937, ArTop10Accuracy=0.7484, over 10293.00 frames. ], tot_loss[loss=2.898, ArTop10Accuracy=0.7521, over 11508.82 frames. ], batch size: 12, lr: 1.60e-02
126
+ 2024-08-06 10:07:46,950 INFO [trainer.py:765] (0/8) Epoch 7, batch 800, train_loss[loss=2.873, ArTop10Accuracy=0.7592, over 10146.00 frames. ], tot_loss[loss=2.898, ArTop10Accuracy=0.7523, over 11629.16 frames. ], batch size: 12, lr: 1.59e-02
127
+ 2024-08-06 10:09:02,824 INFO [trainer.py:765] (0/8) Epoch 7, batch 900, train_loss[loss=2.971, ArTop10Accuracy=0.7367, over 12774.00 frames. ], tot_loss[loss=2.89, ArTop10Accuracy=0.7537, over 11679.75 frames. ], batch size: 27, lr: 1.59e-02
128
+ 2024-08-06 10:10:19,635 INFO [trainer.py:765] (0/8) Epoch 7, batch 1000, train_loss[loss=2.904, ArTop10Accuracy=0.7487, over 12768.00 frames. ], tot_loss[loss=2.896, ArTop10Accuracy=0.7523, over 11877.49 frames. ], batch size: 27, lr: 1.58e-02
129
+ 2024-08-06 10:11:35,207 INFO [trainer.py:765] (0/8) Epoch 7, batch 1100, train_loss[loss=2.97, ArTop10Accuracy=0.7417, over 13638.00 frames. ], tot_loss[loss=2.902, ArTop10Accuracy=0.7512, over 11956.00 frames. ], batch size: 34, lr: 1.57e-02
130
+ 2024-08-06 10:12:48,204 INFO [trainer.py:765] (0/8) Epoch 7, batch 1200, train_loss[loss=3.043, ArTop10Accuracy=0.7302, over 12201.00 frames. ], tot_loss[loss=2.901, ArTop10Accuracy=0.7514, over 11879.95 frames. ], batch size: 101, lr: 1.57e-02
131
+ 2024-08-06 10:13:46,785 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
132
+ 2024-08-06 10:13:46,788 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-7.pt
133
+ 2024-08-06 10:15:03,600 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.017e+02 1.283e+02 1.410e+02 1.601e+02 1.017e+03, threshold=2.820e+02, percent-clipped=0.9
134
+ 2024-08-06 10:15:40,820 INFO [trainer.py:765] (0/8) Epoch 8, batch 100, train_loss[loss=2.903, ArTop10Accuracy=0.7486, over 14670.00 frames. ], tot_loss[loss=2.889, ArTop10Accuracy=0.7534, over 4768.92 frames. ], batch size: 62, lr: 1.47e-02
135
+ 2024-08-06 10:17:12,862 INFO [trainer.py:765] (0/8) Epoch 8, batch 200, train_loss[loss=2.86, ArTop10Accuracy=0.7606, over 13779.00 frames. ], tot_loss[loss=2.879, ArTop10Accuracy=0.7555, over 7751.17 frames. ], batch size: 34, lr: 1.46e-02
136
+ 2024-08-06 10:18:37,898 INFO [trainer.py:765] (0/8) Epoch 8, batch 300, train_loss[loss=2.875, ArTop10Accuracy=0.7556, over 14097.00 frames. ], tot_loss[loss=2.872, ArTop10Accuracy=0.7568, over 9353.66 frames. ], batch size: 44, lr: 1.46e-02
137
+ 2024-08-06 10:20:06,341 INFO [trainer.py:765] (0/8) Epoch 8, batch 400, train_loss[loss=2.701, ArTop10Accuracy=0.7881, over 10956.00 frames. ], tot_loss[loss=2.869, ArTop10Accuracy=0.7574, over 10265.29 frames. ], batch size: 15, lr: 1.45e-02
138
+ 2024-08-06 10:21:32,411 INFO [trainer.py:765] (0/8) Epoch 8, batch 500, train_loss[loss=2.816, ArTop10Accuracy=0.7739, over 12225.00 frames. ], tot_loss[loss=2.862, ArTop10Accuracy=0.7587, over 10847.20 frames. ], batch size: 22, lr: 1.45e-02
139
+ 2024-08-06 10:23:00,974 INFO [trainer.py:765] (0/8) Epoch 8, batch 600, train_loss[loss=2.888, ArTop10Accuracy=0.7598, over 11340.00 frames. ], tot_loss[loss=2.865, ArTop10Accuracy=0.7583, over 11377.64 frames. ], batch size: 18, lr: 1.44e-02
140
+ 2024-08-06 10:24:37,787 INFO [trainer.py:765] (0/8) Epoch 8, batch 700, train_loss[loss=2.898, ArTop10Accuracy=0.7523, over 9450.00 frames. ], tot_loss[loss=2.869, ArTop10Accuracy=0.7574, over 11524.35 frames. ], batch size: 11, lr: 1.43e-02
141
+ 2024-08-06 10:25:56,088 INFO [trainer.py:765] (0/8) Epoch 8, batch 800, train_loss[loss=2.921, ArTop10Accuracy=0.7523, over 10218.00 frames. ], tot_loss[loss=2.874, ArTop10Accuracy=0.7566, over 11645.69 frames. ], batch size: 12, lr: 1.43e-02
142
+ 2024-08-06 10:27:12,246 INFO [trainer.py:765] (0/8) Epoch 8, batch 900, train_loss[loss=2.885, ArTop10Accuracy=0.7531, over 12756.00 frames. ], tot_loss[loss=2.865, ArTop10Accuracy=0.7585, over 11705.50 frames. ], batch size: 27, lr: 1.42e-02
143
+ 2024-08-06 10:28:25,263 INFO [trainer.py:765] (0/8) Epoch 8, batch 1000, train_loss[loss=2.886, ArTop10Accuracy=0.7564, over 12960.00 frames. ], tot_loss[loss=2.872, ArTop10Accuracy=0.7571, over 11890.41 frames. ], batch size: 27, lr: 1.42e-02
144
+ 2024-08-06 10:29:07,155 INFO [trainer.py:803] (0/8) Computing validation loss
145
+ 2024-08-06 10:29:16,830 INFO [trainer.py:811] (0/8) Epoch 8, validation: loss=2.858, ArTop10Accuracy=0.7594, over 1827537.00 frames.
146
+ 2024-08-06 10:29:16,831 INFO [trainer.py:814] (0/8) Maximum memory allocated so far is 29519MB
147
+ 2024-08-06 10:29:17,490 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.275e+02 1.390e+02 1.547e+02 3.717e+02, threshold=2.781e+02, percent-clipped=0.7
148
+ 2024-08-06 10:29:51,730 INFO [trainer.py:765] (0/8) Epoch 8, batch 1100, train_loss[loss=2.87, ArTop10Accuracy=0.7579, over 13614.00 frames. ], tot_loss[loss=2.875, ArTop10Accuracy=0.7563, over 11953.41 frames. ], batch size: 34, lr: 1.41e-02
149
+ 2024-08-06 10:31:05,948 INFO [trainer.py:765] (0/8) Epoch 8, batch 1200, train_loss[loss=2.996, ArTop10Accuracy=0.7309, over 12624.00 frames. ], tot_loss[loss=2.878, ArTop10Accuracy=0.7558, over 11901.69 frames. ], batch size: 103, lr: 1.40e-02
150
+ 2024-08-06 10:32:05,402 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
151
+ 2024-08-06 10:32:05,407 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-8.pt
152
+ 2024-08-06 10:34:01,256 INFO [trainer.py:765] (0/8) Epoch 9, batch 100, train_loss[loss=2.882, ArTop10Accuracy=0.7575, over 14805.00 frames. ], tot_loss[loss=2.858, ArTop10Accuracy=0.7592, over 4758.64 frames. ], batch size: 63, lr: 1.32e-02
153
+ 2024-08-06 10:35:31,772 INFO [trainer.py:765] (0/8) Epoch 9, batch 200, train_loss[loss=2.787, ArTop10Accuracy=0.7743, over 13638.00 frames. ], tot_loss[loss=2.848, ArTop10Accuracy=0.7607, over 7747.34 frames. ], batch size: 34, lr: 1.32e-02
154
+ 2024-08-06 10:36:57,926 INFO [trainer.py:765] (0/8) Epoch 9, batch 300, train_loss[loss=2.883, ArTop10Accuracy=0.7545, over 14226.00 frames. ], tot_loss[loss=2.849, ArTop10Accuracy=0.7606, over 9353.90 frames. ], batch size: 45, lr: 1.31e-02
155
+ 2024-08-06 10:38:32,698 INFO [trainer.py:765] (0/8) Epoch 9, batch 400, train_loss[loss=2.729, ArTop10Accuracy=0.7845, over 10419.00 frames. ], tot_loss[loss=2.844, ArTop10Accuracy=0.762, over 10282.65 frames. ], batch size: 14, lr: 1.31e-02
156
+ 2024-08-06 10:39:59,256 INFO [trainer.py:765] (0/8) Epoch 9, batch 500, train_loss[loss=2.871, ArTop10Accuracy=0.7572, over 11979.00 frames. ], tot_loss[loss=2.838, ArTop10Accuracy=0.7632, over 10858.06 frames. ], batch size: 22, lr: 1.30e-02
157
+ 2024-08-06 10:41:29,690 INFO [trainer.py:765] (0/8) Epoch 9, batch 600, train_loss[loss=2.819, ArTop10Accuracy=0.7655, over 11466.00 frames. ], tot_loss[loss=2.842, ArTop10Accuracy=0.7628, over 11377.95 frames. ], batch size: 18, lr: 1.30e-02
158
+ 2024-08-06 10:42:58,440 INFO [trainer.py:765] (0/8) Epoch 9, batch 700, train_loss[loss=2.632, ArTop10Accuracy=0.7985, over 10293.00 frames. ], tot_loss[loss=2.841, ArTop10Accuracy=0.7629, over 11523.81 frames. ], batch size: 12, lr: 1.29e-02
159
+ 2024-08-06 10:44:02,952 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.039e+02 1.253e+02 1.352e+02 1.493e+02 7.010e+02, threshold=2.704e+02, percent-clipped=0.6
160
+ 2024-08-06 10:44:19,669 INFO [trainer.py:765] (0/8) Epoch 9, batch 800, train_loss[loss=2.764, ArTop10Accuracy=0.7822, over 9246.00 frames. ], tot_loss[loss=2.845, ArTop10Accuracy=0.762, over 11613.97 frames. ], batch size: 11, lr: 1.29e-02
161
+ 2024-08-06 10:45:35,718 INFO [trainer.py:765] (0/8) Epoch 9, batch 900, train_loss[loss=2.91, ArTop10Accuracy=0.7467, over 13092.00 frames. ], tot_loss[loss=2.84, ArTop10Accuracy=0.7629, over 11694.74 frames. ], batch size: 27, lr: 1.28e-02
162
+ 2024-08-06 10:46:51,271 INFO [trainer.py:765] (0/8) Epoch 9, batch 1000, train_loss[loss=2.834, ArTop10Accuracy=0.7649, over 12852.00 frames. ], tot_loss[loss=2.846, ArTop10Accuracy=0.762, over 11888.03 frames. ], batch size: 27, lr: 1.28e-02
163
+ 2024-08-06 10:48:06,247 INFO [trainer.py:765] (0/8) Epoch 9, batch 1100, train_loss[loss=2.846, ArTop10Accuracy=0.7591, over 13386.00 frames. ], tot_loss[loss=2.854, ArTop10Accuracy=0.7603, over 11952.87 frames. ], batch size: 34, lr: 1.28e-02
164
+ 2024-08-06 10:49:21,054 INFO [trainer.py:765] (0/8) Epoch 9, batch 1200, train_loss[loss=2.929, ArTop10Accuracy=0.7448, over 12195.00 frames. ], tot_loss[loss=2.855, ArTop10Accuracy=0.7599, over 11873.16 frames. ], batch size: 101, lr: 1.27e-02
165
+ 2024-08-06 10:50:22,708 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
166
+ 2024-08-06 10:50:22,712 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-9.pt
167
+ 2024-08-06 10:52:12,325 INFO [trainer.py:765] (0/8) Epoch 10, batch 100, train_loss[loss=2.889, ArTop10Accuracy=0.7574, over 14289.00 frames. ], tot_loss[loss=2.85, ArTop10Accuracy=0.7603, over 4750.13 frames. ], batch size: 62, lr: 1.20e-02
168
+ 2024-08-06 10:53:44,584 INFO [trainer.py:765] (0/8) Epoch 10, batch 200, train_loss[loss=2.83, ArTop10Accuracy=0.7625, over 13602.00 frames. ], tot_loss[loss=2.834, ArTop10Accuracy=0.7638, over 7744.10 frames. ], batch size: 34, lr: 1.20e-02
169
+ 2024-08-06 10:55:08,089 INFO [trainer.py:765] (0/8) Epoch 10, batch 300, train_loss[loss=2.874, ArTop10Accuracy=0.7542, over 13893.00 frames. ], tot_loss[loss=2.828, ArTop10Accuracy=0.7651, over 9372.24 frames. ], batch size: 44, lr: 1.19e-02
170
+ 2024-08-06 10:56:41,177 INFO [trainer.py:765] (0/8) Epoch 10, batch 400, train_loss[loss=2.776, ArTop10Accuracy=0.7748, over 10425.00 frames. ], tot_loss[loss=2.825, ArTop10Accuracy=0.7658, over 10286.81 frames. ], batch size: 14, lr: 1.19e-02
171
+ 2024-08-06 10:58:04,937 INFO [trainer.py:803] (0/8) Computing validation loss
172
+ 2024-08-06 10:58:14,555 INFO [trainer.py:811] (0/8) Epoch 10, validation: loss=2.842, ArTop10Accuracy=0.7624, over 1827537.00 frames.
173
+ 2024-08-06 10:58:14,556 INFO [trainer.py:814] (0/8) Maximum memory allocated so far is 29519MB
174
+ 2024-08-06 10:58:15,574 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.228e+02 1.320e+02 1.458e+02 6.096e+02, threshold=2.641e+02, percent-clipped=0.6
175
+ 2024-08-06 10:58:15,579 INFO [trainer.py:765] (0/8) Epoch 10, batch 500, train_loss[loss=2.791, ArTop10Accuracy=0.7754, over 12168.00 frames. ], tot_loss[loss=2.824, ArTop10Accuracy=0.766, over 10844.50 frames. ], batch size: 22, lr: 1.19e-02
176
+ 2024-08-06 10:59:42,817 INFO [trainer.py:765] (0/8) Epoch 10, batch 600, train_loss[loss=2.772, ArTop10Accuracy=0.7802, over 11367.00 frames. ], tot_loss[loss=2.822, ArTop10Accuracy=0.7664, over 11362.90 frames. ], batch size: 18, lr: 1.18e-02
177
+ 2024-08-06 11:01:18,109 INFO [trainer.py:765] (0/8) Epoch 10, batch 700, train_loss[loss=2.798, ArTop10Accuracy=0.769, over 10206.00 frames. ], tot_loss[loss=2.827, ArTop10Accuracy=0.7656, over 11497.81 frames. ], batch size: 12, lr: 1.18e-02
178
+ 2024-08-06 11:02:36,918 INFO [trainer.py:765] (0/8) Epoch 10, batch 800, train_loss[loss=2.694, ArTop10Accuracy=0.7909, over 9990.00 frames. ], tot_loss[loss=2.833, ArTop10Accuracy=0.764, over 11622.09 frames. ], batch size: 12, lr: 1.17e-02
179
+ 2024-08-06 11:03:51,212 INFO [trainer.py:765] (0/8) Epoch 10, batch 900, train_loss[loss=2.837, ArTop10Accuracy=0.7643, over 13029.00 frames. ], tot_loss[loss=2.829, ArTop10Accuracy=0.765, over 11669.36 frames. ], batch size: 27, lr: 1.17e-02
180
+ 2024-08-06 11:05:06,352 INFO [trainer.py:765] (0/8) Epoch 10, batch 1000, train_loss[loss=2.861, ArTop10Accuracy=0.7555, over 13305.00 frames. ], tot_loss[loss=2.829, ArTop10Accuracy=0.7652, over 11865.57 frames. ], batch size: 28, lr: 1.17e-02
181
+ 2024-08-06 11:06:21,724 INFO [trainer.py:765] (0/8) Epoch 10, batch 1100, train_loss[loss=2.84, ArTop10Accuracy=0.7659, over 13539.00 frames. ], tot_loss[loss=2.834, ArTop10Accuracy=0.7642, over 11938.55 frames. ], batch size: 34, lr: 1.16e-02
182
+ 2024-08-06 11:07:34,772 INFO [trainer.py:765] (0/8) Epoch 10, batch 1200, train_loss[loss=2.951, ArTop10Accuracy=0.7414, over 12210.00 frames. ], tot_loss[loss=2.837, ArTop10Accuracy=0.7635, over 11860.87 frames. ], batch size: 101, lr: 1.16e-02
183
+ 2024-08-06 11:08:33,817 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
184
+ 2024-08-06 11:08:33,820 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-10.pt
185
+ 2024-08-06 11:10:29,953 INFO [trainer.py:765] (0/8) Epoch 11, batch 100, train_loss[loss=2.853, ArTop10Accuracy=0.7606, over 14718.00 frames. ], tot_loss[loss=2.822, ArTop10Accuracy=0.766, over 4759.47 frames. ], batch size: 62, lr: 1.10e-02
186
+ 2024-08-06 11:12:04,673 INFO [trainer.py:765] (0/8) Epoch 11, batch 200, train_loss[loss=2.845, ArTop10Accuracy=0.7641, over 14085.00 frames. ], tot_loss[loss=2.814, ArTop10Accuracy=0.7672, over 7737.29 frames. ], batch size: 35, lr: 1.10e-02
187
+ 2024-08-06 11:12:22,823 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 9.884e+01 1.240e+02 1.333e+02 1.457e+02 6.939e+02, threshold=2.667e+02, percent-clipped=0.1
188
+ 2024-08-06 11:13:31,549 INFO [trainer.py:765] (0/8) Epoch 11, batch 300, train_loss[loss=2.914, ArTop10Accuracy=0.749, over 14385.00 frames. ], tot_loss[loss=2.81, ArTop10Accuracy=0.7683, over 9360.87 frames. ], batch size: 44, lr: 1.09e-02
189
+ 2024-08-06 11:15:03,268 INFO [trainer.py:765] (0/8) Epoch 11, batch 400, train_loss[loss=2.742, ArTop10Accuracy=0.7831, over 10377.00 frames. ], tot_loss[loss=2.807, ArTop10Accuracy=0.769, over 10296.01 frames. ], batch size: 14, lr: 1.09e-02
190
+ 2024-08-06 11:16:29,637 INFO [trainer.py:765] (0/8) Epoch 11, batch 500, train_loss[loss=2.801, ArTop10Accuracy=0.7716, over 12243.00 frames. ], tot_loss[loss=2.802, ArTop10Accuracy=0.7699, over 10869.51 frames. ], batch size: 22, lr: 1.09e-02
191
+ 2024-08-06 11:18:00,516 INFO [trainer.py:765] (0/8) Epoch 11, batch 600, train_loss[loss=2.718, ArTop10Accuracy=0.7889, over 11457.00 frames. ], tot_loss[loss=2.804, ArTop10Accuracy=0.7698, over 11377.43 frames. ], batch size: 18, lr: 1.08e-02
192
+ 2024-08-06 11:19:34,512 INFO [trainer.py:765] (0/8) Epoch 11, batch 700, train_loss[loss=2.702, ArTop10Accuracy=0.7909, over 10167.00 frames. ], tot_loss[loss=2.81, ArTop10Accuracy=0.7684, over 11520.97 frames. ], batch size: 12, lr: 1.08e-02
193
+ 2024-08-06 11:20:55,482 INFO [trainer.py:765] (0/8) Epoch 11, batch 800, train_loss[loss=2.676, ArTop10Accuracy=0.795, over 10086.00 frames. ], tot_loss[loss=2.813, ArTop10Accuracy=0.7681, over 11629.35 frames. ], batch size: 12, lr: 1.07e-02
194
+ 2024-08-06 11:22:13,704 INFO [trainer.py:765] (0/8) Epoch 11, batch 900, train_loss[loss=2.802, ArTop10Accuracy=0.771, over 12939.00 frames. ], tot_loss[loss=2.81, ArTop10Accuracy=0.7685, over 11682.50 frames. ], batch size: 27, lr: 1.07e-02
195
+ 2024-08-06 11:23:31,797 INFO [trainer.py:765] (0/8) Epoch 11, batch 1000, train_loss[loss=2.76, ArTop10Accuracy=0.7754, over 12987.00 frames. ], tot_loss[loss=2.815, ArTop10Accuracy=0.7675, over 11877.90 frames. ], batch size: 27, lr: 1.07e-02
196
+ 2024-08-06 11:24:46,901 INFO [trainer.py:765] (0/8) Epoch 11, batch 1100, train_loss[loss=2.779, ArTop10Accuracy=0.7775, over 13578.00 frames. ], tot_loss[loss=2.821, ArTop10Accuracy=0.7666, over 11962.65 frames. ], batch size: 34, lr: 1.06e-02
197
+ 2024-08-06 11:26:00,733 INFO [trainer.py:765] (0/8) Epoch 11, batch 1200, train_loss[loss=2.934, ArTop10Accuracy=0.7458, over 12288.00 frames. ], tot_loss[loss=2.821, ArTop10Accuracy=0.7664, over 11874.75 frames. ], batch size: 103, lr: 1.06e-02
198
+ 2024-08-06 11:26:15,845 INFO [trainer.py:803] (0/8) Computing validation loss
199
+ 2024-08-06 11:26:25,556 INFO [trainer.py:811] (0/8) Epoch 11, validation: loss=2.831, ArTop10Accuracy=0.7643, over 1827537.00 frames.
200
+ 2024-08-06 11:26:25,557 INFO [trainer.py:814] (0/8) Maximum memory allocated so far is 29519MB
201
+ 2024-08-06 11:26:26,185 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.251e+02 1.335e+02 1.441e+02 2.942e+02, threshold=2.669e+02, percent-clipped=0.1
202
+ 2024-08-06 11:27:09,747 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
203
+ 2024-08-06 11:27:09,754 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-11.pt
204
+ 2024-08-06 11:29:03,450 INFO [trainer.py:765] (0/8) Epoch 12, batch 100, train_loss[loss=2.881, ArTop10Accuracy=0.7544, over 14667.00 frames. ], tot_loss[loss=2.797, ArTop10Accuracy=0.7704, over 4747.48 frames. ], batch size: 62, lr: 1.01e-02
205
+ 2024-08-06 11:30:30,673 INFO [trainer.py:765] (0/8) Epoch 12, batch 200, train_loss[loss=2.785, ArTop10Accuracy=0.7731, over 13518.00 frames. ], tot_loss[loss=2.799, ArTop10Accuracy=0.7705, over 7756.10 frames. ], batch size: 34, lr: 1.01e-02
206
+ 2024-08-06 11:31:57,654 INFO [trainer.py:765] (0/8) Epoch 12, batch 300, train_loss[loss=2.832, ArTop10Accuracy=0.7641, over 14247.00 frames. ], tot_loss[loss=2.792, ArTop10Accuracy=0.7719, over 9392.76 frames. ], batch size: 44, lr: 1.01e-02
207
+ 2024-08-06 11:33:30,737 INFO [trainer.py:765] (0/8) Epoch 12, batch 400, train_loss[loss=2.739, ArTop10Accuracy=0.7777, over 10179.00 frames. ], tot_loss[loss=2.794, ArTop10Accuracy=0.7714, over 10292.52 frames. ], batch size: 14, lr: 1.00e-02
208
+ 2024-08-06 11:34:55,734 INFO [trainer.py:765] (0/8) Epoch 12, batch 500, train_loss[loss=2.801, ArTop10Accuracy=0.7709, over 12150.00 frames. ], tot_loss[loss=2.788, ArTop10Accuracy=0.7728, over 10850.93 frames. ], batch size: 22, lr: 1.00e-02
209
+ 2024-08-06 11:36:29,361 INFO [trainer.py:765] (0/8) Epoch 12, batch 600, train_loss[loss=2.741, ArTop10Accuracy=0.7815, over 11487.00 frames. ], tot_loss[loss=2.79, ArTop10Accuracy=0.7724, over 11362.82 frames. ], batch size: 18, lr: 9.97e-03
210
+ 2024-08-06 11:38:00,343 INFO [trainer.py:765] (0/8) Epoch 12, batch 700, train_loss[loss=2.741, ArTop10Accuracy=0.7811, over 10062.00 frames. ], tot_loss[loss=2.796, ArTop10Accuracy=0.7713, over 11517.08 frames. ], batch size: 12, lr: 9.93e-03
211
+ 2024-08-06 11:39:23,610 INFO [trainer.py:765] (0/8) Epoch 12, batch 800, train_loss[loss=2.759, ArTop10Accuracy=0.7732, over 10080.00 frames. ], tot_loss[loss=2.799, ArTop10Accuracy=0.7705, over 11636.73 frames. ], batch size: 12, lr: 9.90e-03
212
+ 2024-08-06 11:40:39,888 INFO [trainer.py:765] (0/8) Epoch 12, batch 900, train_loss[loss=2.823, ArTop10Accuracy=0.7693, over 12876.00 frames. ], tot_loss[loss=2.79, ArTop10Accuracy=0.7724, over 11678.76 frames. ], batch size: 27, lr: 9.87e-03
213
+ 2024-08-06 11:41:13,995 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.248e+02 1.348e+02 1.459e+02 5.540e+02, threshold=2.695e+02, percent-clipped=0.3
214
+ 2024-08-06 11:41:56,188 INFO [trainer.py:765] (0/8) Epoch 12, batch 1000, train_loss[loss=2.811, ArTop10Accuracy=0.7647, over 12993.00 frames. ], tot_loss[loss=2.795, ArTop10Accuracy=0.7713, over 11886.62 frames. ], batch size: 27, lr: 9.85e-03
215
+ 2024-08-06 11:43:14,319 INFO [trainer.py:765] (0/8) Epoch 12, batch 1100, train_loss[loss=2.781, ArTop10Accuracy=0.7739, over 13596.00 frames. ], tot_loss[loss=2.799, ArTop10Accuracy=0.7707, over 11956.72 frames. ], batch size: 34, lr: 9.82e-03
216
+ 2024-08-06 11:44:26,155 INFO [trainer.py:765] (0/8) Epoch 12, batch 1200, train_loss[loss=2.9, ArTop10Accuracy=0.7534, over 12429.00 frames. ], tot_loss[loss=2.803, ArTop10Accuracy=0.7699, over 11861.85 frames. ], batch size: 103, lr: 9.79e-03
217
+ 2024-08-06 11:45:26,924 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
218
+ 2024-08-06 11:45:26,927 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-12.pt
219
+ 2024-08-06 11:47:26,603 INFO [trainer.py:765] (0/8) Epoch 13, batch 100, train_loss[loss=2.873, ArTop10Accuracy=0.7586, over 14676.00 frames. ], tot_loss[loss=2.786, ArTop10Accuracy=0.7726, over 4752.95 frames. ], batch size: 63, lr: 9.37e-03
220
+ 2024-08-06 11:48:54,779 INFO [trainer.py:765] (0/8) Epoch 13, batch 200, train_loss[loss=2.687, ArTop10Accuracy=0.7937, over 13491.00 frames. ], tot_loss[loss=2.78, ArTop10Accuracy=0.7742, over 7726.95 frames. ], batch size: 34, lr: 9.34e-03
221
+ 2024-08-06 11:50:20,515 INFO [trainer.py:765] (0/8) Epoch 13, batch 300, train_loss[loss=2.866, ArTop10Accuracy=0.7566, over 14184.00 frames. ], tot_loss[loss=2.777, ArTop10Accuracy=0.7748, over 9347.37 frames. ], batch size: 44, lr: 9.31e-03
222
+ 2024-08-06 11:51:48,764 INFO [trainer.py:765] (0/8) Epoch 13, batch 400, train_loss[loss=2.725, ArTop10Accuracy=0.7881, over 10398.00 frames. ], tot_loss[loss=2.771, ArTop10Accuracy=0.7757, over 10274.62 frames. ], batch size: 14, lr: 9.28e-03
223
+ 2024-08-06 11:53:13,408 INFO [trainer.py:765] (0/8) Epoch 13, batch 500, train_loss[loss=2.662, ArTop10Accuracy=0.7992, over 12588.00 frames. ], tot_loss[loss=2.77, ArTop10Accuracy=0.7759, over 10848.23 frames. ], batch size: 23, lr: 9.26e-03
224
+ 2024-08-06 11:54:52,223 INFO [trainer.py:765] (0/8) Epoch 13, batch 600, train_loss[loss=2.762, ArTop10Accuracy=0.7758, over 11412.00 frames. ], tot_loss[loss=2.773, ArTop10Accuracy=0.7756, over 11379.70 frames. ], batch size: 18, lr: 9.23e-03
225
+ 2024-08-06 11:55:47,082 INFO [trainer.py:803] (0/8) Computing validation loss
226
+ 2024-08-06 11:55:56,835 INFO [trainer.py:811] (0/8) Epoch 13, validation: loss=2.824, ArTop10Accuracy=0.7662, over 1827537.00 frames.
227
+ 2024-08-06 11:55:56,835 INFO [trainer.py:814] (0/8) Maximum memory allocated so far is 29519MB
228
+ 2024-08-06 11:55:57,712 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.255e+02 1.343e+02 1.452e+02 4.888e+02, threshold=2.687e+02, percent-clipped=0.1
229
+ 2024-08-06 11:56:28,465 INFO [trainer.py:765] (0/8) Epoch 13, batch 700, train_loss[loss=2.707, ArTop10Accuracy=0.7891, over 10317.00 frames. ], tot_loss[loss=2.776, ArTop10Accuracy=0.7749, over 11514.39 frames. ], batch size: 12, lr: 9.20e-03
230
+ 2024-08-06 11:57:46,684 INFO [trainer.py:765] (0/8) Epoch 13, batch 800, train_loss[loss=2.67, ArTop10Accuracy=0.7936, over 9534.00 frames. ], tot_loss[loss=2.78, ArTop10Accuracy=0.7743, over 11614.14 frames. ], batch size: 11, lr: 9.18e-03
231
+ 2024-08-06 11:59:03,289 INFO [trainer.py:765] (0/8) Epoch 13, batch 900, train_loss[loss=2.81, ArTop10Accuracy=0.7719, over 13218.00 frames. ], tot_loss[loss=2.777, ArTop10Accuracy=0.775, over 11658.17 frames. ], batch size: 28, lr: 9.15e-03
232
+ 2024-08-06 12:00:19,174 INFO [trainer.py:765] (0/8) Epoch 13, batch 1000, train_loss[loss=2.737, ArTop10Accuracy=0.7798, over 13002.00 frames. ], tot_loss[loss=2.781, ArTop10Accuracy=0.7743, over 11880.17 frames. ], batch size: 27, lr: 9.13e-03
233
+ 2024-08-06 12:01:34,883 INFO [trainer.py:765] (0/8) Epoch 13, batch 1100, train_loss[loss=2.79, ArTop10Accuracy=0.7723, over 13695.00 frames. ], tot_loss[loss=2.789, ArTop10Accuracy=0.7728, over 11953.89 frames. ], batch size: 34, lr: 9.10e-03
234
+ 2024-08-06 12:02:48,662 INFO [trainer.py:765] (0/8) Epoch 13, batch 1200, train_loss[loss=2.97, ArTop10Accuracy=0.7374, over 12612.00 frames. ], tot_loss[loss=2.79, ArTop10Accuracy=0.7723, over 11864.76 frames. ], batch size: 101, lr: 9.08e-03
235
+ 2024-08-06 12:03:47,909 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
236
+ 2024-08-06 12:03:47,912 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-13.pt
237
+ 2024-08-06 12:05:45,336 INFO [trainer.py:765] (0/8) Epoch 14, batch 100, train_loss[loss=2.841, ArTop10Accuracy=0.7652, over 14853.00 frames. ], tot_loss[loss=2.769, ArTop10Accuracy=0.7759, over 4768.63 frames. ], batch size: 64, lr: 8.71e-03
238
+ 2024-08-06 12:07:16,605 INFO [trainer.py:765] (0/8) Epoch 14, batch 200, train_loss[loss=2.748, ArTop10Accuracy=0.7779, over 13905.00 frames. ], tot_loss[loss=2.765, ArTop10Accuracy=0.7762, over 7750.37 frames. ], batch size: 35, lr: 8.69e-03
239
+ 2024-08-06 12:08:44,312 INFO [trainer.py:765] (0/8) Epoch 14, batch 300, train_loss[loss=2.778, ArTop10Accuracy=0.772, over 14355.00 frames. ], tot_loss[loss=2.764, ArTop10Accuracy=0.7769, over 9379.42 frames. ], batch size: 44, lr: 8.66e-03
240
+ 2024-08-06 12:10:01,132 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.266e+02 1.374e+02 1.483e+02 6.480e+02, threshold=2.748e+02, percent-clipped=0.2
241
+ 2024-08-06 12:10:10,226 INFO [trainer.py:765] (0/8) Epoch 14, batch 400, train_loss[loss=2.741, ArTop10Accuracy=0.7836, over 10218.00 frames. ], tot_loss[loss=2.766, ArTop10Accuracy=0.7765, over 10280.49 frames. ], batch size: 14, lr: 8.64e-03
242
+ 2024-08-06 12:11:36,150 INFO [trainer.py:765] (0/8) Epoch 14, batch 500, train_loss[loss=2.676, ArTop10Accuracy=0.7933, over 12192.00 frames. ], tot_loss[loss=2.759, ArTop10Accuracy=0.7781, over 10852.25 frames. ], batch size: 22, lr: 8.62e-03
243
+ 2024-08-06 12:13:05,993 INFO [trainer.py:765] (0/8) Epoch 14, batch 600, train_loss[loss=2.752, ArTop10Accuracy=0.7771, over 11457.00 frames. ], tot_loss[loss=2.763, ArTop10Accuracy=0.7771, over 11369.19 frames. ], batch size: 18, lr: 8.59e-03
244
+ 2024-08-06 12:14:38,553 INFO [trainer.py:765] (0/8) Epoch 14, batch 700, train_loss[loss=2.705, ArTop10Accuracy=0.7961, over 10176.00 frames. ], tot_loss[loss=2.769, ArTop10Accuracy=0.7761, over 11536.24 frames. ], batch size: 12, lr: 8.57e-03
245
+ 2024-08-06 12:15:58,070 INFO [trainer.py:765] (0/8) Epoch 14, batch 800, train_loss[loss=2.689, ArTop10Accuracy=0.7909, over 10209.00 frames. ], tot_loss[loss=2.772, ArTop10Accuracy=0.7753, over 11651.71 frames. ], batch size: 12, lr: 8.55e-03
246
+ 2024-08-06 12:17:12,865 INFO [trainer.py:765] (0/8) Epoch 14, batch 900, train_loss[loss=2.732, ArTop10Accuracy=0.7826, over 12996.00 frames. ], tot_loss[loss=2.768, ArTop10Accuracy=0.7763, over 11690.20 frames. ], batch size: 27, lr: 8.52e-03
247
+ 2024-08-06 12:18:29,613 INFO [trainer.py:765] (0/8) Epoch 14, batch 1000, train_loss[loss=2.803, ArTop10Accuracy=0.7661, over 12777.00 frames. ], tot_loss[loss=2.775, ArTop10Accuracy=0.7751, over 11882.08 frames. ], batch size: 27, lr: 8.50e-03
248
+ 2024-08-06 12:19:45,376 INFO [trainer.py:765] (0/8) Epoch 14, batch 1100, train_loss[loss=2.777, ArTop10Accuracy=0.7773, over 13431.00 frames. ], tot_loss[loss=2.781, ArTop10Accuracy=0.7741, over 11942.94 frames. ], batch size: 34, lr: 8.48e-03
249
+ 2024-08-06 12:20:59,278 INFO [trainer.py:765] (0/8) Epoch 14, batch 1200, train_loss[loss=2.906, ArTop10Accuracy=0.75, over 12942.00 frames. ], tot_loss[loss=2.779, ArTop10Accuracy=0.7745, over 11861.52 frames. ], batch size: 102, lr: 8.46e-03
250
+ 2024-08-06 12:21:58,346 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
251
+ 2024-08-06 12:21:58,348 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-14.pt
252
+ 2024-08-06 12:23:51,960 INFO [trainer.py:765] (0/8) Epoch 15, batch 100, train_loss[loss=2.836, ArTop10Accuracy=0.7586, over 14583.00 frames. ], tot_loss[loss=2.768, ArTop10Accuracy=0.7759, over 4768.34 frames. ], batch size: 62, lr: 8.14e-03
253
+ 2024-08-06 12:24:00,597 INFO [trainer.py:803] (0/8) Computing validation loss
254
+ 2024-08-06 12:24:10,290 INFO [trainer.py:811] (0/8) Epoch 15, validation: loss=2.819, ArTop10Accuracy=0.7675, over 1827537.00 frames.
255
+ 2024-08-06 12:24:10,291 INFO [trainer.py:814] (0/8) Maximum memory allocated so far is 29519MB
256
+ 2024-08-06 12:24:11,094 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.284e+02 1.371e+02 1.488e+02 4.667e+02, threshold=2.743e+02, percent-clipped=0.2
257
+ 2024-08-06 12:25:29,990 INFO [trainer.py:765] (0/8) Epoch 15, batch 200, train_loss[loss=2.826, ArTop10Accuracy=0.7627, over 13767.00 frames. ], tot_loss[loss=2.757, ArTop10Accuracy=0.7783, over 7773.81 frames. ], batch size: 34, lr: 8.12e-03
258
+ 2024-08-06 12:26:58,695 INFO [trainer.py:765] (0/8) Epoch 15, batch 300, train_loss[loss=2.819, ArTop10Accuracy=0.7639, over 13935.00 frames. ], tot_loss[loss=2.753, ArTop10Accuracy=0.7791, over 9378.32 frames. ], batch size: 44, lr: 8.09e-03
259
+ 2024-08-06 12:28:28,536 INFO [trainer.py:765] (0/8) Epoch 15, batch 400, train_loss[loss=2.684, ArTop10Accuracy=0.7938, over 10281.00 frames. ], tot_loss[loss=2.751, ArTop10Accuracy=0.7796, over 10275.42 frames. ], batch size: 14, lr: 8.07e-03
260
+ 2024-08-06 12:29:54,033 INFO [trainer.py:765] (0/8) Epoch 15, batch 500, train_loss[loss=2.691, ArTop10Accuracy=0.7942, over 12264.00 frames. ], tot_loss[loss=2.746, ArTop10Accuracy=0.7804, over 10831.07 frames. ], batch size: 22, lr: 8.05e-03
261
+ 2024-08-06 12:31:23,293 INFO [trainer.py:765] (0/8) Epoch 15, batch 600, train_loss[loss=2.735, ArTop10Accuracy=0.7826, over 11409.00 frames. ], tot_loss[loss=2.753, ArTop10Accuracy=0.7791, over 11367.63 frames. ], batch size: 18, lr: 8.03e-03
262
+ 2024-08-06 12:32:53,176 INFO [trainer.py:765] (0/8) Epoch 15, batch 700, train_loss[loss=2.686, ArTop10Accuracy=0.7965, over 9513.00 frames. ], tot_loss[loss=2.757, ArTop10Accuracy=0.7784, over 11512.36 frames. ], batch size: 11, lr: 8.01e-03
263
+ 2024-08-06 12:34:18,254 INFO [trainer.py:765] (0/8) Epoch 15, batch 800, train_loss[loss=2.696, ArTop10Accuracy=0.7923, over 9231.00 frames. ], tot_loss[loss=2.759, ArTop10Accuracy=0.7779, over 11623.54 frames. ], batch size: 11, lr: 7.99e-03
264
+ 2024-08-06 12:35:34,727 INFO [trainer.py:765] (0/8) Epoch 15, batch 900, train_loss[loss=2.797, ArTop10Accuracy=0.7713, over 12987.00 frames. ], tot_loss[loss=2.756, ArTop10Accuracy=0.7785, over 11662.82 frames. ], batch size: 27, lr: 7.97e-03
265
+ 2024-08-06 12:36:50,540 INFO [trainer.py:765] (0/8) Epoch 15, batch 1000, train_loss[loss=2.785, ArTop10Accuracy=0.7808, over 12819.00 frames. ], tot_loss[loss=2.76, ArTop10Accuracy=0.7779, over 11884.84 frames. ], batch size: 27, lr: 7.95e-03
266
+ 2024-08-06 12:38:05,181 INFO [trainer.py:765] (0/8) Epoch 15, batch 1100, train_loss[loss=2.821, ArTop10Accuracy=0.7663, over 13737.00 frames. ], tot_loss[loss=2.766, ArTop10Accuracy=0.7766, over 11951.41 frames. ], batch size: 34, lr: 7.93e-03
267
+ 2024-08-06 12:38:12,841 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.293e+02 1.379e+02 1.467e+02 2.824e+02, threshold=2.759e+02, percent-clipped=0.1
268
+ 2024-08-06 12:39:18,789 INFO [trainer.py:765] (0/8) Epoch 15, batch 1200, train_loss[loss=2.888, ArTop10Accuracy=0.7535, over 12417.00 frames. ], tot_loss[loss=2.769, ArTop10Accuracy=0.776, over 11865.99 frames. ], batch size: 101, lr: 7.91e-03
269
+ 2024-08-06 12:40:18,830 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
270
+ 2024-08-06 12:40:18,833 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-15.pt
271
+ 2024-08-06 12:42:17,620 INFO [trainer.py:765] (0/8) Epoch 16, batch 100, train_loss[loss=2.796, ArTop10Accuracy=0.7693, over 14427.00 frames. ], tot_loss[loss=2.744, ArTop10Accuracy=0.7806, over 4772.64 frames. ], batch size: 62, lr: 7.63e-03
272
+ 2024-08-06 12:43:49,565 INFO [trainer.py:765] (0/8) Epoch 16, batch 200, train_loss[loss=2.744, ArTop10Accuracy=0.7763, over 13896.00 frames. ], tot_loss[loss=2.744, ArTop10Accuracy=0.7806, over 7760.92 frames. ], batch size: 35, lr: 7.61e-03
273
+ 2024-08-06 12:45:18,501 INFO [trainer.py:765] (0/8) Epoch 16, batch 300, train_loss[loss=2.8, ArTop10Accuracy=0.7674, over 14070.00 frames. ], tot_loss[loss=2.742, ArTop10Accuracy=0.7808, over 9372.95 frames. ], batch size: 44, lr: 7.59e-03
274
+ 2024-08-06 12:46:45,208 INFO [trainer.py:765] (0/8) Epoch 16, batch 400, train_loss[loss=2.716, ArTop10Accuracy=0.7876, over 10245.00 frames. ], tot_loss[loss=2.743, ArTop10Accuracy=0.7806, over 10296.61 frames. ], batch size: 14, lr: 7.58e-03
275
+ 2024-08-06 12:48:16,312 INFO [trainer.py:765] (0/8) Epoch 16, batch 500, train_loss[loss=2.686, ArTop10Accuracy=0.7917, over 12087.00 frames. ], tot_loss[loss=2.738, ArTop10Accuracy=0.7818, over 10853.59 frames. ], batch size: 22, lr: 7.56e-03
276
+ 2024-08-06 12:49:46,641 INFO [trainer.py:765] (0/8) Epoch 16, batch 600, train_loss[loss=2.69, ArTop10Accuracy=0.792, over 11271.00 frames. ], tot_loss[loss=2.741, ArTop10Accuracy=0.7811, over 11366.38 frames. ], batch size: 18, lr: 7.54e-03
277
+ 2024-08-06 12:51:23,680 INFO [trainer.py:765] (0/8) Epoch 16, batch 700, train_loss[loss=2.612, ArTop10Accuracy=0.8047, over 9414.00 frames. ], tot_loss[loss=2.743, ArTop10Accuracy=0.7808, over 11501.99 frames. ], batch size: 11, lr: 7.52e-03
278
+ 2024-08-06 12:52:43,500 INFO [trainer.py:765] (0/8) Epoch 16, batch 800, train_loss[loss=2.757, ArTop10Accuracy=0.7792, over 9558.00 frames. ], tot_loss[loss=2.748, ArTop10Accuracy=0.7797, over 11628.70 frames. ], batch size: 11, lr: 7.51e-03
279
+ 2024-08-06 12:53:06,014 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/checkpoint-20000.pt
280
+ 2024-08-06 12:53:08,969 INFO [trainer.py:803] (0/8) Computing validation loss
281
+ 2024-08-06 12:53:15,494 INFO [trainer.py:811] (0/8) Epoch 16, validation: loss=2.816, ArTop10Accuracy=0.7678, over 1827537.00 frames.
282
+ 2024-08-06 12:53:15,495 INFO [trainer.py:814] (0/8) Maximum memory allocated so far is 29524MB
283
+ 2024-08-06 12:53:16,187 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.291e+02 1.391e+02 1.487e+02 3.459e+02, threshold=2.783e+02, percent-clipped=0.1
284
+ 2024-08-06 12:54:06,480 INFO [trainer.py:765] (0/8) Epoch 16, batch 900, train_loss[loss=2.719, ArTop10Accuracy=0.7879, over 12855.00 frames. ], tot_loss[loss=2.748, ArTop10Accuracy=0.7799, over 11666.65 frames. ], batch size: 27, lr: 7.49e-03
285
+ 2024-08-06 12:55:19,791 INFO [trainer.py:765] (0/8) Epoch 16, batch 1000, train_loss[loss=2.721, ArTop10Accuracy=0.7857, over 12909.00 frames. ], tot_loss[loss=2.75, ArTop10Accuracy=0.78, over 11872.94 frames. ], batch size: 27, lr: 7.47e-03
286
+ 2024-08-06 12:56:33,165 INFO [trainer.py:765] (0/8) Epoch 16, batch 1100, train_loss[loss=2.78, ArTop10Accuracy=0.7767, over 13479.00 frames. ], tot_loss[loss=2.758, ArTop10Accuracy=0.7784, over 11947.93 frames. ], batch size: 34, lr: 7.45e-03
287
+ 2024-08-06 12:57:48,485 INFO [trainer.py:765] (0/8) Epoch 16, batch 1200, train_loss[loss=2.835, ArTop10Accuracy=0.7596, over 12075.00 frames. ], tot_loss[loss=2.756, ArTop10Accuracy=0.7786, over 11864.13 frames. ], batch size: 103, lr: 7.44e-03
288
+ 2024-08-06 12:58:48,289 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
289
+ 2024-08-06 12:58:48,292 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-16.pt
290
+ 2024-08-06 13:00:47,900 INFO [trainer.py:765] (0/8) Epoch 17, batch 100, train_loss[loss=2.789, ArTop10Accuracy=0.7719, over 14058.00 frames. ], tot_loss[loss=2.741, ArTop10Accuracy=0.7811, over 4756.29 frames. ], batch size: 62, lr: 7.18e-03
291
+ 2024-08-06 13:02:19,302 INFO [trainer.py:765] (0/8) Epoch 17, batch 200, train_loss[loss=2.781, ArTop10Accuracy=0.7758, over 14010.00 frames. ], tot_loss[loss=2.737, ArTop10Accuracy=0.7818, over 7771.17 frames. ], batch size: 35, lr: 7.17e-03
292
+ 2024-08-06 13:03:45,518 INFO [trainer.py:765] (0/8) Epoch 17, batch 300, train_loss[loss=2.776, ArTop10Accuracy=0.7769, over 14058.00 frames. ], tot_loss[loss=2.73, ArTop10Accuracy=0.7832, over 9383.89 frames. ], batch size: 44, lr: 7.15e-03
293
+ 2024-08-06 13:05:21,760 INFO [trainer.py:765] (0/8) Epoch 17, batch 400, train_loss[loss=2.661, ArTop10Accuracy=0.8014, over 10203.00 frames. ], tot_loss[loss=2.725, ArTop10Accuracy=0.7843, over 10293.21 frames. ], batch size: 14, lr: 7.14e-03
294
+ 2024-08-06 13:06:47,021 INFO [trainer.py:765] (0/8) Epoch 17, batch 500, train_loss[loss=2.668, ArTop10Accuracy=0.7993, over 11934.00 frames. ], tot_loss[loss=2.723, ArTop10Accuracy=0.7846, over 10850.62 frames. ], batch size: 22, lr: 7.12e-03
295
+ 2024-08-06 13:07:39,880 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.293e+02 1.386e+02 1.488e+02 3.253e+02, threshold=2.772e+02, percent-clipped=0.1
296
+ 2024-08-06 13:08:22,689 INFO [trainer.py:765] (0/8) Epoch 17, batch 600, train_loss[loss=2.689, ArTop10Accuracy=0.7944, over 11556.00 frames. ], tot_loss[loss=2.725, ArTop10Accuracy=0.7845, over 11380.73 frames. ], batch size: 18, lr: 7.10e-03
297
+ 2024-08-06 13:09:54,835 INFO [trainer.py:765] (0/8) Epoch 17, batch 700, train_loss[loss=2.51, ArTop10Accuracy=0.8276, over 10056.00 frames. ], tot_loss[loss=2.732, ArTop10Accuracy=0.7833, over 11512.80 frames. ], batch size: 12, lr: 7.09e-03
298
+ 2024-08-06 13:11:19,481 INFO [trainer.py:765] (0/8) Epoch 17, batch 800, train_loss[loss=2.601, ArTop10Accuracy=0.8115, over 10008.00 frames. ], tot_loss[loss=2.733, ArTop10Accuracy=0.7831, over 11634.34 frames. ], batch size: 12, lr: 7.07e-03
299
+ 2024-08-06 13:12:35,670 INFO [trainer.py:765] (0/8) Epoch 17, batch 900, train_loss[loss=2.754, ArTop10Accuracy=0.7803, over 13182.00 frames. ], tot_loss[loss=2.731, ArTop10Accuracy=0.7836, over 11691.70 frames. ], batch size: 28, lr: 7.06e-03
300
+ 2024-08-06 13:13:53,063 INFO [trainer.py:765] (0/8) Epoch 17, batch 1000, train_loss[loss=2.715, ArTop10Accuracy=0.79, over 13344.00 frames. ], tot_loss[loss=2.737, ArTop10Accuracy=0.7822, over 11909.14 frames. ], batch size: 28, lr: 7.04e-03
301
+ 2024-08-06 13:15:08,483 INFO [trainer.py:765] (0/8) Epoch 17, batch 1100, train_loss[loss=2.693, ArTop10Accuracy=0.7931, over 13545.00 frames. ], tot_loss[loss=2.744, ArTop10Accuracy=0.7808, over 11965.37 frames. ], batch size: 34, lr: 7.02e-03
302
+ 2024-08-06 13:16:22,389 INFO [trainer.py:765] (0/8) Epoch 17, batch 1200, train_loss[loss=2.875, ArTop10Accuracy=0.7577, over 12714.00 frames. ], tot_loss[loss=2.744, ArTop10Accuracy=0.7807, over 11871.85 frames. ], batch size: 101, lr: 7.01e-03
303
+ 2024-08-06 13:17:21,130 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
304
+ 2024-08-06 13:17:21,134 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-17.pt
305
+ 2024-08-06 13:19:15,995 INFO [trainer.py:765] (0/8) Epoch 18, batch 100, train_loss[loss=2.79, ArTop10Accuracy=0.7737, over 14730.00 frames. ], tot_loss[loss=2.738, ArTop10Accuracy=0.7815, over 4768.08 frames. ], batch size: 62, lr: 6.78e-03
306
+ 2024-08-06 13:20:46,597 INFO [trainer.py:765] (0/8) Epoch 18, batch 200, train_loss[loss=2.71, ArTop10Accuracy=0.7899, over 13821.00 frames. ], tot_loss[loss=2.73, ArTop10Accuracy=0.7832, over 7767.37 frames. ], batch size: 34, lr: 6.77e-03
307
+ 2024-08-06 13:21:55,105 INFO [trainer.py:803] (0/8) Computing validation loss
308
+ 2024-08-06 13:22:04,751 INFO [trainer.py:811] (0/8) Epoch 18, validation: loss=2.817, ArTop10Accuracy=0.768, over 1827537.00 frames.
309
+ 2024-08-06 13:22:04,752 INFO [trainer.py:814] (0/8) Maximum memory allocated so far is 29524MB
310
+ 2024-08-06 13:22:05,474 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.323e+02 1.409e+02 1.514e+02 3.209e+02, threshold=2.818e+02, percent-clipped=0.1
311
+ 2024-08-06 13:22:26,581 INFO [trainer.py:765] (0/8) Epoch 18, batch 300, train_loss[loss=2.789, ArTop10Accuracy=0.7743, over 14439.00 frames. ], tot_loss[loss=2.721, ArTop10Accuracy=0.7848, over 9396.72 frames. ], batch size: 45, lr: 6.76e-03
312
+ 2024-08-06 13:23:57,928 INFO [trainer.py:765] (0/8) Epoch 18, batch 400, train_loss[loss=2.672, ArTop10Accuracy=0.7928, over 10332.00 frames. ], tot_loss[loss=2.718, ArTop10Accuracy=0.7856, over 10300.70 frames. ], batch size: 14, lr: 6.74e-03
313
+ 2024-08-06 13:25:34,012 INFO [trainer.py:765] (0/8) Epoch 18, batch 500, train_loss[loss=2.645, ArTop10Accuracy=0.8047, over 12105.00 frames. ], tot_loss[loss=2.714, ArTop10Accuracy=0.7864, over 10847.39 frames. ], batch size: 22, lr: 6.73e-03
314
+ 2024-08-06 13:27:00,632 INFO [trainer.py:765] (0/8) Epoch 18, batch 600, train_loss[loss=2.638, ArTop10Accuracy=0.7969, over 12132.00 frames. ], tot_loss[loss=2.716, ArTop10Accuracy=0.7859, over 11382.46 frames. ], batch size: 19, lr: 6.71e-03
315
+ 2024-08-06 13:28:33,583 INFO [trainer.py:765] (0/8) Epoch 18, batch 700, train_loss[loss=2.602, ArTop10Accuracy=0.8091, over 10143.00 frames. ], tot_loss[loss=2.724, ArTop10Accuracy=0.7847, over 11529.41 frames. ], batch size: 12, lr: 6.70e-03
316
+ 2024-08-06 13:29:54,986 INFO [trainer.py:765] (0/8) Epoch 18, batch 800, train_loss[loss=2.684, ArTop10Accuracy=0.7945, over 10206.00 frames. ], tot_loss[loss=2.727, ArTop10Accuracy=0.7842, over 11648.40 frames. ], batch size: 12, lr: 6.68e-03
317
+ 2024-08-06 13:31:12,518 INFO [trainer.py:765] (0/8) Epoch 18, batch 900, train_loss[loss=2.679, ArTop10Accuracy=0.7971, over 12792.00 frames. ], tot_loss[loss=2.721, ArTop10Accuracy=0.7852, over 11695.39 frames. ], batch size: 27, lr: 6.67e-03
318
+ 2024-08-06 13:32:26,550 INFO [trainer.py:765] (0/8) Epoch 18, batch 1000, train_loss[loss=2.67, ArTop10Accuracy=0.7925, over 12756.00 frames. ], tot_loss[loss=2.727, ArTop10Accuracy=0.7841, over 11888.08 frames. ], batch size: 27, lr: 6.66e-03
319
+ 2024-08-06 13:33:41,496 INFO [trainer.py:765] (0/8) Epoch 18, batch 1100, train_loss[loss=2.722, ArTop10Accuracy=0.7837, over 13665.00 frames. ], tot_loss[loss=2.732, ArTop10Accuracy=0.7832, over 11958.37 frames. ], batch size: 34, lr: 6.64e-03
320
+ 2024-08-06 13:34:54,675 INFO [trainer.py:765] (0/8) Epoch 18, batch 1200, train_loss[loss=2.85, ArTop10Accuracy=0.7681, over 12753.00 frames. ], tot_loss[loss=2.731, ArTop10Accuracy=0.7834, over 11867.53 frames. ], batch size: 101, lr: 6.63e-03
321
+ 2024-08-06 13:35:51,064 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.340e+02 1.433e+02 1.533e+02 2.444e+02, threshold=2.867e+02, percent-clipped=0.0
322
+ 2024-08-06 13:35:54,972 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
323
+ 2024-08-06 13:35:54,974 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-18.pt
324
+ 2024-08-06 13:37:48,624 INFO [trainer.py:765] (0/8) Epoch 19, batch 100, train_loss[loss=2.798, ArTop10Accuracy=0.7718, over 14607.00 frames. ], tot_loss[loss=2.719, ArTop10Accuracy=0.7848, over 4757.47 frames. ], batch size: 62, lr: 6.43e-03
325
+ 2024-08-06 13:39:23,256 INFO [trainer.py:765] (0/8) Epoch 19, batch 200, train_loss[loss=2.751, ArTop10Accuracy=0.7775, over 13698.00 frames. ], tot_loss[loss=2.715, ArTop10Accuracy=0.7858, over 7745.97 frames. ], batch size: 34, lr: 6.41e-03
326
+ 2024-08-06 13:40:48,358 INFO [trainer.py:765] (0/8) Epoch 19, batch 300, train_loss[loss=2.768, ArTop10Accuracy=0.7791, over 14661.00 frames. ], tot_loss[loss=2.708, ArTop10Accuracy=0.7874, over 9366.48 frames. ], batch size: 45, lr: 6.40e-03
327
+ 2024-08-06 13:42:21,068 INFO [trainer.py:765] (0/8) Epoch 19, batch 400, train_loss[loss=2.615, ArTop10Accuracy=0.8009, over 10503.00 frames. ], tot_loss[loss=2.706, ArTop10Accuracy=0.7879, over 10283.24 frames. ], batch size: 14, lr: 6.39e-03
328
+ 2024-08-06 13:43:44,955 INFO [trainer.py:765] (0/8) Epoch 19, batch 500, train_loss[loss=2.719, ArTop10Accuracy=0.7864, over 12033.00 frames. ], tot_loss[loss=2.704, ArTop10Accuracy=0.7883, over 10843.55 frames. ], batch size: 22, lr: 6.37e-03
329
+ 2024-08-06 13:45:16,681 INFO [trainer.py:765] (0/8) Epoch 19, batch 600, train_loss[loss=2.701, ArTop10Accuracy=0.7897, over 11457.00 frames. ], tot_loss[loss=2.706, ArTop10Accuracy=0.788, over 11364.68 frames. ], batch size: 18, lr: 6.36e-03
330
+ 2024-08-06 13:46:48,323 INFO [trainer.py:765] (0/8) Epoch 19, batch 700, train_loss[loss=2.563, ArTop10Accuracy=0.8169, over 10290.00 frames. ], tot_loss[loss=2.709, ArTop10Accuracy=0.7874, over 11523.58 frames. ], batch size: 12, lr: 6.35e-03
331
+ 2024-08-06 13:48:11,883 INFO [trainer.py:765] (0/8) Epoch 19, batch 800, train_loss[loss=2.58, ArTop10Accuracy=0.8112, over 10350.00 frames. ], tot_loss[loss=2.713, ArTop10Accuracy=0.7865, over 11661.89 frames. ], batch size: 12, lr: 6.34e-03
332
+ 2024-08-06 13:49:27,256 INFO [trainer.py:765] (0/8) Epoch 19, batch 900, train_loss[loss=2.645, ArTop10Accuracy=0.8003, over 13215.00 frames. ], tot_loss[loss=2.708, ArTop10Accuracy=0.7877, over 11704.49 frames. ], batch size: 27, lr: 6.32e-03
333
+ 2024-08-06 13:50:40,654 INFO [trainer.py:803] (0/8) Computing validation loss
334
+ 2024-08-06 13:50:50,535 INFO [trainer.py:811] (0/8) Epoch 19, validation: loss=2.818, ArTop10Accuracy=0.7679, over 1827537.00 frames.
335
+ 2024-08-06 13:50:50,535 INFO [trainer.py:814] (0/8) Maximum memory allocated so far is 29524MB
336
+ 2024-08-06 13:50:51,491 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.371e+02 1.455e+02 1.550e+02 3.697e+02, threshold=2.909e+02, percent-clipped=0.2
337
+ 2024-08-06 13:50:52,917 INFO [trainer.py:765] (0/8) Epoch 19, batch 1000, train_loss[loss=2.755, ArTop10Accuracy=0.7868, over 13050.00 frames. ], tot_loss[loss=2.713, ArTop10Accuracy=0.7866, over 11906.01 frames. ], batch size: 27, lr: 6.31e-03
338
+ 2024-08-06 13:52:08,264 INFO [trainer.py:765] (0/8) Epoch 19, batch 1100, train_loss[loss=2.738, ArTop10Accuracy=0.7811, over 13623.00 frames. ], tot_loss[loss=2.72, ArTop10Accuracy=0.7853, over 11973.95 frames. ], batch size: 34, lr: 6.30e-03
339
+ 2024-08-06 13:53:22,314 INFO [trainer.py:765] (0/8) Epoch 19, batch 1200, train_loss[loss=2.835, ArTop10Accuracy=0.7578, over 12549.00 frames. ], tot_loss[loss=2.723, ArTop10Accuracy=0.7847, over 11876.88 frames. ], batch size: 103, lr: 6.28e-03
340
+ 2024-08-06 13:54:21,954 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
341
+ 2024-08-06 13:54:21,958 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-19.pt
342
+ 2024-08-06 13:56:12,902 INFO [trainer.py:765] (0/8) Epoch 20, batch 100, train_loss[loss=2.747, ArTop10Accuracy=0.7778, over 14466.00 frames. ], tot_loss[loss=2.71, ArTop10Accuracy=0.7868, over 4752.63 frames. ], batch size: 63, lr: 6.10e-03
343
+ 2024-08-06 13:57:42,495 INFO [trainer.py:765] (0/8) Epoch 20, batch 200, train_loss[loss=2.695, ArTop10Accuracy=0.7915, over 13728.00 frames. ], tot_loss[loss=2.708, ArTop10Accuracy=0.7873, over 7731.34 frames. ], batch size: 34, lr: 6.09e-03
344
+ 2024-08-06 13:59:15,430 INFO [trainer.py:765] (0/8) Epoch 20, batch 300, train_loss[loss=2.759, ArTop10Accuracy=0.775, over 14556.00 frames. ], tot_loss[loss=2.706, ArTop10Accuracy=0.7876, over 9388.49 frames. ], batch size: 45, lr: 6.08e-03
345
+ 2024-08-06 14:00:44,358 INFO [trainer.py:765] (0/8) Epoch 20, batch 400, train_loss[loss=2.515, ArTop10Accuracy=0.8197, over 10503.00 frames. ], tot_loss[loss=2.699, ArTop10Accuracy=0.7887, over 10287.72 frames. ], batch size: 14, lr: 6.07e-03
346
+ 2024-08-06 14:02:14,855 INFO [trainer.py:765] (0/8) Epoch 20, batch 500, train_loss[loss=2.665, ArTop10Accuracy=0.7981, over 12078.00 frames. ], tot_loss[loss=2.693, ArTop10Accuracy=0.7902, over 10843.66 frames. ], batch size: 22, lr: 6.06e-03
347
+ 2024-08-06 14:03:40,853 INFO [trainer.py:765] (0/8) Epoch 20, batch 600, train_loss[loss=2.729, ArTop10Accuracy=0.7816, over 11331.00 frames. ], tot_loss[loss=2.695, ArTop10Accuracy=0.7897, over 11359.53 frames. ], batch size: 18, lr: 6.04e-03
348
+ 2024-08-06 14:05:13,864 INFO [trainer.py:765] (0/8) Epoch 20, batch 700, train_loss[loss=2.575, ArTop10Accuracy=0.8131, over 10071.00 frames. ], tot_loss[loss=2.7, ArTop10Accuracy=0.789, over 11524.60 frames. ], batch size: 12, lr: 6.03e-03
349
+ 2024-08-06 14:05:30,792 INFO [optim.py:386] (0/8) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.365e+02 1.456e+02 1.550e+02 3.525e+02, threshold=2.913e+02, percent-clipped=0.1
350
+ 2024-08-06 14:06:34,509 INFO [trainer.py:765] (0/8) Epoch 20, batch 800, train_loss[loss=2.737, ArTop10Accuracy=0.7809, over 10116.00 frames. ], tot_loss[loss=2.704, ArTop10Accuracy=0.7881, over 11642.44 frames. ], batch size: 12, lr: 6.02e-03
351
+ 2024-08-06 14:07:50,944 INFO [trainer.py:765] (0/8) Epoch 20, batch 900, train_loss[loss=2.648, ArTop10Accuracy=0.796, over 12945.00 frames. ], tot_loss[loss=2.701, ArTop10Accuracy=0.7887, over 11672.81 frames. ], batch size: 27, lr: 6.01e-03
352
+ 2024-08-06 14:09:07,174 INFO [trainer.py:765] (0/8) Epoch 20, batch 1000, train_loss[loss=2.709, ArTop10Accuracy=0.7859, over 13047.00 frames. ], tot_loss[loss=2.708, ArTop10Accuracy=0.7876, over 11885.74 frames. ], batch size: 27, lr: 6.00e-03
353
+ 2024-08-06 14:10:21,209 INFO [trainer.py:765] (0/8) Epoch 20, batch 1100, train_loss[loss=2.764, ArTop10Accuracy=0.7796, over 13680.00 frames. ], tot_loss[loss=2.716, ArTop10Accuracy=0.7862, over 11935.78 frames. ], batch size: 34, lr: 5.99e-03
354
+ 2024-08-06 14:11:37,813 INFO [trainer.py:765] (0/8) Epoch 20, batch 1200, train_loss[loss=2.803, ArTop10Accuracy=0.7718, over 11997.00 frames. ], tot_loss[loss=2.718, ArTop10Accuracy=0.7858, over 11870.84 frames. ], batch size: 101, lr: 5.98e-03
355
+ 2024-08-06 14:12:37,148 INFO [trainer.py:650] (0/8) Reaches end of dataloader.
356
+ 2024-08-06 14:12:37,151 INFO [checkpoint.py:75] (0/8) Saving checkpoint to exp/valle/epoch-20.pt
357
+ 2024-08-06 14:12:43,011 INFO [trainer.py:1069] (0/8) Done!
libritts-r/log/log-train-2024-08-06-08-06-14-1 ADDED
@@ -0,0 +1,336 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:06:14,314 INFO [trainer.py:870] (1/8) Training started
2
+ 2024-08-06 08:06:14,315 INFO [trainer.py:889] (1/8) Device: cuda:1
3
+ 2024-08-06 08:06:14,315 INFO [trainer.py:890] (1/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:06:14,315 INFO [trainer.py:892] (1/8) About to create model
5
+ 2024-08-06 08:06:15,030 INFO [trainer.py:899] (1/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:06:16,712 INFO [trainer.py:914] (1/8) Using DDP
7
+ 2024-08-06 08:06:19,149 INFO [datamodule.py:427] (1/8) About to get train cuts
8
+ 2024-08-06 08:06:19,151 INFO [datamodule.py:434] (1/8) About to get dev cuts
9
+ 2024-08-06 08:06:19,152 INFO [datamodule.py:292] (1/8) Disable SpecAugment
10
+ 2024-08-06 08:06:19,152 INFO [datamodule.py:294] (1/8) About to create train dataset
11
+ 2024-08-06 08:06:19,153 INFO [datamodule.py:323] (1/8) Using DynamicBucketingSampler
12
+ 2024-08-06 08:06:19,769 INFO [datamodule.py:344] (1/8) About to create train dataloader
13
+ 2024-08-06 08:06:19,769 INFO [datamodule.py:367] (1/8) About to create dev dataset
14
+ 2024-08-06 08:06:20,100 INFO [datamodule.py:388] (1/8) About to create dev dataloader
15
+ 2024-08-06 08:08:02,125 INFO [trainer.py:765] (1/8) Epoch 1, batch 100, train_loss[loss=4.313, ArTop10Accuracy=0.499, over 14373.00 frames. ], tot_loss[loss=5.051, ArTop10Accuracy=0.3736, over 4747.16 frames. ], batch size: 63, lr: 2.25e-02
16
+ 2024-08-06 08:09:28,831 INFO [trainer.py:765] (1/8) Epoch 1, batch 200, train_loss[loss=4.082, ArTop10Accuracy=0.5339, over 13701.00 frames. ], tot_loss[loss=4.494, ArTop10Accuracy=0.4669, over 7740.47 frames. ], batch size: 34, lr: 3.00e-02
17
+ 2024-08-06 08:10:52,432 INFO [trainer.py:765] (1/8) Epoch 1, batch 300, train_loss[loss=3.827, ArTop10Accuracy=0.5819, over 14076.00 frames. ], tot_loss[loss=4.214, ArTop10Accuracy=0.5136, over 9378.41 frames. ], batch size: 44, lr: 3.00e-02
18
+ 2024-08-06 08:12:12,703 INFO [trainer.py:765] (1/8) Epoch 1, batch 400, train_loss[loss=3.646, ArTop10Accuracy=0.6151, over 10353.00 frames. ], tot_loss[loss=4.028, ArTop10Accuracy=0.5453, over 10284.75 frames. ], batch size: 14, lr: 3.00e-02
19
+ 2024-08-06 08:13:40,054 INFO [trainer.py:765] (1/8) Epoch 1, batch 500, train_loss[loss=3.622, ArTop10Accuracy=0.6179, over 12669.00 frames. ], tot_loss[loss=3.883, ArTop10Accuracy=0.5706, over 10856.75 frames. ], batch size: 23, lr: 2.99e-02
20
+ 2024-08-06 08:15:00,247 INFO [trainer.py:765] (1/8) Epoch 1, batch 600, train_loss[loss=3.602, ArTop10Accuracy=0.6197, over 11541.00 frames. ], tot_loss[loss=3.77, ArTop10Accuracy=0.5906, over 11363.67 frames. ], batch size: 18, lr: 2.99e-02
21
+ 2024-08-06 08:16:26,429 INFO [trainer.py:765] (1/8) Epoch 1, batch 700, train_loss[loss=3.496, ArTop10Accuracy=0.6398, over 10332.00 frames. ], tot_loss[loss=3.689, ArTop10Accuracy=0.6051, over 11510.86 frames. ], batch size: 12, lr: 2.99e-02
22
+ 2024-08-06 08:17:43,022 INFO [trainer.py:765] (1/8) Epoch 1, batch 800, train_loss[loss=3.526, ArTop10Accuracy=0.635, over 10014.00 frames. ], tot_loss[loss=3.625, ArTop10Accuracy=0.6167, over 11655.02 frames. ], batch size: 12, lr: 2.98e-02
23
+ 2024-08-06 08:18:56,155 INFO [trainer.py:765] (1/8) Epoch 1, batch 900, train_loss[loss=3.49, ArTop10Accuracy=0.6426, over 12882.00 frames. ], tot_loss[loss=3.567, ArTop10Accuracy=0.6274, over 11695.58 frames. ], batch size: 27, lr: 2.98e-02
24
+ 2024-08-06 08:20:12,867 INFO [trainer.py:765] (1/8) Epoch 1, batch 1000, train_loss[loss=3.434, ArTop10Accuracy=0.6517, over 13002.00 frames. ], tot_loss[loss=3.525, ArTop10Accuracy=0.635, over 11887.94 frames. ], batch size: 27, lr: 2.97e-02
25
+ 2024-08-06 08:20:13,547 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 9.300e+01 1.871e+02 2.675e+02 4.030e+02 9.119e+03, threshold=5.351e+02, percent-clipped=0.0
26
+ 2024-08-06 08:21:29,161 INFO [trainer.py:765] (1/8) Epoch 1, batch 1100, train_loss[loss=3.453, ArTop10Accuracy=0.6489, over 14007.00 frames. ], tot_loss[loss=3.488, ArTop10Accuracy=0.6419, over 11969.79 frames. ], batch size: 35, lr: 2.96e-02
27
+ 2024-08-06 08:22:45,417 INFO [trainer.py:765] (1/8) Epoch 1, batch 1200, train_loss[loss=3.476, ArTop10Accuracy=0.644, over 12039.00 frames. ], tot_loss[loss=3.463, ArTop10Accuracy=0.6461, over 11878.23 frames. ], batch size: 101, lr: 2.96e-02
28
+ 2024-08-06 08:23:45,310 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
29
+ 2024-08-06 08:25:36,244 INFO [trainer.py:765] (1/8) Epoch 2, batch 100, train_loss[loss=3.4, ArTop10Accuracy=0.6565, over 14508.00 frames. ], tot_loss[loss=3.423, ArTop10Accuracy=0.6528, over 4764.29 frames. ], batch size: 62, lr: 2.90e-02
30
+ 2024-08-06 08:26:58,962 INFO [trainer.py:765] (1/8) Epoch 2, batch 200, train_loss[loss=3.323, ArTop10Accuracy=0.6679, over 13725.00 frames. ], tot_loss[loss=3.385, ArTop10Accuracy=0.6599, over 7749.22 frames. ], batch size: 34, lr: 2.89e-02
31
+ 2024-08-06 08:28:25,539 INFO [trainer.py:765] (1/8) Epoch 2, batch 300, train_loss[loss=3.403, ArTop10Accuracy=0.6592, over 14277.00 frames. ], tot_loss[loss=3.366, ArTop10Accuracy=0.6635, over 9375.13 frames. ], batch size: 44, lr: 2.89e-02
32
+ 2024-08-06 08:29:48,644 INFO [trainer.py:765] (1/8) Epoch 2, batch 400, train_loss[loss=3.391, ArTop10Accuracy=0.6543, over 11046.00 frames. ], tot_loss[loss=3.354, ArTop10Accuracy=0.666, over 10294.61 frames. ], batch size: 15, lr: 2.88e-02
33
+ 2024-08-06 08:31:22,906 INFO [trainer.py:765] (1/8) Epoch 2, batch 500, train_loss[loss=3.266, ArTop10Accuracy=0.6837, over 12363.00 frames. ], tot_loss[loss=3.337, ArTop10Accuracy=0.6696, over 10867.29 frames. ], batch size: 22, lr: 2.87e-02
34
+ 2024-08-06 08:32:45,694 INFO [trainer.py:765] (1/8) Epoch 2, batch 600, train_loss[loss=3.294, ArTop10Accuracy=0.6813, over 11376.00 frames. ], tot_loss[loss=3.327, ArTop10Accuracy=0.6714, over 11379.06 frames. ], batch size: 18, lr: 2.86e-02
35
+ 2024-08-06 08:34:13,587 INFO [trainer.py:765] (1/8) Epoch 2, batch 700, train_loss[loss=3.278, ArTop10Accuracy=0.6789, over 10239.00 frames. ], tot_loss[loss=3.322, ArTop10Accuracy=0.6723, over 11515.89 frames. ], batch size: 12, lr: 2.85e-02
36
+ 2024-08-06 08:34:31,179 INFO [trainer.py:803] (1/8) Computing validation loss
37
+ 2024-08-06 08:34:40,887 INFO [trainer.py:811] (1/8) Epoch 2, validation: loss=3.277, ArTop10Accuracy=0.6803, over 1827537.00 frames.
38
+ 2024-08-06 08:34:40,888 INFO [trainer.py:814] (1/8) Maximum memory allocated so far is 31570MB
39
+ 2024-08-06 08:34:41,706 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 7.953e+01 1.592e+02 2.200e+02 3.344e+02 2.949e+03, threshold=4.400e+02, percent-clipped=8.6
40
+ 2024-08-06 08:35:39,883 INFO [trainer.py:765] (1/8) Epoch 2, batch 800, train_loss[loss=3.314, ArTop10Accuracy=0.6699, over 9348.00 frames. ], tot_loss[loss=3.319, ArTop10Accuracy=0.673, over 11636.96 frames. ], batch size: 11, lr: 2.84e-02
41
+ 2024-08-06 08:36:56,377 INFO [trainer.py:765] (1/8) Epoch 2, batch 900, train_loss[loss=3.373, ArTop10Accuracy=0.6616, over 12846.00 frames. ], tot_loss[loss=3.305, ArTop10Accuracy=0.6758, over 11683.18 frames. ], batch size: 27, lr: 2.83e-02
42
+ 2024-08-06 08:38:10,518 INFO [trainer.py:765] (1/8) Epoch 2, batch 1000, train_loss[loss=3.233, ArTop10Accuracy=0.6893, over 12870.00 frames. ], tot_loss[loss=3.296, ArTop10Accuracy=0.6774, over 11873.81 frames. ], batch size: 27, lr: 2.82e-02
43
+ 2024-08-06 08:39:25,065 INFO [trainer.py:765] (1/8) Epoch 2, batch 1100, train_loss[loss=3.28, ArTop10Accuracy=0.6837, over 13569.00 frames. ], tot_loss[loss=3.291, ArTop10Accuracy=0.6783, over 11963.75 frames. ], batch size: 34, lr: 2.81e-02
44
+ 2024-08-06 08:40:38,225 INFO [trainer.py:765] (1/8) Epoch 2, batch 1200, train_loss[loss=3.337, ArTop10Accuracy=0.6672, over 12903.00 frames. ], tot_loss[loss=3.281, ArTop10Accuracy=0.6802, over 11861.92 frames. ], batch size: 101, lr: 2.80e-02
45
+ 2024-08-06 08:41:38,205 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
46
+ 2024-08-06 08:43:36,655 INFO [trainer.py:765] (1/8) Epoch 3, batch 100, train_loss[loss=3.334, ArTop10Accuracy=0.6681, over 14691.00 frames. ], tot_loss[loss=3.254, ArTop10Accuracy=0.6846, over 4778.14 frames. ], batch size: 62, lr: 2.67e-02
47
+ 2024-08-06 08:45:10,505 INFO [trainer.py:765] (1/8) Epoch 3, batch 200, train_loss[loss=3.187, ArTop10Accuracy=0.6983, over 13692.00 frames. ], tot_loss[loss=3.223, ArTop10Accuracy=0.6906, over 7780.48 frames. ], batch size: 34, lr: 2.66e-02
48
+ 2024-08-06 08:46:29,264 INFO [trainer.py:765] (1/8) Epoch 3, batch 300, train_loss[loss=3.197, ArTop10Accuracy=0.7005, over 14133.00 frames. ], tot_loss[loss=3.206, ArTop10Accuracy=0.6938, over 9395.78 frames. ], batch size: 44, lr: 2.64e-02
49
+ 2024-08-06 08:48:04,223 INFO [trainer.py:765] (1/8) Epoch 3, batch 400, train_loss[loss=3.106, ArTop10Accuracy=0.7194, over 10431.00 frames. ], tot_loss[loss=3.191, ArTop10Accuracy=0.6968, over 10278.07 frames. ], batch size: 14, lr: 2.63e-02
50
+ 2024-08-06 08:48:40,887 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 9.282e+01 1.561e+02 1.981e+02 2.686e+02 1.768e+03, threshold=3.962e+02, percent-clipped=7.6
51
+ 2024-08-06 08:49:25,548 INFO [trainer.py:765] (1/8) Epoch 3, batch 500, train_loss[loss=3.143, ArTop10Accuracy=0.7066, over 12162.00 frames. ], tot_loss[loss=3.171, ArTop10Accuracy=0.7005, over 10856.70 frames. ], batch size: 22, lr: 2.62e-02
52
+ 2024-08-06 08:51:00,483 INFO [trainer.py:765] (1/8) Epoch 3, batch 600, train_loss[loss=3.082, ArTop10Accuracy=0.723, over 11733.00 frames. ], tot_loss[loss=3.156, ArTop10Accuracy=0.7034, over 11384.37 frames. ], batch size: 18, lr: 2.61e-02
53
+ 2024-08-06 08:52:31,624 INFO [trainer.py:765] (1/8) Epoch 3, batch 700, train_loss[loss=3.085, ArTop10Accuracy=0.721, over 10002.00 frames. ], tot_loss[loss=3.15, ArTop10Accuracy=0.7044, over 11517.63 frames. ], batch size: 12, lr: 2.60e-02
54
+ 2024-08-06 08:53:57,395 INFO [trainer.py:765] (1/8) Epoch 3, batch 800, train_loss[loss=3.078, ArTop10Accuracy=0.7226, over 10086.00 frames. ], tot_loss[loss=3.142, ArTop10Accuracy=0.7064, over 11639.54 frames. ], batch size: 12, lr: 2.59e-02
55
+ 2024-08-06 08:55:15,124 INFO [trainer.py:765] (1/8) Epoch 3, batch 900, train_loss[loss=3.066, ArTop10Accuracy=0.7258, over 12849.00 frames. ], tot_loss[loss=3.12, ArTop10Accuracy=0.7104, over 11684.21 frames. ], batch size: 27, lr: 2.57e-02
56
+ 2024-08-06 08:56:31,564 INFO [trainer.py:765] (1/8) Epoch 3, batch 1000, train_loss[loss=3.051, ArTop10Accuracy=0.7245, over 12855.00 frames. ], tot_loss[loss=3.112, ArTop10Accuracy=0.7118, over 11895.25 frames. ], batch size: 27, lr: 2.56e-02
57
+ 2024-08-06 08:57:46,510 INFO [trainer.py:765] (1/8) Epoch 3, batch 1100, train_loss[loss=3.066, ArTop10Accuracy=0.7182, over 13584.00 frames. ], tot_loss[loss=3.104, ArTop10Accuracy=0.7134, over 11977.87 frames. ], batch size: 34, lr: 2.55e-02
58
+ 2024-08-06 08:59:01,403 INFO [trainer.py:765] (1/8) Epoch 3, batch 1200, train_loss[loss=3.159, ArTop10Accuracy=0.702, over 11196.00 frames. ], tot_loss[loss=3.095, ArTop10Accuracy=0.715, over 11868.07 frames. ], batch size: 103, lr: 2.54e-02
59
+ 2024-08-06 09:00:01,941 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
60
+ 2024-08-06 09:01:50,745 INFO [trainer.py:765] (1/8) Epoch 4, batch 100, train_loss[loss=3.027, ArTop10Accuracy=0.7318, over 14526.00 frames. ], tot_loss[loss=3.07, ArTop10Accuracy=0.7198, over 4767.61 frames. ], batch size: 62, lr: 2.38e-02
61
+ 2024-08-06 09:02:52,864 INFO [trainer.py:803] (1/8) Computing validation loss
62
+ 2024-08-06 09:03:02,383 INFO [trainer.py:811] (1/8) Epoch 4, validation: loss=2.997, ArTop10Accuracy=0.7338, over 1827537.00 frames.
63
+ 2024-08-06 09:03:02,384 INFO [trainer.py:814] (1/8) Maximum memory allocated so far is 31570MB
64
+ 2024-08-06 09:03:03,368 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.499e+02 1.782e+02 2.273e+02 1.100e+03, threshold=3.565e+02, percent-clipped=4.7
65
+ 2024-08-06 09:03:29,277 INFO [trainer.py:765] (1/8) Epoch 4, batch 200, train_loss[loss=2.974, ArTop10Accuracy=0.7396, over 13569.00 frames. ], tot_loss[loss=3.051, ArTop10Accuracy=0.7232, over 7747.56 frames. ], batch size: 34, lr: 2.37e-02
66
+ 2024-08-06 09:05:01,738 INFO [trainer.py:765] (1/8) Epoch 4, batch 300, train_loss[loss=3.089, ArTop10Accuracy=0.7124, over 14157.00 frames. ], tot_loss[loss=3.041, ArTop10Accuracy=0.7252, over 9375.36 frames. ], batch size: 44, lr: 2.36e-02
67
+ 2024-08-06 09:06:28,155 INFO [trainer.py:765] (1/8) Epoch 4, batch 400, train_loss[loss=2.98, ArTop10Accuracy=0.7353, over 10224.00 frames. ], tot_loss[loss=3.034, ArTop10Accuracy=0.7266, over 10275.30 frames. ], batch size: 14, lr: 2.34e-02
68
+ 2024-08-06 09:08:01,929 INFO [trainer.py:765] (1/8) Epoch 4, batch 500, train_loss[loss=2.934, ArTop10Accuracy=0.7454, over 12306.00 frames. ], tot_loss[loss=3.025, ArTop10Accuracy=0.7283, over 10830.06 frames. ], batch size: 22, lr: 2.33e-02
69
+ 2024-08-06 09:09:28,546 INFO [trainer.py:765] (1/8) Epoch 4, batch 600, train_loss[loss=3.117, ArTop10Accuracy=0.7081, over 11433.00 frames. ], tot_loss[loss=3.018, ArTop10Accuracy=0.7296, over 11350.44 frames. ], batch size: 18, lr: 2.32e-02
70
+ 2024-08-06 09:10:59,871 INFO [trainer.py:765] (1/8) Epoch 4, batch 700, train_loss[loss=3.038, ArTop10Accuracy=0.7271, over 10323.00 frames. ], tot_loss[loss=3.024, ArTop10Accuracy=0.7283, over 11501.78 frames. ], batch size: 12, lr: 2.31e-02
71
+ 2024-08-06 09:12:17,518 INFO [trainer.py:765] (1/8) Epoch 4, batch 800, train_loss[loss=3.003, ArTop10Accuracy=0.7298, over 9342.00 frames. ], tot_loss[loss=3.023, ArTop10Accuracy=0.7284, over 11621.48 frames. ], batch size: 11, lr: 2.30e-02
72
+ 2024-08-06 09:13:33,218 INFO [trainer.py:765] (1/8) Epoch 4, batch 900, train_loss[loss=3.1, ArTop10Accuracy=0.7141, over 12897.00 frames. ], tot_loss[loss=3.016, ArTop10Accuracy=0.7296, over 11659.32 frames. ], batch size: 27, lr: 2.29e-02
73
+ 2024-08-06 09:14:47,526 INFO [trainer.py:765] (1/8) Epoch 4, batch 1000, train_loss[loss=3.042, ArTop10Accuracy=0.7206, over 12765.00 frames. ], tot_loss[loss=3.013, ArTop10Accuracy=0.7304, over 11873.50 frames. ], batch size: 27, lr: 2.28e-02
74
+ 2024-08-06 09:16:02,987 INFO [trainer.py:765] (1/8) Epoch 4, batch 1100, train_loss[loss=3.001, ArTop10Accuracy=0.7331, over 13710.00 frames. ], tot_loss[loss=3.014, ArTop10Accuracy=0.7303, over 11946.58 frames. ], batch size: 34, lr: 2.26e-02
75
+ 2024-08-06 09:16:53,297 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.440e+02 1.636e+02 1.968e+02 7.702e+02, threshold=3.273e+02, percent-clipped=1.3
76
+ 2024-08-06 09:17:18,350 INFO [trainer.py:765] (1/8) Epoch 4, batch 1200, train_loss[loss=3.076, ArTop10Accuracy=0.7165, over 12258.00 frames. ], tot_loss[loss=3.011, ArTop10Accuracy=0.7307, over 11834.91 frames. ], batch size: 101, lr: 2.25e-02
77
+ 2024-08-06 09:18:17,461 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
78
+ 2024-08-06 09:20:17,177 INFO [trainer.py:765] (1/8) Epoch 5, batch 100, train_loss[loss=2.968, ArTop10Accuracy=0.7418, over 14499.00 frames. ], tot_loss[loss=2.991, ArTop10Accuracy=0.7345, over 4753.20 frames. ], batch size: 62, lr: 2.10e-02
79
+ 2024-08-06 09:21:52,300 INFO [trainer.py:765] (1/8) Epoch 5, batch 200, train_loss[loss=3.047, ArTop10Accuracy=0.7255, over 13533.00 frames. ], tot_loss[loss=2.983, ArTop10Accuracy=0.7362, over 7733.64 frames. ], batch size: 34, lr: 2.09e-02
80
+ 2024-08-06 09:23:19,245 INFO [trainer.py:765] (1/8) Epoch 5, batch 300, train_loss[loss=3, ArTop10Accuracy=0.7324, over 14067.00 frames. ], tot_loss[loss=2.971, ArTop10Accuracy=0.7382, over 9372.02 frames. ], batch size: 44, lr: 2.08e-02
81
+ 2024-08-06 09:24:53,543 INFO [trainer.py:765] (1/8) Epoch 5, batch 400, train_loss[loss=2.851, ArTop10Accuracy=0.7643, over 10191.00 frames. ], tot_loss[loss=2.965, ArTop10Accuracy=0.7392, over 10286.55 frames. ], batch size: 14, lr: 2.07e-02
82
+ 2024-08-06 09:26:19,424 INFO [trainer.py:765] (1/8) Epoch 5, batch 500, train_loss[loss=2.986, ArTop10Accuracy=0.7376, over 12162.00 frames. ], tot_loss[loss=2.963, ArTop10Accuracy=0.7397, over 10872.84 frames. ], batch size: 22, lr: 2.06e-02
83
+ 2024-08-06 09:27:49,543 INFO [trainer.py:765] (1/8) Epoch 5, batch 600, train_loss[loss=2.903, ArTop10Accuracy=0.7532, over 11931.00 frames. ], tot_loss[loss=2.964, ArTop10Accuracy=0.7397, over 11399.96 frames. ], batch size: 19, lr: 2.05e-02
84
+ 2024-08-06 09:29:21,676 INFO [trainer.py:765] (1/8) Epoch 5, batch 700, train_loss[loss=2.898, ArTop10Accuracy=0.7536, over 9321.00 frames. ], tot_loss[loss=2.972, ArTop10Accuracy=0.7382, over 11531.87 frames. ], batch size: 11, lr: 2.04e-02
85
+ 2024-08-06 09:30:44,699 INFO [trainer.py:765] (1/8) Epoch 5, batch 800, train_loss[loss=3.077, ArTop10Accuracy=0.7202, over 9351.00 frames. ], tot_loss[loss=2.974, ArTop10Accuracy=0.738, over 11643.11 frames. ], batch size: 11, lr: 2.03e-02
86
+ 2024-08-06 09:31:51,245 INFO [trainer.py:803] (1/8) Computing validation loss
87
+ 2024-08-06 09:32:00,761 INFO [trainer.py:811] (1/8) Epoch 5, validation: loss=2.926, ArTop10Accuracy=0.7466, over 1827537.00 frames.
88
+ 2024-08-06 09:32:00,761 INFO [trainer.py:814] (1/8) Maximum memory allocated so far is 31570MB
89
+ 2024-08-06 09:32:01,712 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.349e+02 1.525e+02 1.806e+02 1.007e+03, threshold=3.049e+02, percent-clipped=2.3
90
+ 2024-08-06 09:32:10,557 INFO [trainer.py:765] (1/8) Epoch 5, batch 900, train_loss[loss=2.939, ArTop10Accuracy=0.7484, over 12774.00 frames. ], tot_loss[loss=2.962, ArTop10Accuracy=0.7404, over 11677.07 frames. ], batch size: 27, lr: 2.02e-02
91
+ 2024-08-06 09:33:27,329 INFO [trainer.py:765] (1/8) Epoch 5, batch 1000, train_loss[loss=3, ArTop10Accuracy=0.7307, over 13125.00 frames. ], tot_loss[loss=2.962, ArTop10Accuracy=0.7405, over 11870.34 frames. ], batch size: 28, lr: 2.01e-02
92
+ 2024-08-06 09:34:42,306 INFO [trainer.py:765] (1/8) Epoch 5, batch 1100, train_loss[loss=2.929, ArTop10Accuracy=0.7503, over 13596.00 frames. ], tot_loss[loss=2.964, ArTop10Accuracy=0.7399, over 11943.56 frames. ], batch size: 34, lr: 2.00e-02
93
+ 2024-08-06 09:35:56,339 INFO [trainer.py:765] (1/8) Epoch 5, batch 1200, train_loss[loss=3.058, ArTop10Accuracy=0.7186, over 13242.00 frames. ], tot_loss[loss=2.963, ArTop10Accuracy=0.7399, over 11871.80 frames. ], batch size: 101, lr: 1.99e-02
94
+ 2024-08-06 09:36:55,360 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
95
+ 2024-08-06 09:38:52,668 INFO [trainer.py:765] (1/8) Epoch 6, batch 100, train_loss[loss=2.962, ArTop10Accuracy=0.7441, over 14187.00 frames. ], tot_loss[loss=2.946, ArTop10Accuracy=0.7427, over 4761.42 frames. ], batch size: 62, lr: 1.85e-02
96
+ 2024-08-06 09:40:19,840 INFO [trainer.py:765] (1/8) Epoch 6, batch 200, train_loss[loss=2.889, ArTop10Accuracy=0.7544, over 13533.00 frames. ], tot_loss[loss=2.934, ArTop10Accuracy=0.7452, over 7753.64 frames. ], batch size: 34, lr: 1.84e-02
97
+ 2024-08-06 09:41:52,971 INFO [trainer.py:765] (1/8) Epoch 6, batch 300, train_loss[loss=2.977, ArTop10Accuracy=0.7367, over 14202.00 frames. ], tot_loss[loss=2.928, ArTop10Accuracy=0.7462, over 9400.74 frames. ], batch size: 44, lr: 1.83e-02
98
+ 2024-08-06 09:43:17,833 INFO [trainer.py:765] (1/8) Epoch 6, batch 400, train_loss[loss=2.822, ArTop10Accuracy=0.7726, over 10521.00 frames. ], tot_loss[loss=2.925, ArTop10Accuracy=0.747, over 10303.55 frames. ], batch size: 14, lr: 1.83e-02
99
+ 2024-08-06 09:44:54,134 INFO [trainer.py:765] (1/8) Epoch 6, batch 500, train_loss[loss=2.937, ArTop10Accuracy=0.7418, over 12210.00 frames. ], tot_loss[loss=2.918, ArTop10Accuracy=0.7481, over 10854.68 frames. ], batch size: 22, lr: 1.82e-02
100
+ 2024-08-06 09:46:22,879 INFO [trainer.py:765] (1/8) Epoch 6, batch 600, train_loss[loss=2.901, ArTop10Accuracy=0.7563, over 11883.00 frames. ], tot_loss[loss=2.925, ArTop10Accuracy=0.7468, over 11386.09 frames. ], batch size: 19, lr: 1.81e-02
101
+ 2024-08-06 09:46:37,225 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.012e+02 1.339e+02 1.480e+02 1.701e+02 7.506e+02, threshold=2.959e+02, percent-clipped=1.1
102
+ 2024-08-06 09:47:57,875 INFO [trainer.py:765] (1/8) Epoch 6, batch 700, train_loss[loss=2.899, ArTop10Accuracy=0.7566, over 10038.00 frames. ], tot_loss[loss=2.929, ArTop10Accuracy=0.7462, over 11537.98 frames. ], batch size: 12, lr: 1.80e-02
103
+ 2024-08-06 09:49:15,961 INFO [trainer.py:765] (1/8) Epoch 6, batch 800, train_loss[loss=2.845, ArTop10Accuracy=0.7569, over 9402.00 frames. ], tot_loss[loss=2.932, ArTop10Accuracy=0.7455, over 11662.00 frames. ], batch size: 11, lr: 1.79e-02
104
+ 2024-08-06 09:50:32,141 INFO [trainer.py:765] (1/8) Epoch 6, batch 900, train_loss[loss=2.907, ArTop10Accuracy=0.748, over 12954.00 frames. ], tot_loss[loss=2.927, ArTop10Accuracy=0.7466, over 11709.57 frames. ], batch size: 27, lr: 1.78e-02
105
+ 2024-08-06 09:51:47,303 INFO [trainer.py:765] (1/8) Epoch 6, batch 1000, train_loss[loss=2.976, ArTop10Accuracy=0.7375, over 12882.00 frames. ], tot_loss[loss=2.929, ArTop10Accuracy=0.7461, over 11880.37 frames. ], batch size: 27, lr: 1.77e-02
106
+ 2024-08-06 09:53:00,927 INFO [trainer.py:765] (1/8) Epoch 6, batch 1100, train_loss[loss=2.896, ArTop10Accuracy=0.7569, over 13659.00 frames. ], tot_loss[loss=2.929, ArTop10Accuracy=0.7462, over 11936.22 frames. ], batch size: 34, lr: 1.77e-02
107
+ 2024-08-06 09:54:14,343 INFO [trainer.py:765] (1/8) Epoch 6, batch 1200, train_loss[loss=3.025, ArTop10Accuracy=0.7304, over 12231.00 frames. ], tot_loss[loss=2.928, ArTop10Accuracy=0.7461, over 11865.74 frames. ], batch size: 101, lr: 1.76e-02
108
+ 2024-08-06 09:55:13,177 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
109
+ 2024-08-06 09:57:06,705 INFO [trainer.py:765] (1/8) Epoch 7, batch 100, train_loss[loss=3.002, ArTop10Accuracy=0.7374, over 14385.00 frames. ], tot_loss[loss=2.913, ArTop10Accuracy=0.7486, over 4745.63 frames. ], batch size: 62, lr: 1.64e-02
110
+ 2024-08-06 09:58:39,429 INFO [trainer.py:765] (1/8) Epoch 7, batch 200, train_loss[loss=2.902, ArTop10Accuracy=0.7502, over 13656.00 frames. ], tot_loss[loss=2.9, ArTop10Accuracy=0.7515, over 7751.20 frames. ], batch size: 34, lr: 1.64e-02
111
+ 2024-08-06 10:00:06,090 INFO [trainer.py:765] (1/8) Epoch 7, batch 300, train_loss[loss=2.908, ArTop10Accuracy=0.755, over 14034.00 frames. ], tot_loss[loss=2.896, ArTop10Accuracy=0.7522, over 9344.26 frames. ], batch size: 44, lr: 1.63e-02
112
+ 2024-08-06 10:00:40,514 INFO [trainer.py:803] (1/8) Computing validation loss
113
+ 2024-08-06 10:00:50,245 INFO [trainer.py:811] (1/8) Epoch 7, validation: loss=2.88, ArTop10Accuracy=0.7554, over 1827537.00 frames.
114
+ 2024-08-06 10:00:50,246 INFO [trainer.py:814] (1/8) Maximum memory allocated so far is 31570MB
115
+ 2024-08-06 10:00:50,983 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.002e+02 1.286e+02 1.429e+02 1.605e+02 1.020e+03, threshold=2.857e+02, percent-clipped=1.5
116
+ 2024-08-06 10:01:49,123 INFO [trainer.py:765] (1/8) Epoch 7, batch 400, train_loss[loss=2.808, ArTop10Accuracy=0.769, over 10485.00 frames. ], tot_loss[loss=2.893, ArTop10Accuracy=0.7531, over 10278.27 frames. ], batch size: 14, lr: 1.62e-02
117
+ 2024-08-06 10:03:21,463 INFO [trainer.py:765] (1/8) Epoch 7, batch 500, train_loss[loss=2.828, ArTop10Accuracy=0.7612, over 12168.00 frames. ], tot_loss[loss=2.889, ArTop10Accuracy=0.754, over 10832.15 frames. ], batch size: 22, lr: 1.61e-02
118
+ 2024-08-06 10:04:51,889 INFO [trainer.py:765] (1/8) Epoch 7, batch 600, train_loss[loss=2.795, ArTop10Accuracy=0.7731, over 11178.00 frames. ], tot_loss[loss=2.888, ArTop10Accuracy=0.7539, over 11361.58 frames. ], batch size: 18, lr: 1.61e-02
119
+ 2024-08-06 10:06:25,117 INFO [trainer.py:765] (1/8) Epoch 7, batch 700, train_loss[loss=2.795, ArTop10Accuracy=0.766, over 9279.00 frames. ], tot_loss[loss=2.896, ArTop10Accuracy=0.7525, over 11491.31 frames. ], batch size: 11, lr: 1.60e-02
120
+ 2024-08-06 10:07:46,955 INFO [trainer.py:765] (1/8) Epoch 7, batch 800, train_loss[loss=2.803, ArTop10Accuracy=0.7747, over 10359.00 frames. ], tot_loss[loss=2.896, ArTop10Accuracy=0.7525, over 11632.01 frames. ], batch size: 12, lr: 1.59e-02
121
+ 2024-08-06 10:09:02,828 INFO [trainer.py:765] (1/8) Epoch 7, batch 900, train_loss[loss=2.792, ArTop10Accuracy=0.7744, over 12621.00 frames. ], tot_loss[loss=2.892, ArTop10Accuracy=0.7532, over 11686.51 frames. ], batch size: 27, lr: 1.59e-02
122
+ 2024-08-06 10:10:19,642 INFO [trainer.py:765] (1/8) Epoch 7, batch 1000, train_loss[loss=2.938, ArTop10Accuracy=0.7412, over 13341.00 frames. ], tot_loss[loss=2.894, ArTop10Accuracy=0.7527, over 11857.32 frames. ], batch size: 28, lr: 1.58e-02
123
+ 2024-08-06 10:11:35,214 INFO [trainer.py:765] (1/8) Epoch 7, batch 1100, train_loss[loss=2.966, ArTop10Accuracy=0.7316, over 13683.00 frames. ], tot_loss[loss=2.902, ArTop10Accuracy=0.7512, over 11942.87 frames. ], batch size: 34, lr: 1.57e-02
124
+ 2024-08-06 10:12:48,210 INFO [trainer.py:765] (1/8) Epoch 7, batch 1200, train_loss[loss=3.03, ArTop10Accuracy=0.7313, over 12375.00 frames. ], tot_loss[loss=2.901, ArTop10Accuracy=0.7515, over 11863.93 frames. ], batch size: 101, lr: 1.57e-02
125
+ 2024-08-06 10:13:46,697 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
126
+ 2024-08-06 10:15:03,607 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.017e+02 1.283e+02 1.410e+02 1.601e+02 1.017e+03, threshold=2.820e+02, percent-clipped=0.9
127
+ 2024-08-06 10:15:40,827 INFO [trainer.py:765] (1/8) Epoch 8, batch 100, train_loss[loss=2.922, ArTop10Accuracy=0.7479, over 14244.00 frames. ], tot_loss[loss=2.885, ArTop10Accuracy=0.7541, over 4746.16 frames. ], batch size: 62, lr: 1.47e-02
128
+ 2024-08-06 10:17:12,868 INFO [trainer.py:765] (1/8) Epoch 8, batch 200, train_loss[loss=2.894, ArTop10Accuracy=0.7525, over 13881.00 frames. ], tot_loss[loss=2.873, ArTop10Accuracy=0.7565, over 7754.49 frames. ], batch size: 34, lr: 1.46e-02
129
+ 2024-08-06 10:18:37,904 INFO [trainer.py:765] (1/8) Epoch 8, batch 300, train_loss[loss=2.901, ArTop10Accuracy=0.7532, over 14163.00 frames. ], tot_loss[loss=2.87, ArTop10Accuracy=0.7571, over 9382.60 frames. ], batch size: 44, lr: 1.46e-02
130
+ 2024-08-06 10:20:06,348 INFO [trainer.py:765] (1/8) Epoch 8, batch 400, train_loss[loss=2.749, ArTop10Accuracy=0.7818, over 10323.00 frames. ], tot_loss[loss=2.864, ArTop10Accuracy=0.7585, over 10284.68 frames. ], batch size: 14, lr: 1.45e-02
131
+ 2024-08-06 10:21:32,417 INFO [trainer.py:765] (1/8) Epoch 8, batch 500, train_loss[loss=2.842, ArTop10Accuracy=0.7659, over 12006.00 frames. ], tot_loss[loss=2.864, ArTop10Accuracy=0.7587, over 10864.33 frames. ], batch size: 22, lr: 1.45e-02
132
+ 2024-08-06 10:23:00,980 INFO [trainer.py:765] (1/8) Epoch 8, batch 600, train_loss[loss=2.892, ArTop10Accuracy=0.7495, over 11358.00 frames. ], tot_loss[loss=2.867, ArTop10Accuracy=0.7579, over 11374.38 frames. ], batch size: 18, lr: 1.44e-02
133
+ 2024-08-06 10:24:37,794 INFO [trainer.py:765] (1/8) Epoch 8, batch 700, train_loss[loss=2.749, ArTop10Accuracy=0.7816, over 10002.00 frames. ], tot_loss[loss=2.87, ArTop10Accuracy=0.7573, over 11528.07 frames. ], batch size: 12, lr: 1.43e-02
134
+ 2024-08-06 10:25:56,091 INFO [trainer.py:765] (1/8) Epoch 8, batch 800, train_loss[loss=2.752, ArTop10Accuracy=0.775, over 9339.00 frames. ], tot_loss[loss=2.875, ArTop10Accuracy=0.7566, over 11620.96 frames. ], batch size: 11, lr: 1.43e-02
135
+ 2024-08-06 10:27:12,249 INFO [trainer.py:765] (1/8) Epoch 8, batch 900, train_loss[loss=2.869, ArTop10Accuracy=0.7567, over 13035.00 frames. ], tot_loss[loss=2.867, ArTop10Accuracy=0.7579, over 11677.37 frames. ], batch size: 27, lr: 1.42e-02
136
+ 2024-08-06 10:28:25,269 INFO [trainer.py:765] (1/8) Epoch 8, batch 1000, train_loss[loss=2.85, ArTop10Accuracy=0.7617, over 12864.00 frames. ], tot_loss[loss=2.872, ArTop10Accuracy=0.7572, over 11871.35 frames. ], batch size: 27, lr: 1.42e-02
137
+ 2024-08-06 10:29:07,161 INFO [trainer.py:803] (1/8) Computing validation loss
138
+ 2024-08-06 10:29:16,830 INFO [trainer.py:811] (1/8) Epoch 8, validation: loss=2.858, ArTop10Accuracy=0.7594, over 1827537.00 frames.
139
+ 2024-08-06 10:29:16,831 INFO [trainer.py:814] (1/8) Maximum memory allocated so far is 31570MB
140
+ 2024-08-06 10:29:17,496 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.275e+02 1.390e+02 1.547e+02 3.717e+02, threshold=2.781e+02, percent-clipped=0.7
141
+ 2024-08-06 10:29:51,738 INFO [trainer.py:765] (1/8) Epoch 8, batch 1100, train_loss[loss=2.851, ArTop10Accuracy=0.7608, over 13545.00 frames. ], tot_loss[loss=2.88, ArTop10Accuracy=0.7554, over 11946.85 frames. ], batch size: 34, lr: 1.41e-02
142
+ 2024-08-06 10:31:05,952 INFO [trainer.py:765] (1/8) Epoch 8, batch 1200, train_loss[loss=3.013, ArTop10Accuracy=0.7242, over 12375.00 frames. ], tot_loss[loss=2.878, ArTop10Accuracy=0.756, over 11877.61 frames. ], batch size: 101, lr: 1.40e-02
143
+ 2024-08-06 10:32:05,554 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
144
+ 2024-08-06 10:34:01,262 INFO [trainer.py:765] (1/8) Epoch 9, batch 100, train_loss[loss=2.974, ArTop10Accuracy=0.7336, over 14712.00 frames. ], tot_loss[loss=2.854, ArTop10Accuracy=0.7598, over 4756.21 frames. ], batch size: 62, lr: 1.32e-02
145
+ 2024-08-06 10:35:31,778 INFO [trainer.py:765] (1/8) Epoch 9, batch 200, train_loss[loss=2.743, ArTop10Accuracy=0.782, over 13812.00 frames. ], tot_loss[loss=2.845, ArTop10Accuracy=0.7618, over 7742.43 frames. ], batch size: 34, lr: 1.32e-02
146
+ 2024-08-06 10:36:57,933 INFO [trainer.py:765] (1/8) Epoch 9, batch 300, train_loss[loss=2.905, ArTop10Accuracy=0.7537, over 14358.00 frames. ], tot_loss[loss=2.842, ArTop10Accuracy=0.7627, over 9372.16 frames. ], batch size: 45, lr: 1.31e-02
147
+ 2024-08-06 10:38:32,702 INFO [trainer.py:765] (1/8) Epoch 9, batch 400, train_loss[loss=2.77, ArTop10Accuracy=0.7783, over 10809.00 frames. ], tot_loss[loss=2.843, ArTop10Accuracy=0.7625, over 10272.39 frames. ], batch size: 15, lr: 1.31e-02
148
+ 2024-08-06 10:39:59,262 INFO [trainer.py:765] (1/8) Epoch 9, batch 500, train_loss[loss=2.804, ArTop10Accuracy=0.7676, over 12750.00 frames. ], tot_loss[loss=2.837, ArTop10Accuracy=0.7636, over 10841.66 frames. ], batch size: 23, lr: 1.30e-02
149
+ 2024-08-06 10:41:29,694 INFO [trainer.py:765] (1/8) Epoch 9, batch 600, train_loss[loss=2.872, ArTop10Accuracy=0.7589, over 11277.00 frames. ], tot_loss[loss=2.837, ArTop10Accuracy=0.7634, over 11363.34 frames. ], batch size: 18, lr: 1.30e-02
150
+ 2024-08-06 10:42:58,446 INFO [trainer.py:765] (1/8) Epoch 9, batch 700, train_loss[loss=2.794, ArTop10Accuracy=0.7749, over 10224.00 frames. ], tot_loss[loss=2.847, ArTop10Accuracy=0.7615, over 11506.27 frames. ], batch size: 12, lr: 1.29e-02
151
+ 2024-08-06 10:44:02,958 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.039e+02 1.253e+02 1.352e+02 1.493e+02 7.010e+02, threshold=2.704e+02, percent-clipped=0.6
152
+ 2024-08-06 10:44:19,675 INFO [trainer.py:765] (1/8) Epoch 9, batch 800, train_loss[loss=2.758, ArTop10Accuracy=0.7845, over 9348.00 frames. ], tot_loss[loss=2.852, ArTop10Accuracy=0.7607, over 11633.24 frames. ], batch size: 11, lr: 1.29e-02
153
+ 2024-08-06 10:45:35,725 INFO [trainer.py:765] (1/8) Epoch 9, batch 900, train_loss[loss=2.776, ArTop10Accuracy=0.7739, over 12975.00 frames. ], tot_loss[loss=2.847, ArTop10Accuracy=0.7616, over 11688.38 frames. ], batch size: 27, lr: 1.28e-02
154
+ 2024-08-06 10:46:51,277 INFO [trainer.py:765] (1/8) Epoch 9, batch 1000, train_loss[loss=2.904, ArTop10Accuracy=0.7464, over 12858.00 frames. ], tot_loss[loss=2.852, ArTop10Accuracy=0.7605, over 11884.24 frames. ], batch size: 27, lr: 1.28e-02
155
+ 2024-08-06 10:48:06,253 INFO [trainer.py:765] (1/8) Epoch 9, batch 1100, train_loss[loss=2.846, ArTop10Accuracy=0.7651, over 13599.00 frames. ], tot_loss[loss=2.856, ArTop10Accuracy=0.7598, over 11944.83 frames. ], batch size: 34, lr: 1.28e-02
156
+ 2024-08-06 10:49:21,058 INFO [trainer.py:765] (1/8) Epoch 9, batch 1200, train_loss[loss=2.94, ArTop10Accuracy=0.7412, over 12285.00 frames. ], tot_loss[loss=2.852, ArTop10Accuracy=0.7606, over 11849.65 frames. ], batch size: 103, lr: 1.27e-02
157
+ 2024-08-06 10:50:22,395 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
158
+ 2024-08-06 10:52:12,332 INFO [trainer.py:765] (1/8) Epoch 10, batch 100, train_loss[loss=2.84, ArTop10Accuracy=0.7599, over 14454.00 frames. ], tot_loss[loss=2.84, ArTop10Accuracy=0.7628, over 4762.71 frames. ], batch size: 62, lr: 1.20e-02
159
+ 2024-08-06 10:53:44,591 INFO [trainer.py:765] (1/8) Epoch 10, batch 200, train_loss[loss=2.841, ArTop10Accuracy=0.7632, over 13776.00 frames. ], tot_loss[loss=2.832, ArTop10Accuracy=0.7643, over 7751.20 frames. ], batch size: 34, lr: 1.20e-02
160
+ 2024-08-06 10:55:08,096 INFO [trainer.py:765] (1/8) Epoch 10, batch 300, train_loss[loss=2.908, ArTop10Accuracy=0.7502, over 13872.00 frames. ], tot_loss[loss=2.828, ArTop10Accuracy=0.765, over 9380.76 frames. ], batch size: 44, lr: 1.19e-02
161
+ 2024-08-06 10:56:41,181 INFO [trainer.py:765] (1/8) Epoch 10, batch 400, train_loss[loss=2.775, ArTop10Accuracy=0.7748, over 10197.00 frames. ], tot_loss[loss=2.824, ArTop10Accuracy=0.7658, over 10286.84 frames. ], batch size: 14, lr: 1.19e-02
162
+ 2024-08-06 10:58:04,944 INFO [trainer.py:803] (1/8) Computing validation loss
163
+ 2024-08-06 10:58:14,557 INFO [trainer.py:811] (1/8) Epoch 10, validation: loss=2.842, ArTop10Accuracy=0.7624, over 1827537.00 frames.
164
+ 2024-08-06 10:58:14,557 INFO [trainer.py:814] (1/8) Maximum memory allocated so far is 31570MB
165
+ 2024-08-06 10:58:15,576 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.228e+02 1.320e+02 1.458e+02 6.096e+02, threshold=2.641e+02, percent-clipped=0.6
166
+ 2024-08-06 10:58:15,583 INFO [trainer.py:765] (1/8) Epoch 10, batch 500, train_loss[loss=2.86, ArTop10Accuracy=0.7601, over 12069.00 frames. ], tot_loss[loss=2.822, ArTop10Accuracy=0.7664, over 10851.85 frames. ], batch size: 22, lr: 1.19e-02
167
+ 2024-08-06 10:59:42,821 INFO [trainer.py:765] (1/8) Epoch 10, batch 600, train_loss[loss=2.747, ArTop10Accuracy=0.7847, over 11316.00 frames. ], tot_loss[loss=2.823, ArTop10Accuracy=0.7663, over 11362.02 frames. ], batch size: 18, lr: 1.18e-02
168
+ 2024-08-06 11:01:18,113 INFO [trainer.py:765] (1/8) Epoch 10, batch 700, train_loss[loss=2.815, ArTop10Accuracy=0.7639, over 10041.00 frames. ], tot_loss[loss=2.828, ArTop10Accuracy=0.7651, over 11512.01 frames. ], batch size: 12, lr: 1.18e-02
169
+ 2024-08-06 11:02:36,923 INFO [trainer.py:765] (1/8) Epoch 10, batch 800, train_loss[loss=2.721, ArTop10Accuracy=0.7851, over 9291.00 frames. ], tot_loss[loss=2.83, ArTop10Accuracy=0.7648, over 11633.61 frames. ], batch size: 11, lr: 1.17e-02
170
+ 2024-08-06 11:03:51,218 INFO [trainer.py:765] (1/8) Epoch 10, batch 900, train_loss[loss=2.86, ArTop10Accuracy=0.7579, over 12807.00 frames. ], tot_loss[loss=2.825, ArTop10Accuracy=0.7657, over 11681.50 frames. ], batch size: 27, lr: 1.17e-02
171
+ 2024-08-06 11:05:06,358 INFO [trainer.py:765] (1/8) Epoch 10, batch 1000, train_loss[loss=2.904, ArTop10Accuracy=0.7478, over 12924.00 frames. ], tot_loss[loss=2.832, ArTop10Accuracy=0.7645, over 11872.55 frames. ], batch size: 27, lr: 1.17e-02
172
+ 2024-08-06 11:06:21,725 INFO [trainer.py:765] (1/8) Epoch 10, batch 1100, train_loss[loss=2.817, ArTop10Accuracy=0.7649, over 13518.00 frames. ], tot_loss[loss=2.839, ArTop10Accuracy=0.763, over 11926.81 frames. ], batch size: 34, lr: 1.16e-02
173
+ 2024-08-06 11:07:34,778 INFO [trainer.py:765] (1/8) Epoch 10, batch 1200, train_loss[loss=2.922, ArTop10Accuracy=0.7471, over 12081.00 frames. ], tot_loss[loss=2.837, ArTop10Accuracy=0.7634, over 11849.57 frames. ], batch size: 101, lr: 1.16e-02
174
+ 2024-08-06 11:08:33,905 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
175
+ 2024-08-06 11:10:29,960 INFO [trainer.py:765] (1/8) Epoch 11, batch 100, train_loss[loss=2.921, ArTop10Accuracy=0.7486, over 14277.00 frames. ], tot_loss[loss=2.819, ArTop10Accuracy=0.7666, over 4744.36 frames. ], batch size: 62, lr: 1.10e-02
176
+ 2024-08-06 11:12:04,679 INFO [trainer.py:765] (1/8) Epoch 11, batch 200, train_loss[loss=2.759, ArTop10Accuracy=0.7787, over 13740.00 frames. ], tot_loss[loss=2.815, ArTop10Accuracy=0.7673, over 7726.69 frames. ], batch size: 34, lr: 1.10e-02
177
+ 2024-08-06 11:12:22,831 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 9.884e+01 1.240e+02 1.333e+02 1.457e+02 6.939e+02, threshold=2.667e+02, percent-clipped=0.1
178
+ 2024-08-06 11:13:31,551 INFO [trainer.py:765] (1/8) Epoch 11, batch 300, train_loss[loss=2.949, ArTop10Accuracy=0.739, over 14226.00 frames. ], tot_loss[loss=2.811, ArTop10Accuracy=0.7683, over 9344.10 frames. ], batch size: 44, lr: 1.09e-02
179
+ 2024-08-06 11:15:03,275 INFO [trainer.py:765] (1/8) Epoch 11, batch 400, train_loss[loss=2.758, ArTop10Accuracy=0.7821, over 11040.00 frames. ], tot_loss[loss=2.808, ArTop10Accuracy=0.7688, over 10269.43 frames. ], batch size: 15, lr: 1.09e-02
180
+ 2024-08-06 11:16:29,642 INFO [trainer.py:765] (1/8) Epoch 11, batch 500, train_loss[loss=2.739, ArTop10Accuracy=0.7819, over 12315.00 frames. ], tot_loss[loss=2.8, ArTop10Accuracy=0.7704, over 10842.51 frames. ], batch size: 22, lr: 1.09e-02
181
+ 2024-08-06 11:18:00,523 INFO [trainer.py:765] (1/8) Epoch 11, batch 600, train_loss[loss=2.793, ArTop10Accuracy=0.7793, over 11529.00 frames. ], tot_loss[loss=2.807, ArTop10Accuracy=0.7689, over 11373.43 frames. ], batch size: 18, lr: 1.08e-02
182
+ 2024-08-06 11:19:34,516 INFO [trainer.py:765] (1/8) Epoch 11, batch 700, train_loss[loss=2.696, ArTop10Accuracy=0.7903, over 9396.00 frames. ], tot_loss[loss=2.813, ArTop10Accuracy=0.768, over 11509.70 frames. ], batch size: 11, lr: 1.08e-02
183
+ 2024-08-06 11:20:55,487 INFO [trainer.py:765] (1/8) Epoch 11, batch 800, train_loss[loss=2.761, ArTop10Accuracy=0.7746, over 10377.00 frames. ], tot_loss[loss=2.815, ArTop10Accuracy=0.7673, over 11641.00 frames. ], batch size: 12, lr: 1.07e-02
184
+ 2024-08-06 11:22:13,712 INFO [trainer.py:765] (1/8) Epoch 11, batch 900, train_loss[loss=2.919, ArTop10Accuracy=0.7417, over 13068.00 frames. ], tot_loss[loss=2.811, ArTop10Accuracy=0.7683, over 11682.77 frames. ], batch size: 27, lr: 1.07e-02
185
+ 2024-08-06 11:23:31,805 INFO [trainer.py:765] (1/8) Epoch 11, batch 1000, train_loss[loss=2.768, ArTop10Accuracy=0.7785, over 12609.00 frames. ], tot_loss[loss=2.813, ArTop10Accuracy=0.768, over 11877.30 frames. ], batch size: 27, lr: 1.07e-02
186
+ 2024-08-06 11:24:46,908 INFO [trainer.py:765] (1/8) Epoch 11, batch 1100, train_loss[loss=2.88, ArTop10Accuracy=0.7552, over 13635.00 frames. ], tot_loss[loss=2.818, ArTop10Accuracy=0.767, over 11958.09 frames. ], batch size: 34, lr: 1.06e-02
187
+ 2024-08-06 11:26:00,739 INFO [trainer.py:765] (1/8) Epoch 11, batch 1200, train_loss[loss=2.993, ArTop10Accuracy=0.7277, over 12222.00 frames. ], tot_loss[loss=2.82, ArTop10Accuracy=0.7665, over 11887.81 frames. ], batch size: 101, lr: 1.06e-02
188
+ 2024-08-06 11:26:15,853 INFO [trainer.py:803] (1/8) Computing validation loss
189
+ 2024-08-06 11:26:25,556 INFO [trainer.py:811] (1/8) Epoch 11, validation: loss=2.831, ArTop10Accuracy=0.7643, over 1827537.00 frames.
190
+ 2024-08-06 11:26:25,557 INFO [trainer.py:814] (1/8) Maximum memory allocated so far is 31570MB
191
+ 2024-08-06 11:26:26,191 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.251e+02 1.335e+02 1.441e+02 2.942e+02, threshold=2.669e+02, percent-clipped=0.1
192
+ 2024-08-06 11:27:09,715 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
193
+ 2024-08-06 11:29:03,456 INFO [trainer.py:765] (1/8) Epoch 12, batch 100, train_loss[loss=2.881, ArTop10Accuracy=0.7566, over 14679.00 frames. ], tot_loss[loss=2.805, ArTop10Accuracy=0.769, over 4772.54 frames. ], batch size: 62, lr: 1.01e-02
194
+ 2024-08-06 11:30:30,679 INFO [trainer.py:765] (1/8) Epoch 12, batch 200, train_loss[loss=2.779, ArTop10Accuracy=0.7758, over 13632.00 frames. ], tot_loss[loss=2.798, ArTop10Accuracy=0.7707, over 7752.01 frames. ], batch size: 34, lr: 1.01e-02
195
+ 2024-08-06 11:31:57,661 INFO [trainer.py:765] (1/8) Epoch 12, batch 300, train_loss[loss=2.833, ArTop10Accuracy=0.7634, over 14193.00 frames. ], tot_loss[loss=2.785, ArTop10Accuracy=0.7732, over 9379.82 frames. ], batch size: 45, lr: 1.01e-02
196
+ 2024-08-06 11:33:30,744 INFO [trainer.py:765] (1/8) Epoch 12, batch 400, train_loss[loss=2.727, ArTop10Accuracy=0.7805, over 10134.00 frames. ], tot_loss[loss=2.785, ArTop10Accuracy=0.7731, over 10292.35 frames. ], batch size: 14, lr: 1.00e-02
197
+ 2024-08-06 11:34:55,737 INFO [trainer.py:765] (1/8) Epoch 12, batch 500, train_loss[loss=2.804, ArTop10Accuracy=0.7718, over 11982.00 frames. ], tot_loss[loss=2.781, ArTop10Accuracy=0.7739, over 10839.09 frames. ], batch size: 22, lr: 1.00e-02
198
+ 2024-08-06 11:36:29,367 INFO [trainer.py:765] (1/8) Epoch 12, batch 600, train_loss[loss=2.87, ArTop10Accuracy=0.7539, over 11367.00 frames. ], tot_loss[loss=2.788, ArTop10Accuracy=0.7727, over 11335.89 frames. ], batch size: 18, lr: 9.97e-03
199
+ 2024-08-06 11:38:00,349 INFO [trainer.py:765] (1/8) Epoch 12, batch 700, train_loss[loss=2.716, ArTop10Accuracy=0.7844, over 10320.00 frames. ], tot_loss[loss=2.794, ArTop10Accuracy=0.7717, over 11503.62 frames. ], batch size: 12, lr: 9.93e-03
200
+ 2024-08-06 11:39:23,617 INFO [trainer.py:765] (1/8) Epoch 12, batch 800, train_loss[loss=2.846, ArTop10Accuracy=0.7584, over 10128.00 frames. ], tot_loss[loss=2.798, ArTop10Accuracy=0.7709, over 11629.35 frames. ], batch size: 12, lr: 9.90e-03
201
+ 2024-08-06 11:40:39,895 INFO [trainer.py:765] (1/8) Epoch 12, batch 900, train_loss[loss=2.831, ArTop10Accuracy=0.7651, over 12711.00 frames. ], tot_loss[loss=2.793, ArTop10Accuracy=0.772, over 11677.66 frames. ], batch size: 27, lr: 9.87e-03
202
+ 2024-08-06 11:41:13,999 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.248e+02 1.348e+02 1.459e+02 5.540e+02, threshold=2.695e+02, percent-clipped=0.3
203
+ 2024-08-06 11:41:56,195 INFO [trainer.py:765] (1/8) Epoch 12, batch 1000, train_loss[loss=2.809, ArTop10Accuracy=0.7676, over 12882.00 frames. ], tot_loss[loss=2.797, ArTop10Accuracy=0.7709, over 11898.86 frames. ], batch size: 27, lr: 9.85e-03
204
+ 2024-08-06 11:43:14,326 INFO [trainer.py:765] (1/8) Epoch 12, batch 1100, train_loss[loss=2.782, ArTop10Accuracy=0.7774, over 13779.00 frames. ], tot_loss[loss=2.805, ArTop10Accuracy=0.7695, over 11949.96 frames. ], batch size: 34, lr: 9.82e-03
205
+ 2024-08-06 11:44:26,162 INFO [trainer.py:765] (1/8) Epoch 12, batch 1200, train_loss[loss=2.944, ArTop10Accuracy=0.7393, over 12054.00 frames. ], tot_loss[loss=2.803, ArTop10Accuracy=0.7701, over 11864.81 frames. ], batch size: 101, lr: 9.79e-03
206
+ 2024-08-06 11:45:26,869 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
207
+ 2024-08-06 11:47:26,604 INFO [trainer.py:765] (1/8) Epoch 13, batch 100, train_loss[loss=2.844, ArTop10Accuracy=0.7621, over 14415.00 frames. ], tot_loss[loss=2.788, ArTop10Accuracy=0.7719, over 4772.92 frames. ], batch size: 62, lr: 9.37e-03
208
+ 2024-08-06 11:48:54,785 INFO [trainer.py:765] (1/8) Epoch 13, batch 200, train_loss[loss=2.752, ArTop10Accuracy=0.7792, over 13590.00 frames. ], tot_loss[loss=2.782, ArTop10Accuracy=0.7732, over 7754.83 frames. ], batch size: 34, lr: 9.34e-03
209
+ 2024-08-06 11:50:20,521 INFO [trainer.py:765] (1/8) Epoch 13, batch 300, train_loss[loss=2.766, ArTop10Accuracy=0.7763, over 14256.00 frames. ], tot_loss[loss=2.776, ArTop10Accuracy=0.7748, over 9373.21 frames. ], batch size: 44, lr: 9.31e-03
210
+ 2024-08-06 11:51:48,771 INFO [trainer.py:765] (1/8) Epoch 13, batch 400, train_loss[loss=2.624, ArTop10Accuracy=0.8076, over 10278.00 frames. ], tot_loss[loss=2.773, ArTop10Accuracy=0.7757, over 10275.22 frames. ], batch size: 14, lr: 9.28e-03
211
+ 2024-08-06 11:53:13,412 INFO [trainer.py:765] (1/8) Epoch 13, batch 500, train_loss[loss=2.676, ArTop10Accuracy=0.798, over 12201.00 frames. ], tot_loss[loss=2.769, ArTop10Accuracy=0.7763, over 10857.58 frames. ], batch size: 22, lr: 9.26e-03
212
+ 2024-08-06 11:54:52,229 INFO [trainer.py:765] (1/8) Epoch 13, batch 600, train_loss[loss=2.698, ArTop10Accuracy=0.7882, over 11418.00 frames. ], tot_loss[loss=2.776, ArTop10Accuracy=0.7749, over 11355.82 frames. ], batch size: 18, lr: 9.23e-03
213
+ 2024-08-06 11:55:47,086 INFO [trainer.py:803] (1/8) Computing validation loss
214
+ 2024-08-06 11:55:56,834 INFO [trainer.py:811] (1/8) Epoch 13, validation: loss=2.824, ArTop10Accuracy=0.7662, over 1827537.00 frames.
215
+ 2024-08-06 11:55:56,835 INFO [trainer.py:814] (1/8) Maximum memory allocated so far is 33972MB
216
+ 2024-08-06 11:55:57,718 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.255e+02 1.343e+02 1.452e+02 4.888e+02, threshold=2.687e+02, percent-clipped=0.1
217
+ 2024-08-06 11:56:28,471 INFO [trainer.py:765] (1/8) Epoch 13, batch 700, train_loss[loss=2.772, ArTop10Accuracy=0.7774, over 9309.00 frames. ], tot_loss[loss=2.783, ArTop10Accuracy=0.7735, over 11488.70 frames. ], batch size: 11, lr: 9.20e-03
218
+ 2024-08-06 11:57:46,690 INFO [trainer.py:765] (1/8) Epoch 13, batch 800, train_loss[loss=2.713, ArTop10Accuracy=0.7875, over 10161.00 frames. ], tot_loss[loss=2.786, ArTop10Accuracy=0.773, over 11619.65 frames. ], batch size: 12, lr: 9.18e-03
219
+ 2024-08-06 11:59:03,290 INFO [trainer.py:765] (1/8) Epoch 13, batch 900, train_loss[loss=2.756, ArTop10Accuracy=0.7807, over 13041.00 frames. ], tot_loss[loss=2.782, ArTop10Accuracy=0.7737, over 11686.03 frames. ], batch size: 27, lr: 9.15e-03
220
+ 2024-08-06 12:00:19,180 INFO [trainer.py:765] (1/8) Epoch 13, batch 1000, train_loss[loss=2.81, ArTop10Accuracy=0.7706, over 12786.00 frames. ], tot_loss[loss=2.786, ArTop10Accuracy=0.773, over 11890.74 frames. ], batch size: 27, lr: 9.13e-03
221
+ 2024-08-06 12:01:34,885 INFO [trainer.py:765] (1/8) Epoch 13, batch 1100, train_loss[loss=2.775, ArTop10Accuracy=0.7717, over 13575.00 frames. ], tot_loss[loss=2.797, ArTop10Accuracy=0.7709, over 11954.47 frames. ], batch size: 34, lr: 9.10e-03
222
+ 2024-08-06 12:02:48,669 INFO [trainer.py:765] (1/8) Epoch 13, batch 1200, train_loss[loss=2.981, ArTop10Accuracy=0.7357, over 12087.00 frames. ], tot_loss[loss=2.795, ArTop10Accuracy=0.7714, over 11877.57 frames. ], batch size: 101, lr: 9.08e-03
223
+ 2024-08-06 12:03:48,616 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
224
+ 2024-08-06 12:05:45,337 INFO [trainer.py:765] (1/8) Epoch 14, batch 100, train_loss[loss=2.83, ArTop10Accuracy=0.766, over 14226.00 frames. ], tot_loss[loss=2.772, ArTop10Accuracy=0.7751, over 4764.55 frames. ], batch size: 62, lr: 8.71e-03
225
+ 2024-08-06 12:07:16,607 INFO [trainer.py:765] (1/8) Epoch 14, batch 200, train_loss[loss=2.838, ArTop10Accuracy=0.7595, over 13674.00 frames. ], tot_loss[loss=2.762, ArTop10Accuracy=0.7773, over 7749.81 frames. ], batch size: 34, lr: 8.69e-03
226
+ 2024-08-06 12:08:44,317 INFO [trainer.py:765] (1/8) Epoch 14, batch 300, train_loss[loss=2.832, ArTop10Accuracy=0.7616, over 14316.00 frames. ], tot_loss[loss=2.762, ArTop10Accuracy=0.777, over 9372.28 frames. ], batch size: 44, lr: 8.66e-03
227
+ 2024-08-06 12:10:01,135 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.266e+02 1.374e+02 1.483e+02 6.480e+02, threshold=2.748e+02, percent-clipped=0.2
228
+ 2024-08-06 12:10:10,232 INFO [trainer.py:765] (1/8) Epoch 14, batch 400, train_loss[loss=2.79, ArTop10Accuracy=0.7714, over 10845.00 frames. ], tot_loss[loss=2.761, ArTop10Accuracy=0.7776, over 10295.62 frames. ], batch size: 15, lr: 8.64e-03
229
+ 2024-08-06 12:11:36,157 INFO [trainer.py:765] (1/8) Epoch 14, batch 500, train_loss[loss=2.754, ArTop10Accuracy=0.7758, over 11991.00 frames. ], tot_loss[loss=2.759, ArTop10Accuracy=0.778, over 10856.86 frames. ], batch size: 22, lr: 8.62e-03
230
+ 2024-08-06 12:13:06,000 INFO [trainer.py:765] (1/8) Epoch 14, batch 600, train_loss[loss=2.779, ArTop10Accuracy=0.7749, over 11514.00 frames. ], tot_loss[loss=2.762, ArTop10Accuracy=0.7771, over 11376.97 frames. ], batch size: 18, lr: 8.59e-03
231
+ 2024-08-06 12:14:38,559 INFO [trainer.py:765] (1/8) Epoch 14, batch 700, train_loss[loss=2.725, ArTop10Accuracy=0.7884, over 9402.00 frames. ], tot_loss[loss=2.768, ArTop10Accuracy=0.776, over 11521.69 frames. ], batch size: 11, lr: 8.57e-03
232
+ 2024-08-06 12:15:58,076 INFO [trainer.py:765] (1/8) Epoch 14, batch 800, train_loss[loss=2.679, ArTop10Accuracy=0.795, over 9294.00 frames. ], tot_loss[loss=2.771, ArTop10Accuracy=0.7755, over 11635.91 frames. ], batch size: 11, lr: 8.55e-03
233
+ 2024-08-06 12:17:12,872 INFO [trainer.py:765] (1/8) Epoch 14, batch 900, train_loss[loss=2.759, ArTop10Accuracy=0.7819, over 13014.00 frames. ], tot_loss[loss=2.768, ArTop10Accuracy=0.7764, over 11670.90 frames. ], batch size: 27, lr: 8.52e-03
234
+ 2024-08-06 12:18:29,618 INFO [trainer.py:765] (1/8) Epoch 14, batch 1000, train_loss[loss=2.777, ArTop10Accuracy=0.7752, over 13023.00 frames. ], tot_loss[loss=2.771, ArTop10Accuracy=0.7758, over 11869.41 frames. ], batch size: 27, lr: 8.50e-03
235
+ 2024-08-06 12:19:45,382 INFO [trainer.py:765] (1/8) Epoch 14, batch 1100, train_loss[loss=2.772, ArTop10Accuracy=0.7752, over 13365.00 frames. ], tot_loss[loss=2.781, ArTop10Accuracy=0.7738, over 11956.46 frames. ], batch size: 34, lr: 8.48e-03
236
+ 2024-08-06 12:20:59,284 INFO [trainer.py:765] (1/8) Epoch 14, batch 1200, train_loss[loss=2.913, ArTop10Accuracy=0.749, over 11832.00 frames. ], tot_loss[loss=2.779, ArTop10Accuracy=0.7742, over 11850.00 frames. ], batch size: 101, lr: 8.46e-03
237
+ 2024-08-06 12:21:57,643 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
238
+ 2024-08-06 12:23:51,968 INFO [trainer.py:765] (1/8) Epoch 15, batch 100, train_loss[loss=2.838, ArTop10Accuracy=0.7632, over 14451.00 frames. ], tot_loss[loss=2.759, ArTop10Accuracy=0.7777, over 4747.89 frames. ], batch size: 62, lr: 8.14e-03
239
+ 2024-08-06 12:24:00,605 INFO [trainer.py:803] (1/8) Computing validation loss
240
+ 2024-08-06 12:24:10,290 INFO [trainer.py:811] (1/8) Epoch 15, validation: loss=2.819, ArTop10Accuracy=0.7675, over 1827537.00 frames.
241
+ 2024-08-06 12:24:10,290 INFO [trainer.py:814] (1/8) Maximum memory allocated so far is 33972MB
242
+ 2024-08-06 12:24:11,100 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.284e+02 1.371e+02 1.488e+02 4.667e+02, threshold=2.743e+02, percent-clipped=0.2
243
+ 2024-08-06 12:25:29,992 INFO [trainer.py:765] (1/8) Epoch 15, batch 200, train_loss[loss=2.771, ArTop10Accuracy=0.7773, over 13839.00 frames. ], tot_loss[loss=2.76, ArTop10Accuracy=0.7774, over 7753.74 frames. ], batch size: 35, lr: 8.12e-03
244
+ 2024-08-06 12:26:58,701 INFO [trainer.py:765] (1/8) Epoch 15, batch 300, train_loss[loss=2.794, ArTop10Accuracy=0.7749, over 13956.00 frames. ], tot_loss[loss=2.756, ArTop10Accuracy=0.7783, over 9373.07 frames. ], batch size: 44, lr: 8.09e-03
245
+ 2024-08-06 12:28:28,538 INFO [trainer.py:765] (1/8) Epoch 15, batch 400, train_loss[loss=2.758, ArTop10Accuracy=0.777, over 10365.00 frames. ], tot_loss[loss=2.752, ArTop10Accuracy=0.7793, over 10266.45 frames. ], batch size: 14, lr: 8.07e-03
246
+ 2024-08-06 12:29:54,036 INFO [trainer.py:765] (1/8) Epoch 15, batch 500, train_loss[loss=2.73, ArTop10Accuracy=0.7861, over 12609.00 frames. ], tot_loss[loss=2.748, ArTop10Accuracy=0.7801, over 10847.41 frames. ], batch size: 23, lr: 8.05e-03
247
+ 2024-08-06 12:31:23,297 INFO [trainer.py:765] (1/8) Epoch 15, batch 600, train_loss[loss=2.736, ArTop10Accuracy=0.7852, over 11271.00 frames. ], tot_loss[loss=2.746, ArTop10Accuracy=0.7805, over 11369.05 frames. ], batch size: 18, lr: 8.03e-03
248
+ 2024-08-06 12:32:53,180 INFO [trainer.py:765] (1/8) Epoch 15, batch 700, train_loss[loss=2.627, ArTop10Accuracy=0.8074, over 10299.00 frames. ], tot_loss[loss=2.752, ArTop10Accuracy=0.7793, over 11529.74 frames. ], batch size: 12, lr: 8.01e-03
249
+ 2024-08-06 12:34:18,260 INFO [trainer.py:765] (1/8) Epoch 15, batch 800, train_loss[loss=2.647, ArTop10Accuracy=0.8006, over 9471.00 frames. ], tot_loss[loss=2.759, ArTop10Accuracy=0.7781, over 11626.40 frames. ], batch size: 11, lr: 7.99e-03
250
+ 2024-08-06 12:35:34,733 INFO [trainer.py:765] (1/8) Epoch 15, batch 900, train_loss[loss=2.757, ArTop10Accuracy=0.7784, over 12828.00 frames. ], tot_loss[loss=2.753, ArTop10Accuracy=0.7794, over 11678.28 frames. ], batch size: 27, lr: 7.97e-03
251
+ 2024-08-06 12:36:50,547 INFO [trainer.py:765] (1/8) Epoch 15, batch 1000, train_loss[loss=2.76, ArTop10Accuracy=0.7799, over 12825.00 frames. ], tot_loss[loss=2.76, ArTop10Accuracy=0.7781, over 11871.63 frames. ], batch size: 27, lr: 7.95e-03
252
+ 2024-08-06 12:38:05,183 INFO [trainer.py:765] (1/8) Epoch 15, batch 1100, train_loss[loss=2.717, ArTop10Accuracy=0.785, over 13494.00 frames. ], tot_loss[loss=2.769, ArTop10Accuracy=0.7765, over 11950.15 frames. ], batch size: 34, lr: 7.93e-03
253
+ 2024-08-06 12:38:12,847 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.293e+02 1.379e+02 1.467e+02 2.824e+02, threshold=2.759e+02, percent-clipped=0.1
254
+ 2024-08-06 12:39:18,795 INFO [trainer.py:765] (1/8) Epoch 15, batch 1200, train_loss[loss=2.892, ArTop10Accuracy=0.7571, over 12477.00 frames. ], tot_loss[loss=2.769, ArTop10Accuracy=0.7762, over 11866.65 frames. ], batch size: 101, lr: 7.91e-03
255
+ 2024-08-06 12:40:18,961 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
256
+ 2024-08-06 12:42:17,623 INFO [trainer.py:765] (1/8) Epoch 16, batch 100, train_loss[loss=2.815, ArTop10Accuracy=0.765, over 14853.00 frames. ], tot_loss[loss=2.743, ArTop10Accuracy=0.781, over 4763.41 frames. ], batch size: 62, lr: 7.63e-03
257
+ 2024-08-06 12:43:49,569 INFO [trainer.py:765] (1/8) Epoch 16, batch 200, train_loss[loss=2.731, ArTop10Accuracy=0.7824, over 13905.00 frames. ], tot_loss[loss=2.739, ArTop10Accuracy=0.7817, over 7738.07 frames. ], batch size: 35, lr: 7.61e-03
258
+ 2024-08-06 12:45:18,507 INFO [trainer.py:765] (1/8) Epoch 16, batch 300, train_loss[loss=2.766, ArTop10Accuracy=0.7766, over 14256.00 frames. ], tot_loss[loss=2.735, ArTop10Accuracy=0.7825, over 9366.92 frames. ], batch size: 44, lr: 7.59e-03
259
+ 2024-08-06 12:46:45,212 INFO [trainer.py:765] (1/8) Epoch 16, batch 400, train_loss[loss=2.789, ArTop10Accuracy=0.7706, over 10821.00 frames. ], tot_loss[loss=2.733, ArTop10Accuracy=0.7829, over 10278.07 frames. ], batch size: 15, lr: 7.58e-03
260
+ 2024-08-06 12:48:16,316 INFO [trainer.py:765] (1/8) Epoch 16, batch 500, train_loss[loss=2.804, ArTop10Accuracy=0.771, over 12237.00 frames. ], tot_loss[loss=2.731, ArTop10Accuracy=0.7833, over 10829.44 frames. ], batch size: 22, lr: 7.56e-03
261
+ 2024-08-06 12:49:46,648 INFO [trainer.py:765] (1/8) Epoch 16, batch 600, train_loss[loss=2.69, ArTop10Accuracy=0.794, over 11436.00 frames. ], tot_loss[loss=2.736, ArTop10Accuracy=0.7824, over 11359.23 frames. ], batch size: 18, lr: 7.54e-03
262
+ 2024-08-06 12:51:23,687 INFO [trainer.py:765] (1/8) Epoch 16, batch 700, train_loss[loss=2.575, ArTop10Accuracy=0.8169, over 10113.00 frames. ], tot_loss[loss=2.737, ArTop10Accuracy=0.7824, over 11499.13 frames. ], batch size: 12, lr: 7.52e-03
263
+ 2024-08-06 12:52:43,507 INFO [trainer.py:765] (1/8) Epoch 16, batch 800, train_loss[loss=2.739, ArTop10Accuracy=0.7811, over 9315.00 frames. ], tot_loss[loss=2.743, ArTop10Accuracy=0.7812, over 11618.42 frames. ], batch size: 11, lr: 7.51e-03
264
+ 2024-08-06 12:53:06,020 INFO [trainer.py:803] (1/8) Computing validation loss
265
+ 2024-08-06 12:53:15,496 INFO [trainer.py:811] (1/8) Epoch 16, validation: loss=2.816, ArTop10Accuracy=0.7678, over 1827537.00 frames.
266
+ 2024-08-06 12:53:15,496 INFO [trainer.py:814] (1/8) Maximum memory allocated so far is 33972MB
267
+ 2024-08-06 12:53:16,191 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.291e+02 1.391e+02 1.487e+02 3.459e+02, threshold=2.783e+02, percent-clipped=0.1
268
+ 2024-08-06 12:54:06,487 INFO [trainer.py:765] (1/8) Epoch 16, batch 900, train_loss[loss=2.656, ArTop10Accuracy=0.7962, over 12948.00 frames. ], tot_loss[loss=2.738, ArTop10Accuracy=0.7822, over 11658.62 frames. ], batch size: 27, lr: 7.49e-03
269
+ 2024-08-06 12:55:19,797 INFO [trainer.py:765] (1/8) Epoch 16, batch 1000, train_loss[loss=2.713, ArTop10Accuracy=0.7918, over 12969.00 frames. ], tot_loss[loss=2.746, ArTop10Accuracy=0.7808, over 11870.63 frames. ], batch size: 27, lr: 7.47e-03
270
+ 2024-08-06 12:56:33,168 INFO [trainer.py:765] (1/8) Epoch 16, batch 1100, train_loss[loss=2.718, ArTop10Accuracy=0.7869, over 13368.00 frames. ], tot_loss[loss=2.757, ArTop10Accuracy=0.7786, over 11929.12 frames. ], batch size: 34, lr: 7.45e-03
271
+ 2024-08-06 12:57:48,491 INFO [trainer.py:765] (1/8) Epoch 16, batch 1200, train_loss[loss=2.885, ArTop10Accuracy=0.7546, over 12060.00 frames. ], tot_loss[loss=2.755, ArTop10Accuracy=0.7789, over 11851.65 frames. ], batch size: 101, lr: 7.44e-03
272
+ 2024-08-06 12:58:48,499 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
273
+ 2024-08-06 13:00:47,904 INFO [trainer.py:765] (1/8) Epoch 17, batch 100, train_loss[loss=2.817, ArTop10Accuracy=0.7671, over 14796.00 frames. ], tot_loss[loss=2.734, ArTop10Accuracy=0.7833, over 4759.58 frames. ], batch size: 62, lr: 7.18e-03
274
+ 2024-08-06 13:02:19,308 INFO [trainer.py:765] (1/8) Epoch 17, batch 200, train_loss[loss=2.825, ArTop10Accuracy=0.7651, over 13587.00 frames. ], tot_loss[loss=2.733, ArTop10Accuracy=0.7833, over 7743.12 frames. ], batch size: 34, lr: 7.17e-03
275
+ 2024-08-06 13:03:45,523 INFO [trainer.py:765] (1/8) Epoch 17, batch 300, train_loss[loss=2.757, ArTop10Accuracy=0.7768, over 14088.00 frames. ], tot_loss[loss=2.726, ArTop10Accuracy=0.7844, over 9362.71 frames. ], batch size: 44, lr: 7.15e-03
276
+ 2024-08-06 13:05:21,767 INFO [trainer.py:765] (1/8) Epoch 17, batch 400, train_loss[loss=2.723, ArTop10Accuracy=0.7856, over 10410.00 frames. ], tot_loss[loss=2.725, ArTop10Accuracy=0.7844, over 10281.41 frames. ], batch size: 14, lr: 7.14e-03
277
+ 2024-08-06 13:06:47,027 INFO [trainer.py:765] (1/8) Epoch 17, batch 500, train_loss[loss=2.711, ArTop10Accuracy=0.7892, over 12222.00 frames. ], tot_loss[loss=2.724, ArTop10Accuracy=0.7845, over 10840.39 frames. ], batch size: 22, lr: 7.12e-03
278
+ 2024-08-06 13:07:39,882 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.293e+02 1.386e+02 1.488e+02 3.253e+02, threshold=2.772e+02, percent-clipped=0.1
279
+ 2024-08-06 13:08:22,694 INFO [trainer.py:765] (1/8) Epoch 17, batch 600, train_loss[loss=2.759, ArTop10Accuracy=0.779, over 11331.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.7837, over 11352.01 frames. ], batch size: 18, lr: 7.10e-03
280
+ 2024-08-06 13:09:54,842 INFO [trainer.py:765] (1/8) Epoch 17, batch 700, train_loss[loss=2.614, ArTop10Accuracy=0.8014, over 10098.00 frames. ], tot_loss[loss=2.734, ArTop10Accuracy=0.7825, over 11507.09 frames. ], batch size: 12, lr: 7.09e-03
281
+ 2024-08-06 13:11:19,487 INFO [trainer.py:765] (1/8) Epoch 17, batch 800, train_loss[loss=2.756, ArTop10Accuracy=0.7715, over 10443.00 frames. ], tot_loss[loss=2.739, ArTop10Accuracy=0.7816, over 11651.32 frames. ], batch size: 12, lr: 7.07e-03
282
+ 2024-08-06 13:12:35,676 INFO [trainer.py:765] (1/8) Epoch 17, batch 900, train_loss[loss=2.768, ArTop10Accuracy=0.7783, over 12885.00 frames. ], tot_loss[loss=2.733, ArTop10Accuracy=0.7827, over 11681.22 frames. ], batch size: 27, lr: 7.06e-03
283
+ 2024-08-06 13:13:53,068 INFO [trainer.py:765] (1/8) Epoch 17, batch 1000, train_loss[loss=2.674, ArTop10Accuracy=0.7989, over 12906.00 frames. ], tot_loss[loss=2.738, ArTop10Accuracy=0.782, over 11879.24 frames. ], batch size: 27, lr: 7.04e-03
284
+ 2024-08-06 13:15:08,490 INFO [trainer.py:765] (1/8) Epoch 17, batch 1100, train_loss[loss=2.717, ArTop10Accuracy=0.7841, over 13662.00 frames. ], tot_loss[loss=2.745, ArTop10Accuracy=0.7807, over 11971.73 frames. ], batch size: 34, lr: 7.02e-03
285
+ 2024-08-06 13:16:22,394 INFO [trainer.py:765] (1/8) Epoch 17, batch 1200, train_loss[loss=2.911, ArTop10Accuracy=0.7513, over 11856.00 frames. ], tot_loss[loss=2.748, ArTop10Accuracy=0.7802, over 11884.24 frames. ], batch size: 103, lr: 7.01e-03
286
+ 2024-08-06 13:17:21,749 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
287
+ 2024-08-06 13:19:16,000 INFO [trainer.py:765] (1/8) Epoch 18, batch 100, train_loss[loss=2.769, ArTop10Accuracy=0.7795, over 14613.00 frames. ], tot_loss[loss=2.722, ArTop10Accuracy=0.7849, over 4769.80 frames. ], batch size: 63, lr: 6.78e-03
288
+ 2024-08-06 13:20:46,604 INFO [trainer.py:765] (1/8) Epoch 18, batch 200, train_loss[loss=2.692, ArTop10Accuracy=0.7902, over 13776.00 frames. ], tot_loss[loss=2.72, ArTop10Accuracy=0.7853, over 7773.05 frames. ], batch size: 34, lr: 6.77e-03
289
+ 2024-08-06 13:21:55,111 INFO [trainer.py:803] (1/8) Computing validation loss
290
+ 2024-08-06 13:22:04,751 INFO [trainer.py:811] (1/8) Epoch 18, validation: loss=2.817, ArTop10Accuracy=0.768, over 1827537.00 frames.
291
+ 2024-08-06 13:22:04,752 INFO [trainer.py:814] (1/8) Maximum memory allocated so far is 33972MB
292
+ 2024-08-06 13:22:05,479 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.323e+02 1.409e+02 1.514e+02 3.209e+02, threshold=2.818e+02, percent-clipped=0.1
293
+ 2024-08-06 13:22:26,587 INFO [trainer.py:765] (1/8) Epoch 18, batch 300, train_loss[loss=2.765, ArTop10Accuracy=0.7781, over 14328.00 frames. ], tot_loss[loss=2.717, ArTop10Accuracy=0.7859, over 9385.66 frames. ], batch size: 44, lr: 6.76e-03
294
+ 2024-08-06 13:23:57,935 INFO [trainer.py:765] (1/8) Epoch 18, batch 400, train_loss[loss=2.545, ArTop10Accuracy=0.8139, over 10287.00 frames. ], tot_loss[loss=2.717, ArTop10Accuracy=0.7857, over 10291.69 frames. ], batch size: 14, lr: 6.74e-03
295
+ 2024-08-06 13:25:34,019 INFO [trainer.py:765] (1/8) Epoch 18, batch 500, train_loss[loss=2.681, ArTop10Accuracy=0.7954, over 12234.00 frames. ], tot_loss[loss=2.713, ArTop10Accuracy=0.7866, over 10834.26 frames. ], batch size: 22, lr: 6.73e-03
296
+ 2024-08-06 13:27:00,640 INFO [trainer.py:765] (1/8) Epoch 18, batch 600, train_loss[loss=2.674, ArTop10Accuracy=0.794, over 11292.00 frames. ], tot_loss[loss=2.717, ArTop10Accuracy=0.7857, over 11355.11 frames. ], batch size: 18, lr: 6.71e-03
297
+ 2024-08-06 13:28:33,588 INFO [trainer.py:765] (1/8) Epoch 18, batch 700, train_loss[loss=2.683, ArTop10Accuracy=0.7953, over 10089.00 frames. ], tot_loss[loss=2.721, ArTop10Accuracy=0.7849, over 11511.68 frames. ], batch size: 12, lr: 6.70e-03
298
+ 2024-08-06 13:29:54,989 INFO [trainer.py:765] (1/8) Epoch 18, batch 800, train_loss[loss=2.677, ArTop10Accuracy=0.7978, over 10206.00 frames. ], tot_loss[loss=2.725, ArTop10Accuracy=0.7842, over 11652.90 frames. ], batch size: 12, lr: 6.68e-03
299
+ 2024-08-06 13:31:12,525 INFO [trainer.py:765] (1/8) Epoch 18, batch 900, train_loss[loss=2.734, ArTop10Accuracy=0.7881, over 12990.00 frames. ], tot_loss[loss=2.721, ArTop10Accuracy=0.7853, over 11691.99 frames. ], batch size: 27, lr: 6.67e-03
300
+ 2024-08-06 13:32:26,557 INFO [trainer.py:765] (1/8) Epoch 18, batch 1000, train_loss[loss=2.804, ArTop10Accuracy=0.77, over 12660.00 frames. ], tot_loss[loss=2.727, ArTop10Accuracy=0.7841, over 11895.92 frames. ], batch size: 27, lr: 6.66e-03
301
+ 2024-08-06 13:33:41,503 INFO [trainer.py:765] (1/8) Epoch 18, batch 1100, train_loss[loss=2.752, ArTop10Accuracy=0.7808, over 13716.00 frames. ], tot_loss[loss=2.736, ArTop10Accuracy=0.7824, over 11960.66 frames. ], batch size: 35, lr: 6.64e-03
302
+ 2024-08-06 13:34:54,680 INFO [trainer.py:765] (1/8) Epoch 18, batch 1200, train_loss[loss=2.896, ArTop10Accuracy=0.753, over 12399.00 frames. ], tot_loss[loss=2.737, ArTop10Accuracy=0.7821, over 11856.69 frames. ], batch size: 101, lr: 6.63e-03
303
+ 2024-08-06 13:35:51,070 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.340e+02 1.433e+02 1.533e+02 2.444e+02, threshold=2.867e+02, percent-clipped=0.0
304
+ 2024-08-06 13:35:54,948 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
305
+ 2024-08-06 13:37:48,630 INFO [trainer.py:765] (1/8) Epoch 19, batch 100, train_loss[loss=2.699, ArTop10Accuracy=0.7911, over 14262.00 frames. ], tot_loss[loss=2.715, ArTop10Accuracy=0.7858, over 4758.72 frames. ], batch size: 63, lr: 6.43e-03
306
+ 2024-08-06 13:39:23,263 INFO [trainer.py:765] (1/8) Epoch 19, batch 200, train_loss[loss=2.776, ArTop10Accuracy=0.7724, over 13521.00 frames. ], tot_loss[loss=2.709, ArTop10Accuracy=0.7868, over 7762.49 frames. ], batch size: 34, lr: 6.41e-03
307
+ 2024-08-06 13:40:48,365 INFO [trainer.py:765] (1/8) Epoch 19, batch 300, train_loss[loss=2.801, ArTop10Accuracy=0.768, over 13980.00 frames. ], tot_loss[loss=2.712, ArTop10Accuracy=0.7867, over 9377.16 frames. ], batch size: 44, lr: 6.40e-03
308
+ 2024-08-06 13:42:21,073 INFO [trainer.py:765] (1/8) Epoch 19, batch 400, train_loss[loss=2.678, ArTop10Accuracy=0.7925, over 10728.00 frames. ], tot_loss[loss=2.705, ArTop10Accuracy=0.7881, over 10287.82 frames. ], batch size: 15, lr: 6.39e-03
309
+ 2024-08-06 13:43:44,961 INFO [trainer.py:765] (1/8) Epoch 19, batch 500, train_loss[loss=2.629, ArTop10Accuracy=0.8012, over 12252.00 frames. ], tot_loss[loss=2.701, ArTop10Accuracy=0.7885, over 10839.43 frames. ], batch size: 22, lr: 6.37e-03
310
+ 2024-08-06 13:45:16,687 INFO [trainer.py:765] (1/8) Epoch 19, batch 600, train_loss[loss=2.795, ArTop10Accuracy=0.7709, over 11283.00 frames. ], tot_loss[loss=2.704, ArTop10Accuracy=0.7882, over 11371.27 frames. ], batch size: 18, lr: 6.36e-03
311
+ 2024-08-06 13:46:48,328 INFO [trainer.py:765] (1/8) Epoch 19, batch 700, train_loss[loss=2.647, ArTop10Accuracy=0.7986, over 10188.00 frames. ], tot_loss[loss=2.712, ArTop10Accuracy=0.7867, over 11507.32 frames. ], batch size: 12, lr: 6.35e-03
312
+ 2024-08-06 13:48:11,890 INFO [trainer.py:765] (1/8) Epoch 19, batch 800, train_loss[loss=2.643, ArTop10Accuracy=0.8003, over 10218.00 frames. ], tot_loss[loss=2.715, ArTop10Accuracy=0.7864, over 11619.30 frames. ], batch size: 12, lr: 6.34e-03
313
+ 2024-08-06 13:49:27,263 INFO [trainer.py:765] (1/8) Epoch 19, batch 900, train_loss[loss=2.758, ArTop10Accuracy=0.777, over 13005.00 frames. ], tot_loss[loss=2.712, ArTop10Accuracy=0.7869, over 11663.49 frames. ], batch size: 27, lr: 6.32e-03
314
+ 2024-08-06 13:50:40,658 INFO [trainer.py:803] (1/8) Computing validation loss
315
+ 2024-08-06 13:50:50,535 INFO [trainer.py:811] (1/8) Epoch 19, validation: loss=2.818, ArTop10Accuracy=0.7679, over 1827537.00 frames.
316
+ 2024-08-06 13:50:50,536 INFO [trainer.py:814] (1/8) Maximum memory allocated so far is 33972MB
317
+ 2024-08-06 13:50:51,493 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.371e+02 1.455e+02 1.550e+02 3.697e+02, threshold=2.909e+02, percent-clipped=0.2
318
+ 2024-08-06 13:50:52,923 INFO [trainer.py:765] (1/8) Epoch 19, batch 1000, train_loss[loss=2.812, ArTop10Accuracy=0.7639, over 12567.00 frames. ], tot_loss[loss=2.718, ArTop10Accuracy=0.7857, over 11873.45 frames. ], batch size: 27, lr: 6.31e-03
319
+ 2024-08-06 13:52:08,273 INFO [trainer.py:765] (1/8) Epoch 19, batch 1100, train_loss[loss=2.733, ArTop10Accuracy=0.7851, over 13866.00 frames. ], tot_loss[loss=2.727, ArTop10Accuracy=0.7839, over 11972.86 frames. ], batch size: 34, lr: 6.30e-03
320
+ 2024-08-06 13:53:22,319 INFO [trainer.py:765] (1/8) Epoch 19, batch 1200, train_loss[loss=2.845, ArTop10Accuracy=0.7572, over 12054.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.7836, over 11871.82 frames. ], batch size: 101, lr: 6.28e-03
321
+ 2024-08-06 13:54:21,985 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
322
+ 2024-08-06 13:56:12,911 INFO [trainer.py:765] (1/8) Epoch 20, batch 100, train_loss[loss=2.773, ArTop10Accuracy=0.7772, over 14652.00 frames. ], tot_loss[loss=2.711, ArTop10Accuracy=0.7862, over 4761.24 frames. ], batch size: 62, lr: 6.10e-03
323
+ 2024-08-06 13:57:42,499 INFO [trainer.py:765] (1/8) Epoch 20, batch 200, train_loss[loss=2.695, ArTop10Accuracy=0.7874, over 13536.00 frames. ], tot_loss[loss=2.704, ArTop10Accuracy=0.7882, over 7748.16 frames. ], batch size: 34, lr: 6.09e-03
324
+ 2024-08-06 13:59:15,436 INFO [trainer.py:765] (1/8) Epoch 20, batch 300, train_loss[loss=2.737, ArTop10Accuracy=0.779, over 14100.00 frames. ], tot_loss[loss=2.7, ArTop10Accuracy=0.7891, over 9381.75 frames. ], batch size: 44, lr: 6.08e-03
325
+ 2024-08-06 14:00:44,362 INFO [trainer.py:765] (1/8) Epoch 20, batch 400, train_loss[loss=2.589, ArTop10Accuracy=0.8097, over 10383.00 frames. ], tot_loss[loss=2.698, ArTop10Accuracy=0.7892, over 10302.26 frames. ], batch size: 14, lr: 6.07e-03
326
+ 2024-08-06 14:02:14,860 INFO [trainer.py:765] (1/8) Epoch 20, batch 500, train_loss[loss=2.766, ArTop10Accuracy=0.7745, over 12717.00 frames. ], tot_loss[loss=2.697, ArTop10Accuracy=0.7896, over 10858.59 frames. ], batch size: 23, lr: 6.06e-03
327
+ 2024-08-06 14:03:40,860 INFO [trainer.py:765] (1/8) Epoch 20, batch 600, train_loss[loss=2.662, ArTop10Accuracy=0.7981, over 11499.00 frames. ], tot_loss[loss=2.699, ArTop10Accuracy=0.7891, over 11367.87 frames. ], batch size: 18, lr: 6.04e-03
328
+ 2024-08-06 14:05:13,872 INFO [trainer.py:765] (1/8) Epoch 20, batch 700, train_loss[loss=2.667, ArTop10Accuracy=0.7949, over 10287.00 frames. ], tot_loss[loss=2.706, ArTop10Accuracy=0.7879, over 11530.59 frames. ], batch size: 12, lr: 6.03e-03
329
+ 2024-08-06 14:05:30,795 INFO [optim.py:386] (1/8) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.365e+02 1.456e+02 1.550e+02 3.525e+02, threshold=2.913e+02, percent-clipped=0.1
330
+ 2024-08-06 14:06:34,515 INFO [trainer.py:765] (1/8) Epoch 20, batch 800, train_loss[loss=2.604, ArTop10Accuracy=0.8108, over 9273.00 frames. ], tot_loss[loss=2.713, ArTop10Accuracy=0.7865, over 11640.77 frames. ], batch size: 11, lr: 6.02e-03
331
+ 2024-08-06 14:07:50,950 INFO [trainer.py:765] (1/8) Epoch 20, batch 900, train_loss[loss=2.748, ArTop10Accuracy=0.7846, over 12843.00 frames. ], tot_loss[loss=2.704, ArTop10Accuracy=0.7883, over 11683.45 frames. ], batch size: 27, lr: 6.01e-03
332
+ 2024-08-06 14:09:07,180 INFO [trainer.py:765] (1/8) Epoch 20, batch 1000, train_loss[loss=2.718, ArTop10Accuracy=0.7892, over 12714.00 frames. ], tot_loss[loss=2.706, ArTop10Accuracy=0.7881, over 11882.84 frames. ], batch size: 27, lr: 6.00e-03
333
+ 2024-08-06 14:10:21,216 INFO [trainer.py:765] (1/8) Epoch 20, batch 1100, train_loss[loss=2.752, ArTop10Accuracy=0.7788, over 14235.00 frames. ], tot_loss[loss=2.712, ArTop10Accuracy=0.7871, over 11959.70 frames. ], batch size: 35, lr: 5.99e-03
334
+ 2024-08-06 14:11:37,820 INFO [trainer.py:765] (1/8) Epoch 20, batch 1200, train_loss[loss=2.856, ArTop10Accuracy=0.758, over 12105.00 frames. ], tot_loss[loss=2.715, ArTop10Accuracy=0.7866, over 11876.99 frames. ], batch size: 101, lr: 5.98e-03
335
+ 2024-08-06 14:12:36,847 INFO [trainer.py:650] (1/8) Reaches end of dataloader.
336
+ 2024-08-06 14:12:36,850 INFO [trainer.py:1069] (1/8) Done!
libritts-r/log/log-train-2024-08-06-08-06-14-2 ADDED
@@ -0,0 +1,336 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:06:14,316 INFO [trainer.py:870] (2/8) Training started
2
+ 2024-08-06 08:06:14,317 INFO [trainer.py:889] (2/8) Device: cuda:2
3
+ 2024-08-06 08:06:14,317 INFO [trainer.py:890] (2/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:06:14,317 INFO [trainer.py:892] (2/8) About to create model
5
+ 2024-08-06 08:06:15,078 INFO [trainer.py:899] (2/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:06:16,215 INFO [trainer.py:914] (2/8) Using DDP
7
+ 2024-08-06 08:06:19,151 INFO [datamodule.py:427] (2/8) About to get train cuts
8
+ 2024-08-06 08:06:19,153 INFO [datamodule.py:434] (2/8) About to get dev cuts
9
+ 2024-08-06 08:06:19,155 INFO [datamodule.py:292] (2/8) Disable SpecAugment
10
+ 2024-08-06 08:06:19,155 INFO [datamodule.py:294] (2/8) About to create train dataset
11
+ 2024-08-06 08:06:19,156 INFO [datamodule.py:323] (2/8) Using DynamicBucketingSampler
12
+ 2024-08-06 08:06:19,769 INFO [datamodule.py:344] (2/8) About to create train dataloader
13
+ 2024-08-06 08:06:19,770 INFO [datamodule.py:367] (2/8) About to create dev dataset
14
+ 2024-08-06 08:06:20,096 INFO [datamodule.py:388] (2/8) About to create dev dataloader
15
+ 2024-08-06 08:08:02,122 INFO [trainer.py:765] (2/8) Epoch 1, batch 100, train_loss[loss=4.363, ArTop10Accuracy=0.494, over 14232.00 frames. ], tot_loss[loss=5.052, ArTop10Accuracy=0.3739, over 4752.73 frames. ], batch size: 62, lr: 2.25e-02
16
+ 2024-08-06 08:09:28,827 INFO [trainer.py:765] (2/8) Epoch 1, batch 200, train_loss[loss=3.997, ArTop10Accuracy=0.554, over 13785.00 frames. ], tot_loss[loss=4.487, ArTop10Accuracy=0.4685, over 7750.12 frames. ], batch size: 34, lr: 3.00e-02
17
+ 2024-08-06 08:10:52,428 INFO [trainer.py:765] (2/8) Epoch 1, batch 300, train_loss[loss=3.878, ArTop10Accuracy=0.5701, over 14358.00 frames. ], tot_loss[loss=4.217, ArTop10Accuracy=0.5127, over 9382.60 frames. ], batch size: 45, lr: 3.00e-02
18
+ 2024-08-06 08:12:12,699 INFO [trainer.py:765] (2/8) Epoch 1, batch 400, train_loss[loss=3.715, ArTop10Accuracy=0.6059, over 10305.00 frames. ], tot_loss[loss=4.027, ArTop10Accuracy=0.5453, over 10290.17 frames. ], batch size: 14, lr: 3.00e-02
19
+ 2024-08-06 08:13:40,050 INFO [trainer.py:765] (2/8) Epoch 1, batch 500, train_loss[loss=3.654, ArTop10Accuracy=0.6128, over 12216.00 frames. ], tot_loss[loss=3.879, ArTop10Accuracy=0.5711, over 10857.77 frames. ], batch size: 22, lr: 2.99e-02
20
+ 2024-08-06 08:15:00,245 INFO [trainer.py:765] (2/8) Epoch 1, batch 600, train_loss[loss=3.641, ArTop10Accuracy=0.6103, over 11523.00 frames. ], tot_loss[loss=3.768, ArTop10Accuracy=0.5906, over 11362.99 frames. ], batch size: 18, lr: 2.99e-02
21
+ 2024-08-06 08:16:26,425 INFO [trainer.py:765] (2/8) Epoch 1, batch 700, train_loss[loss=3.477, ArTop10Accuracy=0.6423, over 10293.00 frames. ], tot_loss[loss=3.69, ArTop10Accuracy=0.6047, over 11525.39 frames. ], batch size: 12, lr: 2.99e-02
22
+ 2024-08-06 08:17:43,020 INFO [trainer.py:765] (2/8) Epoch 1, batch 800, train_loss[loss=3.488, ArTop10Accuracy=0.6468, over 10005.00 frames. ], tot_loss[loss=3.627, ArTop10Accuracy=0.6161, over 11644.65 frames. ], batch size: 12, lr: 2.98e-02
23
+ 2024-08-06 08:18:56,151 INFO [trainer.py:765] (2/8) Epoch 1, batch 900, train_loss[loss=3.489, ArTop10Accuracy=0.6433, over 13104.00 frames. ], tot_loss[loss=3.569, ArTop10Accuracy=0.6268, over 11684.52 frames. ], batch size: 28, lr: 2.98e-02
24
+ 2024-08-06 08:20:12,863 INFO [trainer.py:765] (2/8) Epoch 1, batch 1000, train_loss[loss=3.458, ArTop10Accuracy=0.6483, over 12945.00 frames. ], tot_loss[loss=3.525, ArTop10Accuracy=0.6349, over 11858.22 frames. ], batch size: 27, lr: 2.97e-02
25
+ 2024-08-06 08:20:13,539 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 9.300e+01 1.871e+02 2.675e+02 4.030e+02 9.119e+03, threshold=5.351e+02, percent-clipped=0.0
26
+ 2024-08-06 08:21:29,155 INFO [trainer.py:765] (2/8) Epoch 1, batch 1100, train_loss[loss=3.455, ArTop10Accuracy=0.6509, over 13734.00 frames. ], tot_loss[loss=3.489, ArTop10Accuracy=0.6414, over 11951.02 frames. ], batch size: 34, lr: 2.96e-02
27
+ 2024-08-06 08:22:45,412 INFO [trainer.py:765] (2/8) Epoch 1, batch 1200, train_loss[loss=3.438, ArTop10Accuracy=0.6536, over 12531.00 frames. ], tot_loss[loss=3.463, ArTop10Accuracy=0.6462, over 11868.08 frames. ], batch size: 101, lr: 2.96e-02
28
+ 2024-08-06 08:23:45,264 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
29
+ 2024-08-06 08:25:36,238 INFO [trainer.py:765] (2/8) Epoch 2, batch 100, train_loss[loss=3.4, ArTop10Accuracy=0.6547, over 14385.00 frames. ], tot_loss[loss=3.411, ArTop10Accuracy=0.6553, over 4753.26 frames. ], batch size: 62, lr: 2.90e-02
30
+ 2024-08-06 08:26:58,956 INFO [trainer.py:765] (2/8) Epoch 2, batch 200, train_loss[loss=3.384, ArTop10Accuracy=0.6598, over 13692.00 frames. ], tot_loss[loss=3.387, ArTop10Accuracy=0.6599, over 7750.66 frames. ], batch size: 34, lr: 2.89e-02
31
+ 2024-08-06 08:28:25,533 INFO [trainer.py:765] (2/8) Epoch 2, batch 300, train_loss[loss=3.337, ArTop10Accuracy=0.6695, over 14655.00 frames. ], tot_loss[loss=3.368, ArTop10Accuracy=0.6638, over 9401.76 frames. ], batch size: 45, lr: 2.89e-02
32
+ 2024-08-06 08:29:48,637 INFO [trainer.py:765] (2/8) Epoch 2, batch 400, train_loss[loss=3.286, ArTop10Accuracy=0.6797, over 10335.00 frames. ], tot_loss[loss=3.354, ArTop10Accuracy=0.6663, over 10286.49 frames. ], batch size: 14, lr: 2.88e-02
33
+ 2024-08-06 08:31:22,900 INFO [trainer.py:765] (2/8) Epoch 2, batch 500, train_loss[loss=3.357, ArTop10Accuracy=0.6681, over 12267.00 frames. ], tot_loss[loss=3.345, ArTop10Accuracy=0.6679, over 10860.51 frames. ], batch size: 22, lr: 2.87e-02
34
+ 2024-08-06 08:32:45,689 INFO [trainer.py:765] (2/8) Epoch 2, batch 600, train_loss[loss=3.376, ArTop10Accuracy=0.6645, over 11415.00 frames. ], tot_loss[loss=3.334, ArTop10Accuracy=0.6704, over 11391.79 frames. ], batch size: 18, lr: 2.86e-02
35
+ 2024-08-06 08:34:13,581 INFO [trainer.py:765] (2/8) Epoch 2, batch 700, train_loss[loss=3.356, ArTop10Accuracy=0.6722, over 9474.00 frames. ], tot_loss[loss=3.329, ArTop10Accuracy=0.6711, over 11540.34 frames. ], batch size: 11, lr: 2.85e-02
36
+ 2024-08-06 08:34:31,173 INFO [trainer.py:803] (2/8) Computing validation loss
37
+ 2024-08-06 08:34:40,887 INFO [trainer.py:811] (2/8) Epoch 2, validation: loss=3.277, ArTop10Accuracy=0.6803, over 1827537.00 frames.
38
+ 2024-08-06 08:34:40,888 INFO [trainer.py:814] (2/8) Maximum memory allocated so far is 33008MB
39
+ 2024-08-06 08:34:41,700 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 7.953e+01 1.592e+02 2.200e+02 3.344e+02 2.949e+03, threshold=4.400e+02, percent-clipped=8.6
40
+ 2024-08-06 08:35:39,878 INFO [trainer.py:765] (2/8) Epoch 2, batch 800, train_loss[loss=3.26, ArTop10Accuracy=0.6839, over 10119.00 frames. ], tot_loss[loss=3.322, ArTop10Accuracy=0.6726, over 11672.83 frames. ], batch size: 12, lr: 2.84e-02
41
+ 2024-08-06 08:36:56,371 INFO [trainer.py:765] (2/8) Epoch 2, batch 900, train_loss[loss=3.232, ArTop10Accuracy=0.6876, over 13044.00 frames. ], tot_loss[loss=3.31, ArTop10Accuracy=0.6749, over 11707.63 frames. ], batch size: 27, lr: 2.83e-02
42
+ 2024-08-06 08:38:10,512 INFO [trainer.py:765] (2/8) Epoch 2, batch 1000, train_loss[loss=3.22, ArTop10Accuracy=0.6934, over 12762.00 frames. ], tot_loss[loss=3.302, ArTop10Accuracy=0.6761, over 11890.52 frames. ], batch size: 27, lr: 2.82e-02
43
+ 2024-08-06 08:39:25,060 INFO [trainer.py:765] (2/8) Epoch 2, batch 1100, train_loss[loss=3.299, ArTop10Accuracy=0.6775, over 13686.00 frames. ], tot_loss[loss=3.296, ArTop10Accuracy=0.6772, over 11940.66 frames. ], batch size: 34, lr: 2.81e-02
44
+ 2024-08-06 08:40:38,220 INFO [trainer.py:765] (2/8) Epoch 2, batch 1200, train_loss[loss=3.305, ArTop10Accuracy=0.6728, over 12009.00 frames. ], tot_loss[loss=3.285, ArTop10Accuracy=0.6794, over 11860.80 frames. ], batch size: 103, lr: 2.80e-02
45
+ 2024-08-06 08:41:38,236 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
46
+ 2024-08-06 08:43:36,650 INFO [trainer.py:765] (2/8) Epoch 3, batch 100, train_loss[loss=3.299, ArTop10Accuracy=0.6755, over 14610.00 frames. ], tot_loss[loss=3.25, ArTop10Accuracy=0.6855, over 4768.32 frames. ], batch size: 62, lr: 2.67e-02
47
+ 2024-08-06 08:45:10,499 INFO [trainer.py:765] (2/8) Epoch 3, batch 200, train_loss[loss=3.211, ArTop10Accuracy=0.6906, over 13572.00 frames. ], tot_loss[loss=3.222, ArTop10Accuracy=0.6906, over 7770.25 frames. ], batch size: 34, lr: 2.66e-02
48
+ 2024-08-06 08:46:29,258 INFO [trainer.py:765] (2/8) Epoch 3, batch 300, train_loss[loss=3.181, ArTop10Accuracy=0.6992, over 14205.00 frames. ], tot_loss[loss=3.203, ArTop10Accuracy=0.6945, over 9392.67 frames. ], batch size: 44, lr: 2.64e-02
49
+ 2024-08-06 08:48:04,217 INFO [trainer.py:765] (2/8) Epoch 3, batch 400, train_loss[loss=3.158, ArTop10Accuracy=0.7093, over 10824.00 frames. ], tot_loss[loss=3.188, ArTop10Accuracy=0.6974, over 10294.96 frames. ], batch size: 15, lr: 2.63e-02
50
+ 2024-08-06 08:48:40,880 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 9.282e+01 1.561e+02 1.981e+02 2.686e+02 1.768e+03, threshold=3.962e+02, percent-clipped=7.6
51
+ 2024-08-06 08:49:25,542 INFO [trainer.py:765] (2/8) Epoch 3, batch 500, train_loss[loss=3.09, ArTop10Accuracy=0.7159, over 12096.00 frames. ], tot_loss[loss=3.174, ArTop10Accuracy=0.7, over 10852.34 frames. ], batch size: 22, lr: 2.62e-02
52
+ 2024-08-06 08:51:00,477 INFO [trainer.py:765] (2/8) Epoch 3, batch 600, train_loss[loss=3.165, ArTop10Accuracy=0.7016, over 11343.00 frames. ], tot_loss[loss=3.157, ArTop10Accuracy=0.7035, over 11365.70 frames. ], batch size: 18, lr: 2.61e-02
53
+ 2024-08-06 08:52:31,618 INFO [trainer.py:765] (2/8) Epoch 3, batch 700, train_loss[loss=3.123, ArTop10Accuracy=0.7134, over 9264.00 frames. ], tot_loss[loss=3.15, ArTop10Accuracy=0.7048, over 11520.92 frames. ], batch size: 11, lr: 2.60e-02
54
+ 2024-08-06 08:53:57,388 INFO [trainer.py:765] (2/8) Epoch 3, batch 800, train_loss[loss=3.136, ArTop10Accuracy=0.7107, over 9384.00 frames. ], tot_loss[loss=3.143, ArTop10Accuracy=0.706, over 11642.41 frames. ], batch size: 11, lr: 2.59e-02
55
+ 2024-08-06 08:55:15,118 INFO [trainer.py:765] (2/8) Epoch 3, batch 900, train_loss[loss=3.106, ArTop10Accuracy=0.7141, over 12834.00 frames. ], tot_loss[loss=3.12, ArTop10Accuracy=0.7105, over 11681.59 frames. ], batch size: 27, lr: 2.57e-02
56
+ 2024-08-06 08:56:31,557 INFO [trainer.py:765] (2/8) Epoch 3, batch 1000, train_loss[loss=3.044, ArTop10Accuracy=0.7252, over 12933.00 frames. ], tot_loss[loss=3.113, ArTop10Accuracy=0.7119, over 11876.26 frames. ], batch size: 27, lr: 2.56e-02
57
+ 2024-08-06 08:57:46,505 INFO [trainer.py:765] (2/8) Epoch 3, batch 1100, train_loss[loss=3.112, ArTop10Accuracy=0.7162, over 13767.00 frames. ], tot_loss[loss=3.108, ArTop10Accuracy=0.7123, over 11954.94 frames. ], batch size: 34, lr: 2.55e-02
58
+ 2024-08-06 08:59:01,398 INFO [trainer.py:765] (2/8) Epoch 3, batch 1200, train_loss[loss=3.103, ArTop10Accuracy=0.712, over 12045.00 frames. ], tot_loss[loss=3.097, ArTop10Accuracy=0.7148, over 11876.24 frames. ], batch size: 101, lr: 2.54e-02
59
+ 2024-08-06 09:00:01,918 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
60
+ 2024-08-06 09:01:50,740 INFO [trainer.py:765] (2/8) Epoch 4, batch 100, train_loss[loss=3.141, ArTop10Accuracy=0.7039, over 14811.00 frames. ], tot_loss[loss=3.07, ArTop10Accuracy=0.7194, over 4779.88 frames. ], batch size: 62, lr: 2.38e-02
61
+ 2024-08-06 09:02:52,859 INFO [trainer.py:803] (2/8) Computing validation loss
62
+ 2024-08-06 09:03:02,383 INFO [trainer.py:811] (2/8) Epoch 4, validation: loss=2.997, ArTop10Accuracy=0.7338, over 1827537.00 frames.
63
+ 2024-08-06 09:03:02,384 INFO [trainer.py:814] (2/8) Maximum memory allocated so far is 33008MB
64
+ 2024-08-06 09:03:03,362 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.499e+02 1.782e+02 2.273e+02 1.100e+03, threshold=3.565e+02, percent-clipped=4.7
65
+ 2024-08-06 09:03:29,271 INFO [trainer.py:765] (2/8) Epoch 4, batch 200, train_loss[loss=3.03, ArTop10Accuracy=0.7293, over 13884.00 frames. ], tot_loss[loss=3.046, ArTop10Accuracy=0.7239, over 7761.27 frames. ], batch size: 35, lr: 2.37e-02
66
+ 2024-08-06 09:05:01,732 INFO [trainer.py:765] (2/8) Epoch 4, batch 300, train_loss[loss=3.033, ArTop10Accuracy=0.7252, over 14427.00 frames. ], tot_loss[loss=3.04, ArTop10Accuracy=0.7251, over 9404.39 frames. ], batch size: 44, lr: 2.36e-02
67
+ 2024-08-06 09:06:28,149 INFO [trainer.py:765] (2/8) Epoch 4, batch 400, train_loss[loss=3.001, ArTop10Accuracy=0.7312, over 10125.00 frames. ], tot_loss[loss=3.034, ArTop10Accuracy=0.7263, over 10311.26 frames. ], batch size: 14, lr: 2.34e-02
68
+ 2024-08-06 09:08:01,925 INFO [trainer.py:765] (2/8) Epoch 4, batch 500, train_loss[loss=2.926, ArTop10Accuracy=0.7474, over 12684.00 frames. ], tot_loss[loss=3.029, ArTop10Accuracy=0.7274, over 10875.39 frames. ], batch size: 23, lr: 2.33e-02
69
+ 2024-08-06 09:09:28,540 INFO [trainer.py:765] (2/8) Epoch 4, batch 600, train_loss[loss=3.015, ArTop10Accuracy=0.7289, over 11412.00 frames. ], tot_loss[loss=3.027, ArTop10Accuracy=0.7277, over 11389.35 frames. ], batch size: 18, lr: 2.32e-02
70
+ 2024-08-06 09:10:59,865 INFO [trainer.py:765] (2/8) Epoch 4, batch 700, train_loss[loss=3.008, ArTop10Accuracy=0.7289, over 10233.00 frames. ], tot_loss[loss=3.025, ArTop10Accuracy=0.728, over 11523.83 frames. ], batch size: 12, lr: 2.31e-02
71
+ 2024-08-06 09:12:17,513 INFO [trainer.py:765] (2/8) Epoch 4, batch 800, train_loss[loss=2.89, ArTop10Accuracy=0.7544, over 10230.00 frames. ], tot_loss[loss=3.026, ArTop10Accuracy=0.7279, over 11647.88 frames. ], batch size: 12, lr: 2.30e-02
72
+ 2024-08-06 09:13:33,212 INFO [trainer.py:765] (2/8) Epoch 4, batch 900, train_loss[loss=3.084, ArTop10Accuracy=0.7148, over 13032.00 frames. ], tot_loss[loss=3.014, ArTop10Accuracy=0.7303, over 11678.88 frames. ], batch size: 27, lr: 2.29e-02
73
+ 2024-08-06 09:14:47,520 INFO [trainer.py:765] (2/8) Epoch 4, batch 1000, train_loss[loss=2.916, ArTop10Accuracy=0.7481, over 12747.00 frames. ], tot_loss[loss=3.013, ArTop10Accuracy=0.7304, over 11858.96 frames. ], batch size: 27, lr: 2.28e-02
74
+ 2024-08-06 09:16:02,981 INFO [trainer.py:765] (2/8) Epoch 4, batch 1100, train_loss[loss=2.947, ArTop10Accuracy=0.7418, over 13398.00 frames. ], tot_loss[loss=3.014, ArTop10Accuracy=0.7302, over 11919.73 frames. ], batch size: 34, lr: 2.26e-02
75
+ 2024-08-06 09:16:53,292 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.440e+02 1.636e+02 1.968e+02 7.702e+02, threshold=3.273e+02, percent-clipped=1.3
76
+ 2024-08-06 09:17:18,345 INFO [trainer.py:765] (2/8) Epoch 4, batch 1200, train_loss[loss=3.05, ArTop10Accuracy=0.722, over 11808.00 frames. ], tot_loss[loss=3.009, ArTop10Accuracy=0.7312, over 11852.81 frames. ], batch size: 101, lr: 2.25e-02
77
+ 2024-08-06 09:18:17,314 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
78
+ 2024-08-06 09:20:17,171 INFO [trainer.py:765] (2/8) Epoch 5, batch 100, train_loss[loss=3.07, ArTop10Accuracy=0.7177, over 14298.00 frames. ], tot_loss[loss=2.987, ArTop10Accuracy=0.7348, over 4753.11 frames. ], batch size: 62, lr: 2.10e-02
79
+ 2024-08-06 09:21:52,293 INFO [trainer.py:765] (2/8) Epoch 5, batch 200, train_loss[loss=2.957, ArTop10Accuracy=0.7421, over 13812.00 frames. ], tot_loss[loss=2.98, ArTop10Accuracy=0.7365, over 7752.56 frames. ], batch size: 34, lr: 2.09e-02
80
+ 2024-08-06 09:23:19,239 INFO [trainer.py:765] (2/8) Epoch 5, batch 300, train_loss[loss=3.04, ArTop10Accuracy=0.7233, over 14325.00 frames. ], tot_loss[loss=2.972, ArTop10Accuracy=0.7381, over 9367.98 frames. ], batch size: 44, lr: 2.08e-02
81
+ 2024-08-06 09:24:53,537 INFO [trainer.py:765] (2/8) Epoch 5, batch 400, train_loss[loss=2.909, ArTop10Accuracy=0.7522, over 10209.00 frames. ], tot_loss[loss=2.97, ArTop10Accuracy=0.7384, over 10284.90 frames. ], batch size: 14, lr: 2.07e-02
82
+ 2024-08-06 09:26:19,418 INFO [trainer.py:765] (2/8) Epoch 5, batch 500, train_loss[loss=2.91, ArTop10Accuracy=0.7482, over 12159.00 frames. ], tot_loss[loss=2.961, ArTop10Accuracy=0.7404, over 10847.24 frames. ], batch size: 22, lr: 2.06e-02
83
+ 2024-08-06 09:27:49,537 INFO [trainer.py:765] (2/8) Epoch 5, batch 600, train_loss[loss=2.969, ArTop10Accuracy=0.7339, over 11391.00 frames. ], tot_loss[loss=2.962, ArTop10Accuracy=0.74, over 11368.24 frames. ], batch size: 18, lr: 2.05e-02
84
+ 2024-08-06 09:29:21,671 INFO [trainer.py:765] (2/8) Epoch 5, batch 700, train_loss[loss=2.931, ArTop10Accuracy=0.7457, over 10194.00 frames. ], tot_loss[loss=2.966, ArTop10Accuracy=0.7394, over 11518.22 frames. ], batch size: 12, lr: 2.04e-02
85
+ 2024-08-06 09:30:44,692 INFO [trainer.py:765] (2/8) Epoch 5, batch 800, train_loss[loss=2.974, ArTop10Accuracy=0.7366, over 9222.00 frames. ], tot_loss[loss=2.97, ArTop10Accuracy=0.7384, over 11625.17 frames. ], batch size: 11, lr: 2.03e-02
86
+ 2024-08-06 09:31:51,239 INFO [trainer.py:803] (2/8) Computing validation loss
87
+ 2024-08-06 09:32:00,761 INFO [trainer.py:811] (2/8) Epoch 5, validation: loss=2.926, ArTop10Accuracy=0.7466, over 1827537.00 frames.
88
+ 2024-08-06 09:32:00,762 INFO [trainer.py:814] (2/8) Maximum memory allocated so far is 33008MB
89
+ 2024-08-06 09:32:01,706 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.349e+02 1.525e+02 1.806e+02 1.007e+03, threshold=3.049e+02, percent-clipped=2.3
90
+ 2024-08-06 09:32:10,550 INFO [trainer.py:765] (2/8) Epoch 5, batch 900, train_loss[loss=2.953, ArTop10Accuracy=0.7378, over 12915.00 frames. ], tot_loss[loss=2.961, ArTop10Accuracy=0.7405, over 11667.24 frames. ], batch size: 27, lr: 2.02e-02
91
+ 2024-08-06 09:33:27,322 INFO [trainer.py:765] (2/8) Epoch 5, batch 1000, train_loss[loss=2.969, ArTop10Accuracy=0.7443, over 12915.00 frames. ], tot_loss[loss=2.963, ArTop10Accuracy=0.7401, over 11855.31 frames. ], batch size: 27, lr: 2.01e-02
92
+ 2024-08-06 09:34:42,299 INFO [trainer.py:765] (2/8) Epoch 5, batch 1100, train_loss[loss=2.942, ArTop10Accuracy=0.7451, over 13662.00 frames. ], tot_loss[loss=2.962, ArTop10Accuracy=0.7403, over 11926.15 frames. ], batch size: 34, lr: 2.00e-02
93
+ 2024-08-06 09:35:56,330 INFO [trainer.py:765] (2/8) Epoch 5, batch 1200, train_loss[loss=3.037, ArTop10Accuracy=0.7247, over 12591.00 frames. ], tot_loss[loss=2.96, ArTop10Accuracy=0.7405, over 11843.81 frames. ], batch size: 101, lr: 1.99e-02
94
+ 2024-08-06 09:36:54,933 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
95
+ 2024-08-06 09:38:52,662 INFO [trainer.py:765] (2/8) Epoch 6, batch 100, train_loss[loss=3.01, ArTop10Accuracy=0.729, over 14643.00 frames. ], tot_loss[loss=2.948, ArTop10Accuracy=0.7425, over 4757.65 frames. ], batch size: 62, lr: 1.85e-02
96
+ 2024-08-06 09:40:19,834 INFO [trainer.py:765] (2/8) Epoch 6, batch 200, train_loss[loss=2.891, ArTop10Accuracy=0.7547, over 13698.00 frames. ], tot_loss[loss=2.936, ArTop10Accuracy=0.7452, over 7750.83 frames. ], batch size: 34, lr: 1.84e-02
97
+ 2024-08-06 09:41:52,964 INFO [trainer.py:765] (2/8) Epoch 6, batch 300, train_loss[loss=2.975, ArTop10Accuracy=0.7346, over 14988.00 frames. ], tot_loss[loss=2.934, ArTop10Accuracy=0.7453, over 9390.38 frames. ], batch size: 45, lr: 1.83e-02
98
+ 2024-08-06 09:43:17,827 INFO [trainer.py:765] (2/8) Epoch 6, batch 400, train_loss[loss=2.765, ArTop10Accuracy=0.7792, over 10242.00 frames. ], tot_loss[loss=2.931, ArTop10Accuracy=0.746, over 10311.50 frames. ], batch size: 14, lr: 1.83e-02
99
+ 2024-08-06 09:44:54,128 INFO [trainer.py:765] (2/8) Epoch 6, batch 500, train_loss[loss=2.926, ArTop10Accuracy=0.7468, over 12633.00 frames. ], tot_loss[loss=2.925, ArTop10Accuracy=0.7472, over 10873.26 frames. ], batch size: 23, lr: 1.82e-02
100
+ 2024-08-06 09:46:22,873 INFO [trainer.py:765] (2/8) Epoch 6, batch 600, train_loss[loss=2.928, ArTop10Accuracy=0.7472, over 11355.00 frames. ], tot_loss[loss=2.924, ArTop10Accuracy=0.7471, over 11385.20 frames. ], batch size: 18, lr: 1.81e-02
101
+ 2024-08-06 09:46:37,219 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.012e+02 1.339e+02 1.480e+02 1.701e+02 7.506e+02, threshold=2.959e+02, percent-clipped=1.1
102
+ 2024-08-06 09:47:57,867 INFO [trainer.py:765] (2/8) Epoch 6, batch 700, train_loss[loss=2.878, ArTop10Accuracy=0.7607, over 10119.00 frames. ], tot_loss[loss=2.925, ArTop10Accuracy=0.7469, over 11525.26 frames. ], batch size: 12, lr: 1.80e-02
103
+ 2024-08-06 09:49:15,954 INFO [trainer.py:765] (2/8) Epoch 6, batch 800, train_loss[loss=2.852, ArTop10Accuracy=0.7609, over 9819.00 frames. ], tot_loss[loss=2.928, ArTop10Accuracy=0.7466, over 11650.70 frames. ], batch size: 12, lr: 1.79e-02
104
+ 2024-08-06 09:50:32,135 INFO [trainer.py:765] (2/8) Epoch 6, batch 900, train_loss[loss=2.791, ArTop10Accuracy=0.78, over 13071.00 frames. ], tot_loss[loss=2.924, ArTop10Accuracy=0.7474, over 11693.48 frames. ], batch size: 27, lr: 1.78e-02
105
+ 2024-08-06 09:51:47,297 INFO [trainer.py:765] (2/8) Epoch 6, batch 1000, train_loss[loss=2.926, ArTop10Accuracy=0.7512, over 12903.00 frames. ], tot_loss[loss=2.926, ArTop10Accuracy=0.747, over 11887.13 frames. ], batch size: 27, lr: 1.77e-02
106
+ 2024-08-06 09:53:00,921 INFO [trainer.py:765] (2/8) Epoch 6, batch 1100, train_loss[loss=2.984, ArTop10Accuracy=0.7382, over 13563.00 frames. ], tot_loss[loss=2.929, ArTop10Accuracy=0.7464, over 11928.49 frames. ], batch size: 34, lr: 1.77e-02
107
+ 2024-08-06 09:54:14,336 INFO [trainer.py:765] (2/8) Epoch 6, batch 1200, train_loss[loss=3.008, ArTop10Accuracy=0.731, over 12219.00 frames. ], tot_loss[loss=2.926, ArTop10Accuracy=0.7471, over 11866.28 frames. ], batch size: 101, lr: 1.76e-02
108
+ 2024-08-06 09:55:13,368 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
109
+ 2024-08-06 09:57:06,699 INFO [trainer.py:765] (2/8) Epoch 7, batch 100, train_loss[loss=2.987, ArTop10Accuracy=0.7334, over 14478.00 frames. ], tot_loss[loss=2.916, ArTop10Accuracy=0.7479, over 4762.83 frames. ], batch size: 62, lr: 1.64e-02
110
+ 2024-08-06 09:58:39,425 INFO [trainer.py:765] (2/8) Epoch 7, batch 200, train_loss[loss=2.928, ArTop10Accuracy=0.7454, over 13437.00 frames. ], tot_loss[loss=2.902, ArTop10Accuracy=0.7509, over 7759.42 frames. ], batch size: 34, lr: 1.64e-02
111
+ 2024-08-06 10:00:06,083 INFO [trainer.py:765] (2/8) Epoch 7, batch 300, train_loss[loss=2.956, ArTop10Accuracy=0.7386, over 14145.00 frames. ], tot_loss[loss=2.9, ArTop10Accuracy=0.7512, over 9375.25 frames. ], batch size: 44, lr: 1.63e-02
112
+ 2024-08-06 10:00:40,509 INFO [trainer.py:803] (2/8) Computing validation loss
113
+ 2024-08-06 10:00:50,245 INFO [trainer.py:811] (2/8) Epoch 7, validation: loss=2.88, ArTop10Accuracy=0.7554, over 1827537.00 frames.
114
+ 2024-08-06 10:00:50,246 INFO [trainer.py:814] (2/8) Maximum memory allocated so far is 33008MB
115
+ 2024-08-06 10:00:50,977 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.002e+02 1.286e+02 1.429e+02 1.605e+02 1.020e+03, threshold=2.857e+02, percent-clipped=1.5
116
+ 2024-08-06 10:01:49,116 INFO [trainer.py:765] (2/8) Epoch 7, batch 400, train_loss[loss=2.927, ArTop10Accuracy=0.7455, over 10119.00 frames. ], tot_loss[loss=2.897, ArTop10Accuracy=0.7518, over 10282.62 frames. ], batch size: 14, lr: 1.62e-02
117
+ 2024-08-06 10:03:21,457 INFO [trainer.py:765] (2/8) Epoch 7, batch 500, train_loss[loss=2.883, ArTop10Accuracy=0.7547, over 12735.00 frames. ], tot_loss[loss=2.892, ArTop10Accuracy=0.7532, over 10853.66 frames. ], batch size: 23, lr: 1.61e-02
118
+ 2024-08-06 10:04:51,883 INFO [trainer.py:765] (2/8) Epoch 7, batch 600, train_loss[loss=2.797, ArTop10Accuracy=0.7727, over 11964.00 frames. ], tot_loss[loss=2.891, ArTop10Accuracy=0.7535, over 11365.43 frames. ], batch size: 19, lr: 1.61e-02
119
+ 2024-08-06 10:06:25,110 INFO [trainer.py:765] (2/8) Epoch 7, batch 700, train_loss[loss=2.876, ArTop10Accuracy=0.7522, over 10353.00 frames. ], tot_loss[loss=2.896, ArTop10Accuracy=0.7524, over 11504.87 frames. ], batch size: 12, lr: 1.60e-02
120
+ 2024-08-06 10:07:46,949 INFO [trainer.py:765] (2/8) Epoch 7, batch 800, train_loss[loss=2.859, ArTop10Accuracy=0.7643, over 10203.00 frames. ], tot_loss[loss=2.897, ArTop10Accuracy=0.7524, over 11612.79 frames. ], batch size: 12, lr: 1.59e-02
121
+ 2024-08-06 10:09:02,821 INFO [trainer.py:765] (2/8) Epoch 7, batch 900, train_loss[loss=2.795, ArTop10Accuracy=0.7766, over 12912.00 frames. ], tot_loss[loss=2.892, ArTop10Accuracy=0.7533, over 11666.77 frames. ], batch size: 27, lr: 1.59e-02
122
+ 2024-08-06 10:10:19,636 INFO [trainer.py:765] (2/8) Epoch 7, batch 1000, train_loss[loss=2.902, ArTop10Accuracy=0.7527, over 12813.00 frames. ], tot_loss[loss=2.894, ArTop10Accuracy=0.7529, over 11861.16 frames. ], batch size: 27, lr: 1.58e-02
123
+ 2024-08-06 10:11:35,208 INFO [trainer.py:765] (2/8) Epoch 7, batch 1100, train_loss[loss=2.863, ArTop10Accuracy=0.7594, over 13686.00 frames. ], tot_loss[loss=2.9, ArTop10Accuracy=0.7517, over 11940.68 frames. ], batch size: 34, lr: 1.57e-02
124
+ 2024-08-06 10:12:48,204 INFO [trainer.py:765] (2/8) Epoch 7, batch 1200, train_loss[loss=3.009, ArTop10Accuracy=0.7302, over 11538.00 frames. ], tot_loss[loss=2.897, ArTop10Accuracy=0.7522, over 11860.24 frames. ], batch size: 103, lr: 1.57e-02
125
+ 2024-08-06 10:13:46,381 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
126
+ 2024-08-06 10:15:03,601 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.017e+02 1.283e+02 1.410e+02 1.601e+02 1.017e+03, threshold=2.820e+02, percent-clipped=0.9
127
+ 2024-08-06 10:15:40,821 INFO [trainer.py:765] (2/8) Epoch 8, batch 100, train_loss[loss=2.902, ArTop10Accuracy=0.7498, over 14082.00 frames. ], tot_loss[loss=2.885, ArTop10Accuracy=0.7545, over 4743.10 frames. ], batch size: 62, lr: 1.47e-02
128
+ 2024-08-06 10:17:12,862 INFO [trainer.py:765] (2/8) Epoch 8, batch 200, train_loss[loss=2.889, ArTop10Accuracy=0.7539, over 13659.00 frames. ], tot_loss[loss=2.877, ArTop10Accuracy=0.7557, over 7747.35 frames. ], batch size: 34, lr: 1.46e-02
129
+ 2024-08-06 10:18:37,898 INFO [trainer.py:765] (2/8) Epoch 8, batch 300, train_loss[loss=2.899, ArTop10Accuracy=0.7508, over 14328.00 frames. ], tot_loss[loss=2.872, ArTop10Accuracy=0.7569, over 9362.59 frames. ], batch size: 44, lr: 1.46e-02
130
+ 2024-08-06 10:20:06,342 INFO [trainer.py:765] (2/8) Epoch 8, batch 400, train_loss[loss=2.801, ArTop10Accuracy=0.7699, over 10290.00 frames. ], tot_loss[loss=2.862, ArTop10Accuracy=0.7589, over 10271.46 frames. ], batch size: 14, lr: 1.45e-02
131
+ 2024-08-06 10:21:32,411 INFO [trainer.py:765] (2/8) Epoch 8, batch 500, train_loss[loss=2.819, ArTop10Accuracy=0.7623, over 12291.00 frames. ], tot_loss[loss=2.859, ArTop10Accuracy=0.7595, over 10811.25 frames. ], batch size: 22, lr: 1.45e-02
132
+ 2024-08-06 10:23:00,974 INFO [trainer.py:765] (2/8) Epoch 8, batch 600, train_loss[loss=2.875, ArTop10Accuracy=0.7605, over 11448.00 frames. ], tot_loss[loss=2.858, ArTop10Accuracy=0.7597, over 11342.02 frames. ], batch size: 18, lr: 1.44e-02
133
+ 2024-08-06 10:24:37,788 INFO [trainer.py:765] (2/8) Epoch 8, batch 700, train_loss[loss=2.805, ArTop10Accuracy=0.7703, over 9375.00 frames. ], tot_loss[loss=2.865, ArTop10Accuracy=0.7582, over 11499.48 frames. ], batch size: 11, lr: 1.43e-02
134
+ 2024-08-06 10:25:56,085 INFO [trainer.py:765] (2/8) Epoch 8, batch 800, train_loss[loss=2.802, ArTop10Accuracy=0.7732, over 9558.00 frames. ], tot_loss[loss=2.872, ArTop10Accuracy=0.7568, over 11628.09 frames. ], batch size: 11, lr: 1.43e-02
135
+ 2024-08-06 10:27:12,244 INFO [trainer.py:765] (2/8) Epoch 8, batch 900, train_loss[loss=2.901, ArTop10Accuracy=0.7464, over 13362.00 frames. ], tot_loss[loss=2.873, ArTop10Accuracy=0.7566, over 11687.71 frames. ], batch size: 28, lr: 1.42e-02
136
+ 2024-08-06 10:28:25,263 INFO [trainer.py:765] (2/8) Epoch 8, batch 1000, train_loss[loss=2.916, ArTop10Accuracy=0.7449, over 12993.00 frames. ], tot_loss[loss=2.877, ArTop10Accuracy=0.7559, over 11886.99 frames. ], batch size: 27, lr: 1.42e-02
137
+ 2024-08-06 10:29:07,156 INFO [trainer.py:803] (2/8) Computing validation loss
138
+ 2024-08-06 10:29:16,830 INFO [trainer.py:811] (2/8) Epoch 8, validation: loss=2.858, ArTop10Accuracy=0.7594, over 1827537.00 frames.
139
+ 2024-08-06 10:29:16,831 INFO [trainer.py:814] (2/8) Maximum memory allocated so far is 33008MB
140
+ 2024-08-06 10:29:17,491 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.275e+02 1.390e+02 1.547e+02 3.717e+02, threshold=2.781e+02, percent-clipped=0.7
141
+ 2024-08-06 10:29:51,731 INFO [trainer.py:765] (2/8) Epoch 8, batch 1100, train_loss[loss=2.966, ArTop10Accuracy=0.7374, over 13548.00 frames. ], tot_loss[loss=2.881, ArTop10Accuracy=0.7551, over 11952.90 frames. ], batch size: 34, lr: 1.41e-02
142
+ 2024-08-06 10:31:05,946 INFO [trainer.py:765] (2/8) Epoch 8, batch 1200, train_loss[loss=2.996, ArTop10Accuracy=0.7329, over 12714.00 frames. ], tot_loss[loss=2.878, ArTop10Accuracy=0.7555, over 11867.75 frames. ], batch size: 103, lr: 1.40e-02
143
+ 2024-08-06 10:32:05,631 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
144
+ 2024-08-06 10:34:01,257 INFO [trainer.py:765] (2/8) Epoch 9, batch 100, train_loss[loss=2.938, ArTop10Accuracy=0.7441, over 14586.00 frames. ], tot_loss[loss=2.864, ArTop10Accuracy=0.7581, over 4765.68 frames. ], batch size: 62, lr: 1.32e-02
145
+ 2024-08-06 10:35:31,773 INFO [trainer.py:765] (2/8) Epoch 9, batch 200, train_loss[loss=2.9, ArTop10Accuracy=0.7475, over 13782.00 frames. ], tot_loss[loss=2.856, ArTop10Accuracy=0.7596, over 7752.54 frames. ], batch size: 34, lr: 1.32e-02
146
+ 2024-08-06 10:36:57,928 INFO [trainer.py:765] (2/8) Epoch 9, batch 300, train_loss[loss=2.8, ArTop10Accuracy=0.7712, over 14121.00 frames. ], tot_loss[loss=2.849, ArTop10Accuracy=0.7611, over 9397.10 frames. ], batch size: 44, lr: 1.31e-02
147
+ 2024-08-06 10:38:32,697 INFO [trainer.py:765] (2/8) Epoch 9, batch 400, train_loss[loss=2.738, ArTop10Accuracy=0.7828, over 10422.00 frames. ], tot_loss[loss=2.847, ArTop10Accuracy=0.7617, over 10292.77 frames. ], batch size: 14, lr: 1.31e-02
148
+ 2024-08-06 10:39:59,258 INFO [trainer.py:765] (2/8) Epoch 9, batch 500, train_loss[loss=2.854, ArTop10Accuracy=0.7616, over 12159.00 frames. ], tot_loss[loss=2.842, ArTop10Accuracy=0.7626, over 10855.51 frames. ], batch size: 22, lr: 1.30e-02
149
+ 2024-08-06 10:41:29,688 INFO [trainer.py:765] (2/8) Epoch 9, batch 600, train_loss[loss=2.898, ArTop10Accuracy=0.7517, over 11454.00 frames. ], tot_loss[loss=2.847, ArTop10Accuracy=0.7617, over 11360.35 frames. ], batch size: 18, lr: 1.30e-02
150
+ 2024-08-06 10:42:58,441 INFO [trainer.py:765] (2/8) Epoch 9, batch 700, train_loss[loss=2.905, ArTop10Accuracy=0.7533, over 10239.00 frames. ], tot_loss[loss=2.848, ArTop10Accuracy=0.7615, over 11519.40 frames. ], batch size: 12, lr: 1.29e-02
151
+ 2024-08-06 10:44:02,952 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.039e+02 1.253e+02 1.352e+02 1.493e+02 7.010e+02, threshold=2.704e+02, percent-clipped=0.6
152
+ 2024-08-06 10:44:19,670 INFO [trainer.py:765] (2/8) Epoch 9, batch 800, train_loss[loss=2.826, ArTop10Accuracy=0.765, over 10275.00 frames. ], tot_loss[loss=2.849, ArTop10Accuracy=0.7612, over 11632.47 frames. ], batch size: 12, lr: 1.29e-02
153
+ 2024-08-06 10:45:35,719 INFO [trainer.py:765] (2/8) Epoch 9, batch 900, train_loss[loss=2.826, ArTop10Accuracy=0.7678, over 12942.00 frames. ], tot_loss[loss=2.843, ArTop10Accuracy=0.7624, over 11680.07 frames. ], batch size: 27, lr: 1.28e-02
154
+ 2024-08-06 10:46:51,272 INFO [trainer.py:765] (2/8) Epoch 9, batch 1000, train_loss[loss=2.815, ArTop10Accuracy=0.7701, over 12792.00 frames. ], tot_loss[loss=2.849, ArTop10Accuracy=0.7614, over 11876.81 frames. ], batch size: 27, lr: 1.28e-02
155
+ 2024-08-06 10:48:06,247 INFO [trainer.py:765] (2/8) Epoch 9, batch 1100, train_loss[loss=2.844, ArTop10Accuracy=0.765, over 13515.00 frames. ], tot_loss[loss=2.854, ArTop10Accuracy=0.7605, over 11950.97 frames. ], batch size: 34, lr: 1.28e-02
156
+ 2024-08-06 10:49:21,053 INFO [trainer.py:765] (2/8) Epoch 9, batch 1200, train_loss[loss=2.933, ArTop10Accuracy=0.7475, over 12606.00 frames. ], tot_loss[loss=2.854, ArTop10Accuracy=0.7604, over 11852.02 frames. ], batch size: 101, lr: 1.27e-02
157
+ 2024-08-06 10:50:22,705 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
158
+ 2024-08-06 10:52:12,326 INFO [trainer.py:765] (2/8) Epoch 10, batch 100, train_loss[loss=2.848, ArTop10Accuracy=0.7638, over 14544.00 frames. ], tot_loss[loss=2.835, ArTop10Accuracy=0.7632, over 4772.41 frames. ], batch size: 62, lr: 1.20e-02
159
+ 2024-08-06 10:53:44,586 INFO [trainer.py:765] (2/8) Epoch 10, batch 200, train_loss[loss=2.894, ArTop10Accuracy=0.7551, over 13695.00 frames. ], tot_loss[loss=2.83, ArTop10Accuracy=0.7647, over 7750.75 frames. ], batch size: 34, lr: 1.20e-02
160
+ 2024-08-06 10:55:08,090 INFO [trainer.py:765] (2/8) Epoch 10, batch 300, train_loss[loss=2.908, ArTop10Accuracy=0.7534, over 14178.00 frames. ], tot_loss[loss=2.827, ArTop10Accuracy=0.7653, over 9377.29 frames. ], batch size: 44, lr: 1.19e-02
161
+ 2024-08-06 10:56:41,175 INFO [trainer.py:765] (2/8) Epoch 10, batch 400, train_loss[loss=2.698, ArTop10Accuracy=0.7918, over 10212.00 frames. ], tot_loss[loss=2.826, ArTop10Accuracy=0.7655, over 10290.51 frames. ], batch size: 14, lr: 1.19e-02
162
+ 2024-08-06 10:58:04,938 INFO [trainer.py:803] (2/8) Computing validation loss
163
+ 2024-08-06 10:58:14,557 INFO [trainer.py:811] (2/8) Epoch 10, validation: loss=2.842, ArTop10Accuracy=0.7624, over 1827537.00 frames.
164
+ 2024-08-06 10:58:14,558 INFO [trainer.py:814] (2/8) Maximum memory allocated so far is 33008MB
165
+ 2024-08-06 10:58:15,570 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.228e+02 1.320e+02 1.458e+02 6.096e+02, threshold=2.641e+02, percent-clipped=0.6
166
+ 2024-08-06 10:58:15,575 INFO [trainer.py:765] (2/8) Epoch 10, batch 500, train_loss[loss=2.761, ArTop10Accuracy=0.7821, over 12861.00 frames. ], tot_loss[loss=2.821, ArTop10Accuracy=0.7663, over 10857.40 frames. ], batch size: 23, lr: 1.19e-02
167
+ 2024-08-06 10:59:42,814 INFO [trainer.py:765] (2/8) Epoch 10, batch 600, train_loss[loss=2.725, ArTop10Accuracy=0.7842, over 11610.00 frames. ], tot_loss[loss=2.82, ArTop10Accuracy=0.7667, over 11367.33 frames. ], batch size: 18, lr: 1.18e-02
168
+ 2024-08-06 11:01:18,107 INFO [trainer.py:765] (2/8) Epoch 10, batch 700, train_loss[loss=2.806, ArTop10Accuracy=0.7697, over 9408.00 frames. ], tot_loss[loss=2.827, ArTop10Accuracy=0.7654, over 11509.69 frames. ], batch size: 11, lr: 1.18e-02
169
+ 2024-08-06 11:02:36,917 INFO [trainer.py:765] (2/8) Epoch 10, batch 800, train_loss[loss=2.718, ArTop10Accuracy=0.7886, over 9372.00 frames. ], tot_loss[loss=2.832, ArTop10Accuracy=0.7645, over 11622.04 frames. ], batch size: 11, lr: 1.17e-02
170
+ 2024-08-06 11:03:51,212 INFO [trainer.py:765] (2/8) Epoch 10, batch 900, train_loss[loss=2.789, ArTop10Accuracy=0.7745, over 12852.00 frames. ], tot_loss[loss=2.825, ArTop10Accuracy=0.7658, over 11675.50 frames. ], batch size: 27, lr: 1.17e-02
171
+ 2024-08-06 11:05:06,352 INFO [trainer.py:765] (2/8) Epoch 10, batch 1000, train_loss[loss=2.797, ArTop10Accuracy=0.7696, over 13302.00 frames. ], tot_loss[loss=2.829, ArTop10Accuracy=0.7649, over 11874.18 frames. ], batch size: 28, lr: 1.17e-02
172
+ 2024-08-06 11:06:21,720 INFO [trainer.py:765] (2/8) Epoch 10, batch 1100, train_loss[loss=2.756, ArTop10Accuracy=0.7815, over 13491.00 frames. ], tot_loss[loss=2.832, ArTop10Accuracy=0.7642, over 11941.88 frames. ], batch size: 34, lr: 1.16e-02
173
+ 2024-08-06 11:07:34,772 INFO [trainer.py:765] (2/8) Epoch 10, batch 1200, train_loss[loss=2.953, ArTop10Accuracy=0.7409, over 12528.00 frames. ], tot_loss[loss=2.837, ArTop10Accuracy=0.7635, over 11862.54 frames. ], batch size: 101, lr: 1.16e-02
174
+ 2024-08-06 11:08:33,717 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
175
+ 2024-08-06 11:10:29,954 INFO [trainer.py:765] (2/8) Epoch 11, batch 100, train_loss[loss=2.905, ArTop10Accuracy=0.7495, over 14376.00 frames. ], tot_loss[loss=2.823, ArTop10Accuracy=0.766, over 4753.26 frames. ], batch size: 62, lr: 1.10e-02
176
+ 2024-08-06 11:12:04,673 INFO [trainer.py:765] (2/8) Epoch 11, batch 200, train_loss[loss=2.831, ArTop10Accuracy=0.7636, over 13989.00 frames. ], tot_loss[loss=2.814, ArTop10Accuracy=0.7677, over 7754.04 frames. ], batch size: 35, lr: 1.10e-02
177
+ 2024-08-06 11:12:22,825 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 9.884e+01 1.240e+02 1.333e+02 1.457e+02 6.939e+02, threshold=2.667e+02, percent-clipped=0.1
178
+ 2024-08-06 11:13:31,545 INFO [trainer.py:765] (2/8) Epoch 11, batch 300, train_loss[loss=2.864, ArTop10Accuracy=0.7564, over 14175.00 frames. ], tot_loss[loss=2.807, ArTop10Accuracy=0.7691, over 9363.48 frames. ], batch size: 44, lr: 1.09e-02
179
+ 2024-08-06 11:15:03,269 INFO [trainer.py:765] (2/8) Epoch 11, batch 400, train_loss[loss=2.861, ArTop10Accuracy=0.7604, over 10101.00 frames. ], tot_loss[loss=2.804, ArTop10Accuracy=0.7696, over 10275.08 frames. ], batch size: 14, lr: 1.09e-02
180
+ 2024-08-06 11:16:29,637 INFO [trainer.py:765] (2/8) Epoch 11, batch 500, train_loss[loss=2.773, ArTop10Accuracy=0.7726, over 12708.00 frames. ], tot_loss[loss=2.805, ArTop10Accuracy=0.7693, over 10826.48 frames. ], batch size: 23, lr: 1.09e-02
181
+ 2024-08-06 11:18:00,517 INFO [trainer.py:765] (2/8) Epoch 11, batch 600, train_loss[loss=2.764, ArTop10Accuracy=0.7791, over 11493.00 frames. ], tot_loss[loss=2.806, ArTop10Accuracy=0.7693, over 11350.63 frames. ], batch size: 18, lr: 1.08e-02
182
+ 2024-08-06 11:19:34,511 INFO [trainer.py:765] (2/8) Epoch 11, batch 700, train_loss[loss=2.662, ArTop10Accuracy=0.7939, over 9405.00 frames. ], tot_loss[loss=2.809, ArTop10Accuracy=0.7687, over 11500.78 frames. ], batch size: 11, lr: 1.08e-02
183
+ 2024-08-06 11:20:55,480 INFO [trainer.py:765] (2/8) Epoch 11, batch 800, train_loss[loss=2.791, ArTop10Accuracy=0.7738, over 10173.00 frames. ], tot_loss[loss=2.812, ArTop10Accuracy=0.7684, over 11644.67 frames. ], batch size: 12, lr: 1.07e-02
184
+ 2024-08-06 11:22:13,705 INFO [trainer.py:765] (2/8) Epoch 11, batch 900, train_loss[loss=2.791, ArTop10Accuracy=0.7762, over 13308.00 frames. ], tot_loss[loss=2.81, ArTop10Accuracy=0.7689, over 11696.54 frames. ], batch size: 28, lr: 1.07e-02
185
+ 2024-08-06 11:23:31,799 INFO [trainer.py:765] (2/8) Epoch 11, batch 1000, train_loss[loss=2.792, ArTop10Accuracy=0.766, over 12885.00 frames. ], tot_loss[loss=2.812, ArTop10Accuracy=0.7682, over 11886.02 frames. ], batch size: 27, lr: 1.07e-02
186
+ 2024-08-06 11:24:46,902 INFO [trainer.py:765] (2/8) Epoch 11, batch 1100, train_loss[loss=2.821, ArTop10Accuracy=0.7629, over 13929.00 frames. ], tot_loss[loss=2.821, ArTop10Accuracy=0.7664, over 11966.48 frames. ], batch size: 35, lr: 1.06e-02
187
+ 2024-08-06 11:26:00,733 INFO [trainer.py:765] (2/8) Epoch 11, batch 1200, train_loss[loss=2.978, ArTop10Accuracy=0.7354, over 11682.00 frames. ], tot_loss[loss=2.819, ArTop10Accuracy=0.7667, over 11862.92 frames. ], batch size: 101, lr: 1.06e-02
188
+ 2024-08-06 11:26:15,847 INFO [trainer.py:803] (2/8) Computing validation loss
189
+ 2024-08-06 11:26:25,556 INFO [trainer.py:811] (2/8) Epoch 11, validation: loss=2.831, ArTop10Accuracy=0.7643, over 1827537.00 frames.
190
+ 2024-08-06 11:26:25,557 INFO [trainer.py:814] (2/8) Maximum memory allocated so far is 33008MB
191
+ 2024-08-06 11:26:26,185 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.251e+02 1.335e+02 1.441e+02 2.942e+02, threshold=2.669e+02, percent-clipped=0.1
192
+ 2024-08-06 11:27:09,617 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
193
+ 2024-08-06 11:29:03,450 INFO [trainer.py:765] (2/8) Epoch 12, batch 100, train_loss[loss=2.878, ArTop10Accuracy=0.755, over 14742.00 frames. ], tot_loss[loss=2.809, ArTop10Accuracy=0.7682, over 4789.07 frames. ], batch size: 62, lr: 1.01e-02
194
+ 2024-08-06 11:30:30,672 INFO [trainer.py:765] (2/8) Epoch 12, batch 200, train_loss[loss=2.788, ArTop10Accuracy=0.7703, over 13956.00 frames. ], tot_loss[loss=2.802, ArTop10Accuracy=0.7696, over 7759.42 frames. ], batch size: 34, lr: 1.01e-02
195
+ 2024-08-06 11:31:57,655 INFO [trainer.py:765] (2/8) Epoch 12, batch 300, train_loss[loss=2.867, ArTop10Accuracy=0.7579, over 14355.00 frames. ], tot_loss[loss=2.799, ArTop10Accuracy=0.7703, over 9398.69 frames. ], batch size: 45, lr: 1.01e-02
196
+ 2024-08-06 11:33:30,738 INFO [trainer.py:765] (2/8) Epoch 12, batch 400, train_loss[loss=2.621, ArTop10Accuracy=0.8039, over 10920.00 frames. ], tot_loss[loss=2.795, ArTop10Accuracy=0.7713, over 10284.91 frames. ], batch size: 15, lr: 1.00e-02
197
+ 2024-08-06 11:34:55,731 INFO [trainer.py:765] (2/8) Epoch 12, batch 500, train_loss[loss=2.757, ArTop10Accuracy=0.7775, over 12078.00 frames. ], tot_loss[loss=2.789, ArTop10Accuracy=0.7725, over 10826.63 frames. ], batch size: 22, lr: 1.00e-02
198
+ 2024-08-06 11:36:29,362 INFO [trainer.py:765] (2/8) Epoch 12, batch 600, train_loss[loss=2.762, ArTop10Accuracy=0.7754, over 11370.00 frames. ], tot_loss[loss=2.79, ArTop10Accuracy=0.7722, over 11358.90 frames. ], batch size: 18, lr: 9.97e-03
199
+ 2024-08-06 11:38:00,343 INFO [trainer.py:765] (2/8) Epoch 12, batch 700, train_loss[loss=2.66, ArTop10Accuracy=0.7981, over 10167.00 frames. ], tot_loss[loss=2.794, ArTop10Accuracy=0.7716, over 11498.68 frames. ], batch size: 12, lr: 9.93e-03
200
+ 2024-08-06 11:39:23,610 INFO [trainer.py:765] (2/8) Epoch 12, batch 800, train_loss[loss=2.768, ArTop10Accuracy=0.7754, over 10056.00 frames. ], tot_loss[loss=2.796, ArTop10Accuracy=0.7714, over 11621.85 frames. ], batch size: 12, lr: 9.90e-03
201
+ 2024-08-06 11:40:39,889 INFO [trainer.py:765] (2/8) Epoch 12, batch 900, train_loss[loss=2.752, ArTop10Accuracy=0.7747, over 13314.00 frames. ], tot_loss[loss=2.794, ArTop10Accuracy=0.7718, over 11684.35 frames. ], batch size: 28, lr: 9.87e-03
202
+ 2024-08-06 11:41:13,993 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.248e+02 1.348e+02 1.459e+02 5.540e+02, threshold=2.695e+02, percent-clipped=0.3
203
+ 2024-08-06 11:41:56,189 INFO [trainer.py:765] (2/8) Epoch 12, batch 1000, train_loss[loss=2.823, ArTop10Accuracy=0.7637, over 13404.00 frames. ], tot_loss[loss=2.799, ArTop10Accuracy=0.7708, over 11891.06 frames. ], batch size: 28, lr: 9.85e-03
204
+ 2024-08-06 11:43:14,320 INFO [trainer.py:765] (2/8) Epoch 12, batch 1100, train_loss[loss=2.808, ArTop10Accuracy=0.7702, over 13656.00 frames. ], tot_loss[loss=2.805, ArTop10Accuracy=0.7695, over 11941.91 frames. ], batch size: 34, lr: 9.82e-03
205
+ 2024-08-06 11:44:26,155 INFO [trainer.py:765] (2/8) Epoch 12, batch 1200, train_loss[loss=2.945, ArTop10Accuracy=0.7388, over 12102.00 frames. ], tot_loss[loss=2.804, ArTop10Accuracy=0.7694, over 11857.94 frames. ], batch size: 101, lr: 9.79e-03
206
+ 2024-08-06 11:45:26,863 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
207
+ 2024-08-06 11:47:26,600 INFO [trainer.py:765] (2/8) Epoch 13, batch 100, train_loss[loss=2.827, ArTop10Accuracy=0.7635, over 14637.00 frames. ], tot_loss[loss=2.793, ArTop10Accuracy=0.7713, over 4744.99 frames. ], batch size: 62, lr: 9.37e-03
208
+ 2024-08-06 11:48:54,778 INFO [trainer.py:765] (2/8) Epoch 13, batch 200, train_loss[loss=2.797, ArTop10Accuracy=0.7655, over 13446.00 frames. ], tot_loss[loss=2.785, ArTop10Accuracy=0.7728, over 7731.58 frames. ], batch size: 34, lr: 9.34e-03
209
+ 2024-08-06 11:50:20,515 INFO [trainer.py:765] (2/8) Epoch 13, batch 300, train_loss[loss=2.8, ArTop10Accuracy=0.7724, over 14124.00 frames. ], tot_loss[loss=2.782, ArTop10Accuracy=0.7733, over 9354.51 frames. ], batch size: 44, lr: 9.31e-03
210
+ 2024-08-06 11:51:48,764 INFO [trainer.py:765] (2/8) Epoch 13, batch 400, train_loss[loss=2.705, ArTop10Accuracy=0.7874, over 10371.00 frames. ], tot_loss[loss=2.777, ArTop10Accuracy=0.7745, over 10278.47 frames. ], batch size: 14, lr: 9.28e-03
211
+ 2024-08-06 11:53:13,405 INFO [trainer.py:765] (2/8) Epoch 13, batch 500, train_loss[loss=2.679, ArTop10Accuracy=0.7927, over 12141.00 frames. ], tot_loss[loss=2.772, ArTop10Accuracy=0.7756, over 10843.83 frames. ], batch size: 22, lr: 9.26e-03
212
+ 2024-08-06 11:54:52,223 INFO [trainer.py:765] (2/8) Epoch 13, batch 600, train_loss[loss=2.73, ArTop10Accuracy=0.7851, over 11412.00 frames. ], tot_loss[loss=2.774, ArTop10Accuracy=0.7752, over 11366.55 frames. ], batch size: 18, lr: 9.23e-03
213
+ 2024-08-06 11:55:47,081 INFO [trainer.py:803] (2/8) Computing validation loss
214
+ 2024-08-06 11:55:56,835 INFO [trainer.py:811] (2/8) Epoch 13, validation: loss=2.824, ArTop10Accuracy=0.7662, over 1827537.00 frames.
215
+ 2024-08-06 11:55:56,835 INFO [trainer.py:814] (2/8) Maximum memory allocated so far is 33008MB
216
+ 2024-08-06 11:55:57,711 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.255e+02 1.343e+02 1.452e+02 4.888e+02, threshold=2.687e+02, percent-clipped=0.1
217
+ 2024-08-06 11:56:28,465 INFO [trainer.py:765] (2/8) Epoch 13, batch 700, train_loss[loss=2.789, ArTop10Accuracy=0.7731, over 10020.00 frames. ], tot_loss[loss=2.78, ArTop10Accuracy=0.7741, over 11510.09 frames. ], batch size: 12, lr: 9.20e-03
218
+ 2024-08-06 11:57:46,683 INFO [trainer.py:765] (2/8) Epoch 13, batch 800, train_loss[loss=2.646, ArTop10Accuracy=0.8021, over 10278.00 frames. ], tot_loss[loss=2.783, ArTop10Accuracy=0.7736, over 11638.26 frames. ], batch size: 12, lr: 9.18e-03
219
+ 2024-08-06 11:59:03,284 INFO [trainer.py:765] (2/8) Epoch 13, batch 900, train_loss[loss=2.749, ArTop10Accuracy=0.7827, over 12804.00 frames. ], tot_loss[loss=2.778, ArTop10Accuracy=0.7745, over 11690.54 frames. ], batch size: 27, lr: 9.15e-03
220
+ 2024-08-06 12:00:19,175 INFO [trainer.py:765] (2/8) Epoch 13, batch 1000, train_loss[loss=2.778, ArTop10Accuracy=0.7769, over 12894.00 frames. ], tot_loss[loss=2.785, ArTop10Accuracy=0.7733, over 11902.64 frames. ], batch size: 27, lr: 9.13e-03
221
+ 2024-08-06 12:01:34,879 INFO [trainer.py:765] (2/8) Epoch 13, batch 1100, train_loss[loss=2.878, ArTop10Accuracy=0.7554, over 13428.00 frames. ], tot_loss[loss=2.794, ArTop10Accuracy=0.7717, over 11968.13 frames. ], batch size: 34, lr: 9.10e-03
222
+ 2024-08-06 12:02:48,662 INFO [trainer.py:765] (2/8) Epoch 13, batch 1200, train_loss[loss=2.927, ArTop10Accuracy=0.7453, over 12432.00 frames. ], tot_loss[loss=2.795, ArTop10Accuracy=0.7716, over 11888.02 frames. ], batch size: 101, lr: 9.08e-03
223
+ 2024-08-06 12:03:48,159 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
224
+ 2024-08-06 12:05:45,332 INFO [trainer.py:765] (2/8) Epoch 14, batch 100, train_loss[loss=2.837, ArTop10Accuracy=0.7623, over 14376.00 frames. ], tot_loss[loss=2.777, ArTop10Accuracy=0.7738, over 4767.97 frames. ], batch size: 62, lr: 8.71e-03
225
+ 2024-08-06 12:07:16,602 INFO [trainer.py:765] (2/8) Epoch 14, batch 200, train_loss[loss=2.785, ArTop10Accuracy=0.7733, over 13857.00 frames. ], tot_loss[loss=2.769, ArTop10Accuracy=0.7755, over 7753.86 frames. ], batch size: 34, lr: 8.69e-03
226
+ 2024-08-06 12:08:44,311 INFO [trainer.py:765] (2/8) Epoch 14, batch 300, train_loss[loss=2.779, ArTop10Accuracy=0.772, over 14271.00 frames. ], tot_loss[loss=2.764, ArTop10Accuracy=0.7764, over 9384.04 frames. ], batch size: 44, lr: 8.66e-03
227
+ 2024-08-06 12:10:01,129 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.266e+02 1.374e+02 1.483e+02 6.480e+02, threshold=2.748e+02, percent-clipped=0.2
228
+ 2024-08-06 12:10:10,227 INFO [trainer.py:765] (2/8) Epoch 14, batch 400, train_loss[loss=2.689, ArTop10Accuracy=0.7941, over 11049.00 frames. ], tot_loss[loss=2.764, ArTop10Accuracy=0.7769, over 10302.72 frames. ], batch size: 15, lr: 8.64e-03
229
+ 2024-08-06 12:11:36,151 INFO [trainer.py:765] (2/8) Epoch 14, batch 500, train_loss[loss=2.766, ArTop10Accuracy=0.7788, over 12171.00 frames. ], tot_loss[loss=2.759, ArTop10Accuracy=0.778, over 10839.36 frames. ], batch size: 22, lr: 8.62e-03
230
+ 2024-08-06 12:13:05,995 INFO [trainer.py:765] (2/8) Epoch 14, batch 600, train_loss[loss=2.759, ArTop10Accuracy=0.7802, over 11607.00 frames. ], tot_loss[loss=2.764, ArTop10Accuracy=0.7772, over 11352.46 frames. ], batch size: 18, lr: 8.59e-03
231
+ 2024-08-06 12:14:38,554 INFO [trainer.py:765] (2/8) Epoch 14, batch 700, train_loss[loss=2.653, ArTop10Accuracy=0.7979, over 10074.00 frames. ], tot_loss[loss=2.767, ArTop10Accuracy=0.7766, over 11499.27 frames. ], batch size: 12, lr: 8.57e-03
232
+ 2024-08-06 12:15:58,072 INFO [trainer.py:765] (2/8) Epoch 14, batch 800, train_loss[loss=2.697, ArTop10Accuracy=0.7895, over 9354.00 frames. ], tot_loss[loss=2.771, ArTop10Accuracy=0.7758, over 11630.13 frames. ], batch size: 11, lr: 8.55e-03
233
+ 2024-08-06 12:17:12,866 INFO [trainer.py:765] (2/8) Epoch 14, batch 900, train_loss[loss=2.692, ArTop10Accuracy=0.7951, over 12969.00 frames. ], tot_loss[loss=2.767, ArTop10Accuracy=0.7764, over 11672.33 frames. ], batch size: 27, lr: 8.52e-03
234
+ 2024-08-06 12:18:29,612 INFO [trainer.py:765] (2/8) Epoch 14, batch 1000, train_loss[loss=2.764, ArTop10Accuracy=0.7829, over 12813.00 frames. ], tot_loss[loss=2.772, ArTop10Accuracy=0.7754, over 11880.22 frames. ], batch size: 27, lr: 8.50e-03
235
+ 2024-08-06 12:19:45,377 INFO [trainer.py:765] (2/8) Epoch 14, batch 1100, train_loss[loss=2.763, ArTop10Accuracy=0.7783, over 13455.00 frames. ], tot_loss[loss=2.777, ArTop10Accuracy=0.7746, over 11943.84 frames. ], batch size: 34, lr: 8.48e-03
236
+ 2024-08-06 12:20:59,279 INFO [trainer.py:765] (2/8) Epoch 14, batch 1200, train_loss[loss=2.902, ArTop10Accuracy=0.7476, over 11772.00 frames. ], tot_loss[loss=2.78, ArTop10Accuracy=0.774, over 11847.85 frames. ], batch size: 101, lr: 8.46e-03
237
+ 2024-08-06 12:21:58,162 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
238
+ 2024-08-06 12:23:51,961 INFO [trainer.py:765] (2/8) Epoch 15, batch 100, train_loss[loss=2.847, ArTop10Accuracy=0.7598, over 14640.00 frames. ], tot_loss[loss=2.77, ArTop10Accuracy=0.7752, over 4775.25 frames. ], batch size: 62, lr: 8.14e-03
239
+ 2024-08-06 12:24:00,598 INFO [trainer.py:803] (2/8) Computing validation loss
240
+ 2024-08-06 12:24:10,290 INFO [trainer.py:811] (2/8) Epoch 15, validation: loss=2.819, ArTop10Accuracy=0.7675, over 1827537.00 frames.
241
+ 2024-08-06 12:24:10,291 INFO [trainer.py:814] (2/8) Maximum memory allocated so far is 33008MB
242
+ 2024-08-06 12:24:11,094 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.284e+02 1.371e+02 1.488e+02 4.667e+02, threshold=2.743e+02, percent-clipped=0.2
243
+ 2024-08-06 12:25:29,986 INFO [trainer.py:765] (2/8) Epoch 15, batch 200, train_loss[loss=2.764, ArTop10Accuracy=0.7793, over 13491.00 frames. ], tot_loss[loss=2.755, ArTop10Accuracy=0.7786, over 7767.90 frames. ], batch size: 34, lr: 8.12e-03
244
+ 2024-08-06 12:26:58,695 INFO [trainer.py:765] (2/8) Epoch 15, batch 300, train_loss[loss=2.755, ArTop10Accuracy=0.7769, over 14298.00 frames. ], tot_loss[loss=2.751, ArTop10Accuracy=0.7791, over 9413.88 frames. ], batch size: 44, lr: 8.09e-03
245
+ 2024-08-06 12:28:28,533 INFO [trainer.py:765] (2/8) Epoch 15, batch 400, train_loss[loss=2.735, ArTop10Accuracy=0.7841, over 10920.00 frames. ], tot_loss[loss=2.75, ArTop10Accuracy=0.7793, over 10302.96 frames. ], batch size: 15, lr: 8.07e-03
246
+ 2024-08-06 12:29:54,031 INFO [trainer.py:765] (2/8) Epoch 15, batch 500, train_loss[loss=2.659, ArTop10Accuracy=0.7957, over 12117.00 frames. ], tot_loss[loss=2.746, ArTop10Accuracy=0.7804, over 10858.01 frames. ], batch size: 22, lr: 8.05e-03
247
+ 2024-08-06 12:31:23,293 INFO [trainer.py:765] (2/8) Epoch 15, batch 600, train_loss[loss=2.7, ArTop10Accuracy=0.7888, over 11391.00 frames. ], tot_loss[loss=2.75, ArTop10Accuracy=0.7795, over 11375.98 frames. ], batch size: 18, lr: 8.03e-03
248
+ 2024-08-06 12:32:53,174 INFO [trainer.py:765] (2/8) Epoch 15, batch 700, train_loss[loss=2.552, ArTop10Accuracy=0.8162, over 9945.00 frames. ], tot_loss[loss=2.755, ArTop10Accuracy=0.7785, over 11508.10 frames. ], batch size: 12, lr: 8.01e-03
249
+ 2024-08-06 12:34:18,254 INFO [trainer.py:765] (2/8) Epoch 15, batch 800, train_loss[loss=2.783, ArTop10Accuracy=0.773, over 10314.00 frames. ], tot_loss[loss=2.758, ArTop10Accuracy=0.7781, over 11629.59 frames. ], batch size: 12, lr: 7.99e-03
250
+ 2024-08-06 12:35:34,728 INFO [trainer.py:765] (2/8) Epoch 15, batch 900, train_loss[loss=2.818, ArTop10Accuracy=0.7685, over 12936.00 frames. ], tot_loss[loss=2.752, ArTop10Accuracy=0.7794, over 11669.12 frames. ], batch size: 27, lr: 7.97e-03
251
+ 2024-08-06 12:36:50,541 INFO [trainer.py:765] (2/8) Epoch 15, batch 1000, train_loss[loss=2.774, ArTop10Accuracy=0.7705, over 12861.00 frames. ], tot_loss[loss=2.758, ArTop10Accuracy=0.7781, over 11870.57 frames. ], batch size: 27, lr: 7.95e-03
252
+ 2024-08-06 12:38:05,177 INFO [trainer.py:765] (2/8) Epoch 15, batch 1100, train_loss[loss=2.838, ArTop10Accuracy=0.763, over 13584.00 frames. ], tot_loss[loss=2.765, ArTop10Accuracy=0.7767, over 11923.60 frames. ], batch size: 34, lr: 7.93e-03
253
+ 2024-08-06 12:38:12,841 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.293e+02 1.379e+02 1.467e+02 2.824e+02, threshold=2.759e+02, percent-clipped=0.1
254
+ 2024-08-06 12:39:18,789 INFO [trainer.py:765] (2/8) Epoch 15, batch 1200, train_loss[loss=2.862, ArTop10Accuracy=0.7544, over 12441.00 frames. ], tot_loss[loss=2.764, ArTop10Accuracy=0.777, over 11837.70 frames. ], batch size: 101, lr: 7.91e-03
255
+ 2024-08-06 12:40:18,530 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
256
+ 2024-08-06 12:42:17,617 INFO [trainer.py:765] (2/8) Epoch 16, batch 100, train_loss[loss=2.803, ArTop10Accuracy=0.7697, over 14316.00 frames. ], tot_loss[loss=2.754, ArTop10Accuracy=0.7784, over 4759.65 frames. ], batch size: 63, lr: 7.63e-03
257
+ 2024-08-06 12:43:49,563 INFO [trainer.py:765] (2/8) Epoch 16, batch 200, train_loss[loss=2.635, ArTop10Accuracy=0.799, over 13557.00 frames. ], tot_loss[loss=2.748, ArTop10Accuracy=0.7797, over 7747.42 frames. ], batch size: 34, lr: 7.61e-03
258
+ 2024-08-06 12:45:18,502 INFO [trainer.py:765] (2/8) Epoch 16, batch 300, train_loss[loss=2.834, ArTop10Accuracy=0.7625, over 13779.00 frames. ], tot_loss[loss=2.741, ArTop10Accuracy=0.781, over 9373.27 frames. ], batch size: 44, lr: 7.59e-03
259
+ 2024-08-06 12:46:45,207 INFO [trainer.py:765] (2/8) Epoch 16, batch 400, train_loss[loss=2.708, ArTop10Accuracy=0.7908, over 10155.00 frames. ], tot_loss[loss=2.735, ArTop10Accuracy=0.7825, over 10260.65 frames. ], batch size: 14, lr: 7.58e-03
260
+ 2024-08-06 12:48:16,310 INFO [trainer.py:765] (2/8) Epoch 16, batch 500, train_loss[loss=2.684, ArTop10Accuracy=0.7932, over 12183.00 frames. ], tot_loss[loss=2.731, ArTop10Accuracy=0.7832, over 10821.97 frames. ], batch size: 22, lr: 7.56e-03
261
+ 2024-08-06 12:49:46,642 INFO [trainer.py:765] (2/8) Epoch 16, batch 600, train_loss[loss=2.672, ArTop10Accuracy=0.7957, over 11847.00 frames. ], tot_loss[loss=2.734, ArTop10Accuracy=0.7826, over 11354.91 frames. ], batch size: 19, lr: 7.54e-03
262
+ 2024-08-06 12:51:23,681 INFO [trainer.py:765] (2/8) Epoch 16, batch 700, train_loss[loss=2.805, ArTop10Accuracy=0.7707, over 9318.00 frames. ], tot_loss[loss=2.744, ArTop10Accuracy=0.7808, over 11503.01 frames. ], batch size: 11, lr: 7.52e-03
263
+ 2024-08-06 12:52:43,501 INFO [trainer.py:765] (2/8) Epoch 16, batch 800, train_loss[loss=2.669, ArTop10Accuracy=0.793, over 10242.00 frames. ], tot_loss[loss=2.752, ArTop10Accuracy=0.7792, over 11612.27 frames. ], batch size: 12, lr: 7.51e-03
264
+ 2024-08-06 12:53:06,014 INFO [trainer.py:803] (2/8) Computing validation loss
265
+ 2024-08-06 12:53:15,496 INFO [trainer.py:811] (2/8) Epoch 16, validation: loss=2.816, ArTop10Accuracy=0.7678, over 1827537.00 frames.
266
+ 2024-08-06 12:53:15,497 INFO [trainer.py:814] (2/8) Maximum memory allocated so far is 33008MB
267
+ 2024-08-06 12:53:16,186 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.291e+02 1.391e+02 1.487e+02 3.459e+02, threshold=2.783e+02, percent-clipped=0.1
268
+ 2024-08-06 12:54:06,481 INFO [trainer.py:765] (2/8) Epoch 16, batch 900, train_loss[loss=2.798, ArTop10Accuracy=0.7704, over 12786.00 frames. ], tot_loss[loss=2.747, ArTop10Accuracy=0.78, over 11657.17 frames. ], batch size: 27, lr: 7.49e-03
269
+ 2024-08-06 12:55:19,792 INFO [trainer.py:765] (2/8) Epoch 16, batch 1000, train_loss[loss=2.778, ArTop10Accuracy=0.7748, over 12894.00 frames. ], tot_loss[loss=2.751, ArTop10Accuracy=0.7795, over 11860.56 frames. ], batch size: 27, lr: 7.47e-03
270
+ 2024-08-06 12:56:33,162 INFO [trainer.py:765] (2/8) Epoch 16, batch 1100, train_loss[loss=2.797, ArTop10Accuracy=0.7688, over 13869.00 frames. ], tot_loss[loss=2.758, ArTop10Accuracy=0.7783, over 11950.02 frames. ], batch size: 35, lr: 7.45e-03
271
+ 2024-08-06 12:57:48,485 INFO [trainer.py:765] (2/8) Epoch 16, batch 1200, train_loss[loss=2.856, ArTop10Accuracy=0.7589, over 12633.00 frames. ], tot_loss[loss=2.757, ArTop10Accuracy=0.7783, over 11848.50 frames. ], batch size: 103, lr: 7.44e-03
272
+ 2024-08-06 12:58:48,362 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
273
+ 2024-08-06 13:00:47,898 INFO [trainer.py:765] (2/8) Epoch 17, batch 100, train_loss[loss=2.763, ArTop10Accuracy=0.7751, over 14601.00 frames. ], tot_loss[loss=2.736, ArTop10Accuracy=0.7817, over 4750.76 frames. ], batch size: 63, lr: 7.18e-03
274
+ 2024-08-06 13:02:19,302 INFO [trainer.py:765] (2/8) Epoch 17, batch 200, train_loss[loss=2.645, ArTop10Accuracy=0.7961, over 13347.00 frames. ], tot_loss[loss=2.732, ArTop10Accuracy=0.7829, over 7753.52 frames. ], batch size: 34, lr: 7.17e-03
275
+ 2024-08-06 13:03:45,517 INFO [trainer.py:765] (2/8) Epoch 17, batch 300, train_loss[loss=2.766, ArTop10Accuracy=0.7765, over 14475.00 frames. ], tot_loss[loss=2.73, ArTop10Accuracy=0.7835, over 9368.92 frames. ], batch size: 45, lr: 7.15e-03
276
+ 2024-08-06 13:05:21,761 INFO [trainer.py:765] (2/8) Epoch 17, batch 400, train_loss[loss=2.698, ArTop10Accuracy=0.7876, over 10851.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.7838, over 10285.68 frames. ], batch size: 15, lr: 7.14e-03
277
+ 2024-08-06 13:06:47,021 INFO [trainer.py:765] (2/8) Epoch 17, batch 500, train_loss[loss=2.613, ArTop10Accuracy=0.8077, over 12156.00 frames. ], tot_loss[loss=2.72, ArTop10Accuracy=0.7855, over 10849.34 frames. ], batch size: 22, lr: 7.12e-03
278
+ 2024-08-06 13:07:39,876 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.293e+02 1.386e+02 1.488e+02 3.253e+02, threshold=2.772e+02, percent-clipped=0.1
279
+ 2024-08-06 13:08:22,688 INFO [trainer.py:765] (2/8) Epoch 17, batch 600, train_loss[loss=2.731, ArTop10Accuracy=0.783, over 11442.00 frames. ], tot_loss[loss=2.726, ArTop10Accuracy=0.7841, over 11370.60 frames. ], batch size: 18, lr: 7.10e-03
280
+ 2024-08-06 13:09:54,836 INFO [trainer.py:765] (2/8) Epoch 17, batch 700, train_loss[loss=2.633, ArTop10Accuracy=0.807, over 10050.00 frames. ], tot_loss[loss=2.732, ArTop10Accuracy=0.7831, over 11529.16 frames. ], batch size: 12, lr: 7.09e-03
281
+ 2024-08-06 13:11:19,480 INFO [trainer.py:765] (2/8) Epoch 17, batch 800, train_loss[loss=2.641, ArTop10Accuracy=0.8067, over 9264.00 frames. ], tot_loss[loss=2.733, ArTop10Accuracy=0.7829, over 11639.94 frames. ], batch size: 11, lr: 7.07e-03
282
+ 2024-08-06 13:12:35,670 INFO [trainer.py:765] (2/8) Epoch 17, batch 900, train_loss[loss=2.787, ArTop10Accuracy=0.7766, over 13002.00 frames. ], tot_loss[loss=2.73, ArTop10Accuracy=0.7836, over 11693.45 frames. ], batch size: 27, lr: 7.06e-03
283
+ 2024-08-06 13:13:53,061 INFO [trainer.py:765] (2/8) Epoch 17, batch 1000, train_loss[loss=2.69, ArTop10Accuracy=0.7868, over 12726.00 frames. ], tot_loss[loss=2.739, ArTop10Accuracy=0.7817, over 11891.49 frames. ], batch size: 27, lr: 7.04e-03
284
+ 2024-08-06 13:15:08,484 INFO [trainer.py:765] (2/8) Epoch 17, batch 1100, train_loss[loss=2.742, ArTop10Accuracy=0.7813, over 13857.00 frames. ], tot_loss[loss=2.744, ArTop10Accuracy=0.7807, over 11966.62 frames. ], batch size: 34, lr: 7.02e-03
285
+ 2024-08-06 13:16:22,388 INFO [trainer.py:765] (2/8) Epoch 17, batch 1200, train_loss[loss=2.858, ArTop10Accuracy=0.7581, over 12147.00 frames. ], tot_loss[loss=2.747, ArTop10Accuracy=0.7801, over 11878.52 frames. ], batch size: 103, lr: 7.01e-03
286
+ 2024-08-06 13:17:21,657 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
287
+ 2024-08-06 13:19:15,994 INFO [trainer.py:765] (2/8) Epoch 18, batch 100, train_loss[loss=2.798, ArTop10Accuracy=0.7686, over 14166.00 frames. ], tot_loss[loss=2.727, ArTop10Accuracy=0.7833, over 4761.06 frames. ], batch size: 62, lr: 6.78e-03
288
+ 2024-08-06 13:20:46,598 INFO [trainer.py:765] (2/8) Epoch 18, batch 200, train_loss[loss=2.716, ArTop10Accuracy=0.7841, over 13710.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.7835, over 7760.09 frames. ], batch size: 34, lr: 6.77e-03
289
+ 2024-08-06 13:21:55,105 INFO [trainer.py:803] (2/8) Computing validation loss
290
+ 2024-08-06 13:22:04,751 INFO [trainer.py:811] (2/8) Epoch 18, validation: loss=2.817, ArTop10Accuracy=0.768, over 1827537.00 frames.
291
+ 2024-08-06 13:22:04,752 INFO [trainer.py:814] (2/8) Maximum memory allocated so far is 33008MB
292
+ 2024-08-06 13:22:05,473 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.323e+02 1.409e+02 1.514e+02 3.209e+02, threshold=2.818e+02, percent-clipped=0.1
293
+ 2024-08-06 13:22:26,581 INFO [trainer.py:765] (2/8) Epoch 18, batch 300, train_loss[loss=2.701, ArTop10Accuracy=0.7888, over 14001.00 frames. ], tot_loss[loss=2.721, ArTop10Accuracy=0.7847, over 9379.73 frames. ], batch size: 44, lr: 6.76e-03
294
+ 2024-08-06 13:23:57,930 INFO [trainer.py:765] (2/8) Epoch 18, batch 400, train_loss[loss=2.628, ArTop10Accuracy=0.8027, over 10353.00 frames. ], tot_loss[loss=2.72, ArTop10Accuracy=0.785, over 10253.95 frames. ], batch size: 14, lr: 6.74e-03
295
+ 2024-08-06 13:25:34,014 INFO [trainer.py:765] (2/8) Epoch 18, batch 500, train_loss[loss=2.724, ArTop10Accuracy=0.7832, over 12105.00 frames. ], tot_loss[loss=2.715, ArTop10Accuracy=0.7862, over 10832.76 frames. ], batch size: 22, lr: 6.73e-03
296
+ 2024-08-06 13:27:00,634 INFO [trainer.py:765] (2/8) Epoch 18, batch 600, train_loss[loss=2.664, ArTop10Accuracy=0.7917, over 11499.00 frames. ], tot_loss[loss=2.715, ArTop10Accuracy=0.7864, over 11359.21 frames. ], batch size: 18, lr: 6.71e-03
297
+ 2024-08-06 13:28:33,581 INFO [trainer.py:765] (2/8) Epoch 18, batch 700, train_loss[loss=2.679, ArTop10Accuracy=0.7905, over 10233.00 frames. ], tot_loss[loss=2.723, ArTop10Accuracy=0.7849, over 11509.88 frames. ], batch size: 12, lr: 6.70e-03
298
+ 2024-08-06 13:29:54,984 INFO [trainer.py:765] (2/8) Epoch 18, batch 800, train_loss[loss=2.624, ArTop10Accuracy=0.8056, over 10095.00 frames. ], tot_loss[loss=2.727, ArTop10Accuracy=0.7841, over 11633.45 frames. ], batch size: 12, lr: 6.68e-03
299
+ 2024-08-06 13:31:12,519 INFO [trainer.py:765] (2/8) Epoch 18, batch 900, train_loss[loss=2.763, ArTop10Accuracy=0.7727, over 12981.00 frames. ], tot_loss[loss=2.724, ArTop10Accuracy=0.7846, over 11688.74 frames. ], batch size: 27, lr: 6.67e-03
300
+ 2024-08-06 13:32:26,552 INFO [trainer.py:765] (2/8) Epoch 18, batch 1000, train_loss[loss=2.712, ArTop10Accuracy=0.7877, over 12930.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.7838, over 11878.51 frames. ], batch size: 27, lr: 6.66e-03
301
+ 2024-08-06 13:33:41,497 INFO [trainer.py:765] (2/8) Epoch 18, batch 1100, train_loss[loss=2.725, ArTop10Accuracy=0.7836, over 13770.00 frames. ], tot_loss[loss=2.732, ArTop10Accuracy=0.7829, over 11933.41 frames. ], batch size: 34, lr: 6.64e-03
302
+ 2024-08-06 13:34:54,675 INFO [trainer.py:765] (2/8) Epoch 18, batch 1200, train_loss[loss=2.849, ArTop10Accuracy=0.7605, over 12456.00 frames. ], tot_loss[loss=2.736, ArTop10Accuracy=0.7823, over 11823.99 frames. ], batch size: 101, lr: 6.63e-03
303
+ 2024-08-06 13:35:51,064 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.340e+02 1.433e+02 1.533e+02 2.444e+02, threshold=2.867e+02, percent-clipped=0.0
304
+ 2024-08-06 13:35:54,276 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
305
+ 2024-08-06 13:37:48,623 INFO [trainer.py:765] (2/8) Epoch 19, batch 100, train_loss[loss=2.829, ArTop10Accuracy=0.7638, over 14544.00 frames. ], tot_loss[loss=2.721, ArTop10Accuracy=0.7847, over 4759.09 frames. ], batch size: 63, lr: 6.43e-03
306
+ 2024-08-06 13:39:23,258 INFO [trainer.py:765] (2/8) Epoch 19, batch 200, train_loss[loss=2.664, ArTop10Accuracy=0.7977, over 13635.00 frames. ], tot_loss[loss=2.712, ArTop10Accuracy=0.7865, over 7744.99 frames. ], batch size: 34, lr: 6.41e-03
307
+ 2024-08-06 13:40:48,361 INFO [trainer.py:765] (2/8) Epoch 19, batch 300, train_loss[loss=2.759, ArTop10Accuracy=0.7783, over 14727.00 frames. ], tot_loss[loss=2.708, ArTop10Accuracy=0.7873, over 9362.71 frames. ], batch size: 45, lr: 6.40e-03
308
+ 2024-08-06 13:42:21,067 INFO [trainer.py:765] (2/8) Epoch 19, batch 400, train_loss[loss=2.777, ArTop10Accuracy=0.7715, over 10257.00 frames. ], tot_loss[loss=2.708, ArTop10Accuracy=0.7875, over 10289.48 frames. ], batch size: 14, lr: 6.39e-03
309
+ 2024-08-06 13:43:44,954 INFO [trainer.py:765] (2/8) Epoch 19, batch 500, train_loss[loss=2.716, ArTop10Accuracy=0.7888, over 12099.00 frames. ], tot_loss[loss=2.706, ArTop10Accuracy=0.7879, over 10829.04 frames. ], batch size: 22, lr: 6.37e-03
310
+ 2024-08-06 13:45:16,681 INFO [trainer.py:765] (2/8) Epoch 19, batch 600, train_loss[loss=2.644, ArTop10Accuracy=0.7979, over 11373.00 frames. ], tot_loss[loss=2.708, ArTop10Accuracy=0.7875, over 11342.18 frames. ], batch size: 18, lr: 6.36e-03
311
+ 2024-08-06 13:46:48,321 INFO [trainer.py:765] (2/8) Epoch 19, batch 700, train_loss[loss=2.622, ArTop10Accuracy=0.8048, over 9222.00 frames. ], tot_loss[loss=2.712, ArTop10Accuracy=0.7868, over 11489.32 frames. ], batch size: 11, lr: 6.35e-03
312
+ 2024-08-06 13:48:11,883 INFO [trainer.py:765] (2/8) Epoch 19, batch 800, train_loss[loss=2.682, ArTop10Accuracy=0.7907, over 9537.00 frames. ], tot_loss[loss=2.714, ArTop10Accuracy=0.7861, over 11598.77 frames. ], batch size: 11, lr: 6.34e-03
313
+ 2024-08-06 13:49:27,258 INFO [trainer.py:765] (2/8) Epoch 19, batch 900, train_loss[loss=2.691, ArTop10Accuracy=0.7925, over 13038.00 frames. ], tot_loss[loss=2.709, ArTop10Accuracy=0.7874, over 11650.51 frames. ], batch size: 28, lr: 6.32e-03
314
+ 2024-08-06 13:50:40,652 INFO [trainer.py:803] (2/8) Computing validation loss
315
+ 2024-08-06 13:50:50,536 INFO [trainer.py:811] (2/8) Epoch 19, validation: loss=2.818, ArTop10Accuracy=0.7679, over 1827537.00 frames.
316
+ 2024-08-06 13:50:50,536 INFO [trainer.py:814] (2/8) Maximum memory allocated so far is 33008MB
317
+ 2024-08-06 13:50:51,488 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.371e+02 1.455e+02 1.550e+02 3.697e+02, threshold=2.909e+02, percent-clipped=0.2
318
+ 2024-08-06 13:50:52,917 INFO [trainer.py:765] (2/8) Epoch 19, batch 1000, train_loss[loss=2.821, ArTop10Accuracy=0.7719, over 13188.00 frames. ], tot_loss[loss=2.718, ArTop10Accuracy=0.7857, over 11863.63 frames. ], batch size: 28, lr: 6.31e-03
319
+ 2024-08-06 13:52:08,266 INFO [trainer.py:765] (2/8) Epoch 19, batch 1100, train_loss[loss=2.794, ArTop10Accuracy=0.7726, over 13524.00 frames. ], tot_loss[loss=2.727, ArTop10Accuracy=0.7837, over 11957.72 frames. ], batch size: 34, lr: 6.30e-03
320
+ 2024-08-06 13:53:22,313 INFO [trainer.py:765] (2/8) Epoch 19, batch 1200, train_loss[loss=2.808, ArTop10Accuracy=0.7734, over 12612.00 frames. ], tot_loss[loss=2.729, ArTop10Accuracy=0.7834, over 11877.42 frames. ], batch size: 101, lr: 6.28e-03
321
+ 2024-08-06 13:54:22,076 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
322
+ 2024-08-06 13:56:12,905 INFO [trainer.py:765] (2/8) Epoch 20, batch 100, train_loss[loss=2.786, ArTop10Accuracy=0.7693, over 14730.00 frames. ], tot_loss[loss=2.709, ArTop10Accuracy=0.7874, over 4769.30 frames. ], batch size: 63, lr: 6.10e-03
323
+ 2024-08-06 13:57:42,493 INFO [trainer.py:765] (2/8) Epoch 20, batch 200, train_loss[loss=2.748, ArTop10Accuracy=0.7818, over 13584.00 frames. ], tot_loss[loss=2.7, ArTop10Accuracy=0.789, over 7782.09 frames. ], batch size: 34, lr: 6.09e-03
324
+ 2024-08-06 13:59:15,430 INFO [trainer.py:765] (2/8) Epoch 20, batch 300, train_loss[loss=2.791, ArTop10Accuracy=0.7734, over 14217.00 frames. ], tot_loss[loss=2.699, ArTop10Accuracy=0.7891, over 9382.25 frames. ], batch size: 44, lr: 6.08e-03
325
+ 2024-08-06 14:00:44,356 INFO [trainer.py:765] (2/8) Epoch 20, batch 400, train_loss[loss=2.715, ArTop10Accuracy=0.7895, over 10272.00 frames. ], tot_loss[loss=2.698, ArTop10Accuracy=0.7893, over 10264.93 frames. ], batch size: 14, lr: 6.07e-03
326
+ 2024-08-06 14:02:14,854 INFO [trainer.py:765] (2/8) Epoch 20, batch 500, train_loss[loss=2.714, ArTop10Accuracy=0.7842, over 12114.00 frames. ], tot_loss[loss=2.694, ArTop10Accuracy=0.79, over 10835.01 frames. ], batch size: 22, lr: 6.06e-03
327
+ 2024-08-06 14:03:40,853 INFO [trainer.py:765] (2/8) Epoch 20, batch 600, train_loss[loss=2.604, ArTop10Accuracy=0.8084, over 11556.00 frames. ], tot_loss[loss=2.7, ArTop10Accuracy=0.789, over 11368.42 frames. ], batch size: 18, lr: 6.04e-03
328
+ 2024-08-06 14:05:13,864 INFO [trainer.py:765] (2/8) Epoch 20, batch 700, train_loss[loss=2.532, ArTop10Accuracy=0.8161, over 9213.00 frames. ], tot_loss[loss=2.704, ArTop10Accuracy=0.7882, over 11519.69 frames. ], batch size: 11, lr: 6.03e-03
329
+ 2024-08-06 14:05:30,789 INFO [optim.py:386] (2/8) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.365e+02 1.456e+02 1.550e+02 3.525e+02, threshold=2.913e+02, percent-clipped=0.1
330
+ 2024-08-06 14:06:34,508 INFO [trainer.py:765] (2/8) Epoch 20, batch 800, train_loss[loss=2.548, ArTop10Accuracy=0.8172, over 10098.00 frames. ], tot_loss[loss=2.707, ArTop10Accuracy=0.7874, over 11627.31 frames. ], batch size: 12, lr: 6.02e-03
331
+ 2024-08-06 14:07:50,944 INFO [trainer.py:765] (2/8) Epoch 20, batch 900, train_loss[loss=2.708, ArTop10Accuracy=0.7879, over 12912.00 frames. ], tot_loss[loss=2.703, ArTop10Accuracy=0.7883, over 11679.18 frames. ], batch size: 27, lr: 6.01e-03
332
+ 2024-08-06 14:09:07,173 INFO [trainer.py:765] (2/8) Epoch 20, batch 1000, train_loss[loss=2.672, ArTop10Accuracy=0.7979, over 13386.00 frames. ], tot_loss[loss=2.706, ArTop10Accuracy=0.7883, over 11867.63 frames. ], batch size: 28, lr: 6.00e-03
333
+ 2024-08-06 14:10:21,210 INFO [trainer.py:765] (2/8) Epoch 20, batch 1100, train_loss[loss=2.742, ArTop10Accuracy=0.7785, over 13809.00 frames. ], tot_loss[loss=2.715, ArTop10Accuracy=0.7864, over 11939.19 frames. ], batch size: 34, lr: 5.99e-03
334
+ 2024-08-06 14:11:37,813 INFO [trainer.py:765] (2/8) Epoch 20, batch 1200, train_loss[loss=2.899, ArTop10Accuracy=0.7505, over 12645.00 frames. ], tot_loss[loss=2.718, ArTop10Accuracy=0.7857, over 11855.77 frames. ], batch size: 101, lr: 5.98e-03
335
+ 2024-08-06 14:12:37,479 INFO [trainer.py:650] (2/8) Reaches end of dataloader.
336
+ 2024-08-06 14:12:37,481 INFO [trainer.py:1069] (2/8) Done!
libritts-r/log/log-train-2024-08-06-08-06-14-3 ADDED
@@ -0,0 +1,336 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:06:14,317 INFO [trainer.py:870] (3/8) Training started
2
+ 2024-08-06 08:06:14,318 INFO [trainer.py:889] (3/8) Device: cuda:3
3
+ 2024-08-06 08:06:14,318 INFO [trainer.py:890] (3/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:06:14,318 INFO [trainer.py:892] (3/8) About to create model
5
+ 2024-08-06 08:06:15,086 INFO [trainer.py:899] (3/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:06:16,728 INFO [trainer.py:914] (3/8) Using DDP
7
+ 2024-08-06 08:06:19,151 INFO [datamodule.py:427] (3/8) About to get train cuts
8
+ 2024-08-06 08:06:19,153 INFO [datamodule.py:434] (3/8) About to get dev cuts
9
+ 2024-08-06 08:06:19,154 INFO [datamodule.py:292] (3/8) Disable SpecAugment
10
+ 2024-08-06 08:06:19,154 INFO [datamodule.py:294] (3/8) About to create train dataset
11
+ 2024-08-06 08:06:19,155 INFO [datamodule.py:323] (3/8) Using DynamicBucketingSampler
12
+ 2024-08-06 08:06:19,758 INFO [datamodule.py:344] (3/8) About to create train dataloader
13
+ 2024-08-06 08:06:19,758 INFO [datamodule.py:367] (3/8) About to create dev dataset
14
+ 2024-08-06 08:06:20,081 INFO [datamodule.py:388] (3/8) About to create dev dataloader
15
+ 2024-08-06 08:08:02,124 INFO [trainer.py:765] (3/8) Epoch 1, batch 100, train_loss[loss=4.321, ArTop10Accuracy=0.494, over 14253.00 frames. ], tot_loss[loss=5.055, ArTop10Accuracy=0.3723, over 4750.99 frames. ], batch size: 62, lr: 2.25e-02
16
+ 2024-08-06 08:09:28,832 INFO [trainer.py:765] (3/8) Epoch 1, batch 200, train_loss[loss=4.028, ArTop10Accuracy=0.5454, over 13785.00 frames. ], tot_loss[loss=4.489, ArTop10Accuracy=0.4677, over 7752.98 frames. ], batch size: 34, lr: 3.00e-02
17
+ 2024-08-06 08:10:52,433 INFO [trainer.py:765] (3/8) Epoch 1, batch 300, train_loss[loss=3.866, ArTop10Accuracy=0.5741, over 14454.00 frames. ], tot_loss[loss=4.214, ArTop10Accuracy=0.5139, over 9356.97 frames. ], batch size: 44, lr: 3.00e-02
18
+ 2024-08-06 08:12:12,702 INFO [trainer.py:765] (3/8) Epoch 1, batch 400, train_loss[loss=3.711, ArTop10Accuracy=0.6045, over 10398.00 frames. ], tot_loss[loss=4.026, ArTop10Accuracy=0.5457, over 10273.01 frames. ], batch size: 14, lr: 3.00e-02
19
+ 2024-08-06 08:13:40,054 INFO [trainer.py:765] (3/8) Epoch 1, batch 500, train_loss[loss=3.637, ArTop10Accuracy=0.6122, over 12705.00 frames. ], tot_loss[loss=3.879, ArTop10Accuracy=0.5712, over 10836.94 frames. ], batch size: 23, lr: 2.99e-02
20
+ 2024-08-06 08:15:00,247 INFO [trainer.py:765] (3/8) Epoch 1, batch 600, train_loss[loss=3.602, ArTop10Accuracy=0.6212, over 11367.00 frames. ], tot_loss[loss=3.767, ArTop10Accuracy=0.5912, over 11362.36 frames. ], batch size: 18, lr: 2.99e-02
21
+ 2024-08-06 08:16:26,429 INFO [trainer.py:765] (3/8) Epoch 1, batch 700, train_loss[loss=3.538, ArTop10Accuracy=0.633, over 10170.00 frames. ], tot_loss[loss=3.684, ArTop10Accuracy=0.6062, over 11494.64 frames. ], batch size: 12, lr: 2.99e-02
22
+ 2024-08-06 08:17:43,022 INFO [trainer.py:765] (3/8) Epoch 1, batch 800, train_loss[loss=3.434, ArTop10Accuracy=0.6542, over 10245.00 frames. ], tot_loss[loss=3.624, ArTop10Accuracy=0.617, over 11656.86 frames. ], batch size: 12, lr: 2.98e-02
23
+ 2024-08-06 08:18:56,155 INFO [trainer.py:765] (3/8) Epoch 1, batch 900, train_loss[loss=3.534, ArTop10Accuracy=0.6336, over 12966.00 frames. ], tot_loss[loss=3.566, ArTop10Accuracy=0.6278, over 11705.99 frames. ], batch size: 27, lr: 2.98e-02
24
+ 2024-08-06 08:20:12,867 INFO [trainer.py:765] (3/8) Epoch 1, batch 1000, train_loss[loss=3.492, ArTop10Accuracy=0.6432, over 12921.00 frames. ], tot_loss[loss=3.524, ArTop10Accuracy=0.6352, over 11908.83 frames. ], batch size: 27, lr: 2.97e-02
25
+ 2024-08-06 08:20:13,547 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 9.300e+01 1.871e+02 2.675e+02 4.030e+02 9.119e+03, threshold=5.351e+02, percent-clipped=0.0
26
+ 2024-08-06 08:21:29,161 INFO [trainer.py:765] (3/8) Epoch 1, batch 1100, train_loss[loss=3.45, ArTop10Accuracy=0.6492, over 13404.00 frames. ], tot_loss[loss=3.486, ArTop10Accuracy=0.6422, over 11968.58 frames. ], batch size: 34, lr: 2.96e-02
27
+ 2024-08-06 08:22:45,417 INFO [trainer.py:765] (3/8) Epoch 1, batch 1200, train_loss[loss=3.485, ArTop10Accuracy=0.6411, over 12429.00 frames. ], tot_loss[loss=3.457, ArTop10Accuracy=0.6477, over 11881.54 frames. ], batch size: 101, lr: 2.96e-02
28
+ 2024-08-06 08:23:45,270 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
29
+ 2024-08-06 08:25:36,244 INFO [trainer.py:765] (3/8) Epoch 2, batch 100, train_loss[loss=3.434, ArTop10Accuracy=0.6529, over 14478.00 frames. ], tot_loss[loss=3.425, ArTop10Accuracy=0.6525, over 4759.18 frames. ], batch size: 62, lr: 2.90e-02
30
+ 2024-08-06 08:26:58,962 INFO [trainer.py:765] (3/8) Epoch 2, batch 200, train_loss[loss=3.295, ArTop10Accuracy=0.6847, over 13569.00 frames. ], tot_loss[loss=3.39, ArTop10Accuracy=0.6593, over 7756.33 frames. ], batch size: 34, lr: 2.89e-02
31
+ 2024-08-06 08:28:25,540 INFO [trainer.py:765] (3/8) Epoch 2, batch 300, train_loss[loss=3.367, ArTop10Accuracy=0.6621, over 14055.00 frames. ], tot_loss[loss=3.367, ArTop10Accuracy=0.6633, over 9383.94 frames. ], batch size: 44, lr: 2.89e-02
32
+ 2024-08-06 08:29:48,643 INFO [trainer.py:765] (3/8) Epoch 2, batch 400, train_loss[loss=3.229, ArTop10Accuracy=0.6936, over 10959.00 frames. ], tot_loss[loss=3.356, ArTop10Accuracy=0.6656, over 10302.25 frames. ], batch size: 15, lr: 2.88e-02
33
+ 2024-08-06 08:31:22,908 INFO [trainer.py:765] (3/8) Epoch 2, batch 500, train_loss[loss=3.37, ArTop10Accuracy=0.6632, over 12750.00 frames. ], tot_loss[loss=3.343, ArTop10Accuracy=0.668, over 10853.14 frames. ], batch size: 23, lr: 2.87e-02
34
+ 2024-08-06 08:32:45,694 INFO [trainer.py:765] (3/8) Epoch 2, batch 600, train_loss[loss=3.274, ArTop10Accuracy=0.684, over 11472.00 frames. ], tot_loss[loss=3.332, ArTop10Accuracy=0.6701, over 11378.49 frames. ], batch size: 18, lr: 2.86e-02
35
+ 2024-08-06 08:34:13,589 INFO [trainer.py:765] (3/8) Epoch 2, batch 700, train_loss[loss=3.395, ArTop10Accuracy=0.6511, over 9318.00 frames. ], tot_loss[loss=3.328, ArTop10Accuracy=0.671, over 11516.79 frames. ], batch size: 11, lr: 2.85e-02
36
+ 2024-08-06 08:34:31,181 INFO [trainer.py:803] (3/8) Computing validation loss
37
+ 2024-08-06 08:34:40,888 INFO [trainer.py:811] (3/8) Epoch 2, validation: loss=3.277, ArTop10Accuracy=0.6803, over 1827537.00 frames.
38
+ 2024-08-06 08:34:40,889 INFO [trainer.py:814] (3/8) Maximum memory allocated so far is 29540MB
39
+ 2024-08-06 08:34:41,706 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 7.953e+01 1.592e+02 2.200e+02 3.344e+02 2.949e+03, threshold=4.400e+02, percent-clipped=8.6
40
+ 2024-08-06 08:35:39,883 INFO [trainer.py:765] (3/8) Epoch 2, batch 800, train_loss[loss=3.373, ArTop10Accuracy=0.6659, over 9150.00 frames. ], tot_loss[loss=3.325, ArTop10Accuracy=0.6717, over 11642.86 frames. ], batch size: 11, lr: 2.84e-02
41
+ 2024-08-06 08:36:56,377 INFO [trainer.py:765] (3/8) Epoch 2, batch 900, train_loss[loss=3.308, ArTop10Accuracy=0.6785, over 12996.00 frames. ], tot_loss[loss=3.31, ArTop10Accuracy=0.6747, over 11701.06 frames. ], batch size: 27, lr: 2.83e-02
42
+ 2024-08-06 08:38:10,517 INFO [trainer.py:765] (3/8) Epoch 2, batch 1000, train_loss[loss=3.3, ArTop10Accuracy=0.6792, over 12924.00 frames. ], tot_loss[loss=3.303, ArTop10Accuracy=0.6759, over 11883.62 frames. ], batch size: 27, lr: 2.82e-02
43
+ 2024-08-06 08:39:25,066 INFO [trainer.py:765] (3/8) Epoch 2, batch 1100, train_loss[loss=3.255, ArTop10Accuracy=0.6854, over 13749.00 frames. ], tot_loss[loss=3.294, ArTop10Accuracy=0.6775, over 11961.15 frames. ], batch size: 34, lr: 2.81e-02
44
+ 2024-08-06 08:40:38,226 INFO [trainer.py:765] (3/8) Epoch 2, batch 1200, train_loss[loss=3.319, ArTop10Accuracy=0.6732, over 12072.00 frames. ], tot_loss[loss=3.284, ArTop10Accuracy=0.6796, over 11891.67 frames. ], batch size: 101, lr: 2.80e-02
45
+ 2024-08-06 08:41:38,664 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
46
+ 2024-08-06 08:43:36,656 INFO [trainer.py:765] (3/8) Epoch 3, batch 100, train_loss[loss=3.244, ArTop10Accuracy=0.6856, over 14499.00 frames. ], tot_loss[loss=3.254, ArTop10Accuracy=0.6846, over 4767.19 frames. ], batch size: 62, lr: 2.67e-02
47
+ 2024-08-06 08:45:10,506 INFO [trainer.py:765] (3/8) Epoch 3, batch 200, train_loss[loss=3.139, ArTop10Accuracy=0.7055, over 13473.00 frames. ], tot_loss[loss=3.23, ArTop10Accuracy=0.6888, over 7744.68 frames. ], batch size: 34, lr: 2.66e-02
48
+ 2024-08-06 08:46:29,264 INFO [trainer.py:765] (3/8) Epoch 3, batch 300, train_loss[loss=3.263, ArTop10Accuracy=0.6859, over 14445.00 frames. ], tot_loss[loss=3.209, ArTop10Accuracy=0.6931, over 9375.03 frames. ], batch size: 45, lr: 2.64e-02
49
+ 2024-08-06 08:48:04,225 INFO [trainer.py:765] (3/8) Epoch 3, batch 400, train_loss[loss=3.121, ArTop10Accuracy=0.7107, over 10851.00 frames. ], tot_loss[loss=3.191, ArTop10Accuracy=0.6967, over 10285.27 frames. ], batch size: 15, lr: 2.63e-02
50
+ 2024-08-06 08:48:40,887 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 9.282e+01 1.561e+02 1.981e+02 2.686e+02 1.768e+03, threshold=3.962e+02, percent-clipped=7.6
51
+ 2024-08-06 08:49:25,547 INFO [trainer.py:765] (3/8) Epoch 3, batch 500, train_loss[loss=3.155, ArTop10Accuracy=0.709, over 12711.00 frames. ], tot_loss[loss=3.174, ArTop10Accuracy=0.7001, over 10844.23 frames. ], batch size: 23, lr: 2.62e-02
52
+ 2024-08-06 08:51:00,483 INFO [trainer.py:765] (3/8) Epoch 3, batch 600, train_loss[loss=3.06, ArTop10Accuracy=0.725, over 11397.00 frames. ], tot_loss[loss=3.157, ArTop10Accuracy=0.7033, over 11347.80 frames. ], batch size: 18, lr: 2.61e-02
53
+ 2024-08-06 08:52:31,624 INFO [trainer.py:765] (3/8) Epoch 3, batch 700, train_loss[loss=3.042, ArTop10Accuracy=0.7264, over 10038.00 frames. ], tot_loss[loss=3.145, ArTop10Accuracy=0.7056, over 11508.63 frames. ], batch size: 12, lr: 2.60e-02
54
+ 2024-08-06 08:53:57,394 INFO [trainer.py:765] (3/8) Epoch 3, batch 800, train_loss[loss=3.221, ArTop10Accuracy=0.6856, over 10071.00 frames. ], tot_loss[loss=3.138, ArTop10Accuracy=0.7071, over 11643.06 frames. ], batch size: 12, lr: 2.59e-02
55
+ 2024-08-06 08:55:15,124 INFO [trainer.py:765] (3/8) Epoch 3, batch 900, train_loss[loss=3.071, ArTop10Accuracy=0.7171, over 12846.00 frames. ], tot_loss[loss=3.117, ArTop10Accuracy=0.711, over 11684.97 frames. ], batch size: 27, lr: 2.57e-02
56
+ 2024-08-06 08:56:31,563 INFO [trainer.py:765] (3/8) Epoch 3, batch 1000, train_loss[loss=3.188, ArTop10Accuracy=0.6936, over 12897.00 frames. ], tot_loss[loss=3.112, ArTop10Accuracy=0.7118, over 11865.36 frames. ], batch size: 27, lr: 2.56e-02
57
+ 2024-08-06 08:57:46,512 INFO [trainer.py:765] (3/8) Epoch 3, batch 1100, train_loss[loss=3.058, ArTop10Accuracy=0.7222, over 13737.00 frames. ], tot_loss[loss=3.105, ArTop10Accuracy=0.7131, over 11941.48 frames. ], batch size: 34, lr: 2.55e-02
58
+ 2024-08-06 08:59:01,405 INFO [trainer.py:765] (3/8) Epoch 3, batch 1200, train_loss[loss=3.175, ArTop10Accuracy=0.7042, over 12597.00 frames. ], tot_loss[loss=3.095, ArTop10Accuracy=0.7152, over 11835.39 frames. ], batch size: 101, lr: 2.54e-02
59
+ 2024-08-06 09:00:02,031 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
60
+ 2024-08-06 09:01:50,747 INFO [trainer.py:765] (3/8) Epoch 4, batch 100, train_loss[loss=3.104, ArTop10Accuracy=0.7088, over 14130.00 frames. ], tot_loss[loss=3.069, ArTop10Accuracy=0.7194, over 4744.71 frames. ], batch size: 62, lr: 2.38e-02
61
+ 2024-08-06 09:02:52,864 INFO [trainer.py:803] (3/8) Computing validation loss
62
+ 2024-08-06 09:03:02,384 INFO [trainer.py:811] (3/8) Epoch 4, validation: loss=2.997, ArTop10Accuracy=0.7338, over 1827537.00 frames.
63
+ 2024-08-06 09:03:02,385 INFO [trainer.py:814] (3/8) Maximum memory allocated so far is 29540MB
64
+ 2024-08-06 09:03:03,370 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.499e+02 1.782e+02 2.273e+02 1.100e+03, threshold=3.565e+02, percent-clipped=4.7
65
+ 2024-08-06 09:03:29,279 INFO [trainer.py:765] (3/8) Epoch 4, batch 200, train_loss[loss=3.02, ArTop10Accuracy=0.7308, over 13695.00 frames. ], tot_loss[loss=3.044, ArTop10Accuracy=0.7244, over 7751.43 frames. ], batch size: 34, lr: 2.37e-02
66
+ 2024-08-06 09:05:01,738 INFO [trainer.py:765] (3/8) Epoch 4, batch 300, train_loss[loss=3.065, ArTop10Accuracy=0.7227, over 14499.00 frames. ], tot_loss[loss=3.04, ArTop10Accuracy=0.7256, over 9375.57 frames. ], batch size: 45, lr: 2.36e-02
67
+ 2024-08-06 09:06:28,156 INFO [trainer.py:765] (3/8) Epoch 4, batch 400, train_loss[loss=2.988, ArTop10Accuracy=0.7383, over 10809.00 frames. ], tot_loss[loss=3.03, ArTop10Accuracy=0.7275, over 10281.95 frames. ], batch size: 15, lr: 2.34e-02
68
+ 2024-08-06 09:08:01,930 INFO [trainer.py:765] (3/8) Epoch 4, batch 500, train_loss[loss=2.973, ArTop10Accuracy=0.7446, over 12387.00 frames. ], tot_loss[loss=3.022, ArTop10Accuracy=0.729, over 10821.67 frames. ], batch size: 22, lr: 2.33e-02
69
+ 2024-08-06 09:09:28,547 INFO [trainer.py:765] (3/8) Epoch 4, batch 600, train_loss[loss=2.865, ArTop10Accuracy=0.7627, over 11403.00 frames. ], tot_loss[loss=3.02, ArTop10Accuracy=0.7295, over 11351.68 frames. ], batch size: 18, lr: 2.32e-02
70
+ 2024-08-06 09:10:59,872 INFO [trainer.py:765] (3/8) Epoch 4, batch 700, train_loss[loss=2.839, ArTop10Accuracy=0.765, over 9333.00 frames. ], tot_loss[loss=3.022, ArTop10Accuracy=0.7291, over 11490.84 frames. ], batch size: 11, lr: 2.31e-02
71
+ 2024-08-06 09:12:17,519 INFO [trainer.py:765] (3/8) Epoch 4, batch 800, train_loss[loss=2.979, ArTop10Accuracy=0.7389, over 9318.00 frames. ], tot_loss[loss=3.021, ArTop10Accuracy=0.7293, over 11615.33 frames. ], batch size: 11, lr: 2.30e-02
72
+ 2024-08-06 09:13:33,219 INFO [trainer.py:765] (3/8) Epoch 4, batch 900, train_loss[loss=2.958, ArTop10Accuracy=0.7471, over 12633.00 frames. ], tot_loss[loss=3.011, ArTop10Accuracy=0.7311, over 11664.01 frames. ], batch size: 27, lr: 2.29e-02
73
+ 2024-08-06 09:14:47,526 INFO [trainer.py:765] (3/8) Epoch 4, batch 1000, train_loss[loss=2.934, ArTop10Accuracy=0.7402, over 12672.00 frames. ], tot_loss[loss=3.013, ArTop10Accuracy=0.7307, over 11855.93 frames. ], batch size: 27, lr: 2.28e-02
74
+ 2024-08-06 09:16:02,988 INFO [trainer.py:765] (3/8) Epoch 4, batch 1100, train_loss[loss=3.039, ArTop10Accuracy=0.7209, over 13689.00 frames. ], tot_loss[loss=3.015, ArTop10Accuracy=0.7302, over 11936.22 frames. ], batch size: 34, lr: 2.26e-02
75
+ 2024-08-06 09:16:53,297 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.440e+02 1.636e+02 1.968e+02 7.702e+02, threshold=3.273e+02, percent-clipped=1.3
76
+ 2024-08-06 09:17:18,350 INFO [trainer.py:765] (3/8) Epoch 4, batch 1200, train_loss[loss=3.08, ArTop10Accuracy=0.7212, over 13128.00 frames. ], tot_loss[loss=3.012, ArTop10Accuracy=0.7306, over 11871.12 frames. ], batch size: 103, lr: 2.25e-02
77
+ 2024-08-06 09:18:17,420 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
78
+ 2024-08-06 09:20:17,177 INFO [trainer.py:765] (3/8) Epoch 5, batch 100, train_loss[loss=3.008, ArTop10Accuracy=0.7247, over 14334.00 frames. ], tot_loss[loss=2.988, ArTop10Accuracy=0.7348, over 4769.18 frames. ], batch size: 63, lr: 2.10e-02
79
+ 2024-08-06 09:21:52,302 INFO [trainer.py:765] (3/8) Epoch 5, batch 200, train_loss[loss=3.035, ArTop10Accuracy=0.7238, over 13557.00 frames. ], tot_loss[loss=2.981, ArTop10Accuracy=0.7363, over 7749.95 frames. ], batch size: 34, lr: 2.09e-02
80
+ 2024-08-06 09:23:19,247 INFO [trainer.py:765] (3/8) Epoch 5, batch 300, train_loss[loss=3.01, ArTop10Accuracy=0.7287, over 14130.00 frames. ], tot_loss[loss=2.969, ArTop10Accuracy=0.7386, over 9362.27 frames. ], batch size: 44, lr: 2.08e-02
81
+ 2024-08-06 09:24:53,543 INFO [trainer.py:765] (3/8) Epoch 5, batch 400, train_loss[loss=2.831, ArTop10Accuracy=0.7688, over 10419.00 frames. ], tot_loss[loss=2.967, ArTop10Accuracy=0.7389, over 10263.43 frames. ], batch size: 14, lr: 2.07e-02
82
+ 2024-08-06 09:26:19,424 INFO [trainer.py:765] (3/8) Epoch 5, batch 500, train_loss[loss=2.948, ArTop10Accuracy=0.7401, over 12189.00 frames. ], tot_loss[loss=2.964, ArTop10Accuracy=0.7394, over 10831.03 frames. ], batch size: 22, lr: 2.06e-02
83
+ 2024-08-06 09:27:49,543 INFO [trainer.py:765] (3/8) Epoch 5, batch 600, train_loss[loss=3.025, ArTop10Accuracy=0.7193, over 11202.00 frames. ], tot_loss[loss=2.963, ArTop10Accuracy=0.7395, over 11339.96 frames. ], batch size: 18, lr: 2.05e-02
84
+ 2024-08-06 09:29:21,676 INFO [trainer.py:765] (3/8) Epoch 5, batch 700, train_loss[loss=2.829, ArTop10Accuracy=0.7669, over 9447.00 frames. ], tot_loss[loss=2.965, ArTop10Accuracy=0.7396, over 11490.41 frames. ], batch size: 11, lr: 2.04e-02
85
+ 2024-08-06 09:30:44,699 INFO [trainer.py:765] (3/8) Epoch 5, batch 800, train_loss[loss=2.886, ArTop10Accuracy=0.7585, over 9441.00 frames. ], tot_loss[loss=2.969, ArTop10Accuracy=0.7388, over 11618.93 frames. ], batch size: 11, lr: 2.03e-02
86
+ 2024-08-06 09:31:51,245 INFO [trainer.py:803] (3/8) Computing validation loss
87
+ 2024-08-06 09:32:00,762 INFO [trainer.py:811] (3/8) Epoch 5, validation: loss=2.926, ArTop10Accuracy=0.7466, over 1827537.00 frames.
88
+ 2024-08-06 09:32:00,763 INFO [trainer.py:814] (3/8) Maximum memory allocated so far is 29540MB
89
+ 2024-08-06 09:32:01,714 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.349e+02 1.525e+02 1.806e+02 1.007e+03, threshold=3.049e+02, percent-clipped=2.3
90
+ 2024-08-06 09:32:10,560 INFO [trainer.py:765] (3/8) Epoch 5, batch 900, train_loss[loss=2.864, ArTop10Accuracy=0.758, over 12900.00 frames. ], tot_loss[loss=2.958, ArTop10Accuracy=0.741, over 11669.20 frames. ], batch size: 27, lr: 2.02e-02
91
+ 2024-08-06 09:33:27,329 INFO [trainer.py:765] (3/8) Epoch 5, batch 1000, train_loss[loss=2.95, ArTop10Accuracy=0.7414, over 12960.00 frames. ], tot_loss[loss=2.961, ArTop10Accuracy=0.7401, over 11887.77 frames. ], batch size: 27, lr: 2.01e-02
92
+ 2024-08-06 09:34:42,307 INFO [trainer.py:765] (3/8) Epoch 5, batch 1100, train_loss[loss=2.925, ArTop10Accuracy=0.7515, over 13506.00 frames. ], tot_loss[loss=2.967, ArTop10Accuracy=0.7391, over 11948.88 frames. ], batch size: 34, lr: 2.00e-02
93
+ 2024-08-06 09:35:56,338 INFO [trainer.py:765] (3/8) Epoch 5, batch 1200, train_loss[loss=2.999, ArTop10Accuracy=0.7334, over 12924.00 frames. ], tot_loss[loss=2.967, ArTop10Accuracy=0.7391, over 11862.59 frames. ], batch size: 101, lr: 1.99e-02
94
+ 2024-08-06 09:36:55,670 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
95
+ 2024-08-06 09:38:52,670 INFO [trainer.py:765] (3/8) Epoch 6, batch 100, train_loss[loss=3, ArTop10Accuracy=0.7349, over 14295.00 frames. ], tot_loss[loss=2.947, ArTop10Accuracy=0.7423, over 4739.69 frames. ], batch size: 62, lr: 1.85e-02
96
+ 2024-08-06 09:40:19,840 INFO [trainer.py:765] (3/8) Epoch 6, batch 200, train_loss[loss=2.892, ArTop10Accuracy=0.7548, over 13599.00 frames. ], tot_loss[loss=2.937, ArTop10Accuracy=0.7444, over 7746.65 frames. ], batch size: 34, lr: 1.84e-02
97
+ 2024-08-06 09:41:52,970 INFO [trainer.py:765] (3/8) Epoch 6, batch 300, train_loss[loss=3.024, ArTop10Accuracy=0.7262, over 14010.00 frames. ], tot_loss[loss=2.93, ArTop10Accuracy=0.746, over 9367.16 frames. ], batch size: 44, lr: 1.83e-02
98
+ 2024-08-06 09:43:17,833 INFO [trainer.py:765] (3/8) Epoch 6, batch 400, train_loss[loss=2.828, ArTop10Accuracy=0.7645, over 10341.00 frames. ], tot_loss[loss=2.926, ArTop10Accuracy=0.7468, over 10282.94 frames. ], batch size: 14, lr: 1.83e-02
99
+ 2024-08-06 09:44:54,133 INFO [trainer.py:765] (3/8) Epoch 6, batch 500, train_loss[loss=2.917, ArTop10Accuracy=0.7487, over 12615.00 frames. ], tot_loss[loss=2.92, ArTop10Accuracy=0.7477, over 10844.16 frames. ], batch size: 23, lr: 1.82e-02
100
+ 2024-08-06 09:46:22,878 INFO [trainer.py:765] (3/8) Epoch 6, batch 600, train_loss[loss=2.954, ArTop10Accuracy=0.7389, over 11496.00 frames. ], tot_loss[loss=2.92, ArTop10Accuracy=0.7478, over 11382.94 frames. ], batch size: 18, lr: 1.81e-02
101
+ 2024-08-06 09:46:37,225 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.012e+02 1.339e+02 1.480e+02 1.701e+02 7.506e+02, threshold=2.959e+02, percent-clipped=1.1
102
+ 2024-08-06 09:47:57,876 INFO [trainer.py:765] (3/8) Epoch 6, batch 700, train_loss[loss=2.81, ArTop10Accuracy=0.7641, over 9426.00 frames. ], tot_loss[loss=2.927, ArTop10Accuracy=0.7465, over 11545.50 frames. ], batch size: 11, lr: 1.80e-02
103
+ 2024-08-06 09:49:15,960 INFO [trainer.py:765] (3/8) Epoch 6, batch 800, train_loss[loss=2.978, ArTop10Accuracy=0.738, over 10194.00 frames. ], tot_loss[loss=2.929, ArTop10Accuracy=0.7462, over 11653.83 frames. ], batch size: 12, lr: 1.79e-02
104
+ 2024-08-06 09:50:32,141 INFO [trainer.py:765] (3/8) Epoch 6, batch 900, train_loss[loss=2.958, ArTop10Accuracy=0.7424, over 12861.00 frames. ], tot_loss[loss=2.923, ArTop10Accuracy=0.7476, over 11703.21 frames. ], batch size: 27, lr: 1.78e-02
105
+ 2024-08-06 09:51:47,305 INFO [trainer.py:765] (3/8) Epoch 6, batch 1000, train_loss[loss=2.948, ArTop10Accuracy=0.7387, over 12723.00 frames. ], tot_loss[loss=2.922, ArTop10Accuracy=0.7476, over 11897.90 frames. ], batch size: 27, lr: 1.77e-02
106
+ 2024-08-06 09:53:00,926 INFO [trainer.py:765] (3/8) Epoch 6, batch 1100, train_loss[loss=2.986, ArTop10Accuracy=0.7328, over 13701.00 frames. ], tot_loss[loss=2.928, ArTop10Accuracy=0.7464, over 11964.42 frames. ], batch size: 34, lr: 1.77e-02
107
+ 2024-08-06 09:54:14,342 INFO [trainer.py:765] (3/8) Epoch 6, batch 1200, train_loss[loss=3.044, ArTop10Accuracy=0.7283, over 12891.00 frames. ], tot_loss[loss=2.927, ArTop10Accuracy=0.7465, over 11877.83 frames. ], batch size: 101, lr: 1.76e-02
108
+ 2024-08-06 09:55:13,384 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
109
+ 2024-08-06 09:57:06,705 INFO [trainer.py:765] (3/8) Epoch 7, batch 100, train_loss[loss=2.902, ArTop10Accuracy=0.7499, over 14643.00 frames. ], tot_loss[loss=2.913, ArTop10Accuracy=0.7485, over 4780.37 frames. ], batch size: 63, lr: 1.64e-02
110
+ 2024-08-06 09:58:39,432 INFO [trainer.py:765] (3/8) Epoch 7, batch 200, train_loss[loss=2.904, ArTop10Accuracy=0.7525, over 13992.00 frames. ], tot_loss[loss=2.905, ArTop10Accuracy=0.7503, over 7757.95 frames. ], batch size: 35, lr: 1.64e-02
111
+ 2024-08-06 10:00:06,089 INFO [trainer.py:765] (3/8) Epoch 7, batch 300, train_loss[loss=2.937, ArTop10Accuracy=0.7427, over 14214.00 frames. ], tot_loss[loss=2.901, ArTop10Accuracy=0.7509, over 9379.79 frames. ], batch size: 45, lr: 1.63e-02
112
+ 2024-08-06 10:00:40,516 INFO [trainer.py:803] (3/8) Computing validation loss
113
+ 2024-08-06 10:00:50,245 INFO [trainer.py:811] (3/8) Epoch 7, validation: loss=2.88, ArTop10Accuracy=0.7554, over 1827537.00 frames.
114
+ 2024-08-06 10:00:50,246 INFO [trainer.py:814] (3/8) Maximum memory allocated so far is 30046MB
115
+ 2024-08-06 10:00:50,983 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.002e+02 1.286e+02 1.429e+02 1.605e+02 1.020e+03, threshold=2.857e+02, percent-clipped=1.5
116
+ 2024-08-06 10:01:49,122 INFO [trainer.py:765] (3/8) Epoch 7, batch 400, train_loss[loss=2.901, ArTop10Accuracy=0.7524, over 10989.00 frames. ], tot_loss[loss=2.895, ArTop10Accuracy=0.7526, over 10298.60 frames. ], batch size: 15, lr: 1.62e-02
117
+ 2024-08-06 10:03:21,465 INFO [trainer.py:765] (3/8) Epoch 7, batch 500, train_loss[loss=2.942, ArTop10Accuracy=0.7404, over 12828.00 frames. ], tot_loss[loss=2.891, ArTop10Accuracy=0.7535, over 10868.22 frames. ], batch size: 23, lr: 1.61e-02
118
+ 2024-08-06 10:04:51,889 INFO [trainer.py:765] (3/8) Epoch 7, batch 600, train_loss[loss=2.859, ArTop10Accuracy=0.7626, over 11541.00 frames. ], tot_loss[loss=2.894, ArTop10Accuracy=0.7528, over 11398.95 frames. ], batch size: 18, lr: 1.61e-02
119
+ 2024-08-06 10:06:25,118 INFO [trainer.py:765] (3/8) Epoch 7, batch 700, train_loss[loss=2.801, ArTop10Accuracy=0.7696, over 10050.00 frames. ], tot_loss[loss=2.897, ArTop10Accuracy=0.7524, over 11508.96 frames. ], batch size: 12, lr: 1.60e-02
120
+ 2024-08-06 10:07:46,954 INFO [trainer.py:765] (3/8) Epoch 7, batch 800, train_loss[loss=2.846, ArTop10Accuracy=0.7637, over 10302.00 frames. ], tot_loss[loss=2.897, ArTop10Accuracy=0.7525, over 11633.29 frames. ], batch size: 12, lr: 1.59e-02
121
+ 2024-08-06 10:09:02,830 INFO [trainer.py:765] (3/8) Epoch 7, batch 900, train_loss[loss=2.874, ArTop10Accuracy=0.7574, over 12936.00 frames. ], tot_loss[loss=2.888, ArTop10Accuracy=0.7541, over 11680.46 frames. ], batch size: 27, lr: 1.59e-02
122
+ 2024-08-06 10:10:19,642 INFO [trainer.py:765] (3/8) Epoch 7, batch 1000, train_loss[loss=2.844, ArTop10Accuracy=0.7628, over 12774.00 frames. ], tot_loss[loss=2.897, ArTop10Accuracy=0.7524, over 11883.71 frames. ], batch size: 27, lr: 1.58e-02
123
+ 2024-08-06 10:11:35,215 INFO [trainer.py:765] (3/8) Epoch 7, batch 1100, train_loss[loss=2.865, ArTop10Accuracy=0.7577, over 13803.00 frames. ], tot_loss[loss=2.9, ArTop10Accuracy=0.7517, over 11945.40 frames. ], batch size: 34, lr: 1.57e-02
124
+ 2024-08-06 10:12:48,210 INFO [trainer.py:765] (3/8) Epoch 7, batch 1200, train_loss[loss=2.986, ArTop10Accuracy=0.7322, over 12135.00 frames. ], tot_loss[loss=2.897, ArTop10Accuracy=0.7522, over 11873.39 frames. ], batch size: 101, lr: 1.57e-02
125
+ 2024-08-06 10:13:46,761 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
126
+ 2024-08-06 10:15:03,607 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.017e+02 1.283e+02 1.410e+02 1.601e+02 1.017e+03, threshold=2.820e+02, percent-clipped=0.9
127
+ 2024-08-06 10:15:40,826 INFO [trainer.py:765] (3/8) Epoch 8, batch 100, train_loss[loss=2.98, ArTop10Accuracy=0.7361, over 14430.00 frames. ], tot_loss[loss=2.887, ArTop10Accuracy=0.7539, over 4774.76 frames. ], batch size: 62, lr: 1.47e-02
128
+ 2024-08-06 10:17:12,867 INFO [trainer.py:765] (3/8) Epoch 8, batch 200, train_loss[loss=2.818, ArTop10Accuracy=0.7666, over 13695.00 frames. ], tot_loss[loss=2.876, ArTop10Accuracy=0.7558, over 7759.07 frames. ], batch size: 34, lr: 1.46e-02
129
+ 2024-08-06 10:18:37,904 INFO [trainer.py:765] (3/8) Epoch 8, batch 300, train_loss[loss=2.93, ArTop10Accuracy=0.7451, over 14208.00 frames. ], tot_loss[loss=2.868, ArTop10Accuracy=0.7573, over 9366.52 frames. ], batch size: 44, lr: 1.46e-02
130
+ 2024-08-06 10:20:06,350 INFO [trainer.py:765] (3/8) Epoch 8, batch 400, train_loss[loss=2.762, ArTop10Accuracy=0.7819, over 10890.00 frames. ], tot_loss[loss=2.865, ArTop10Accuracy=0.7582, over 10274.78 frames. ], batch size: 15, lr: 1.45e-02
131
+ 2024-08-06 10:21:32,417 INFO [trainer.py:765] (3/8) Epoch 8, batch 500, train_loss[loss=2.789, ArTop10Accuracy=0.7733, over 12693.00 frames. ], tot_loss[loss=2.859, ArTop10Accuracy=0.7593, over 10844.23 frames. ], batch size: 23, lr: 1.45e-02
132
+ 2024-08-06 10:23:00,980 INFO [trainer.py:765] (3/8) Epoch 8, batch 600, train_loss[loss=2.75, ArTop10Accuracy=0.779, over 11301.00 frames. ], tot_loss[loss=2.864, ArTop10Accuracy=0.7585, over 11361.98 frames. ], batch size: 18, lr: 1.44e-02
133
+ 2024-08-06 10:24:37,793 INFO [trainer.py:765] (3/8) Epoch 8, batch 700, train_loss[loss=2.707, ArTop10Accuracy=0.7901, over 10263.00 frames. ], tot_loss[loss=2.87, ArTop10Accuracy=0.7573, over 11497.41 frames. ], batch size: 12, lr: 1.43e-02
134
+ 2024-08-06 10:25:56,091 INFO [trainer.py:765] (3/8) Epoch 8, batch 800, train_loss[loss=2.766, ArTop10Accuracy=0.7851, over 9222.00 frames. ], tot_loss[loss=2.875, ArTop10Accuracy=0.7566, over 11602.97 frames. ], batch size: 11, lr: 1.43e-02
135
+ 2024-08-06 10:27:12,251 INFO [trainer.py:765] (3/8) Epoch 8, batch 900, train_loss[loss=2.853, ArTop10Accuracy=0.7619, over 12891.00 frames. ], tot_loss[loss=2.865, ArTop10Accuracy=0.7586, over 11666.59 frames. ], batch size: 27, lr: 1.42e-02
136
+ 2024-08-06 10:28:25,269 INFO [trainer.py:765] (3/8) Epoch 8, batch 1000, train_loss[loss=2.875, ArTop10Accuracy=0.7523, over 12798.00 frames. ], tot_loss[loss=2.872, ArTop10Accuracy=0.7572, over 11869.24 frames. ], batch size: 27, lr: 1.42e-02
137
+ 2024-08-06 10:29:07,161 INFO [trainer.py:803] (3/8) Computing validation loss
138
+ 2024-08-06 10:29:16,830 INFO [trainer.py:811] (3/8) Epoch 8, validation: loss=2.858, ArTop10Accuracy=0.7594, over 1827537.00 frames.
139
+ 2024-08-06 10:29:16,831 INFO [trainer.py:814] (3/8) Maximum memory allocated so far is 33011MB
140
+ 2024-08-06 10:29:17,497 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.275e+02 1.390e+02 1.547e+02 3.717e+02, threshold=2.781e+02, percent-clipped=0.7
141
+ 2024-08-06 10:29:51,737 INFO [trainer.py:765] (3/8) Epoch 8, batch 1100, train_loss[loss=2.957, ArTop10Accuracy=0.737, over 13581.00 frames. ], tot_loss[loss=2.88, ArTop10Accuracy=0.7556, over 11946.69 frames. ], batch size: 34, lr: 1.41e-02
142
+ 2024-08-06 10:31:05,952 INFO [trainer.py:765] (3/8) Epoch 8, batch 1200, train_loss[loss=2.957, ArTop10Accuracy=0.7357, over 12132.00 frames. ], tot_loss[loss=2.877, ArTop10Accuracy=0.7559, over 11833.29 frames. ], batch size: 101, lr: 1.40e-02
143
+ 2024-08-06 10:32:05,668 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
144
+ 2024-08-06 10:34:01,262 INFO [trainer.py:765] (3/8) Epoch 9, batch 100, train_loss[loss=2.938, ArTop10Accuracy=0.7428, over 14349.00 frames. ], tot_loss[loss=2.857, ArTop10Accuracy=0.7592, over 4768.86 frames. ], batch size: 62, lr: 1.32e-02
145
+ 2024-08-06 10:35:31,778 INFO [trainer.py:765] (3/8) Epoch 9, batch 200, train_loss[loss=2.876, ArTop10Accuracy=0.7568, over 13407.00 frames. ], tot_loss[loss=2.854, ArTop10Accuracy=0.7602, over 7749.90 frames. ], batch size: 34, lr: 1.32e-02
146
+ 2024-08-06 10:36:57,933 INFO [trainer.py:765] (3/8) Epoch 9, batch 300, train_loss[loss=2.883, ArTop10Accuracy=0.7558, over 14283.00 frames. ], tot_loss[loss=2.851, ArTop10Accuracy=0.7606, over 9375.68 frames. ], batch size: 44, lr: 1.31e-02
147
+ 2024-08-06 10:38:32,703 INFO [trainer.py:765] (3/8) Epoch 9, batch 400, train_loss[loss=2.786, ArTop10Accuracy=0.7768, over 10629.00 frames. ], tot_loss[loss=2.848, ArTop10Accuracy=0.7612, over 10299.19 frames. ], batch size: 14, lr: 1.31e-02
148
+ 2024-08-06 10:39:59,262 INFO [trainer.py:765] (3/8) Epoch 9, batch 500, train_loss[loss=2.803, ArTop10Accuracy=0.7703, over 12306.00 frames. ], tot_loss[loss=2.844, ArTop10Accuracy=0.7624, over 10831.99 frames. ], batch size: 22, lr: 1.30e-02
149
+ 2024-08-06 10:41:29,696 INFO [trainer.py:765] (3/8) Epoch 9, batch 600, train_loss[loss=2.849, ArTop10Accuracy=0.7581, over 11571.00 frames. ], tot_loss[loss=2.845, ArTop10Accuracy=0.7621, over 11341.92 frames. ], batch size: 18, lr: 1.30e-02
150
+ 2024-08-06 10:42:58,446 INFO [trainer.py:765] (3/8) Epoch 9, batch 700, train_loss[loss=2.93, ArTop10Accuracy=0.7417, over 9555.00 frames. ], tot_loss[loss=2.847, ArTop10Accuracy=0.7619, over 11496.78 frames. ], batch size: 11, lr: 1.29e-02
151
+ 2024-08-06 10:44:02,958 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.039e+02 1.253e+02 1.352e+02 1.493e+02 7.010e+02, threshold=2.704e+02, percent-clipped=0.6
152
+ 2024-08-06 10:44:19,675 INFO [trainer.py:765] (3/8) Epoch 9, batch 800, train_loss[loss=2.76, ArTop10Accuracy=0.7781, over 10053.00 frames. ], tot_loss[loss=2.853, ArTop10Accuracy=0.7607, over 11611.66 frames. ], batch size: 12, lr: 1.29e-02
153
+ 2024-08-06 10:45:35,725 INFO [trainer.py:765] (3/8) Epoch 9, batch 900, train_loss[loss=2.775, ArTop10Accuracy=0.7765, over 13281.00 frames. ], tot_loss[loss=2.845, ArTop10Accuracy=0.7624, over 11670.68 frames. ], batch size: 28, lr: 1.28e-02
154
+ 2024-08-06 10:46:51,277 INFO [trainer.py:765] (3/8) Epoch 9, batch 1000, train_loss[loss=2.849, ArTop10Accuracy=0.7631, over 12930.00 frames. ], tot_loss[loss=2.847, ArTop10Accuracy=0.7619, over 11874.30 frames. ], batch size: 27, lr: 1.28e-02
155
+ 2024-08-06 10:48:06,252 INFO [trainer.py:765] (3/8) Epoch 9, batch 1100, train_loss[loss=2.854, ArTop10Accuracy=0.7604, over 13707.00 frames. ], tot_loss[loss=2.854, ArTop10Accuracy=0.7606, over 11943.12 frames. ], batch size: 34, lr: 1.28e-02
156
+ 2024-08-06 10:49:21,059 INFO [trainer.py:765] (3/8) Epoch 9, batch 1200, train_loss[loss=2.971, ArTop10Accuracy=0.7419, over 12234.00 frames. ], tot_loss[loss=2.856, ArTop10Accuracy=0.76, over 11871.25 frames. ], batch size: 101, lr: 1.27e-02
157
+ 2024-08-06 10:50:21,935 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
158
+ 2024-08-06 10:52:12,332 INFO [trainer.py:765] (3/8) Epoch 10, batch 100, train_loss[loss=2.91, ArTop10Accuracy=0.7482, over 14496.00 frames. ], tot_loss[loss=2.845, ArTop10Accuracy=0.7617, over 4757.99 frames. ], batch size: 62, lr: 1.20e-02
159
+ 2024-08-06 10:53:44,592 INFO [trainer.py:765] (3/8) Epoch 10, batch 200, train_loss[loss=2.72, ArTop10Accuracy=0.79, over 13527.00 frames. ], tot_loss[loss=2.831, ArTop10Accuracy=0.7645, over 7751.44 frames. ], batch size: 34, lr: 1.20e-02
160
+ 2024-08-06 10:55:08,096 INFO [trainer.py:765] (3/8) Epoch 10, batch 300, train_loss[loss=2.869, ArTop10Accuracy=0.757, over 14196.00 frames. ], tot_loss[loss=2.828, ArTop10Accuracy=0.7654, over 9370.30 frames. ], batch size: 44, lr: 1.19e-02
161
+ 2024-08-06 10:56:41,182 INFO [trainer.py:765] (3/8) Epoch 10, batch 400, train_loss[loss=2.822, ArTop10Accuracy=0.7626, over 10914.00 frames. ], tot_loss[loss=2.824, ArTop10Accuracy=0.7661, over 10267.64 frames. ], batch size: 15, lr: 1.19e-02
162
+ 2024-08-06 10:58:04,944 INFO [trainer.py:803] (3/8) Computing validation loss
163
+ 2024-08-06 10:58:14,559 INFO [trainer.py:811] (3/8) Epoch 10, validation: loss=2.842, ArTop10Accuracy=0.7624, over 1827537.00 frames.
164
+ 2024-08-06 10:58:14,560 INFO [trainer.py:814] (3/8) Maximum memory allocated so far is 33011MB
165
+ 2024-08-06 10:58:15,578 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.228e+02 1.320e+02 1.458e+02 6.096e+02, threshold=2.641e+02, percent-clipped=0.6
166
+ 2024-08-06 10:58:15,585 INFO [trainer.py:765] (3/8) Epoch 10, batch 500, train_loss[loss=2.765, ArTop10Accuracy=0.7807, over 12336.00 frames. ], tot_loss[loss=2.819, ArTop10Accuracy=0.7668, over 10858.82 frames. ], batch size: 22, lr: 1.19e-02
167
+ 2024-08-06 10:59:42,820 INFO [trainer.py:765] (3/8) Epoch 10, batch 600, train_loss[loss=2.878, ArTop10Accuracy=0.7531, over 11310.00 frames. ], tot_loss[loss=2.822, ArTop10Accuracy=0.7663, over 11375.27 frames. ], batch size: 18, lr: 1.18e-02
168
+ 2024-08-06 11:01:18,113 INFO [trainer.py:765] (3/8) Epoch 10, batch 700, train_loss[loss=2.746, ArTop10Accuracy=0.7848, over 9285.00 frames. ], tot_loss[loss=2.828, ArTop10Accuracy=0.7654, over 11525.92 frames. ], batch size: 11, lr: 1.18e-02
169
+ 2024-08-06 11:02:36,923 INFO [trainer.py:765] (3/8) Epoch 10, batch 800, train_loss[loss=2.666, ArTop10Accuracy=0.7947, over 10209.00 frames. ], tot_loss[loss=2.829, ArTop10Accuracy=0.7651, over 11652.28 frames. ], batch size: 12, lr: 1.17e-02
170
+ 2024-08-06 11:03:51,217 INFO [trainer.py:765] (3/8) Epoch 10, batch 900, train_loss[loss=2.843, ArTop10Accuracy=0.764, over 12903.00 frames. ], tot_loss[loss=2.824, ArTop10Accuracy=0.766, over 11698.22 frames. ], batch size: 27, lr: 1.17e-02
171
+ 2024-08-06 11:05:06,357 INFO [trainer.py:765] (3/8) Epoch 10, batch 1000, train_loss[loss=2.825, ArTop10Accuracy=0.7681, over 12927.00 frames. ], tot_loss[loss=2.831, ArTop10Accuracy=0.7648, over 11889.20 frames. ], batch size: 27, lr: 1.17e-02
172
+ 2024-08-06 11:06:21,727 INFO [trainer.py:765] (3/8) Epoch 10, batch 1100, train_loss[loss=2.881, ArTop10Accuracy=0.7536, over 13725.00 frames. ], tot_loss[loss=2.837, ArTop10Accuracy=0.7633, over 11982.89 frames. ], batch size: 34, lr: 1.16e-02
173
+ 2024-08-06 11:07:34,778 INFO [trainer.py:765] (3/8) Epoch 10, batch 1200, train_loss[loss=2.931, ArTop10Accuracy=0.7462, over 12060.00 frames. ], tot_loss[loss=2.836, ArTop10Accuracy=0.7637, over 11894.39 frames. ], batch size: 101, lr: 1.16e-02
174
+ 2024-08-06 11:08:33,651 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
175
+ 2024-08-06 11:10:29,961 INFO [trainer.py:765] (3/8) Epoch 11, batch 100, train_loss[loss=2.872, ArTop10Accuracy=0.7564, over 14637.00 frames. ], tot_loss[loss=2.823, ArTop10Accuracy=0.7654, over 4739.55 frames. ], batch size: 63, lr: 1.10e-02
176
+ 2024-08-06 11:12:04,681 INFO [trainer.py:765] (3/8) Epoch 11, batch 200, train_loss[loss=2.874, ArTop10Accuracy=0.755, over 13779.00 frames. ], tot_loss[loss=2.816, ArTop10Accuracy=0.767, over 7746.84 frames. ], batch size: 34, lr: 1.10e-02
177
+ 2024-08-06 11:12:22,831 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 9.884e+01 1.240e+02 1.333e+02 1.457e+02 6.939e+02, threshold=2.667e+02, percent-clipped=0.1
178
+ 2024-08-06 11:13:31,554 INFO [trainer.py:765] (3/8) Epoch 11, batch 300, train_loss[loss=2.886, ArTop10Accuracy=0.7523, over 14241.00 frames. ], tot_loss[loss=2.811, ArTop10Accuracy=0.7679, over 9372.27 frames. ], batch size: 44, lr: 1.09e-02
179
+ 2024-08-06 11:15:03,275 INFO [trainer.py:765] (3/8) Epoch 11, batch 400, train_loss[loss=2.797, ArTop10Accuracy=0.7679, over 10716.00 frames. ], tot_loss[loss=2.809, ArTop10Accuracy=0.7686, over 10280.11 frames. ], batch size: 15, lr: 1.09e-02
180
+ 2024-08-06 11:16:29,643 INFO [trainer.py:765] (3/8) Epoch 11, batch 500, train_loss[loss=2.834, ArTop10Accuracy=0.7655, over 12348.00 frames. ], tot_loss[loss=2.806, ArTop10Accuracy=0.7691, over 10847.00 frames. ], batch size: 22, lr: 1.09e-02
181
+ 2024-08-06 11:18:00,524 INFO [trainer.py:765] (3/8) Epoch 11, batch 600, train_loss[loss=2.729, ArTop10Accuracy=0.7832, over 11418.00 frames. ], tot_loss[loss=2.809, ArTop10Accuracy=0.7687, over 11384.89 frames. ], batch size: 18, lr: 1.08e-02
182
+ 2024-08-06 11:19:34,519 INFO [trainer.py:765] (3/8) Epoch 11, batch 700, train_loss[loss=2.69, ArTop10Accuracy=0.7945, over 10086.00 frames. ], tot_loss[loss=2.815, ArTop10Accuracy=0.7676, over 11518.99 frames. ], batch size: 12, lr: 1.08e-02
183
+ 2024-08-06 11:20:55,489 INFO [trainer.py:765] (3/8) Epoch 11, batch 800, train_loss[loss=2.668, ArTop10Accuracy=0.7995, over 10152.00 frames. ], tot_loss[loss=2.815, ArTop10Accuracy=0.7676, over 11650.91 frames. ], batch size: 12, lr: 1.07e-02
184
+ 2024-08-06 11:22:13,711 INFO [trainer.py:765] (3/8) Epoch 11, batch 900, train_loss[loss=2.8, ArTop10Accuracy=0.7695, over 12996.00 frames. ], tot_loss[loss=2.814, ArTop10Accuracy=0.7678, over 11706.55 frames. ], batch size: 27, lr: 1.07e-02
185
+ 2024-08-06 11:23:31,804 INFO [trainer.py:765] (3/8) Epoch 11, batch 1000, train_loss[loss=2.759, ArTop10Accuracy=0.7778, over 12738.00 frames. ], tot_loss[loss=2.816, ArTop10Accuracy=0.7677, over 11899.04 frames. ], batch size: 27, lr: 1.07e-02
186
+ 2024-08-06 11:24:46,908 INFO [trainer.py:765] (3/8) Epoch 11, batch 1100, train_loss[loss=2.76, ArTop10Accuracy=0.7812, over 13725.00 frames. ], tot_loss[loss=2.82, ArTop10Accuracy=0.7667, over 11968.01 frames. ], batch size: 34, lr: 1.06e-02
187
+ 2024-08-06 11:26:00,739 INFO [trainer.py:765] (3/8) Epoch 11, batch 1200, train_loss[loss=2.928, ArTop10Accuracy=0.7442, over 12543.00 frames. ], tot_loss[loss=2.82, ArTop10Accuracy=0.7664, over 11889.44 frames. ], batch size: 101, lr: 1.06e-02
188
+ 2024-08-06 11:26:15,853 INFO [trainer.py:803] (3/8) Computing validation loss
189
+ 2024-08-06 11:26:25,556 INFO [trainer.py:811] (3/8) Epoch 11, validation: loss=2.831, ArTop10Accuracy=0.7643, over 1827537.00 frames.
190
+ 2024-08-06 11:26:25,556 INFO [trainer.py:814] (3/8) Maximum memory allocated so far is 33011MB
191
+ 2024-08-06 11:26:26,191 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.251e+02 1.335e+02 1.441e+02 2.942e+02, threshold=2.669e+02, percent-clipped=0.1
192
+ 2024-08-06 11:27:09,618 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
193
+ 2024-08-06 11:29:03,457 INFO [trainer.py:765] (3/8) Epoch 12, batch 100, train_loss[loss=2.846, ArTop10Accuracy=0.7631, over 14118.00 frames. ], tot_loss[loss=2.806, ArTop10Accuracy=0.7687, over 4773.56 frames. ], batch size: 62, lr: 1.01e-02
194
+ 2024-08-06 11:30:30,680 INFO [trainer.py:765] (3/8) Epoch 12, batch 200, train_loss[loss=2.78, ArTop10Accuracy=0.7769, over 13701.00 frames. ], tot_loss[loss=2.799, ArTop10Accuracy=0.7701, over 7763.97 frames. ], batch size: 34, lr: 1.01e-02
195
+ 2024-08-06 11:31:57,661 INFO [trainer.py:765] (3/8) Epoch 12, batch 300, train_loss[loss=2.809, ArTop10Accuracy=0.7704, over 14559.00 frames. ], tot_loss[loss=2.795, ArTop10Accuracy=0.7712, over 9355.86 frames. ], batch size: 45, lr: 1.01e-02
196
+ 2024-08-06 11:33:30,744 INFO [trainer.py:765] (3/8) Epoch 12, batch 400, train_loss[loss=2.739, ArTop10Accuracy=0.7841, over 10419.00 frames. ], tot_loss[loss=2.791, ArTop10Accuracy=0.7719, over 10277.50 frames. ], batch size: 14, lr: 1.00e-02
197
+ 2024-08-06 11:34:55,739 INFO [trainer.py:765] (3/8) Epoch 12, batch 500, train_loss[loss=2.723, ArTop10Accuracy=0.7865, over 12318.00 frames. ], tot_loss[loss=2.785, ArTop10Accuracy=0.7731, over 10829.72 frames. ], batch size: 22, lr: 1.00e-02
198
+ 2024-08-06 11:36:29,367 INFO [trainer.py:765] (3/8) Epoch 12, batch 600, train_loss[loss=2.752, ArTop10Accuracy=0.7847, over 12060.00 frames. ], tot_loss[loss=2.787, ArTop10Accuracy=0.7728, over 11358.64 frames. ], batch size: 19, lr: 9.97e-03
199
+ 2024-08-06 11:38:00,349 INFO [trainer.py:765] (3/8) Epoch 12, batch 700, train_loss[loss=2.783, ArTop10Accuracy=0.7784, over 10062.00 frames. ], tot_loss[loss=2.791, ArTop10Accuracy=0.772, over 11511.89 frames. ], batch size: 12, lr: 9.93e-03
200
+ 2024-08-06 11:39:23,617 INFO [trainer.py:765] (3/8) Epoch 12, batch 800, train_loss[loss=2.752, ArTop10Accuracy=0.785, over 9558.00 frames. ], tot_loss[loss=2.797, ArTop10Accuracy=0.7708, over 11639.07 frames. ], batch size: 11, lr: 9.90e-03
201
+ 2024-08-06 11:40:39,895 INFO [trainer.py:765] (3/8) Epoch 12, batch 900, train_loss[loss=2.831, ArTop10Accuracy=0.7677, over 12933.00 frames. ], tot_loss[loss=2.794, ArTop10Accuracy=0.7714, over 11694.97 frames. ], batch size: 27, lr: 9.87e-03
202
+ 2024-08-06 11:41:14,001 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.248e+02 1.348e+02 1.459e+02 5.540e+02, threshold=2.695e+02, percent-clipped=0.3
203
+ 2024-08-06 11:41:56,195 INFO [trainer.py:765] (3/8) Epoch 12, batch 1000, train_loss[loss=2.767, ArTop10Accuracy=0.7761, over 12909.00 frames. ], tot_loss[loss=2.797, ArTop10Accuracy=0.7708, over 11880.84 frames. ], batch size: 27, lr: 9.85e-03
204
+ 2024-08-06 11:43:14,326 INFO [trainer.py:765] (3/8) Epoch 12, batch 1100, train_loss[loss=2.787, ArTop10Accuracy=0.7675, over 13797.00 frames. ], tot_loss[loss=2.802, ArTop10Accuracy=0.77, over 11937.72 frames. ], batch size: 34, lr: 9.82e-03
205
+ 2024-08-06 11:44:26,162 INFO [trainer.py:765] (3/8) Epoch 12, batch 1200, train_loss[loss=2.935, ArTop10Accuracy=0.7424, over 12132.00 frames. ], tot_loss[loss=2.801, ArTop10Accuracy=0.7701, over 11867.60 frames. ], batch size: 101, lr: 9.79e-03
206
+ 2024-08-06 11:45:26,537 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
207
+ 2024-08-06 11:47:26,606 INFO [trainer.py:765] (3/8) Epoch 13, batch 100, train_loss[loss=2.859, ArTop10Accuracy=0.7575, over 14892.00 frames. ], tot_loss[loss=2.794, ArTop10Accuracy=0.7707, over 4740.93 frames. ], batch size: 62, lr: 9.37e-03
208
+ 2024-08-06 11:48:54,785 INFO [trainer.py:765] (3/8) Epoch 13, batch 200, train_loss[loss=2.754, ArTop10Accuracy=0.7835, over 14028.00 frames. ], tot_loss[loss=2.785, ArTop10Accuracy=0.7729, over 7741.31 frames. ], batch size: 35, lr: 9.34e-03
209
+ 2024-08-06 11:50:20,521 INFO [trainer.py:765] (3/8) Epoch 13, batch 300, train_loss[loss=2.751, ArTop10Accuracy=0.7841, over 14265.00 frames. ], tot_loss[loss=2.774, ArTop10Accuracy=0.7751, over 9363.54 frames. ], batch size: 44, lr: 9.31e-03
210
+ 2024-08-06 11:51:48,772 INFO [trainer.py:765] (3/8) Epoch 13, batch 400, train_loss[loss=2.735, ArTop10Accuracy=0.7855, over 10485.00 frames. ], tot_loss[loss=2.776, ArTop10Accuracy=0.775, over 10270.67 frames. ], batch size: 14, lr: 9.28e-03
211
+ 2024-08-06 11:53:13,413 INFO [trainer.py:765] (3/8) Epoch 13, batch 500, train_loss[loss=2.73, ArTop10Accuracy=0.7846, over 12183.00 frames. ], tot_loss[loss=2.776, ArTop10Accuracy=0.7753, over 10846.70 frames. ], batch size: 22, lr: 9.26e-03
212
+ 2024-08-06 11:54:52,229 INFO [trainer.py:765] (3/8) Epoch 13, batch 600, train_loss[loss=2.769, ArTop10Accuracy=0.7767, over 11760.00 frames. ], tot_loss[loss=2.775, ArTop10Accuracy=0.7753, over 11382.98 frames. ], batch size: 19, lr: 9.23e-03
213
+ 2024-08-06 11:55:47,087 INFO [trainer.py:803] (3/8) Computing validation loss
214
+ 2024-08-06 11:55:56,835 INFO [trainer.py:811] (3/8) Epoch 13, validation: loss=2.824, ArTop10Accuracy=0.7662, over 1827537.00 frames.
215
+ 2024-08-06 11:55:56,835 INFO [trainer.py:814] (3/8) Maximum memory allocated so far is 33011MB
216
+ 2024-08-06 11:55:57,718 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.255e+02 1.343e+02 1.452e+02 4.888e+02, threshold=2.687e+02, percent-clipped=0.1
217
+ 2024-08-06 11:56:28,472 INFO [trainer.py:765] (3/8) Epoch 13, batch 700, train_loss[loss=2.726, ArTop10Accuracy=0.7868, over 10173.00 frames. ], tot_loss[loss=2.779, ArTop10Accuracy=0.7747, over 11525.72 frames. ], batch size: 12, lr: 9.20e-03
218
+ 2024-08-06 11:57:46,689 INFO [trainer.py:765] (3/8) Epoch 13, batch 800, train_loss[loss=2.736, ArTop10Accuracy=0.7815, over 10086.00 frames. ], tot_loss[loss=2.784, ArTop10Accuracy=0.7734, over 11612.96 frames. ], batch size: 12, lr: 9.18e-03
219
+ 2024-08-06 11:59:03,292 INFO [trainer.py:765] (3/8) Epoch 13, batch 900, train_loss[loss=2.768, ArTop10Accuracy=0.7799, over 12903.00 frames. ], tot_loss[loss=2.782, ArTop10Accuracy=0.7737, over 11684.55 frames. ], batch size: 27, lr: 9.15e-03
220
+ 2024-08-06 12:00:19,180 INFO [trainer.py:765] (3/8) Epoch 13, batch 1000, train_loss[loss=2.768, ArTop10Accuracy=0.7764, over 12711.00 frames. ], tot_loss[loss=2.786, ArTop10Accuracy=0.773, over 11887.42 frames. ], batch size: 27, lr: 9.13e-03
221
+ 2024-08-06 12:01:34,887 INFO [trainer.py:765] (3/8) Epoch 13, batch 1100, train_loss[loss=2.819, ArTop10Accuracy=0.7671, over 13707.00 frames. ], tot_loss[loss=2.791, ArTop10Accuracy=0.772, over 11953.14 frames. ], batch size: 34, lr: 9.10e-03
222
+ 2024-08-06 12:02:48,671 INFO [trainer.py:765] (3/8) Epoch 13, batch 1200, train_loss[loss=2.951, ArTop10Accuracy=0.7404, over 11715.00 frames. ], tot_loss[loss=2.793, ArTop10Accuracy=0.7714, over 11872.35 frames. ], batch size: 101, lr: 9.08e-03
223
+ 2024-08-06 12:03:48,181 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
224
+ 2024-08-06 12:05:45,340 INFO [trainer.py:765] (3/8) Epoch 14, batch 100, train_loss[loss=2.837, ArTop10Accuracy=0.7671, over 14409.00 frames. ], tot_loss[loss=2.776, ArTop10Accuracy=0.7742, over 4764.27 frames. ], batch size: 62, lr: 8.71e-03
225
+ 2024-08-06 12:07:16,610 INFO [trainer.py:765] (3/8) Epoch 14, batch 200, train_loss[loss=2.752, ArTop10Accuracy=0.7797, over 13509.00 frames. ], tot_loss[loss=2.767, ArTop10Accuracy=0.7761, over 7760.98 frames. ], batch size: 34, lr: 8.69e-03
226
+ 2024-08-06 12:08:44,317 INFO [trainer.py:765] (3/8) Epoch 14, batch 300, train_loss[loss=2.763, ArTop10Accuracy=0.7764, over 14157.00 frames. ], tot_loss[loss=2.762, ArTop10Accuracy=0.7774, over 9387.83 frames. ], batch size: 44, lr: 8.66e-03
227
+ 2024-08-06 12:10:01,137 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.266e+02 1.374e+02 1.483e+02 6.480e+02, threshold=2.748e+02, percent-clipped=0.2
228
+ 2024-08-06 12:10:10,233 INFO [trainer.py:765] (3/8) Epoch 14, batch 400, train_loss[loss=2.688, ArTop10Accuracy=0.7928, over 10413.00 frames. ], tot_loss[loss=2.759, ArTop10Accuracy=0.7781, over 10296.32 frames. ], batch size: 14, lr: 8.64e-03
229
+ 2024-08-06 12:11:36,157 INFO [trainer.py:765] (3/8) Epoch 14, batch 500, train_loss[loss=2.743, ArTop10Accuracy=0.7808, over 12153.00 frames. ], tot_loss[loss=2.759, ArTop10Accuracy=0.7781, over 10850.15 frames. ], batch size: 22, lr: 8.62e-03
230
+ 2024-08-06 12:13:05,999 INFO [trainer.py:765] (3/8) Epoch 14, batch 600, train_loss[loss=2.798, ArTop10Accuracy=0.7717, over 11496.00 frames. ], tot_loss[loss=2.762, ArTop10Accuracy=0.7775, over 11365.20 frames. ], batch size: 18, lr: 8.59e-03
231
+ 2024-08-06 12:14:38,559 INFO [trainer.py:765] (3/8) Epoch 14, batch 700, train_loss[loss=2.733, ArTop10Accuracy=0.7762, over 10191.00 frames. ], tot_loss[loss=2.767, ArTop10Accuracy=0.7765, over 11522.21 frames. ], batch size: 12, lr: 8.57e-03
232
+ 2024-08-06 12:15:58,076 INFO [trainer.py:765] (3/8) Epoch 14, batch 800, train_loss[loss=2.694, ArTop10Accuracy=0.793, over 9315.00 frames. ], tot_loss[loss=2.769, ArTop10Accuracy=0.7762, over 11630.20 frames. ], batch size: 11, lr: 8.55e-03
233
+ 2024-08-06 12:17:12,872 INFO [trainer.py:765] (3/8) Epoch 14, batch 900, train_loss[loss=2.755, ArTop10Accuracy=0.7847, over 12834.00 frames. ], tot_loss[loss=2.766, ArTop10Accuracy=0.7768, over 11689.30 frames. ], batch size: 27, lr: 8.52e-03
234
+ 2024-08-06 12:18:29,620 INFO [trainer.py:765] (3/8) Epoch 14, batch 1000, train_loss[loss=2.773, ArTop10Accuracy=0.7776, over 12876.00 frames. ], tot_loss[loss=2.772, ArTop10Accuracy=0.7755, over 11875.98 frames. ], batch size: 27, lr: 8.50e-03
235
+ 2024-08-06 12:19:45,382 INFO [trainer.py:765] (3/8) Epoch 14, batch 1100, train_loss[loss=2.798, ArTop10Accuracy=0.7736, over 13728.00 frames. ], tot_loss[loss=2.781, ArTop10Accuracy=0.7738, over 11970.38 frames. ], batch size: 34, lr: 8.48e-03
236
+ 2024-08-06 12:20:59,285 INFO [trainer.py:765] (3/8) Epoch 14, batch 1200, train_loss[loss=2.908, ArTop10Accuracy=0.7524, over 12462.00 frames. ], tot_loss[loss=2.783, ArTop10Accuracy=0.7734, over 11864.36 frames. ], batch size: 101, lr: 8.46e-03
237
+ 2024-08-06 12:21:58,393 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
238
+ 2024-08-06 12:23:51,968 INFO [trainer.py:765] (3/8) Epoch 15, batch 100, train_loss[loss=2.804, ArTop10Accuracy=0.7692, over 14889.00 frames. ], tot_loss[loss=2.761, ArTop10Accuracy=0.7771, over 4760.90 frames. ], batch size: 65, lr: 8.14e-03
239
+ 2024-08-06 12:24:00,605 INFO [trainer.py:803] (3/8) Computing validation loss
240
+ 2024-08-06 12:24:10,290 INFO [trainer.py:811] (3/8) Epoch 15, validation: loss=2.819, ArTop10Accuracy=0.7675, over 1827537.00 frames.
241
+ 2024-08-06 12:24:10,291 INFO [trainer.py:814] (3/8) Maximum memory allocated so far is 33011MB
242
+ 2024-08-06 12:24:11,100 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.284e+02 1.371e+02 1.488e+02 4.667e+02, threshold=2.743e+02, percent-clipped=0.2
243
+ 2024-08-06 12:25:29,995 INFO [trainer.py:765] (3/8) Epoch 15, batch 200, train_loss[loss=2.662, ArTop10Accuracy=0.7959, over 13275.00 frames. ], tot_loss[loss=2.752, ArTop10Accuracy=0.7792, over 7758.27 frames. ], batch size: 34, lr: 8.12e-03
244
+ 2024-08-06 12:26:58,702 INFO [trainer.py:765] (3/8) Epoch 15, batch 300, train_loss[loss=2.777, ArTop10Accuracy=0.772, over 13758.00 frames. ], tot_loss[loss=2.75, ArTop10Accuracy=0.7796, over 9380.39 frames. ], batch size: 44, lr: 8.09e-03
245
+ 2024-08-06 12:28:28,540 INFO [trainer.py:765] (3/8) Epoch 15, batch 400, train_loss[loss=2.664, ArTop10Accuracy=0.7956, over 10533.00 frames. ], tot_loss[loss=2.75, ArTop10Accuracy=0.7797, over 10288.58 frames. ], batch size: 14, lr: 8.07e-03
246
+ 2024-08-06 12:29:54,038 INFO [trainer.py:765] (3/8) Epoch 15, batch 500, train_loss[loss=2.802, ArTop10Accuracy=0.7713, over 12066.00 frames. ], tot_loss[loss=2.749, ArTop10Accuracy=0.7799, over 10860.11 frames. ], batch size: 22, lr: 8.05e-03
247
+ 2024-08-06 12:31:23,299 INFO [trainer.py:765] (3/8) Epoch 15, batch 600, train_loss[loss=2.704, ArTop10Accuracy=0.7892, over 11985.00 frames. ], tot_loss[loss=2.751, ArTop10Accuracy=0.7796, over 11365.41 frames. ], batch size: 19, lr: 8.03e-03
248
+ 2024-08-06 12:32:53,182 INFO [trainer.py:765] (3/8) Epoch 15, batch 700, train_loss[loss=2.61, ArTop10Accuracy=0.8052, over 10227.00 frames. ], tot_loss[loss=2.756, ArTop10Accuracy=0.7784, over 11530.22 frames. ], batch size: 12, lr: 8.01e-03
249
+ 2024-08-06 12:34:18,261 INFO [trainer.py:765] (3/8) Epoch 15, batch 800, train_loss[loss=2.733, ArTop10Accuracy=0.7805, over 10134.00 frames. ], tot_loss[loss=2.762, ArTop10Accuracy=0.7775, over 11637.96 frames. ], batch size: 12, lr: 7.99e-03
250
+ 2024-08-06 12:35:34,733 INFO [trainer.py:765] (3/8) Epoch 15, batch 900, train_loss[loss=2.748, ArTop10Accuracy=0.779, over 12927.00 frames. ], tot_loss[loss=2.757, ArTop10Accuracy=0.7785, over 11705.92 frames. ], batch size: 27, lr: 7.97e-03
251
+ 2024-08-06 12:36:50,547 INFO [trainer.py:765] (3/8) Epoch 15, batch 1000, train_loss[loss=2.757, ArTop10Accuracy=0.7794, over 12849.00 frames. ], tot_loss[loss=2.762, ArTop10Accuracy=0.7777, over 11893.13 frames. ], batch size: 27, lr: 7.95e-03
252
+ 2024-08-06 12:38:05,186 INFO [trainer.py:765] (3/8) Epoch 15, batch 1100, train_loss[loss=2.801, ArTop10Accuracy=0.7653, over 13638.00 frames. ], tot_loss[loss=2.767, ArTop10Accuracy=0.7765, over 11952.94 frames. ], batch size: 34, lr: 7.93e-03
253
+ 2024-08-06 12:38:12,847 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.293e+02 1.379e+02 1.467e+02 2.824e+02, threshold=2.759e+02, percent-clipped=0.1
254
+ 2024-08-06 12:39:18,795 INFO [trainer.py:765] (3/8) Epoch 15, batch 1200, train_loss[loss=2.927, ArTop10Accuracy=0.7428, over 12492.00 frames. ], tot_loss[loss=2.768, ArTop10Accuracy=0.7763, over 11874.98 frames. ], batch size: 101, lr: 7.91e-03
255
+ 2024-08-06 12:40:18,968 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
256
+ 2024-08-06 12:42:17,625 INFO [trainer.py:765] (3/8) Epoch 16, batch 100, train_loss[loss=2.855, ArTop10Accuracy=0.7584, over 14046.00 frames. ], tot_loss[loss=2.755, ArTop10Accuracy=0.7784, over 4773.43 frames. ], batch size: 62, lr: 7.63e-03
257
+ 2024-08-06 12:43:49,570 INFO [trainer.py:765] (3/8) Epoch 16, batch 200, train_loss[loss=2.744, ArTop10Accuracy=0.7782, over 13509.00 frames. ], tot_loss[loss=2.741, ArTop10Accuracy=0.7811, over 7745.98 frames. ], batch size: 34, lr: 7.61e-03
258
+ 2024-08-06 12:45:18,508 INFO [trainer.py:765] (3/8) Epoch 16, batch 300, train_loss[loss=2.82, ArTop10Accuracy=0.7672, over 14163.00 frames. ], tot_loss[loss=2.733, ArTop10Accuracy=0.7828, over 9367.42 frames. ], batch size: 44, lr: 7.59e-03
259
+ 2024-08-06 12:46:45,215 INFO [trainer.py:765] (3/8) Epoch 16, batch 400, train_loss[loss=2.704, ArTop10Accuracy=0.7902, over 10095.00 frames. ], tot_loss[loss=2.738, ArTop10Accuracy=0.7819, over 10274.41 frames. ], batch size: 14, lr: 7.58e-03
260
+ 2024-08-06 12:48:16,317 INFO [trainer.py:765] (3/8) Epoch 16, batch 500, train_loss[loss=2.739, ArTop10Accuracy=0.7879, over 12132.00 frames. ], tot_loss[loss=2.735, ArTop10Accuracy=0.7824, over 10839.76 frames. ], batch size: 22, lr: 7.56e-03
261
+ 2024-08-06 12:49:46,648 INFO [trainer.py:765] (3/8) Epoch 16, batch 600, train_loss[loss=2.728, ArTop10Accuracy=0.7856, over 11466.00 frames. ], tot_loss[loss=2.74, ArTop10Accuracy=0.7816, over 11376.36 frames. ], batch size: 18, lr: 7.54e-03
262
+ 2024-08-06 12:51:23,687 INFO [trainer.py:765] (3/8) Epoch 16, batch 700, train_loss[loss=2.741, ArTop10Accuracy=0.7755, over 10176.00 frames. ], tot_loss[loss=2.743, ArTop10Accuracy=0.7809, over 11521.68 frames. ], batch size: 12, lr: 7.52e-03
263
+ 2024-08-06 12:52:43,507 INFO [trainer.py:765] (3/8) Epoch 16, batch 800, train_loss[loss=2.648, ArTop10Accuracy=0.8006, over 9210.00 frames. ], tot_loss[loss=2.748, ArTop10Accuracy=0.7801, over 11645.23 frames. ], batch size: 11, lr: 7.51e-03
264
+ 2024-08-06 12:53:06,022 INFO [trainer.py:803] (3/8) Computing validation loss
265
+ 2024-08-06 12:53:15,496 INFO [trainer.py:811] (3/8) Epoch 16, validation: loss=2.816, ArTop10Accuracy=0.7678, over 1827537.00 frames.
266
+ 2024-08-06 12:53:15,497 INFO [trainer.py:814] (3/8) Maximum memory allocated so far is 33011MB
267
+ 2024-08-06 12:53:16,192 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.291e+02 1.391e+02 1.487e+02 3.459e+02, threshold=2.783e+02, percent-clipped=0.1
268
+ 2024-08-06 12:54:06,486 INFO [trainer.py:765] (3/8) Epoch 16, batch 900, train_loss[loss=2.737, ArTop10Accuracy=0.7819, over 12945.00 frames. ], tot_loss[loss=2.743, ArTop10Accuracy=0.781, over 11677.13 frames. ], batch size: 27, lr: 7.49e-03
269
+ 2024-08-06 12:55:19,797 INFO [trainer.py:765] (3/8) Epoch 16, batch 1000, train_loss[loss=2.689, ArTop10Accuracy=0.7908, over 13293.00 frames. ], tot_loss[loss=2.748, ArTop10Accuracy=0.78, over 11861.96 frames. ], batch size: 28, lr: 7.47e-03
270
+ 2024-08-06 12:56:33,168 INFO [trainer.py:765] (3/8) Epoch 16, batch 1100, train_loss[loss=2.79, ArTop10Accuracy=0.7754, over 13731.00 frames. ], tot_loss[loss=2.757, ArTop10Accuracy=0.7783, over 11946.49 frames. ], batch size: 34, lr: 7.45e-03
271
+ 2024-08-06 12:57:48,491 INFO [trainer.py:765] (3/8) Epoch 16, batch 1200, train_loss[loss=2.893, ArTop10Accuracy=0.7484, over 12243.00 frames. ], tot_loss[loss=2.757, ArTop10Accuracy=0.7782, over 11839.50 frames. ], batch size: 101, lr: 7.44e-03
272
+ 2024-08-06 12:58:48,504 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
273
+ 2024-08-06 13:00:47,906 INFO [trainer.py:765] (3/8) Epoch 17, batch 100, train_loss[loss=2.789, ArTop10Accuracy=0.7714, over 14616.00 frames. ], tot_loss[loss=2.74, ArTop10Accuracy=0.7809, over 4765.07 frames. ], batch size: 63, lr: 7.18e-03
274
+ 2024-08-06 13:02:19,309 INFO [trainer.py:765] (3/8) Epoch 17, batch 200, train_loss[loss=2.732, ArTop10Accuracy=0.7834, over 13602.00 frames. ], tot_loss[loss=2.733, ArTop10Accuracy=0.7823, over 7748.91 frames. ], batch size: 34, lr: 7.17e-03
275
+ 2024-08-06 13:03:45,523 INFO [trainer.py:765] (3/8) Epoch 17, batch 300, train_loss[loss=2.795, ArTop10Accuracy=0.7725, over 13977.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.7836, over 9359.46 frames. ], batch size: 44, lr: 7.15e-03
276
+ 2024-08-06 13:05:21,766 INFO [trainer.py:765] (3/8) Epoch 17, batch 400, train_loss[loss=2.692, ArTop10Accuracy=0.7914, over 10851.00 frames. ], tot_loss[loss=2.729, ArTop10Accuracy=0.7836, over 10278.48 frames. ], batch size: 15, lr: 7.14e-03
277
+ 2024-08-06 13:06:47,027 INFO [trainer.py:765] (3/8) Epoch 17, batch 500, train_loss[loss=2.7, ArTop10Accuracy=0.7846, over 12618.00 frames. ], tot_loss[loss=2.723, ArTop10Accuracy=0.7847, over 10854.47 frames. ], batch size: 23, lr: 7.12e-03
278
+ 2024-08-06 13:07:39,884 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.293e+02 1.386e+02 1.488e+02 3.253e+02, threshold=2.772e+02, percent-clipped=0.1
279
+ 2024-08-06 13:08:22,694 INFO [trainer.py:765] (3/8) Epoch 17, batch 600, train_loss[loss=2.655, ArTop10Accuracy=0.8002, over 11547.00 frames. ], tot_loss[loss=2.726, ArTop10Accuracy=0.784, over 11386.03 frames. ], batch size: 18, lr: 7.10e-03
280
+ 2024-08-06 13:09:54,842 INFO [trainer.py:765] (3/8) Epoch 17, batch 700, train_loss[loss=2.522, ArTop10Accuracy=0.8214, over 10182.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.7836, over 11523.38 frames. ], batch size: 12, lr: 7.09e-03
281
+ 2024-08-06 13:11:19,487 INFO [trainer.py:765] (3/8) Epoch 17, batch 800, train_loss[loss=2.54, ArTop10Accuracy=0.8153, over 10068.00 frames. ], tot_loss[loss=2.732, ArTop10Accuracy=0.7831, over 11645.95 frames. ], batch size: 12, lr: 7.07e-03
282
+ 2024-08-06 13:12:35,676 INFO [trainer.py:765] (3/8) Epoch 17, batch 900, train_loss[loss=2.767, ArTop10Accuracy=0.7784, over 12936.00 frames. ], tot_loss[loss=2.733, ArTop10Accuracy=0.7831, over 11694.16 frames. ], batch size: 27, lr: 7.06e-03
283
+ 2024-08-06 13:13:53,067 INFO [trainer.py:765] (3/8) Epoch 17, batch 1000, train_loss[loss=2.722, ArTop10Accuracy=0.7889, over 12888.00 frames. ], tot_loss[loss=2.739, ArTop10Accuracy=0.782, over 11891.73 frames. ], batch size: 27, lr: 7.04e-03
284
+ 2024-08-06 13:15:08,490 INFO [trainer.py:765] (3/8) Epoch 17, batch 1100, train_loss[loss=2.697, ArTop10Accuracy=0.7897, over 13773.00 frames. ], tot_loss[loss=2.744, ArTop10Accuracy=0.7811, over 11951.73 frames. ], batch size: 34, lr: 7.02e-03
285
+ 2024-08-06 13:16:22,395 INFO [trainer.py:765] (3/8) Epoch 17, batch 1200, train_loss[loss=2.828, ArTop10Accuracy=0.7645, over 12483.00 frames. ], tot_loss[loss=2.746, ArTop10Accuracy=0.7804, over 11856.50 frames. ], batch size: 101, lr: 7.01e-03
286
+ 2024-08-06 13:17:21,345 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
287
+ 2024-08-06 13:19:16,000 INFO [trainer.py:765] (3/8) Epoch 18, batch 100, train_loss[loss=2.748, ArTop10Accuracy=0.7797, over 14040.00 frames. ], tot_loss[loss=2.732, ArTop10Accuracy=0.7818, over 4757.97 frames. ], batch size: 62, lr: 6.78e-03
288
+ 2024-08-06 13:20:46,607 INFO [trainer.py:765] (3/8) Epoch 18, batch 200, train_loss[loss=2.765, ArTop10Accuracy=0.7724, over 13836.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.7828, over 7748.93 frames. ], batch size: 34, lr: 6.77e-03
289
+ 2024-08-06 13:21:55,111 INFO [trainer.py:803] (3/8) Computing validation loss
290
+ 2024-08-06 13:22:04,751 INFO [trainer.py:811] (3/8) Epoch 18, validation: loss=2.817, ArTop10Accuracy=0.768, over 1827537.00 frames.
291
+ 2024-08-06 13:22:04,752 INFO [trainer.py:814] (3/8) Maximum memory allocated so far is 33011MB
292
+ 2024-08-06 13:22:05,480 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.323e+02 1.409e+02 1.514e+02 3.209e+02, threshold=2.818e+02, percent-clipped=0.1
293
+ 2024-08-06 13:22:26,587 INFO [trainer.py:765] (3/8) Epoch 18, batch 300, train_loss[loss=2.719, ArTop10Accuracy=0.7826, over 14019.00 frames. ], tot_loss[loss=2.719, ArTop10Accuracy=0.7851, over 9382.02 frames. ], batch size: 44, lr: 6.76e-03
294
+ 2024-08-06 13:23:57,935 INFO [trainer.py:765] (3/8) Epoch 18, batch 400, train_loss[loss=2.703, ArTop10Accuracy=0.7932, over 10263.00 frames. ], tot_loss[loss=2.714, ArTop10Accuracy=0.7862, over 10301.86 frames. ], batch size: 14, lr: 6.74e-03
295
+ 2024-08-06 13:25:34,019 INFO [trainer.py:765] (3/8) Epoch 18, batch 500, train_loss[loss=2.681, ArTop10Accuracy=0.7939, over 12159.00 frames. ], tot_loss[loss=2.716, ArTop10Accuracy=0.7859, over 10841.49 frames. ], batch size: 22, lr: 6.73e-03
296
+ 2024-08-06 13:27:00,639 INFO [trainer.py:765] (3/8) Epoch 18, batch 600, train_loss[loss=2.633, ArTop10Accuracy=0.8064, over 11547.00 frames. ], tot_loss[loss=2.721, ArTop10Accuracy=0.785, over 11359.83 frames. ], batch size: 18, lr: 6.71e-03
297
+ 2024-08-06 13:28:33,588 INFO [trainer.py:765] (3/8) Epoch 18, batch 700, train_loss[loss=2.561, ArTop10Accuracy=0.814, over 10119.00 frames. ], tot_loss[loss=2.721, ArTop10Accuracy=0.7849, over 11503.18 frames. ], batch size: 12, lr: 6.70e-03
298
+ 2024-08-06 13:29:54,991 INFO [trainer.py:765] (3/8) Epoch 18, batch 800, train_loss[loss=2.603, ArTop10Accuracy=0.8063, over 10134.00 frames. ], tot_loss[loss=2.725, ArTop10Accuracy=0.7842, over 11650.29 frames. ], batch size: 12, lr: 6.68e-03
299
+ 2024-08-06 13:31:12,525 INFO [trainer.py:765] (3/8) Epoch 18, batch 900, train_loss[loss=2.859, ArTop10Accuracy=0.7596, over 12912.00 frames. ], tot_loss[loss=2.721, ArTop10Accuracy=0.7852, over 11680.26 frames. ], batch size: 27, lr: 6.67e-03
300
+ 2024-08-06 13:32:26,557 INFO [trainer.py:765] (3/8) Epoch 18, batch 1000, train_loss[loss=2.705, ArTop10Accuracy=0.7892, over 13044.00 frames. ], tot_loss[loss=2.724, ArTop10Accuracy=0.7844, over 11875.32 frames. ], batch size: 27, lr: 6.66e-03
301
+ 2024-08-06 13:33:41,503 INFO [trainer.py:765] (3/8) Epoch 18, batch 1100, train_loss[loss=2.799, ArTop10Accuracy=0.7727, over 13740.00 frames. ], tot_loss[loss=2.733, ArTop10Accuracy=0.7828, over 11949.67 frames. ], batch size: 34, lr: 6.64e-03
302
+ 2024-08-06 13:34:54,679 INFO [trainer.py:765] (3/8) Epoch 18, batch 1200, train_loss[loss=2.883, ArTop10Accuracy=0.7567, over 12735.00 frames. ], tot_loss[loss=2.738, ArTop10Accuracy=0.7818, over 11873.72 frames. ], batch size: 101, lr: 6.63e-03
303
+ 2024-08-06 13:35:51,070 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.340e+02 1.433e+02 1.533e+02 2.444e+02, threshold=2.867e+02, percent-clipped=0.0
304
+ 2024-08-06 13:35:54,614 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
305
+ 2024-08-06 13:37:48,631 INFO [trainer.py:765] (3/8) Epoch 19, batch 100, train_loss[loss=2.814, ArTop10Accuracy=0.7616, over 14505.00 frames. ], tot_loss[loss=2.714, ArTop10Accuracy=0.786, over 4772.08 frames. ], batch size: 62, lr: 6.43e-03
306
+ 2024-08-06 13:39:23,263 INFO [trainer.py:765] (3/8) Epoch 19, batch 200, train_loss[loss=2.737, ArTop10Accuracy=0.7761, over 13692.00 frames. ], tot_loss[loss=2.714, ArTop10Accuracy=0.786, over 7752.10 frames. ], batch size: 34, lr: 6.41e-03
307
+ 2024-08-06 13:40:48,365 INFO [trainer.py:765] (3/8) Epoch 19, batch 300, train_loss[loss=2.774, ArTop10Accuracy=0.7773, over 13896.00 frames. ], tot_loss[loss=2.71, ArTop10Accuracy=0.7867, over 9365.27 frames. ], batch size: 44, lr: 6.40e-03
308
+ 2024-08-06 13:42:21,074 INFO [trainer.py:765] (3/8) Epoch 19, batch 400, train_loss[loss=2.689, ArTop10Accuracy=0.7877, over 10443.00 frames. ], tot_loss[loss=2.705, ArTop10Accuracy=0.7878, over 10266.11 frames. ], batch size: 14, lr: 6.39e-03
309
+ 2024-08-06 13:43:44,962 INFO [trainer.py:765] (3/8) Epoch 19, batch 500, train_loss[loss=2.773, ArTop10Accuracy=0.7813, over 12192.00 frames. ], tot_loss[loss=2.707, ArTop10Accuracy=0.7875, over 10846.68 frames. ], batch size: 22, lr: 6.37e-03
310
+ 2024-08-06 13:45:16,688 INFO [trainer.py:765] (3/8) Epoch 19, batch 600, train_loss[loss=2.611, ArTop10Accuracy=0.8074, over 11343.00 frames. ], tot_loss[loss=2.71, ArTop10Accuracy=0.7871, over 11365.54 frames. ], batch size: 18, lr: 6.36e-03
311
+ 2024-08-06 13:46:48,330 INFO [trainer.py:765] (3/8) Epoch 19, batch 700, train_loss[loss=2.617, ArTop10Accuracy=0.8064, over 9999.00 frames. ], tot_loss[loss=2.714, ArTop10Accuracy=0.7864, over 11505.08 frames. ], batch size: 12, lr: 6.35e-03
312
+ 2024-08-06 13:48:11,890 INFO [trainer.py:765] (3/8) Epoch 19, batch 800, train_loss[loss=2.594, ArTop10Accuracy=0.8097, over 9999.00 frames. ], tot_loss[loss=2.72, ArTop10Accuracy=0.7854, over 11627.45 frames. ], batch size: 12, lr: 6.34e-03
313
+ 2024-08-06 13:49:27,265 INFO [trainer.py:765] (3/8) Epoch 19, batch 900, train_loss[loss=2.669, ArTop10Accuracy=0.7927, over 13359.00 frames. ], tot_loss[loss=2.714, ArTop10Accuracy=0.7863, over 11681.49 frames. ], batch size: 28, lr: 6.32e-03
314
+ 2024-08-06 13:50:40,660 INFO [trainer.py:803] (3/8) Computing validation loss
315
+ 2024-08-06 13:50:50,537 INFO [trainer.py:811] (3/8) Epoch 19, validation: loss=2.818, ArTop10Accuracy=0.7679, over 1827537.00 frames.
316
+ 2024-08-06 13:50:50,537 INFO [trainer.py:814] (3/8) Maximum memory allocated so far is 33011MB
317
+ 2024-08-06 13:50:51,496 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.371e+02 1.455e+02 1.550e+02 3.697e+02, threshold=2.909e+02, percent-clipped=0.2
318
+ 2024-08-06 13:50:52,922 INFO [trainer.py:765] (3/8) Epoch 19, batch 1000, train_loss[loss=2.747, ArTop10Accuracy=0.782, over 12804.00 frames. ], tot_loss[loss=2.721, ArTop10Accuracy=0.7851, over 11885.30 frames. ], batch size: 27, lr: 6.31e-03
319
+ 2024-08-06 13:52:08,271 INFO [trainer.py:765] (3/8) Epoch 19, batch 1100, train_loss[loss=2.752, ArTop10Accuracy=0.7856, over 13539.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.7839, over 11950.86 frames. ], batch size: 34, lr: 6.30e-03
320
+ 2024-08-06 13:53:22,318 INFO [trainer.py:765] (3/8) Epoch 19, batch 1200, train_loss[loss=2.822, ArTop10Accuracy=0.7666, over 12456.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.7836, over 11872.46 frames. ], batch size: 101, lr: 6.28e-03
321
+ 2024-08-06 13:54:21,909 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
322
+ 2024-08-06 13:56:12,911 INFO [trainer.py:765] (3/8) Epoch 20, batch 100, train_loss[loss=2.75, ArTop10Accuracy=0.7818, over 14421.00 frames. ], tot_loss[loss=2.709, ArTop10Accuracy=0.7869, over 4732.71 frames. ], batch size: 62, lr: 6.10e-03
323
+ 2024-08-06 13:57:42,502 INFO [trainer.py:765] (3/8) Epoch 20, batch 200, train_loss[loss=2.767, ArTop10Accuracy=0.7772, over 13650.00 frames. ], tot_loss[loss=2.704, ArTop10Accuracy=0.788, over 7747.52 frames. ], batch size: 34, lr: 6.09e-03
324
+ 2024-08-06 13:59:15,437 INFO [trainer.py:765] (3/8) Epoch 20, batch 300, train_loss[loss=2.739, ArTop10Accuracy=0.7793, over 14226.00 frames. ], tot_loss[loss=2.697, ArTop10Accuracy=0.7894, over 9362.38 frames. ], batch size: 44, lr: 6.08e-03
325
+ 2024-08-06 14:00:44,363 INFO [trainer.py:765] (3/8) Epoch 20, batch 400, train_loss[loss=2.645, ArTop10Accuracy=0.7971, over 10269.00 frames. ], tot_loss[loss=2.695, ArTop10Accuracy=0.7897, over 10275.36 frames. ], batch size: 14, lr: 6.07e-03
326
+ 2024-08-06 14:02:14,860 INFO [trainer.py:765] (3/8) Epoch 20, batch 500, train_loss[loss=2.717, ArTop10Accuracy=0.786, over 12165.00 frames. ], tot_loss[loss=2.693, ArTop10Accuracy=0.7903, over 10821.64 frames. ], batch size: 22, lr: 6.06e-03
327
+ 2024-08-06 14:03:40,861 INFO [trainer.py:765] (3/8) Epoch 20, batch 600, train_loss[loss=2.75, ArTop10Accuracy=0.78, over 11436.00 frames. ], tot_loss[loss=2.695, ArTop10Accuracy=0.7902, over 11359.02 frames. ], batch size: 18, lr: 6.04e-03
328
+ 2024-08-06 14:05:13,871 INFO [trainer.py:765] (3/8) Epoch 20, batch 700, train_loss[loss=2.699, ArTop10Accuracy=0.7882, over 9402.00 frames. ], tot_loss[loss=2.702, ArTop10Accuracy=0.7887, over 11513.76 frames. ], batch size: 11, lr: 6.03e-03
329
+ 2024-08-06 14:05:30,797 INFO [optim.py:386] (3/8) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.365e+02 1.456e+02 1.550e+02 3.525e+02, threshold=2.913e+02, percent-clipped=0.1
330
+ 2024-08-06 14:06:34,515 INFO [trainer.py:765] (3/8) Epoch 20, batch 800, train_loss[loss=2.575, ArTop10Accuracy=0.8192, over 10212.00 frames. ], tot_loss[loss=2.706, ArTop10Accuracy=0.7877, over 11662.80 frames. ], batch size: 12, lr: 6.02e-03
331
+ 2024-08-06 14:07:50,950 INFO [trainer.py:765] (3/8) Epoch 20, batch 900, train_loss[loss=2.687, ArTop10Accuracy=0.7923, over 12840.00 frames. ], tot_loss[loss=2.703, ArTop10Accuracy=0.7885, over 11704.16 frames. ], batch size: 27, lr: 6.01e-03
332
+ 2024-08-06 14:09:07,179 INFO [trainer.py:765] (3/8) Epoch 20, batch 1000, train_loss[loss=2.767, ArTop10Accuracy=0.776, over 13011.00 frames. ], tot_loss[loss=2.705, ArTop10Accuracy=0.7881, over 11884.06 frames. ], batch size: 27, lr: 6.00e-03
333
+ 2024-08-06 14:10:21,215 INFO [trainer.py:765] (3/8) Epoch 20, batch 1100, train_loss[loss=2.713, ArTop10Accuracy=0.7884, over 13860.00 frames. ], tot_loss[loss=2.712, ArTop10Accuracy=0.7867, over 11958.77 frames. ], batch size: 34, lr: 5.99e-03
334
+ 2024-08-06 14:11:37,819 INFO [trainer.py:765] (3/8) Epoch 20, batch 1200, train_loss[loss=2.817, ArTop10Accuracy=0.7719, over 11877.00 frames. ], tot_loss[loss=2.713, ArTop10Accuracy=0.7866, over 11870.45 frames. ], batch size: 101, lr: 5.98e-03
335
+ 2024-08-06 14:12:36,807 INFO [trainer.py:650] (3/8) Reaches end of dataloader.
336
+ 2024-08-06 14:12:36,810 INFO [trainer.py:1069] (3/8) Done!
libritts-r/log/log-train-2024-08-06-08-06-14-4 ADDED
@@ -0,0 +1,336 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:06:14,318 INFO [trainer.py:870] (4/8) Training started
2
+ 2024-08-06 08:06:14,319 INFO [trainer.py:889] (4/8) Device: cuda:4
3
+ 2024-08-06 08:06:14,319 INFO [trainer.py:890] (4/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:06:14,319 INFO [trainer.py:892] (4/8) About to create model
5
+ 2024-08-06 08:06:15,010 INFO [trainer.py:899] (4/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:06:16,222 INFO [trainer.py:914] (4/8) Using DDP
7
+ 2024-08-06 08:06:19,151 INFO [datamodule.py:427] (4/8) About to get train cuts
8
+ 2024-08-06 08:06:19,153 INFO [datamodule.py:434] (4/8) About to get dev cuts
9
+ 2024-08-06 08:06:19,155 INFO [datamodule.py:292] (4/8) Disable SpecAugment
10
+ 2024-08-06 08:06:19,155 INFO [datamodule.py:294] (4/8) About to create train dataset
11
+ 2024-08-06 08:06:19,155 INFO [datamodule.py:323] (4/8) Using DynamicBucketingSampler
12
+ 2024-08-06 08:06:19,763 INFO [datamodule.py:344] (4/8) About to create train dataloader
13
+ 2024-08-06 08:06:19,763 INFO [datamodule.py:367] (4/8) About to create dev dataset
14
+ 2024-08-06 08:06:20,087 INFO [datamodule.py:388] (4/8) About to create dev dataloader
15
+ 2024-08-06 08:08:02,120 INFO [trainer.py:765] (4/8) Epoch 1, batch 100, train_loss[loss=4.335, ArTop10Accuracy=0.4992, over 14349.00 frames. ], tot_loss[loss=5.058, ArTop10Accuracy=0.3727, over 4771.89 frames. ], batch size: 62, lr: 2.25e-02
16
+ 2024-08-06 08:09:28,827 INFO [trainer.py:765] (4/8) Epoch 1, batch 200, train_loss[loss=4.111, ArTop10Accuracy=0.5308, over 13680.00 frames. ], tot_loss[loss=4.496, ArTop10Accuracy=0.4669, over 7746.35 frames. ], batch size: 34, lr: 3.00e-02
17
+ 2024-08-06 08:10:52,430 INFO [trainer.py:765] (4/8) Epoch 1, batch 300, train_loss[loss=3.881, ArTop10Accuracy=0.5686, over 14085.00 frames. ], tot_loss[loss=4.218, ArTop10Accuracy=0.5129, over 9372.51 frames. ], batch size: 44, lr: 3.00e-02
18
+ 2024-08-06 08:12:12,698 INFO [trainer.py:765] (4/8) Epoch 1, batch 400, train_loss[loss=3.738, ArTop10Accuracy=0.6, over 10314.00 frames. ], tot_loss[loss=4.027, ArTop10Accuracy=0.5454, over 10297.26 frames. ], batch size: 14, lr: 3.00e-02
19
+ 2024-08-06 08:13:40,049 INFO [trainer.py:765] (4/8) Epoch 1, batch 500, train_loss[loss=3.696, ArTop10Accuracy=0.6047, over 12063.00 frames. ], tot_loss[loss=3.883, ArTop10Accuracy=0.5703, over 10855.34 frames. ], batch size: 22, lr: 2.99e-02
20
+ 2024-08-06 08:15:00,242 INFO [trainer.py:765] (4/8) Epoch 1, batch 600, train_loss[loss=3.559, ArTop10Accuracy=0.6271, over 11481.00 frames. ], tot_loss[loss=3.773, ArTop10Accuracy=0.5898, over 11365.23 frames. ], batch size: 18, lr: 2.99e-02
21
+ 2024-08-06 08:16:26,424 INFO [trainer.py:765] (4/8) Epoch 1, batch 700, train_loss[loss=3.57, ArTop10Accuracy=0.6244, over 10320.00 frames. ], tot_loss[loss=3.695, ArTop10Accuracy=0.6037, over 11513.99 frames. ], batch size: 12, lr: 2.99e-02
22
+ 2024-08-06 08:17:43,017 INFO [trainer.py:765] (4/8) Epoch 1, batch 800, train_loss[loss=3.429, ArTop10Accuracy=0.6523, over 9978.00 frames. ], tot_loss[loss=3.627, ArTop10Accuracy=0.6163, over 11645.81 frames. ], batch size: 12, lr: 2.98e-02
23
+ 2024-08-06 08:18:56,150 INFO [trainer.py:765] (4/8) Epoch 1, batch 900, train_loss[loss=3.458, ArTop10Accuracy=0.6464, over 12951.00 frames. ], tot_loss[loss=3.567, ArTop10Accuracy=0.6273, over 11687.10 frames. ], batch size: 27, lr: 2.98e-02
24
+ 2024-08-06 08:20:12,862 INFO [trainer.py:765] (4/8) Epoch 1, batch 1000, train_loss[loss=3.476, ArTop10Accuracy=0.6408, over 13488.00 frames. ], tot_loss[loss=3.524, ArTop10Accuracy=0.635, over 11889.35 frames. ], batch size: 28, lr: 2.97e-02
25
+ 2024-08-06 08:20:13,539 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 9.300e+01 1.871e+02 2.675e+02 4.030e+02 9.119e+03, threshold=5.351e+02, percent-clipped=0.0
26
+ 2024-08-06 08:21:29,154 INFO [trainer.py:765] (4/8) Epoch 1, batch 1100, train_loss[loss=3.469, ArTop10Accuracy=0.6412, over 13692.00 frames. ], tot_loss[loss=3.487, ArTop10Accuracy=0.6416, over 11959.37 frames. ], batch size: 34, lr: 2.96e-02
27
+ 2024-08-06 08:22:45,410 INFO [trainer.py:765] (4/8) Epoch 1, batch 1200, train_loss[loss=3.468, ArTop10Accuracy=0.6428, over 11691.00 frames. ], tot_loss[loss=3.456, ArTop10Accuracy=0.6475, over 11856.20 frames. ], batch size: 101, lr: 2.96e-02
28
+ 2024-08-06 08:23:45,262 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
29
+ 2024-08-06 08:25:36,236 INFO [trainer.py:765] (4/8) Epoch 2, batch 100, train_loss[loss=3.453, ArTop10Accuracy=0.6483, over 14559.00 frames. ], tot_loss[loss=3.419, ArTop10Accuracy=0.6533, over 4753.85 frames. ], batch size: 62, lr: 2.90e-02
30
+ 2024-08-06 08:26:58,955 INFO [trainer.py:765] (4/8) Epoch 2, batch 200, train_loss[loss=3.27, ArTop10Accuracy=0.6853, over 13752.00 frames. ], tot_loss[loss=3.384, ArTop10Accuracy=0.6604, over 7757.10 frames. ], batch size: 34, lr: 2.89e-02
31
+ 2024-08-06 08:28:25,533 INFO [trainer.py:765] (4/8) Epoch 2, batch 300, train_loss[loss=3.402, ArTop10Accuracy=0.6578, over 14046.00 frames. ], tot_loss[loss=3.371, ArTop10Accuracy=0.6631, over 9382.03 frames. ], batch size: 44, lr: 2.89e-02
32
+ 2024-08-06 08:29:48,636 INFO [trainer.py:765] (4/8) Epoch 2, batch 400, train_loss[loss=3.355, ArTop10Accuracy=0.6619, over 10944.00 frames. ], tot_loss[loss=3.358, ArTop10Accuracy=0.6657, over 10312.86 frames. ], batch size: 15, lr: 2.88e-02
33
+ 2024-08-06 08:31:22,902 INFO [trainer.py:765] (4/8) Epoch 2, batch 500, train_loss[loss=3.212, ArTop10Accuracy=0.6956, over 12171.00 frames. ], tot_loss[loss=3.339, ArTop10Accuracy=0.6692, over 10869.97 frames. ], batch size: 22, lr: 2.87e-02
34
+ 2024-08-06 08:32:45,688 INFO [trainer.py:765] (4/8) Epoch 2, batch 600, train_loss[loss=3.308, ArTop10Accuracy=0.6743, over 11418.00 frames. ], tot_loss[loss=3.329, ArTop10Accuracy=0.671, over 11384.57 frames. ], batch size: 18, lr: 2.86e-02
35
+ 2024-08-06 08:34:13,582 INFO [trainer.py:765] (4/8) Epoch 2, batch 700, train_loss[loss=3.313, ArTop10Accuracy=0.6793, over 9951.00 frames. ], tot_loss[loss=3.325, ArTop10Accuracy=0.6719, over 11534.00 frames. ], batch size: 12, lr: 2.85e-02
36
+ 2024-08-06 08:34:31,175 INFO [trainer.py:803] (4/8) Computing validation loss
37
+ 2024-08-06 08:34:40,888 INFO [trainer.py:811] (4/8) Epoch 2, validation: loss=3.277, ArTop10Accuracy=0.6803, over 1827537.00 frames.
38
+ 2024-08-06 08:34:40,889 INFO [trainer.py:814] (4/8) Maximum memory allocated so far is 28695MB
39
+ 2024-08-06 08:34:41,699 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 7.953e+01 1.592e+02 2.200e+02 3.344e+02 2.949e+03, threshold=4.400e+02, percent-clipped=8.6
40
+ 2024-08-06 08:35:39,878 INFO [trainer.py:765] (4/8) Epoch 2, batch 800, train_loss[loss=3.2, ArTop10Accuracy=0.6972, over 9570.00 frames. ], tot_loss[loss=3.318, ArTop10Accuracy=0.6735, over 11652.56 frames. ], batch size: 11, lr: 2.84e-02
41
+ 2024-08-06 08:36:56,371 INFO [trainer.py:765] (4/8) Epoch 2, batch 900, train_loss[loss=3.262, ArTop10Accuracy=0.6776, over 12861.00 frames. ], tot_loss[loss=3.305, ArTop10Accuracy=0.6758, over 11691.96 frames. ], batch size: 27, lr: 2.83e-02
42
+ 2024-08-06 08:38:10,511 INFO [trainer.py:765] (4/8) Epoch 2, batch 1000, train_loss[loss=3.307, ArTop10Accuracy=0.6773, over 13053.00 frames. ], tot_loss[loss=3.299, ArTop10Accuracy=0.677, over 11892.14 frames. ], batch size: 27, lr: 2.82e-02
43
+ 2024-08-06 08:39:25,059 INFO [trainer.py:765] (4/8) Epoch 2, batch 1100, train_loss[loss=3.159, ArTop10Accuracy=0.7048, over 13839.00 frames. ], tot_loss[loss=3.292, ArTop10Accuracy=0.6781, over 11930.52 frames. ], batch size: 34, lr: 2.81e-02
44
+ 2024-08-06 08:40:38,219 INFO [trainer.py:765] (4/8) Epoch 2, batch 1200, train_loss[loss=3.333, ArTop10Accuracy=0.6674, over 13452.00 frames. ], tot_loss[loss=3.283, ArTop10Accuracy=0.6799, over 11836.01 frames. ], batch size: 101, lr: 2.80e-02
45
+ 2024-08-06 08:41:38,601 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
46
+ 2024-08-06 08:43:36,649 INFO [trainer.py:765] (4/8) Epoch 3, batch 100, train_loss[loss=3.256, ArTop10Accuracy=0.6832, over 14394.00 frames. ], tot_loss[loss=3.244, ArTop10Accuracy=0.6866, over 4768.62 frames. ], batch size: 62, lr: 2.67e-02
47
+ 2024-08-06 08:45:10,500 INFO [trainer.py:765] (4/8) Epoch 3, batch 200, train_loss[loss=3.201, ArTop10Accuracy=0.695, over 13674.00 frames. ], tot_loss[loss=3.221, ArTop10Accuracy=0.6908, over 7764.36 frames. ], batch size: 34, lr: 2.66e-02
48
+ 2024-08-06 08:46:29,258 INFO [trainer.py:765] (4/8) Epoch 3, batch 300, train_loss[loss=3.233, ArTop10Accuracy=0.6863, over 14310.00 frames. ], tot_loss[loss=3.207, ArTop10Accuracy=0.6938, over 9365.66 frames. ], batch size: 44, lr: 2.64e-02
49
+ 2024-08-06 08:48:04,219 INFO [trainer.py:765] (4/8) Epoch 3, batch 400, train_loss[loss=3.129, ArTop10Accuracy=0.7089, over 10473.00 frames. ], tot_loss[loss=3.192, ArTop10Accuracy=0.6969, over 10282.72 frames. ], batch size: 14, lr: 2.63e-02
50
+ 2024-08-06 08:48:40,881 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 9.282e+01 1.561e+02 1.981e+02 2.686e+02 1.768e+03, threshold=3.962e+02, percent-clipped=7.6
51
+ 2024-08-06 08:49:25,541 INFO [trainer.py:765] (4/8) Epoch 3, batch 500, train_loss[loss=3.17, ArTop10Accuracy=0.7073, over 12351.00 frames. ], tot_loss[loss=3.172, ArTop10Accuracy=0.7009, over 10852.39 frames. ], batch size: 22, lr: 2.62e-02
52
+ 2024-08-06 08:51:00,477 INFO [trainer.py:765] (4/8) Epoch 3, batch 600, train_loss[loss=3.056, ArTop10Accuracy=0.7223, over 11325.00 frames. ], tot_loss[loss=3.16, ArTop10Accuracy=0.7028, over 11381.83 frames. ], batch size: 18, lr: 2.61e-02
53
+ 2024-08-06 08:52:31,618 INFO [trainer.py:765] (4/8) Epoch 3, batch 700, train_loss[loss=3.058, ArTop10Accuracy=0.7241, over 10176.00 frames. ], tot_loss[loss=3.143, ArTop10Accuracy=0.7062, over 11521.06 frames. ], batch size: 12, lr: 2.60e-02
54
+ 2024-08-06 08:53:57,389 INFO [trainer.py:765] (4/8) Epoch 3, batch 800, train_loss[loss=3.078, ArTop10Accuracy=0.7212, over 9276.00 frames. ], tot_loss[loss=3.137, ArTop10Accuracy=0.7072, over 11622.44 frames. ], batch size: 11, lr: 2.59e-02
55
+ 2024-08-06 08:55:15,119 INFO [trainer.py:765] (4/8) Epoch 3, batch 900, train_loss[loss=3.061, ArTop10Accuracy=0.7179, over 13047.00 frames. ], tot_loss[loss=3.123, ArTop10Accuracy=0.7099, over 11666.34 frames. ], batch size: 27, lr: 2.57e-02
56
+ 2024-08-06 08:56:31,557 INFO [trainer.py:765] (4/8) Epoch 3, batch 1000, train_loss[loss=3.183, ArTop10Accuracy=0.6972, over 12882.00 frames. ], tot_loss[loss=3.112, ArTop10Accuracy=0.7119, over 11867.25 frames. ], batch size: 27, lr: 2.56e-02
57
+ 2024-08-06 08:57:46,506 INFO [trainer.py:765] (4/8) Epoch 3, batch 1100, train_loss[loss=2.998, ArTop10Accuracy=0.7333, over 13554.00 frames. ], tot_loss[loss=3.105, ArTop10Accuracy=0.7132, over 11926.75 frames. ], batch size: 34, lr: 2.55e-02
58
+ 2024-08-06 08:59:01,399 INFO [trainer.py:765] (4/8) Epoch 3, batch 1200, train_loss[loss=3.151, ArTop10Accuracy=0.7024, over 13326.00 frames. ], tot_loss[loss=3.097, ArTop10Accuracy=0.7145, over 11854.55 frames. ], batch size: 101, lr: 2.54e-02
59
+ 2024-08-06 09:00:01,980 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
60
+ 2024-08-06 09:01:50,742 INFO [trainer.py:765] (4/8) Epoch 4, batch 100, train_loss[loss=3.127, ArTop10Accuracy=0.7081, over 14670.00 frames. ], tot_loss[loss=3.065, ArTop10Accuracy=0.7201, over 4761.72 frames. ], batch size: 64, lr: 2.38e-02
61
+ 2024-08-06 09:02:52,859 INFO [trainer.py:803] (4/8) Computing validation loss
62
+ 2024-08-06 09:03:02,384 INFO [trainer.py:811] (4/8) Epoch 4, validation: loss=2.997, ArTop10Accuracy=0.7338, over 1827537.00 frames.
63
+ 2024-08-06 09:03:02,385 INFO [trainer.py:814] (4/8) Maximum memory allocated so far is 29513MB
64
+ 2024-08-06 09:03:03,364 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.499e+02 1.782e+02 2.273e+02 1.100e+03, threshold=3.565e+02, percent-clipped=4.7
65
+ 2024-08-06 09:03:29,273 INFO [trainer.py:765] (4/8) Epoch 4, batch 200, train_loss[loss=3.069, ArTop10Accuracy=0.718, over 13527.00 frames. ], tot_loss[loss=3.041, ArTop10Accuracy=0.7249, over 7754.25 frames. ], batch size: 34, lr: 2.37e-02
66
+ 2024-08-06 09:05:01,733 INFO [trainer.py:765] (4/8) Epoch 4, batch 300, train_loss[loss=3.133, ArTop10Accuracy=0.7066, over 14562.00 frames. ], tot_loss[loss=3.038, ArTop10Accuracy=0.7259, over 9353.08 frames. ], batch size: 45, lr: 2.36e-02
67
+ 2024-08-06 09:06:28,151 INFO [trainer.py:765] (4/8) Epoch 4, batch 400, train_loss[loss=2.94, ArTop10Accuracy=0.7486, over 10116.00 frames. ], tot_loss[loss=3.034, ArTop10Accuracy=0.7265, over 10275.79 frames. ], batch size: 14, lr: 2.34e-02
68
+ 2024-08-06 09:08:01,925 INFO [trainer.py:765] (4/8) Epoch 4, batch 500, train_loss[loss=3.045, ArTop10Accuracy=0.7286, over 12501.00 frames. ], tot_loss[loss=3.029, ArTop10Accuracy=0.7272, over 10828.04 frames. ], batch size: 23, lr: 2.33e-02
69
+ 2024-08-06 09:09:28,540 INFO [trainer.py:765] (4/8) Epoch 4, batch 600, train_loss[loss=2.955, ArTop10Accuracy=0.7423, over 11589.00 frames. ], tot_loss[loss=3.024, ArTop10Accuracy=0.7284, over 11374.57 frames. ], batch size: 18, lr: 2.32e-02
70
+ 2024-08-06 09:10:59,865 INFO [trainer.py:765] (4/8) Epoch 4, batch 700, train_loss[loss=3.009, ArTop10Accuracy=0.7394, over 10125.00 frames. ], tot_loss[loss=3.026, ArTop10Accuracy=0.7277, over 11515.59 frames. ], batch size: 12, lr: 2.31e-02
71
+ 2024-08-06 09:12:17,513 INFO [trainer.py:765] (4/8) Epoch 4, batch 800, train_loss[loss=2.969, ArTop10Accuracy=0.7425, over 9366.00 frames. ], tot_loss[loss=3.022, ArTop10Accuracy=0.7288, over 11640.09 frames. ], batch size: 11, lr: 2.30e-02
72
+ 2024-08-06 09:13:33,212 INFO [trainer.py:765] (4/8) Epoch 4, batch 900, train_loss[loss=2.992, ArTop10Accuracy=0.7306, over 12924.00 frames. ], tot_loss[loss=3.012, ArTop10Accuracy=0.7308, over 11686.06 frames. ], batch size: 27, lr: 2.29e-02
73
+ 2024-08-06 09:14:47,520 INFO [trainer.py:765] (4/8) Epoch 4, batch 1000, train_loss[loss=2.935, ArTop10Accuracy=0.7421, over 12690.00 frames. ], tot_loss[loss=3.011, ArTop10Accuracy=0.7308, over 11873.84 frames. ], batch size: 27, lr: 2.28e-02
74
+ 2024-08-06 09:16:02,982 INFO [trainer.py:765] (4/8) Epoch 4, batch 1100, train_loss[loss=2.965, ArTop10Accuracy=0.741, over 13602.00 frames. ], tot_loss[loss=3.011, ArTop10Accuracy=0.7308, over 11940.21 frames. ], batch size: 34, lr: 2.26e-02
75
+ 2024-08-06 09:16:53,291 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.440e+02 1.636e+02 1.968e+02 7.702e+02, threshold=3.273e+02, percent-clipped=1.3
76
+ 2024-08-06 09:17:18,344 INFO [trainer.py:765] (4/8) Epoch 4, batch 1200, train_loss[loss=3.053, ArTop10Accuracy=0.7237, over 12633.00 frames. ], tot_loss[loss=3.01, ArTop10Accuracy=0.7309, over 11854.12 frames. ], batch size: 101, lr: 2.25e-02
77
+ 2024-08-06 09:18:17,349 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
78
+ 2024-08-06 09:20:17,170 INFO [trainer.py:765] (4/8) Epoch 5, batch 100, train_loss[loss=3.024, ArTop10Accuracy=0.725, over 14337.00 frames. ], tot_loss[loss=2.997, ArTop10Accuracy=0.7326, over 4765.65 frames. ], batch size: 62, lr: 2.10e-02
79
+ 2024-08-06 09:21:52,295 INFO [trainer.py:765] (4/8) Epoch 5, batch 200, train_loss[loss=2.968, ArTop10Accuracy=0.7412, over 13647.00 frames. ], tot_loss[loss=2.972, ArTop10Accuracy=0.7375, over 7764.93 frames. ], batch size: 34, lr: 2.09e-02
80
+ 2024-08-06 09:23:19,240 INFO [trainer.py:765] (4/8) Epoch 5, batch 300, train_loss[loss=3.021, ArTop10Accuracy=0.7283, over 14367.00 frames. ], tot_loss[loss=2.966, ArTop10Accuracy=0.7392, over 9381.01 frames. ], batch size: 45, lr: 2.08e-02
81
+ 2024-08-06 09:24:53,536 INFO [trainer.py:765] (4/8) Epoch 5, batch 400, train_loss[loss=2.943, ArTop10Accuracy=0.7394, over 10296.00 frames. ], tot_loss[loss=2.963, ArTop10Accuracy=0.74, over 10297.28 frames. ], batch size: 14, lr: 2.07e-02
82
+ 2024-08-06 09:26:19,417 INFO [trainer.py:765] (4/8) Epoch 5, batch 500, train_loss[loss=2.9, ArTop10Accuracy=0.7498, over 12066.00 frames. ], tot_loss[loss=2.961, ArTop10Accuracy=0.7404, over 10856.08 frames. ], batch size: 22, lr: 2.06e-02
83
+ 2024-08-06 09:27:49,536 INFO [trainer.py:765] (4/8) Epoch 5, batch 600, train_loss[loss=3.014, ArTop10Accuracy=0.7307, over 11511.00 frames. ], tot_loss[loss=2.963, ArTop10Accuracy=0.7401, over 11385.83 frames. ], batch size: 18, lr: 2.05e-02
84
+ 2024-08-06 09:29:21,669 INFO [trainer.py:765] (4/8) Epoch 5, batch 700, train_loss[loss=2.976, ArTop10Accuracy=0.733, over 9171.00 frames. ], tot_loss[loss=2.967, ArTop10Accuracy=0.7393, over 11512.86 frames. ], batch size: 11, lr: 2.04e-02
85
+ 2024-08-06 09:30:44,692 INFO [trainer.py:765] (4/8) Epoch 5, batch 800, train_loss[loss=2.901, ArTop10Accuracy=0.7489, over 10170.00 frames. ], tot_loss[loss=2.971, ArTop10Accuracy=0.7385, over 11642.97 frames. ], batch size: 12, lr: 2.03e-02
86
+ 2024-08-06 09:31:51,238 INFO [trainer.py:803] (4/8) Computing validation loss
87
+ 2024-08-06 09:32:00,762 INFO [trainer.py:811] (4/8) Epoch 5, validation: loss=2.926, ArTop10Accuracy=0.7466, over 1827537.00 frames.
88
+ 2024-08-06 09:32:00,763 INFO [trainer.py:814] (4/8) Maximum memory allocated so far is 32729MB
89
+ 2024-08-06 09:32:01,708 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.349e+02 1.525e+02 1.806e+02 1.007e+03, threshold=3.049e+02, percent-clipped=2.3
90
+ 2024-08-06 09:32:10,553 INFO [trainer.py:765] (4/8) Epoch 5, batch 900, train_loss[loss=2.998, ArTop10Accuracy=0.7307, over 12870.00 frames. ], tot_loss[loss=2.961, ArTop10Accuracy=0.7404, over 11683.19 frames. ], batch size: 27, lr: 2.02e-02
91
+ 2024-08-06 09:33:27,322 INFO [trainer.py:765] (4/8) Epoch 5, batch 1000, train_loss[loss=2.89, ArTop10Accuracy=0.7553, over 12822.00 frames. ], tot_loss[loss=2.963, ArTop10Accuracy=0.7399, over 11878.40 frames. ], batch size: 27, lr: 2.01e-02
92
+ 2024-08-06 09:34:42,299 INFO [trainer.py:765] (4/8) Epoch 5, batch 1100, train_loss[loss=2.96, ArTop10Accuracy=0.7399, over 13461.00 frames. ], tot_loss[loss=2.961, ArTop10Accuracy=0.7402, over 11949.48 frames. ], batch size: 34, lr: 2.00e-02
93
+ 2024-08-06 09:35:56,331 INFO [trainer.py:765] (4/8) Epoch 5, batch 1200, train_loss[loss=3.116, ArTop10Accuracy=0.7083, over 11610.00 frames. ], tot_loss[loss=2.957, ArTop10Accuracy=0.7409, over 11851.67 frames. ], batch size: 101, lr: 1.99e-02
94
+ 2024-08-06 09:36:55,326 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
95
+ 2024-08-06 09:38:52,664 INFO [trainer.py:765] (4/8) Epoch 6, batch 100, train_loss[loss=3.011, ArTop10Accuracy=0.7297, over 14652.00 frames. ], tot_loss[loss=2.953, ArTop10Accuracy=0.7414, over 4763.22 frames. ], batch size: 63, lr: 1.85e-02
96
+ 2024-08-06 09:40:19,833 INFO [trainer.py:765] (4/8) Epoch 6, batch 200, train_loss[loss=2.924, ArTop10Accuracy=0.749, over 13821.00 frames. ], tot_loss[loss=2.935, ArTop10Accuracy=0.7448, over 7747.45 frames. ], batch size: 34, lr: 1.84e-02
97
+ 2024-08-06 09:41:52,964 INFO [trainer.py:765] (4/8) Epoch 6, batch 300, train_loss[loss=2.895, ArTop10Accuracy=0.7482, over 14133.00 frames. ], tot_loss[loss=2.931, ArTop10Accuracy=0.7456, over 9381.03 frames. ], batch size: 44, lr: 1.83e-02
98
+ 2024-08-06 09:43:17,827 INFO [trainer.py:765] (4/8) Epoch 6, batch 400, train_loss[loss=2.934, ArTop10Accuracy=0.7428, over 10410.00 frames. ], tot_loss[loss=2.927, ArTop10Accuracy=0.7465, over 10294.73 frames. ], batch size: 14, lr: 1.83e-02
99
+ 2024-08-06 09:44:54,127 INFO [trainer.py:765] (4/8) Epoch 6, batch 500, train_loss[loss=2.914, ArTop10Accuracy=0.7514, over 12327.00 frames. ], tot_loss[loss=2.916, ArTop10Accuracy=0.7488, over 10858.88 frames. ], batch size: 22, lr: 1.82e-02
100
+ 2024-08-06 09:46:22,872 INFO [trainer.py:765] (4/8) Epoch 6, batch 600, train_loss[loss=2.957, ArTop10Accuracy=0.7467, over 11445.00 frames. ], tot_loss[loss=2.921, ArTop10Accuracy=0.7477, over 11366.12 frames. ], batch size: 18, lr: 1.81e-02
101
+ 2024-08-06 09:46:37,219 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.012e+02 1.339e+02 1.480e+02 1.701e+02 7.506e+02, threshold=2.959e+02, percent-clipped=1.1
102
+ 2024-08-06 09:47:57,869 INFO [trainer.py:765] (4/8) Epoch 6, batch 700, train_loss[loss=2.881, ArTop10Accuracy=0.7528, over 10191.00 frames. ], tot_loss[loss=2.925, ArTop10Accuracy=0.7467, over 11534.85 frames. ], batch size: 12, lr: 1.80e-02
103
+ 2024-08-06 09:49:15,954 INFO [trainer.py:765] (4/8) Epoch 6, batch 800, train_loss[loss=2.915, ArTop10Accuracy=0.7503, over 9345.00 frames. ], tot_loss[loss=2.927, ArTop10Accuracy=0.7465, over 11644.71 frames. ], batch size: 11, lr: 1.79e-02
104
+ 2024-08-06 09:50:32,134 INFO [trainer.py:765] (4/8) Epoch 6, batch 900, train_loss[loss=2.904, ArTop10Accuracy=0.7514, over 13062.00 frames. ], tot_loss[loss=2.923, ArTop10Accuracy=0.7472, over 11680.49 frames. ], batch size: 27, lr: 1.78e-02
105
+ 2024-08-06 09:51:47,298 INFO [trainer.py:765] (4/8) Epoch 6, batch 1000, train_loss[loss=2.913, ArTop10Accuracy=0.752, over 12957.00 frames. ], tot_loss[loss=2.925, ArTop10Accuracy=0.747, over 11873.10 frames. ], batch size: 27, lr: 1.77e-02
106
+ 2024-08-06 09:53:00,920 INFO [trainer.py:765] (4/8) Epoch 6, batch 1100, train_loss[loss=2.91, ArTop10Accuracy=0.7514, over 13686.00 frames. ], tot_loss[loss=2.931, ArTop10Accuracy=0.7458, over 11944.91 frames. ], batch size: 34, lr: 1.77e-02
107
+ 2024-08-06 09:54:14,336 INFO [trainer.py:765] (4/8) Epoch 6, batch 1200, train_loss[loss=3.056, ArTop10Accuracy=0.7208, over 12609.00 frames. ], tot_loss[loss=2.93, ArTop10Accuracy=0.7461, over 11868.14 frames. ], batch size: 101, lr: 1.76e-02
108
+ 2024-08-06 09:55:13,263 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
109
+ 2024-08-06 09:57:06,699 INFO [trainer.py:765] (4/8) Epoch 7, batch 100, train_loss[loss=2.98, ArTop10Accuracy=0.7353, over 14310.00 frames. ], tot_loss[loss=2.918, ArTop10Accuracy=0.748, over 4748.26 frames. ], batch size: 62, lr: 1.64e-02
110
+ 2024-08-06 09:58:39,426 INFO [trainer.py:765] (4/8) Epoch 7, batch 200, train_loss[loss=2.916, ArTop10Accuracy=0.7468, over 13752.00 frames. ], tot_loss[loss=2.906, ArTop10Accuracy=0.7504, over 7746.19 frames. ], batch size: 34, lr: 1.64e-02
111
+ 2024-08-06 10:00:06,083 INFO [trainer.py:765] (4/8) Epoch 7, batch 300, train_loss[loss=2.979, ArTop10Accuracy=0.7354, over 13800.00 frames. ], tot_loss[loss=2.898, ArTop10Accuracy=0.7517, over 9374.86 frames. ], batch size: 44, lr: 1.63e-02
112
+ 2024-08-06 10:00:40,509 INFO [trainer.py:803] (4/8) Computing validation loss
113
+ 2024-08-06 10:00:50,245 INFO [trainer.py:811] (4/8) Epoch 7, validation: loss=2.88, ArTop10Accuracy=0.7554, over 1827537.00 frames.
114
+ 2024-08-06 10:00:50,246 INFO [trainer.py:814] (4/8) Maximum memory allocated so far is 32729MB
115
+ 2024-08-06 10:00:50,977 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.002e+02 1.286e+02 1.429e+02 1.605e+02 1.020e+03, threshold=2.857e+02, percent-clipped=1.5
116
+ 2024-08-06 10:01:49,118 INFO [trainer.py:765] (4/8) Epoch 7, batch 400, train_loss[loss=2.852, ArTop10Accuracy=0.765, over 10215.00 frames. ], tot_loss[loss=2.896, ArTop10Accuracy=0.7524, over 10288.13 frames. ], batch size: 14, lr: 1.62e-02
117
+ 2024-08-06 10:03:21,457 INFO [trainer.py:765] (4/8) Epoch 7, batch 500, train_loss[loss=2.888, ArTop10Accuracy=0.7599, over 12327.00 frames. ], tot_loss[loss=2.891, ArTop10Accuracy=0.7534, over 10853.35 frames. ], batch size: 22, lr: 1.61e-02
118
+ 2024-08-06 10:04:51,882 INFO [trainer.py:765] (4/8) Epoch 7, batch 600, train_loss[loss=2.909, ArTop10Accuracy=0.7526, over 11847.00 frames. ], tot_loss[loss=2.895, ArTop10Accuracy=0.7529, over 11343.07 frames. ], batch size: 19, lr: 1.61e-02
119
+ 2024-08-06 10:06:25,112 INFO [trainer.py:765] (4/8) Epoch 7, batch 700, train_loss[loss=2.939, ArTop10Accuracy=0.7475, over 9381.00 frames. ], tot_loss[loss=2.896, ArTop10Accuracy=0.7526, over 11502.86 frames. ], batch size: 11, lr: 1.60e-02
120
+ 2024-08-06 10:07:46,948 INFO [trainer.py:765] (4/8) Epoch 7, batch 800, train_loss[loss=2.904, ArTop10Accuracy=0.7507, over 10071.00 frames. ], tot_loss[loss=2.902, ArTop10Accuracy=0.7514, over 11636.93 frames. ], batch size: 12, lr: 1.59e-02
121
+ 2024-08-06 10:09:02,824 INFO [trainer.py:765] (4/8) Epoch 7, batch 900, train_loss[loss=2.835, ArTop10Accuracy=0.7591, over 12993.00 frames. ], tot_loss[loss=2.893, ArTop10Accuracy=0.753, over 11682.69 frames. ], batch size: 27, lr: 1.59e-02
122
+ 2024-08-06 10:10:19,636 INFO [trainer.py:765] (4/8) Epoch 7, batch 1000, train_loss[loss=2.856, ArTop10Accuracy=0.7663, over 12762.00 frames. ], tot_loss[loss=2.898, ArTop10Accuracy=0.7523, over 11896.36 frames. ], batch size: 27, lr: 1.58e-02
123
+ 2024-08-06 10:11:35,208 INFO [trainer.py:765] (4/8) Epoch 7, batch 1100, train_loss[loss=2.936, ArTop10Accuracy=0.7445, over 13755.00 frames. ], tot_loss[loss=2.902, ArTop10Accuracy=0.7512, over 11966.74 frames. ], batch size: 34, lr: 1.57e-02
124
+ 2024-08-06 10:12:48,204 INFO [trainer.py:765] (4/8) Epoch 7, batch 1200, train_loss[loss=3.002, ArTop10Accuracy=0.7326, over 12930.00 frames. ], tot_loss[loss=2.898, ArTop10Accuracy=0.7519, over 11872.20 frames. ], batch size: 101, lr: 1.57e-02
125
+ 2024-08-06 10:13:46,750 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
126
+ 2024-08-06 10:15:03,600 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.017e+02 1.283e+02 1.410e+02 1.601e+02 1.017e+03, threshold=2.820e+02, percent-clipped=0.9
127
+ 2024-08-06 10:15:40,820 INFO [trainer.py:765] (4/8) Epoch 8, batch 100, train_loss[loss=3.008, ArTop10Accuracy=0.7324, over 14205.00 frames. ], tot_loss[loss=2.887, ArTop10Accuracy=0.754, over 4763.79 frames. ], batch size: 62, lr: 1.47e-02
128
+ 2024-08-06 10:17:12,861 INFO [trainer.py:765] (4/8) Epoch 8, batch 200, train_loss[loss=2.874, ArTop10Accuracy=0.7629, over 13785.00 frames. ], tot_loss[loss=2.872, ArTop10Accuracy=0.757, over 7763.58 frames. ], batch size: 34, lr: 1.46e-02
129
+ 2024-08-06 10:18:37,897 INFO [trainer.py:765] (4/8) Epoch 8, batch 300, train_loss[loss=2.891, ArTop10Accuracy=0.7523, over 14205.00 frames. ], tot_loss[loss=2.865, ArTop10Accuracy=0.7584, over 9375.13 frames. ], batch size: 44, lr: 1.46e-02
130
+ 2024-08-06 10:20:06,341 INFO [trainer.py:765] (4/8) Epoch 8, batch 400, train_loss[loss=2.892, ArTop10Accuracy=0.751, over 10953.00 frames. ], tot_loss[loss=2.865, ArTop10Accuracy=0.7583, over 10289.35 frames. ], batch size: 15, lr: 1.45e-02
131
+ 2024-08-06 10:21:32,410 INFO [trainer.py:765] (4/8) Epoch 8, batch 500, train_loss[loss=2.888, ArTop10Accuracy=0.7557, over 12642.00 frames. ], tot_loss[loss=2.859, ArTop10Accuracy=0.7591, over 10849.37 frames. ], batch size: 23, lr: 1.45e-02
132
+ 2024-08-06 10:23:00,973 INFO [trainer.py:765] (4/8) Epoch 8, batch 600, train_loss[loss=2.915, ArTop10Accuracy=0.7512, over 11388.00 frames. ], tot_loss[loss=2.862, ArTop10Accuracy=0.7587, over 11353.14 frames. ], batch size: 18, lr: 1.44e-02
133
+ 2024-08-06 10:24:37,787 INFO [trainer.py:765] (4/8) Epoch 8, batch 700, train_loss[loss=2.855, ArTop10Accuracy=0.7624, over 10257.00 frames. ], tot_loss[loss=2.866, ArTop10Accuracy=0.7579, over 11516.65 frames. ], batch size: 12, lr: 1.43e-02
134
+ 2024-08-06 10:25:56,086 INFO [trainer.py:765] (4/8) Epoch 8, batch 800, train_loss[loss=2.831, ArTop10Accuracy=0.7627, over 10239.00 frames. ], tot_loss[loss=2.873, ArTop10Accuracy=0.7568, over 11657.44 frames. ], batch size: 12, lr: 1.43e-02
135
+ 2024-08-06 10:27:12,244 INFO [trainer.py:765] (4/8) Epoch 8, batch 900, train_loss[loss=2.981, ArTop10Accuracy=0.7387, over 13305.00 frames. ], tot_loss[loss=2.865, ArTop10Accuracy=0.7582, over 11699.99 frames. ], batch size: 28, lr: 1.42e-02
136
+ 2024-08-06 10:28:25,262 INFO [trainer.py:765] (4/8) Epoch 8, batch 1000, train_loss[loss=2.895, ArTop10Accuracy=0.7544, over 13005.00 frames. ], tot_loss[loss=2.871, ArTop10Accuracy=0.7573, over 11892.93 frames. ], batch size: 27, lr: 1.42e-02
137
+ 2024-08-06 10:29:07,154 INFO [trainer.py:803] (4/8) Computing validation loss
138
+ 2024-08-06 10:29:16,831 INFO [trainer.py:811] (4/8) Epoch 8, validation: loss=2.858, ArTop10Accuracy=0.7594, over 1827537.00 frames.
139
+ 2024-08-06 10:29:16,831 INFO [trainer.py:814] (4/8) Maximum memory allocated so far is 32729MB
140
+ 2024-08-06 10:29:17,490 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.275e+02 1.390e+02 1.547e+02 3.717e+02, threshold=2.781e+02, percent-clipped=0.7
141
+ 2024-08-06 10:29:51,731 INFO [trainer.py:765] (4/8) Epoch 8, batch 1100, train_loss[loss=2.842, ArTop10Accuracy=0.762, over 13689.00 frames. ], tot_loss[loss=2.873, ArTop10Accuracy=0.7568, over 11939.37 frames. ], batch size: 34, lr: 1.41e-02
142
+ 2024-08-06 10:31:05,947 INFO [trainer.py:765] (4/8) Epoch 8, batch 1200, train_loss[loss=2.955, ArTop10Accuracy=0.743, over 12402.00 frames. ], tot_loss[loss=2.875, ArTop10Accuracy=0.7565, over 11857.03 frames. ], batch size: 101, lr: 1.40e-02
143
+ 2024-08-06 10:32:05,791 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
144
+ 2024-08-06 10:34:01,255 INFO [trainer.py:765] (4/8) Epoch 9, batch 100, train_loss[loss=2.904, ArTop10Accuracy=0.7568, over 14307.00 frames. ], tot_loss[loss=2.863, ArTop10Accuracy=0.7586, over 4737.05 frames. ], batch size: 62, lr: 1.32e-02
145
+ 2024-08-06 10:35:31,771 INFO [trainer.py:765] (4/8) Epoch 9, batch 200, train_loss[loss=2.822, ArTop10Accuracy=0.7641, over 13818.00 frames. ], tot_loss[loss=2.855, ArTop10Accuracy=0.76, over 7743.59 frames. ], batch size: 35, lr: 1.32e-02
146
+ 2024-08-06 10:36:57,926 INFO [trainer.py:765] (4/8) Epoch 9, batch 300, train_loss[loss=2.909, ArTop10Accuracy=0.7482, over 13983.00 frames. ], tot_loss[loss=2.849, ArTop10Accuracy=0.7611, over 9372.82 frames. ], batch size: 44, lr: 1.31e-02
147
+ 2024-08-06 10:38:32,696 INFO [trainer.py:765] (4/8) Epoch 9, batch 400, train_loss[loss=2.76, ArTop10Accuracy=0.7829, over 10272.00 frames. ], tot_loss[loss=2.849, ArTop10Accuracy=0.7612, over 10289.62 frames. ], batch size: 14, lr: 1.31e-02
148
+ 2024-08-06 10:39:59,255 INFO [trainer.py:765] (4/8) Epoch 9, batch 500, train_loss[loss=2.807, ArTop10Accuracy=0.7688, over 12525.00 frames. ], tot_loss[loss=2.843, ArTop10Accuracy=0.7624, over 10855.66 frames. ], batch size: 23, lr: 1.30e-02
149
+ 2024-08-06 10:41:29,689 INFO [trainer.py:765] (4/8) Epoch 9, batch 600, train_loss[loss=2.761, ArTop10Accuracy=0.7807, over 11481.00 frames. ], tot_loss[loss=2.846, ArTop10Accuracy=0.7619, over 11380.55 frames. ], batch size: 18, lr: 1.30e-02
150
+ 2024-08-06 10:42:58,439 INFO [trainer.py:765] (4/8) Epoch 9, batch 700, train_loss[loss=2.828, ArTop10Accuracy=0.7617, over 10086.00 frames. ], tot_loss[loss=2.849, ArTop10Accuracy=0.7613, over 11524.13 frames. ], batch size: 12, lr: 1.29e-02
151
+ 2024-08-06 10:44:02,952 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.039e+02 1.253e+02 1.352e+02 1.493e+02 7.010e+02, threshold=2.704e+02, percent-clipped=0.6
152
+ 2024-08-06 10:44:19,668 INFO [trainer.py:765] (4/8) Epoch 9, batch 800, train_loss[loss=2.768, ArTop10Accuracy=0.7773, over 10254.00 frames. ], tot_loss[loss=2.852, ArTop10Accuracy=0.7609, over 11649.42 frames. ], batch size: 12, lr: 1.29e-02
153
+ 2024-08-06 10:45:35,718 INFO [trainer.py:765] (4/8) Epoch 9, batch 900, train_loss[loss=2.88, ArTop10Accuracy=0.753, over 13434.00 frames. ], tot_loss[loss=2.847, ArTop10Accuracy=0.7618, over 11670.68 frames. ], batch size: 28, lr: 1.28e-02
154
+ 2024-08-06 10:46:51,270 INFO [trainer.py:765] (4/8) Epoch 9, batch 1000, train_loss[loss=2.869, ArTop10Accuracy=0.7541, over 12966.00 frames. ], tot_loss[loss=2.85, ArTop10Accuracy=0.761, over 11876.51 frames. ], batch size: 27, lr: 1.28e-02
155
+ 2024-08-06 10:48:06,246 INFO [trainer.py:765] (4/8) Epoch 9, batch 1100, train_loss[loss=2.955, ArTop10Accuracy=0.7404, over 13590.00 frames. ], tot_loss[loss=2.856, ArTop10Accuracy=0.7598, over 11951.07 frames. ], batch size: 34, lr: 1.28e-02
156
+ 2024-08-06 10:49:21,052 INFO [trainer.py:765] (4/8) Epoch 9, batch 1200, train_loss[loss=2.974, ArTop10Accuracy=0.7371, over 12891.00 frames. ], tot_loss[loss=2.854, ArTop10Accuracy=0.7605, over 11860.30 frames. ], batch size: 101, lr: 1.27e-02
157
+ 2024-08-06 10:50:22,648 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
158
+ 2024-08-06 10:52:12,326 INFO [trainer.py:765] (4/8) Epoch 10, batch 100, train_loss[loss=2.903, ArTop10Accuracy=0.7494, over 14361.00 frames. ], tot_loss[loss=2.84, ArTop10Accuracy=0.7629, over 4760.61 frames. ], batch size: 62, lr: 1.20e-02
159
+ 2024-08-06 10:53:44,585 INFO [trainer.py:765] (4/8) Epoch 10, batch 200, train_loss[loss=2.808, ArTop10Accuracy=0.7728, over 13860.00 frames. ], tot_loss[loss=2.832, ArTop10Accuracy=0.7645, over 7751.90 frames. ], batch size: 34, lr: 1.20e-02
160
+ 2024-08-06 10:55:08,089 INFO [trainer.py:765] (4/8) Epoch 10, batch 300, train_loss[loss=2.899, ArTop10Accuracy=0.7537, over 14238.00 frames. ], tot_loss[loss=2.829, ArTop10Accuracy=0.765, over 9382.80 frames. ], batch size: 44, lr: 1.19e-02
161
+ 2024-08-06 10:56:41,176 INFO [trainer.py:765] (4/8) Epoch 10, batch 400, train_loss[loss=2.607, ArTop10Accuracy=0.8052, over 10920.00 frames. ], tot_loss[loss=2.825, ArTop10Accuracy=0.7657, over 10285.01 frames. ], batch size: 15, lr: 1.19e-02
162
+ 2024-08-06 10:58:04,937 INFO [trainer.py:803] (4/8) Computing validation loss
163
+ 2024-08-06 10:58:14,559 INFO [trainer.py:811] (4/8) Epoch 10, validation: loss=2.842, ArTop10Accuracy=0.7624, over 1827537.00 frames.
164
+ 2024-08-06 10:58:14,560 INFO [trainer.py:814] (4/8) Maximum memory allocated so far is 32729MB
165
+ 2024-08-06 10:58:15,573 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.228e+02 1.320e+02 1.458e+02 6.096e+02, threshold=2.641e+02, percent-clipped=0.6
166
+ 2024-08-06 10:58:15,577 INFO [trainer.py:765] (4/8) Epoch 10, batch 500, train_loss[loss=2.741, ArTop10Accuracy=0.7833, over 12168.00 frames. ], tot_loss[loss=2.822, ArTop10Accuracy=0.7663, over 10832.02 frames. ], batch size: 22, lr: 1.19e-02
167
+ 2024-08-06 10:59:42,814 INFO [trainer.py:765] (4/8) Epoch 10, batch 600, train_loss[loss=2.833, ArTop10Accuracy=0.7675, over 11478.00 frames. ], tot_loss[loss=2.822, ArTop10Accuracy=0.7663, over 11348.40 frames. ], batch size: 18, lr: 1.18e-02
168
+ 2024-08-06 11:01:18,107 INFO [trainer.py:765] (4/8) Epoch 10, batch 700, train_loss[loss=2.833, ArTop10Accuracy=0.77, over 10155.00 frames. ], tot_loss[loss=2.831, ArTop10Accuracy=0.7646, over 11499.12 frames. ], batch size: 12, lr: 1.18e-02
169
+ 2024-08-06 11:02:36,917 INFO [trainer.py:765] (4/8) Epoch 10, batch 800, train_loss[loss=2.736, ArTop10Accuracy=0.7802, over 9588.00 frames. ], tot_loss[loss=2.834, ArTop10Accuracy=0.764, over 11598.00 frames. ], batch size: 11, lr: 1.17e-02
170
+ 2024-08-06 11:03:51,211 INFO [trainer.py:765] (4/8) Epoch 10, batch 900, train_loss[loss=2.81, ArTop10Accuracy=0.7674, over 12879.00 frames. ], tot_loss[loss=2.829, ArTop10Accuracy=0.7651, over 11668.20 frames. ], batch size: 27, lr: 1.17e-02
171
+ 2024-08-06 11:05:06,351 INFO [trainer.py:765] (4/8) Epoch 10, batch 1000, train_loss[loss=2.774, ArTop10Accuracy=0.7766, over 13230.00 frames. ], tot_loss[loss=2.832, ArTop10Accuracy=0.7643, over 11870.10 frames. ], batch size: 28, lr: 1.17e-02
172
+ 2024-08-06 11:06:21,722 INFO [trainer.py:765] (4/8) Epoch 10, batch 1100, train_loss[loss=2.834, ArTop10Accuracy=0.7654, over 14001.00 frames. ], tot_loss[loss=2.837, ArTop10Accuracy=0.7635, over 11949.67 frames. ], batch size: 34, lr: 1.16e-02
173
+ 2024-08-06 11:07:34,771 INFO [trainer.py:765] (4/8) Epoch 10, batch 1200, train_loss[loss=2.926, ArTop10Accuracy=0.7422, over 12183.00 frames. ], tot_loss[loss=2.839, ArTop10Accuracy=0.7631, over 11860.35 frames. ], batch size: 101, lr: 1.16e-02
174
+ 2024-08-06 11:08:33,545 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
175
+ 2024-08-06 11:10:29,955 INFO [trainer.py:765] (4/8) Epoch 11, batch 100, train_loss[loss=2.894, ArTop10Accuracy=0.7514, over 14163.00 frames. ], tot_loss[loss=2.822, ArTop10Accuracy=0.7656, over 4760.24 frames. ], batch size: 62, lr: 1.10e-02
176
+ 2024-08-06 11:12:04,675 INFO [trainer.py:765] (4/8) Epoch 11, batch 200, train_loss[loss=2.819, ArTop10Accuracy=0.7616, over 13581.00 frames. ], tot_loss[loss=2.816, ArTop10Accuracy=0.7671, over 7747.57 frames. ], batch size: 34, lr: 1.10e-02
177
+ 2024-08-06 11:12:22,826 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 9.884e+01 1.240e+02 1.333e+02 1.457e+02 6.939e+02, threshold=2.667e+02, percent-clipped=0.1
178
+ 2024-08-06 11:13:31,548 INFO [trainer.py:765] (4/8) Epoch 11, batch 300, train_loss[loss=2.827, ArTop10Accuracy=0.7685, over 14136.00 frames. ], tot_loss[loss=2.805, ArTop10Accuracy=0.7695, over 9352.68 frames. ], batch size: 44, lr: 1.09e-02
179
+ 2024-08-06 11:15:03,269 INFO [trainer.py:765] (4/8) Epoch 11, batch 400, train_loss[loss=2.65, ArTop10Accuracy=0.7958, over 10311.00 frames. ], tot_loss[loss=2.805, ArTop10Accuracy=0.7695, over 10269.57 frames. ], batch size: 14, lr: 1.09e-02
180
+ 2024-08-06 11:16:29,637 INFO [trainer.py:765] (4/8) Epoch 11, batch 500, train_loss[loss=2.803, ArTop10Accuracy=0.7683, over 12186.00 frames. ], tot_loss[loss=2.799, ArTop10Accuracy=0.7709, over 10871.11 frames. ], batch size: 22, lr: 1.09e-02
181
+ 2024-08-06 11:18:00,517 INFO [trainer.py:765] (4/8) Epoch 11, batch 600, train_loss[loss=2.703, ArTop10Accuracy=0.7925, over 11367.00 frames. ], tot_loss[loss=2.802, ArTop10Accuracy=0.7702, over 11379.50 frames. ], batch size: 18, lr: 1.08e-02
182
+ 2024-08-06 11:19:34,514 INFO [trainer.py:765] (4/8) Epoch 11, batch 700, train_loss[loss=2.706, ArTop10Accuracy=0.7954, over 10206.00 frames. ], tot_loss[loss=2.805, ArTop10Accuracy=0.7696, over 11531.85 frames. ], batch size: 12, lr: 1.08e-02
183
+ 2024-08-06 11:20:55,484 INFO [trainer.py:765] (4/8) Epoch 11, batch 800, train_loss[loss=2.785, ArTop10Accuracy=0.7746, over 10131.00 frames. ], tot_loss[loss=2.812, ArTop10Accuracy=0.7683, over 11652.45 frames. ], batch size: 12, lr: 1.07e-02
184
+ 2024-08-06 11:22:13,705 INFO [trainer.py:765] (4/8) Epoch 11, batch 900, train_loss[loss=2.852, ArTop10Accuracy=0.7595, over 12939.00 frames. ], tot_loss[loss=2.808, ArTop10Accuracy=0.7692, over 11699.34 frames. ], batch size: 27, lr: 1.07e-02
185
+ 2024-08-06 11:23:31,799 INFO [trainer.py:765] (4/8) Epoch 11, batch 1000, train_loss[loss=2.784, ArTop10Accuracy=0.776, over 12765.00 frames. ], tot_loss[loss=2.811, ArTop10Accuracy=0.7685, over 11909.07 frames. ], batch size: 27, lr: 1.07e-02
186
+ 2024-08-06 11:24:46,902 INFO [trainer.py:765] (4/8) Epoch 11, batch 1100, train_loss[loss=2.783, ArTop10Accuracy=0.7739, over 13785.00 frames. ], tot_loss[loss=2.82, ArTop10Accuracy=0.7666, over 11994.73 frames. ], batch size: 34, lr: 1.06e-02
187
+ 2024-08-06 11:26:00,733 INFO [trainer.py:765] (4/8) Epoch 11, batch 1200, train_loss[loss=2.906, ArTop10Accuracy=0.7499, over 12528.00 frames. ], tot_loss[loss=2.821, ArTop10Accuracy=0.7665, over 11900.31 frames. ], batch size: 101, lr: 1.06e-02
188
+ 2024-08-06 11:26:15,847 INFO [trainer.py:803] (4/8) Computing validation loss
189
+ 2024-08-06 11:26:25,556 INFO [trainer.py:811] (4/8) Epoch 11, validation: loss=2.831, ArTop10Accuracy=0.7643, over 1827537.00 frames.
190
+ 2024-08-06 11:26:25,557 INFO [trainer.py:814] (4/8) Maximum memory allocated so far is 32729MB
191
+ 2024-08-06 11:26:26,185 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.251e+02 1.335e+02 1.441e+02 2.942e+02, threshold=2.669e+02, percent-clipped=0.1
192
+ 2024-08-06 11:27:09,520 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
193
+ 2024-08-06 11:29:03,450 INFO [trainer.py:765] (4/8) Epoch 12, batch 100, train_loss[loss=2.851, ArTop10Accuracy=0.7621, over 14574.00 frames. ], tot_loss[loss=2.803, ArTop10Accuracy=0.7693, over 4761.92 frames. ], batch size: 62, lr: 1.01e-02
194
+ 2024-08-06 11:30:30,674 INFO [trainer.py:765] (4/8) Epoch 12, batch 200, train_loss[loss=2.84, ArTop10Accuracy=0.7634, over 13653.00 frames. ], tot_loss[loss=2.801, ArTop10Accuracy=0.7697, over 7757.17 frames. ], batch size: 34, lr: 1.01e-02
195
+ 2024-08-06 11:31:57,655 INFO [trainer.py:765] (4/8) Epoch 12, batch 300, train_loss[loss=2.84, ArTop10Accuracy=0.7657, over 14268.00 frames. ], tot_loss[loss=2.795, ArTop10Accuracy=0.7713, over 9378.27 frames. ], batch size: 44, lr: 1.01e-02
196
+ 2024-08-06 11:33:30,739 INFO [trainer.py:765] (4/8) Epoch 12, batch 400, train_loss[loss=2.648, ArTop10Accuracy=0.7979, over 10299.00 frames. ], tot_loss[loss=2.793, ArTop10Accuracy=0.7716, over 10283.00 frames. ], batch size: 14, lr: 1.00e-02
197
+ 2024-08-06 11:34:55,733 INFO [trainer.py:765] (4/8) Epoch 12, batch 500, train_loss[loss=2.764, ArTop10Accuracy=0.7742, over 12129.00 frames. ], tot_loss[loss=2.79, ArTop10Accuracy=0.7722, over 10856.75 frames. ], batch size: 22, lr: 1.00e-02
198
+ 2024-08-06 11:36:29,363 INFO [trainer.py:765] (4/8) Epoch 12, batch 600, train_loss[loss=2.737, ArTop10Accuracy=0.7859, over 11379.00 frames. ], tot_loss[loss=2.792, ArTop10Accuracy=0.7718, over 11376.60 frames. ], batch size: 18, lr: 9.97e-03
199
+ 2024-08-06 11:38:00,343 INFO [trainer.py:765] (4/8) Epoch 12, batch 700, train_loss[loss=2.838, ArTop10Accuracy=0.7632, over 10191.00 frames. ], tot_loss[loss=2.796, ArTop10Accuracy=0.771, over 11525.72 frames. ], batch size: 12, lr: 9.93e-03
200
+ 2024-08-06 11:39:23,610 INFO [trainer.py:765] (4/8) Epoch 12, batch 800, train_loss[loss=2.668, ArTop10Accuracy=0.7935, over 10128.00 frames. ], tot_loss[loss=2.799, ArTop10Accuracy=0.7706, over 11638.44 frames. ], batch size: 12, lr: 9.90e-03
201
+ 2024-08-06 11:40:39,889 INFO [trainer.py:765] (4/8) Epoch 12, batch 900, train_loss[loss=2.768, ArTop10Accuracy=0.7778, over 12996.00 frames. ], tot_loss[loss=2.794, ArTop10Accuracy=0.7715, over 11692.40 frames. ], batch size: 27, lr: 9.87e-03
202
+ 2024-08-06 11:41:13,995 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.248e+02 1.348e+02 1.459e+02 5.540e+02, threshold=2.695e+02, percent-clipped=0.3
203
+ 2024-08-06 11:41:56,188 INFO [trainer.py:765] (4/8) Epoch 12, batch 1000, train_loss[loss=2.773, ArTop10Accuracy=0.7733, over 12681.00 frames. ], tot_loss[loss=2.797, ArTop10Accuracy=0.7706, over 11884.29 frames. ], batch size: 27, lr: 9.85e-03
204
+ 2024-08-06 11:43:14,321 INFO [trainer.py:765] (4/8) Epoch 12, batch 1100, train_loss[loss=2.818, ArTop10Accuracy=0.7713, over 13422.00 frames. ], tot_loss[loss=2.804, ArTop10Accuracy=0.7694, over 11954.58 frames. ], batch size: 34, lr: 9.82e-03
205
+ 2024-08-06 11:44:26,156 INFO [trainer.py:765] (4/8) Epoch 12, batch 1200, train_loss[loss=2.938, ArTop10Accuracy=0.7448, over 12807.00 frames. ], tot_loss[loss=2.805, ArTop10Accuracy=0.7695, over 11862.54 frames. ], batch size: 101, lr: 9.79e-03
206
+ 2024-08-06 11:45:26,431 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
207
+ 2024-08-06 11:47:26,599 INFO [trainer.py:765] (4/8) Epoch 13, batch 100, train_loss[loss=2.825, ArTop10Accuracy=0.766, over 14238.00 frames. ], tot_loss[loss=2.792, ArTop10Accuracy=0.7713, over 4764.73 frames. ], batch size: 62, lr: 9.37e-03
208
+ 2024-08-06 11:48:54,778 INFO [trainer.py:765] (4/8) Epoch 13, batch 200, train_loss[loss=2.834, ArTop10Accuracy=0.7646, over 13965.00 frames. ], tot_loss[loss=2.783, ArTop10Accuracy=0.773, over 7763.78 frames. ], batch size: 35, lr: 9.34e-03
209
+ 2024-08-06 11:50:20,514 INFO [trainer.py:765] (4/8) Epoch 13, batch 300, train_loss[loss=2.807, ArTop10Accuracy=0.7683, over 14352.00 frames. ], tot_loss[loss=2.779, ArTop10Accuracy=0.7743, over 9392.08 frames. ], batch size: 44, lr: 9.31e-03
210
+ 2024-08-06 11:51:48,764 INFO [trainer.py:765] (4/8) Epoch 13, batch 400, train_loss[loss=2.714, ArTop10Accuracy=0.785, over 10140.00 frames. ], tot_loss[loss=2.777, ArTop10Accuracy=0.7749, over 10285.29 frames. ], batch size: 14, lr: 9.28e-03
211
+ 2024-08-06 11:53:13,406 INFO [trainer.py:765] (4/8) Epoch 13, batch 500, train_loss[loss=2.727, ArTop10Accuracy=0.7851, over 12174.00 frames. ], tot_loss[loss=2.769, ArTop10Accuracy=0.7765, over 10845.60 frames. ], batch size: 22, lr: 9.26e-03
212
+ 2024-08-06 11:54:52,222 INFO [trainer.py:765] (4/8) Epoch 13, batch 600, train_loss[loss=2.715, ArTop10Accuracy=0.7804, over 11475.00 frames. ], tot_loss[loss=2.776, ArTop10Accuracy=0.775, over 11351.00 frames. ], batch size: 18, lr: 9.23e-03
213
+ 2024-08-06 11:55:47,080 INFO [trainer.py:803] (4/8) Computing validation loss
214
+ 2024-08-06 11:55:56,835 INFO [trainer.py:811] (4/8) Epoch 13, validation: loss=2.824, ArTop10Accuracy=0.7662, over 1827537.00 frames.
215
+ 2024-08-06 11:55:56,835 INFO [trainer.py:814] (4/8) Maximum memory allocated so far is 32729MB
216
+ 2024-08-06 11:55:57,711 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.255e+02 1.343e+02 1.452e+02 4.888e+02, threshold=2.687e+02, percent-clipped=0.1
217
+ 2024-08-06 11:56:28,464 INFO [trainer.py:765] (4/8) Epoch 13, batch 700, train_loss[loss=2.75, ArTop10Accuracy=0.7831, over 10083.00 frames. ], tot_loss[loss=2.777, ArTop10Accuracy=0.7747, over 11501.17 frames. ], batch size: 12, lr: 9.20e-03
218
+ 2024-08-06 11:57:46,682 INFO [trainer.py:765] (4/8) Epoch 13, batch 800, train_loss[loss=2.675, ArTop10Accuracy=0.7886, over 10248.00 frames. ], tot_loss[loss=2.779, ArTop10Accuracy=0.7744, over 11619.44 frames. ], batch size: 12, lr: 9.18e-03
219
+ 2024-08-06 11:59:03,286 INFO [trainer.py:765] (4/8) Epoch 13, batch 900, train_loss[loss=2.771, ArTop10Accuracy=0.7767, over 12915.00 frames. ], tot_loss[loss=2.775, ArTop10Accuracy=0.7752, over 11675.58 frames. ], batch size: 27, lr: 9.15e-03
220
+ 2024-08-06 12:00:19,173 INFO [trainer.py:765] (4/8) Epoch 13, batch 1000, train_loss[loss=2.828, ArTop10Accuracy=0.7686, over 12804.00 frames. ], tot_loss[loss=2.785, ArTop10Accuracy=0.7734, over 11872.36 frames. ], batch size: 27, lr: 9.13e-03
221
+ 2024-08-06 12:01:34,880 INFO [trainer.py:765] (4/8) Epoch 13, batch 1100, train_loss[loss=2.804, ArTop10Accuracy=0.7651, over 13485.00 frames. ], tot_loss[loss=2.79, ArTop10Accuracy=0.7723, over 11951.26 frames. ], batch size: 34, lr: 9.10e-03
222
+ 2024-08-06 12:02:48,662 INFO [trainer.py:765] (4/8) Epoch 13, batch 1200, train_loss[loss=2.902, ArTop10Accuracy=0.7484, over 12114.00 frames. ], tot_loss[loss=2.79, ArTop10Accuracy=0.7723, over 11865.52 frames. ], batch size: 101, lr: 9.08e-03
223
+ 2024-08-06 12:03:48,339 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
224
+ 2024-08-06 12:05:45,333 INFO [trainer.py:765] (4/8) Epoch 14, batch 100, train_loss[loss=2.835, ArTop10Accuracy=0.762, over 14472.00 frames. ], tot_loss[loss=2.776, ArTop10Accuracy=0.7746, over 4782.62 frames. ], batch size: 62, lr: 8.71e-03
225
+ 2024-08-06 12:07:16,602 INFO [trainer.py:765] (4/8) Epoch 14, batch 200, train_loss[loss=2.814, ArTop10Accuracy=0.7662, over 13683.00 frames. ], tot_loss[loss=2.773, ArTop10Accuracy=0.7753, over 7786.17 frames. ], batch size: 34, lr: 8.69e-03
226
+ 2024-08-06 12:08:44,310 INFO [trainer.py:765] (4/8) Epoch 14, batch 300, train_loss[loss=2.756, ArTop10Accuracy=0.7761, over 14625.00 frames. ], tot_loss[loss=2.766, ArTop10Accuracy=0.7768, over 9408.24 frames. ], batch size: 45, lr: 8.66e-03
227
+ 2024-08-06 12:10:01,130 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.266e+02 1.374e+02 1.483e+02 6.480e+02, threshold=2.748e+02, percent-clipped=0.2
228
+ 2024-08-06 12:10:10,225 INFO [trainer.py:765] (4/8) Epoch 14, batch 400, train_loss[loss=2.663, ArTop10Accuracy=0.7999, over 10509.00 frames. ], tot_loss[loss=2.765, ArTop10Accuracy=0.7769, over 10317.67 frames. ], batch size: 14, lr: 8.64e-03
229
+ 2024-08-06 12:11:36,149 INFO [trainer.py:765] (4/8) Epoch 14, batch 500, train_loss[loss=2.836, ArTop10Accuracy=0.7666, over 12150.00 frames. ], tot_loss[loss=2.764, ArTop10Accuracy=0.7771, over 10867.61 frames. ], batch size: 22, lr: 8.62e-03
230
+ 2024-08-06 12:13:05,992 INFO [trainer.py:765] (4/8) Epoch 14, batch 600, train_loss[loss=2.737, ArTop10Accuracy=0.7799, over 11397.00 frames. ], tot_loss[loss=2.766, ArTop10Accuracy=0.777, over 11388.73 frames. ], batch size: 18, lr: 8.59e-03
231
+ 2024-08-06 12:14:38,553 INFO [trainer.py:765] (4/8) Epoch 14, batch 700, train_loss[loss=2.761, ArTop10Accuracy=0.7818, over 9318.00 frames. ], tot_loss[loss=2.771, ArTop10Accuracy=0.7759, over 11531.31 frames. ], batch size: 11, lr: 8.57e-03
232
+ 2024-08-06 12:15:58,068 INFO [trainer.py:765] (4/8) Epoch 14, batch 800, train_loss[loss=2.574, ArTop10Accuracy=0.8119, over 10068.00 frames. ], tot_loss[loss=2.774, ArTop10Accuracy=0.7752, over 11637.37 frames. ], batch size: 12, lr: 8.55e-03
233
+ 2024-08-06 12:17:12,864 INFO [trainer.py:765] (4/8) Epoch 14, batch 900, train_loss[loss=2.758, ArTop10Accuracy=0.7791, over 13287.00 frames. ], tot_loss[loss=2.767, ArTop10Accuracy=0.7766, over 11696.62 frames. ], batch size: 28, lr: 8.52e-03
234
+ 2024-08-06 12:18:29,613 INFO [trainer.py:765] (4/8) Epoch 14, batch 1000, train_loss[loss=2.746, ArTop10Accuracy=0.7813, over 12909.00 frames. ], tot_loss[loss=2.771, ArTop10Accuracy=0.7758, over 11892.84 frames. ], batch size: 27, lr: 8.50e-03
235
+ 2024-08-06 12:19:45,375 INFO [trainer.py:765] (4/8) Epoch 14, batch 1100, train_loss[loss=2.739, ArTop10Accuracy=0.7804, over 13647.00 frames. ], tot_loss[loss=2.775, ArTop10Accuracy=0.7752, over 11926.48 frames. ], batch size: 34, lr: 8.48e-03
236
+ 2024-08-06 12:20:59,277 INFO [trainer.py:765] (4/8) Epoch 14, batch 1200, train_loss[loss=2.904, ArTop10Accuracy=0.7477, over 12768.00 frames. ], tot_loss[loss=2.774, ArTop10Accuracy=0.7754, over 11863.44 frames. ], batch size: 101, lr: 8.46e-03
237
+ 2024-08-06 12:21:58,313 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
238
+ 2024-08-06 12:23:51,961 INFO [trainer.py:765] (4/8) Epoch 15, batch 100, train_loss[loss=2.757, ArTop10Accuracy=0.7769, over 14058.00 frames. ], tot_loss[loss=2.763, ArTop10Accuracy=0.7767, over 4741.67 frames. ], batch size: 62, lr: 8.14e-03
239
+ 2024-08-06 12:24:00,599 INFO [trainer.py:803] (4/8) Computing validation loss
240
+ 2024-08-06 12:24:10,290 INFO [trainer.py:811] (4/8) Epoch 15, validation: loss=2.819, ArTop10Accuracy=0.7675, over 1827537.00 frames.
241
+ 2024-08-06 12:24:10,291 INFO [trainer.py:814] (4/8) Maximum memory allocated so far is 32729MB
242
+ 2024-08-06 12:24:11,094 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.284e+02 1.371e+02 1.488e+02 4.667e+02, threshold=2.743e+02, percent-clipped=0.2
243
+ 2024-08-06 12:25:29,988 INFO [trainer.py:765] (4/8) Epoch 15, batch 200, train_loss[loss=2.727, ArTop10Accuracy=0.7861, over 13497.00 frames. ], tot_loss[loss=2.756, ArTop10Accuracy=0.7786, over 7747.87 frames. ], batch size: 34, lr: 8.12e-03
244
+ 2024-08-06 12:26:58,694 INFO [trainer.py:765] (4/8) Epoch 15, batch 300, train_loss[loss=2.79, ArTop10Accuracy=0.7734, over 14127.00 frames. ], tot_loss[loss=2.755, ArTop10Accuracy=0.7789, over 9366.22 frames. ], batch size: 44, lr: 8.09e-03
245
+ 2024-08-06 12:28:28,533 INFO [trainer.py:765] (4/8) Epoch 15, batch 400, train_loss[loss=2.737, ArTop10Accuracy=0.7757, over 10197.00 frames. ], tot_loss[loss=2.75, ArTop10Accuracy=0.7798, over 10275.21 frames. ], batch size: 14, lr: 8.07e-03
246
+ 2024-08-06 12:29:54,032 INFO [trainer.py:765] (4/8) Epoch 15, batch 500, train_loss[loss=2.684, ArTop10Accuracy=0.7923, over 11910.00 frames. ], tot_loss[loss=2.745, ArTop10Accuracy=0.7806, over 10839.43 frames. ], batch size: 22, lr: 8.05e-03
247
+ 2024-08-06 12:31:23,292 INFO [trainer.py:765] (4/8) Epoch 15, batch 600, train_loss[loss=2.711, ArTop10Accuracy=0.7829, over 11328.00 frames. ], tot_loss[loss=2.751, ArTop10Accuracy=0.7795, over 11360.31 frames. ], batch size: 18, lr: 8.03e-03
248
+ 2024-08-06 12:32:53,175 INFO [trainer.py:765] (4/8) Epoch 15, batch 700, train_loss[loss=2.798, ArTop10Accuracy=0.7657, over 9354.00 frames. ], tot_loss[loss=2.755, ArTop10Accuracy=0.7787, over 11509.82 frames. ], batch size: 11, lr: 8.01e-03
249
+ 2024-08-06 12:34:18,254 INFO [trainer.py:765] (4/8) Epoch 15, batch 800, train_loss[loss=2.694, ArTop10Accuracy=0.7858, over 9429.00 frames. ], tot_loss[loss=2.759, ArTop10Accuracy=0.7778, over 11617.64 frames. ], batch size: 11, lr: 7.99e-03
250
+ 2024-08-06 12:35:34,726 INFO [trainer.py:765] (4/8) Epoch 15, batch 900, train_loss[loss=2.779, ArTop10Accuracy=0.7811, over 13008.00 frames. ], tot_loss[loss=2.754, ArTop10Accuracy=0.7789, over 11663.27 frames. ], batch size: 27, lr: 7.97e-03
251
+ 2024-08-06 12:36:50,540 INFO [trainer.py:765] (4/8) Epoch 15, batch 1000, train_loss[loss=2.755, ArTop10Accuracy=0.7811, over 12786.00 frames. ], tot_loss[loss=2.758, ArTop10Accuracy=0.7782, over 11867.55 frames. ], batch size: 27, lr: 7.95e-03
252
+ 2024-08-06 12:38:05,179 INFO [trainer.py:765] (4/8) Epoch 15, batch 1100, train_loss[loss=2.727, ArTop10Accuracy=0.781, over 13656.00 frames. ], tot_loss[loss=2.765, ArTop10Accuracy=0.7768, over 11960.61 frames. ], batch size: 34, lr: 7.93e-03
253
+ 2024-08-06 12:38:12,841 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.293e+02 1.379e+02 1.467e+02 2.824e+02, threshold=2.759e+02, percent-clipped=0.1
254
+ 2024-08-06 12:39:18,788 INFO [trainer.py:765] (4/8) Epoch 15, batch 1200, train_loss[loss=2.875, ArTop10Accuracy=0.7581, over 12324.00 frames. ], tot_loss[loss=2.767, ArTop10Accuracy=0.7764, over 11867.43 frames. ], batch size: 101, lr: 7.91e-03
255
+ 2024-08-06 12:40:18,729 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
256
+ 2024-08-06 12:42:17,617 INFO [trainer.py:765] (4/8) Epoch 16, batch 100, train_loss[loss=2.72, ArTop10Accuracy=0.7843, over 14628.00 frames. ], tot_loss[loss=2.754, ArTop10Accuracy=0.7785, over 4756.83 frames. ], batch size: 63, lr: 7.63e-03
257
+ 2024-08-06 12:43:49,563 INFO [trainer.py:765] (4/8) Epoch 16, batch 200, train_loss[loss=2.772, ArTop10Accuracy=0.7808, over 13596.00 frames. ], tot_loss[loss=2.748, ArTop10Accuracy=0.7796, over 7758.14 frames. ], batch size: 34, lr: 7.61e-03
258
+ 2024-08-06 12:45:18,501 INFO [trainer.py:765] (4/8) Epoch 16, batch 300, train_loss[loss=2.786, ArTop10Accuracy=0.7746, over 14376.00 frames. ], tot_loss[loss=2.742, ArTop10Accuracy=0.7808, over 9384.75 frames. ], batch size: 44, lr: 7.59e-03
259
+ 2024-08-06 12:46:45,207 INFO [trainer.py:765] (4/8) Epoch 16, batch 400, train_loss[loss=2.673, ArTop10Accuracy=0.7931, over 10800.00 frames. ], tot_loss[loss=2.738, ArTop10Accuracy=0.7816, over 10273.40 frames. ], batch size: 15, lr: 7.58e-03
260
+ 2024-08-06 12:48:16,309 INFO [trainer.py:765] (4/8) Epoch 16, batch 500, train_loss[loss=2.668, ArTop10Accuracy=0.7959, over 12543.00 frames. ], tot_loss[loss=2.733, ArTop10Accuracy=0.7828, over 10823.89 frames. ], batch size: 23, lr: 7.56e-03
261
+ 2024-08-06 12:49:46,641 INFO [trainer.py:765] (4/8) Epoch 16, batch 600, train_loss[loss=2.696, ArTop10Accuracy=0.7945, over 11832.00 frames. ], tot_loss[loss=2.739, ArTop10Accuracy=0.7818, over 11356.77 frames. ], batch size: 19, lr: 7.54e-03
262
+ 2024-08-06 12:51:23,681 INFO [trainer.py:765] (4/8) Epoch 16, batch 700, train_loss[loss=2.622, ArTop10Accuracy=0.8066, over 9279.00 frames. ], tot_loss[loss=2.742, ArTop10Accuracy=0.7812, over 11496.79 frames. ], batch size: 11, lr: 7.52e-03
263
+ 2024-08-06 12:52:43,500 INFO [trainer.py:765] (4/8) Epoch 16, batch 800, train_loss[loss=2.665, ArTop10Accuracy=0.7968, over 9534.00 frames. ], tot_loss[loss=2.748, ArTop10Accuracy=0.7802, over 11622.91 frames. ], batch size: 11, lr: 7.51e-03
264
+ 2024-08-06 12:53:06,015 INFO [trainer.py:803] (4/8) Computing validation loss
265
+ 2024-08-06 12:53:15,497 INFO [trainer.py:811] (4/8) Epoch 16, validation: loss=2.816, ArTop10Accuracy=0.7678, over 1827537.00 frames.
266
+ 2024-08-06 12:53:15,497 INFO [trainer.py:814] (4/8) Maximum memory allocated so far is 32729MB
267
+ 2024-08-06 12:53:16,186 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.291e+02 1.391e+02 1.487e+02 3.459e+02, threshold=2.783e+02, percent-clipped=0.1
268
+ 2024-08-06 12:54:06,480 INFO [trainer.py:765] (4/8) Epoch 16, batch 900, train_loss[loss=2.758, ArTop10Accuracy=0.7755, over 12792.00 frames. ], tot_loss[loss=2.743, ArTop10Accuracy=0.7814, over 11673.14 frames. ], batch size: 27, lr: 7.49e-03
269
+ 2024-08-06 12:55:19,790 INFO [trainer.py:765] (4/8) Epoch 16, batch 1000, train_loss[loss=2.729, ArTop10Accuracy=0.7823, over 12786.00 frames. ], tot_loss[loss=2.748, ArTop10Accuracy=0.7803, over 11883.80 frames. ], batch size: 27, lr: 7.47e-03
270
+ 2024-08-06 12:56:33,162 INFO [trainer.py:765] (4/8) Epoch 16, batch 1100, train_loss[loss=2.841, ArTop10Accuracy=0.761, over 13731.00 frames. ], tot_loss[loss=2.755, ArTop10Accuracy=0.7788, over 11965.31 frames. ], batch size: 34, lr: 7.45e-03
271
+ 2024-08-06 12:57:48,484 INFO [trainer.py:765] (4/8) Epoch 16, batch 1200, train_loss[loss=2.889, ArTop10Accuracy=0.7509, over 13242.00 frames. ], tot_loss[loss=2.758, ArTop10Accuracy=0.7784, over 11864.74 frames. ], batch size: 101, lr: 7.44e-03
272
+ 2024-08-06 12:58:48,452 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
273
+ 2024-08-06 13:00:47,899 INFO [trainer.py:765] (4/8) Epoch 17, batch 100, train_loss[loss=2.808, ArTop10Accuracy=0.7735, over 14139.00 frames. ], tot_loss[loss=2.737, ArTop10Accuracy=0.782, over 4762.26 frames. ], batch size: 62, lr: 7.18e-03
274
+ 2024-08-06 13:02:19,301 INFO [trainer.py:765] (4/8) Epoch 17, batch 200, train_loss[loss=2.696, ArTop10Accuracy=0.7905, over 13575.00 frames. ], tot_loss[loss=2.731, ArTop10Accuracy=0.783, over 7754.67 frames. ], batch size: 34, lr: 7.17e-03
275
+ 2024-08-06 13:03:45,516 INFO [trainer.py:765] (4/8) Epoch 17, batch 300, train_loss[loss=2.778, ArTop10Accuracy=0.774, over 14085.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.7836, over 9361.00 frames. ], batch size: 44, lr: 7.15e-03
276
+ 2024-08-06 13:05:21,759 INFO [trainer.py:765] (4/8) Epoch 17, batch 400, train_loss[loss=2.697, ArTop10Accuracy=0.7889, over 10224.00 frames. ], tot_loss[loss=2.729, ArTop10Accuracy=0.7835, over 10286.59 frames. ], batch size: 14, lr: 7.14e-03
277
+ 2024-08-06 13:06:47,020 INFO [trainer.py:765] (4/8) Epoch 17, batch 500, train_loss[loss=2.703, ArTop10Accuracy=0.7954, over 12390.00 frames. ], tot_loss[loss=2.722, ArTop10Accuracy=0.7849, over 10860.98 frames. ], batch size: 23, lr: 7.12e-03
278
+ 2024-08-06 13:07:39,878 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.293e+02 1.386e+02 1.488e+02 3.253e+02, threshold=2.772e+02, percent-clipped=0.1
279
+ 2024-08-06 13:08:22,687 INFO [trainer.py:765] (4/8) Epoch 17, batch 600, train_loss[loss=2.644, ArTop10Accuracy=0.8019, over 11319.00 frames. ], tot_loss[loss=2.725, ArTop10Accuracy=0.7842, over 11399.75 frames. ], batch size: 18, lr: 7.10e-03
280
+ 2024-08-06 13:09:54,835 INFO [trainer.py:765] (4/8) Epoch 17, batch 700, train_loss[loss=2.647, ArTop10Accuracy=0.7977, over 9441.00 frames. ], tot_loss[loss=2.732, ArTop10Accuracy=0.7829, over 11531.09 frames. ], batch size: 11, lr: 7.09e-03
281
+ 2024-08-06 13:11:19,480 INFO [trainer.py:765] (4/8) Epoch 17, batch 800, train_loss[loss=2.671, ArTop10Accuracy=0.7933, over 9414.00 frames. ], tot_loss[loss=2.736, ArTop10Accuracy=0.7824, over 11649.00 frames. ], batch size: 11, lr: 7.07e-03
282
+ 2024-08-06 13:12:35,669 INFO [trainer.py:765] (4/8) Epoch 17, batch 900, train_loss[loss=2.667, ArTop10Accuracy=0.7941, over 12930.00 frames. ], tot_loss[loss=2.731, ArTop10Accuracy=0.7833, over 11681.78 frames. ], batch size: 27, lr: 7.06e-03
283
+ 2024-08-06 13:13:53,061 INFO [trainer.py:765] (4/8) Epoch 17, batch 1000, train_loss[loss=2.74, ArTop10Accuracy=0.7801, over 13290.00 frames. ], tot_loss[loss=2.738, ArTop10Accuracy=0.7819, over 11875.05 frames. ], batch size: 28, lr: 7.04e-03
284
+ 2024-08-06 13:15:08,483 INFO [trainer.py:765] (4/8) Epoch 17, batch 1100, train_loss[loss=2.772, ArTop10Accuracy=0.7688, over 13890.00 frames. ], tot_loss[loss=2.746, ArTop10Accuracy=0.7805, over 11955.85 frames. ], batch size: 34, lr: 7.02e-03
285
+ 2024-08-06 13:16:22,387 INFO [trainer.py:765] (4/8) Epoch 17, batch 1200, train_loss[loss=2.87, ArTop10Accuracy=0.7565, over 12078.00 frames. ], tot_loss[loss=2.745, ArTop10Accuracy=0.7806, over 11841.53 frames. ], batch size: 101, lr: 7.01e-03
286
+ 2024-08-06 13:17:21,505 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
287
+ 2024-08-06 13:19:15,993 INFO [trainer.py:765] (4/8) Epoch 18, batch 100, train_loss[loss=2.768, ArTop10Accuracy=0.7747, over 14724.00 frames. ], tot_loss[loss=2.726, ArTop10Accuracy=0.7841, over 4762.63 frames. ], batch size: 62, lr: 6.78e-03
288
+ 2024-08-06 13:20:46,601 INFO [trainer.py:765] (4/8) Epoch 18, batch 200, train_loss[loss=2.718, ArTop10Accuracy=0.7839, over 13710.00 frames. ], tot_loss[loss=2.72, ArTop10Accuracy=0.7852, over 7740.19 frames. ], batch size: 34, lr: 6.77e-03
289
+ 2024-08-06 13:21:55,104 INFO [trainer.py:803] (4/8) Computing validation loss
290
+ 2024-08-06 13:22:04,751 INFO [trainer.py:811] (4/8) Epoch 18, validation: loss=2.817, ArTop10Accuracy=0.768, over 1827537.00 frames.
291
+ 2024-08-06 13:22:04,752 INFO [trainer.py:814] (4/8) Maximum memory allocated so far is 32729MB
292
+ 2024-08-06 13:22:05,473 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.323e+02 1.409e+02 1.514e+02 3.209e+02, threshold=2.818e+02, percent-clipped=0.1
293
+ 2024-08-06 13:22:26,580 INFO [trainer.py:765] (4/8) Epoch 18, batch 300, train_loss[loss=2.816, ArTop10Accuracy=0.7642, over 14331.00 frames. ], tot_loss[loss=2.719, ArTop10Accuracy=0.7853, over 9360.96 frames. ], batch size: 45, lr: 6.76e-03
294
+ 2024-08-06 13:23:57,929 INFO [trainer.py:765] (4/8) Epoch 18, batch 400, train_loss[loss=2.63, ArTop10Accuracy=0.8021, over 10269.00 frames. ], tot_loss[loss=2.719, ArTop10Accuracy=0.7853, over 10295.59 frames. ], batch size: 14, lr: 6.74e-03
295
+ 2024-08-06 13:25:34,012 INFO [trainer.py:765] (4/8) Epoch 18, batch 500, train_loss[loss=2.756, ArTop10Accuracy=0.7784, over 12132.00 frames. ], tot_loss[loss=2.718, ArTop10Accuracy=0.7855, over 10847.92 frames. ], batch size: 22, lr: 6.73e-03
296
+ 2024-08-06 13:27:00,633 INFO [trainer.py:765] (4/8) Epoch 18, batch 600, train_loss[loss=2.646, ArTop10Accuracy=0.8065, over 11325.00 frames. ], tot_loss[loss=2.719, ArTop10Accuracy=0.7854, over 11377.58 frames. ], batch size: 18, lr: 6.71e-03
297
+ 2024-08-06 13:28:33,581 INFO [trainer.py:765] (4/8) Epoch 18, batch 700, train_loss[loss=2.732, ArTop10Accuracy=0.7826, over 10032.00 frames. ], tot_loss[loss=2.721, ArTop10Accuracy=0.785, over 11521.88 frames. ], batch size: 12, lr: 6.70e-03
298
+ 2024-08-06 13:29:54,984 INFO [trainer.py:765] (4/8) Epoch 18, batch 800, train_loss[loss=2.64, ArTop10Accuracy=0.804, over 9444.00 frames. ], tot_loss[loss=2.725, ArTop10Accuracy=0.7844, over 11624.46 frames. ], batch size: 11, lr: 6.68e-03
299
+ 2024-08-06 13:31:12,518 INFO [trainer.py:765] (4/8) Epoch 18, batch 900, train_loss[loss=2.733, ArTop10Accuracy=0.7867, over 13254.00 frames. ], tot_loss[loss=2.722, ArTop10Accuracy=0.7851, over 11690.93 frames. ], batch size: 28, lr: 6.67e-03
300
+ 2024-08-06 13:32:26,550 INFO [trainer.py:765] (4/8) Epoch 18, batch 1000, train_loss[loss=2.754, ArTop10Accuracy=0.7792, over 12873.00 frames. ], tot_loss[loss=2.729, ArTop10Accuracy=0.7838, over 11892.74 frames. ], batch size: 27, lr: 6.66e-03
301
+ 2024-08-06 13:33:41,496 INFO [trainer.py:765] (4/8) Epoch 18, batch 1100, train_loss[loss=2.731, ArTop10Accuracy=0.7878, over 13854.00 frames. ], tot_loss[loss=2.734, ArTop10Accuracy=0.7828, over 11966.56 frames. ], batch size: 34, lr: 6.64e-03
302
+ 2024-08-06 13:34:54,673 INFO [trainer.py:765] (4/8) Epoch 18, batch 1200, train_loss[loss=2.876, ArTop10Accuracy=0.755, over 11688.00 frames. ], tot_loss[loss=2.733, ArTop10Accuracy=0.7828, over 11879.62 frames. ], batch size: 103, lr: 6.63e-03
303
+ 2024-08-06 13:35:51,064 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.340e+02 1.433e+02 1.533e+02 2.444e+02, threshold=2.867e+02, percent-clipped=0.0
304
+ 2024-08-06 13:35:54,218 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
305
+ 2024-08-06 13:37:48,623 INFO [trainer.py:765] (4/8) Epoch 19, batch 100, train_loss[loss=2.786, ArTop10Accuracy=0.773, over 14562.00 frames. ], tot_loss[loss=2.709, ArTop10Accuracy=0.7871, over 4763.34 frames. ], batch size: 62, lr: 6.43e-03
306
+ 2024-08-06 13:39:23,256 INFO [trainer.py:765] (4/8) Epoch 19, batch 200, train_loss[loss=2.706, ArTop10Accuracy=0.782, over 13527.00 frames. ], tot_loss[loss=2.711, ArTop10Accuracy=0.7867, over 7744.85 frames. ], batch size: 34, lr: 6.41e-03
307
+ 2024-08-06 13:40:48,358 INFO [trainer.py:765] (4/8) Epoch 19, batch 300, train_loss[loss=2.735, ArTop10Accuracy=0.7868, over 14472.00 frames. ], tot_loss[loss=2.71, ArTop10Accuracy=0.7871, over 9377.25 frames. ], batch size: 46, lr: 6.40e-03
308
+ 2024-08-06 13:42:21,067 INFO [trainer.py:765] (4/8) Epoch 19, batch 400, train_loss[loss=2.586, ArTop10Accuracy=0.8117, over 10197.00 frames. ], tot_loss[loss=2.703, ArTop10Accuracy=0.7883, over 10290.26 frames. ], batch size: 14, lr: 6.39e-03
309
+ 2024-08-06 13:43:44,954 INFO [trainer.py:765] (4/8) Epoch 19, batch 500, train_loss[loss=2.667, ArTop10Accuracy=0.7974, over 12102.00 frames. ], tot_loss[loss=2.697, ArTop10Accuracy=0.7896, over 10853.44 frames. ], batch size: 22, lr: 6.37e-03
310
+ 2024-08-06 13:45:16,681 INFO [trainer.py:765] (4/8) Epoch 19, batch 600, train_loss[loss=2.625, ArTop10Accuracy=0.8068, over 11361.00 frames. ], tot_loss[loss=2.703, ArTop10Accuracy=0.7886, over 11367.34 frames. ], batch size: 18, lr: 6.36e-03
311
+ 2024-08-06 13:46:48,324 INFO [trainer.py:765] (4/8) Epoch 19, batch 700, train_loss[loss=2.687, ArTop10Accuracy=0.7831, over 10386.00 frames. ], tot_loss[loss=2.713, ArTop10Accuracy=0.7867, over 11508.06 frames. ], batch size: 12, lr: 6.35e-03
312
+ 2024-08-06 13:48:11,883 INFO [trainer.py:765] (4/8) Epoch 19, batch 800, train_loss[loss=2.696, ArTop10Accuracy=0.7907, over 10185.00 frames. ], tot_loss[loss=2.717, ArTop10Accuracy=0.7858, over 11635.75 frames. ], batch size: 12, lr: 6.34e-03
313
+ 2024-08-06 13:49:27,258 INFO [trainer.py:765] (4/8) Epoch 19, batch 900, train_loss[loss=2.68, ArTop10Accuracy=0.7937, over 12957.00 frames. ], tot_loss[loss=2.711, ArTop10Accuracy=0.7868, over 11686.07 frames. ], batch size: 27, lr: 6.32e-03
314
+ 2024-08-06 13:50:40,653 INFO [trainer.py:803] (4/8) Computing validation loss
315
+ 2024-08-06 13:50:50,537 INFO [trainer.py:811] (4/8) Epoch 19, validation: loss=2.818, ArTop10Accuracy=0.7679, over 1827537.00 frames.
316
+ 2024-08-06 13:50:50,537 INFO [trainer.py:814] (4/8) Maximum memory allocated so far is 32729MB
317
+ 2024-08-06 13:50:51,489 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.371e+02 1.455e+02 1.550e+02 3.697e+02, threshold=2.909e+02, percent-clipped=0.2
318
+ 2024-08-06 13:50:52,915 INFO [trainer.py:765] (4/8) Epoch 19, batch 1000, train_loss[loss=2.761, ArTop10Accuracy=0.7747, over 12699.00 frames. ], tot_loss[loss=2.72, ArTop10Accuracy=0.7853, over 11884.10 frames. ], batch size: 27, lr: 6.31e-03
319
+ 2024-08-06 13:52:08,265 INFO [trainer.py:765] (4/8) Epoch 19, batch 1100, train_loss[loss=2.701, ArTop10Accuracy=0.7904, over 13695.00 frames. ], tot_loss[loss=2.724, ArTop10Accuracy=0.7845, over 11953.71 frames. ], batch size: 34, lr: 6.30e-03
320
+ 2024-08-06 13:53:22,313 INFO [trainer.py:765] (4/8) Epoch 19, batch 1200, train_loss[loss=2.831, ArTop10Accuracy=0.7577, over 12249.00 frames. ], tot_loss[loss=2.726, ArTop10Accuracy=0.7842, over 11861.23 frames. ], batch size: 101, lr: 6.28e-03
321
+ 2024-08-06 13:54:21,708 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
322
+ 2024-08-06 13:56:12,907 INFO [trainer.py:765] (4/8) Epoch 20, batch 100, train_loss[loss=2.789, ArTop10Accuracy=0.7679, over 14760.00 frames. ], tot_loss[loss=2.713, ArTop10Accuracy=0.7857, over 4756.56 frames. ], batch size: 62, lr: 6.10e-03
323
+ 2024-08-06 13:57:42,497 INFO [trainer.py:765] (4/8) Epoch 20, batch 200, train_loss[loss=2.639, ArTop10Accuracy=0.8007, over 13737.00 frames. ], tot_loss[loss=2.705, ArTop10Accuracy=0.7879, over 7746.44 frames. ], batch size: 34, lr: 6.09e-03
324
+ 2024-08-06 13:59:15,430 INFO [trainer.py:765] (4/8) Epoch 20, batch 300, train_loss[loss=2.762, ArTop10Accuracy=0.7798, over 14253.00 frames. ], tot_loss[loss=2.699, ArTop10Accuracy=0.789, over 9366.73 frames. ], batch size: 45, lr: 6.08e-03
325
+ 2024-08-06 14:00:44,356 INFO [trainer.py:765] (4/8) Epoch 20, batch 400, train_loss[loss=2.555, ArTop10Accuracy=0.8139, over 10905.00 frames. ], tot_loss[loss=2.696, ArTop10Accuracy=0.7895, over 10302.20 frames. ], batch size: 15, lr: 6.07e-03
326
+ 2024-08-06 14:02:14,855 INFO [trainer.py:765] (4/8) Epoch 20, batch 500, train_loss[loss=2.66, ArTop10Accuracy=0.7958, over 12114.00 frames. ], tot_loss[loss=2.692, ArTop10Accuracy=0.7904, over 10858.12 frames. ], batch size: 22, lr: 6.06e-03
327
+ 2024-08-06 14:03:40,856 INFO [trainer.py:765] (4/8) Epoch 20, batch 600, train_loss[loss=2.597, ArTop10Accuracy=0.8091, over 11571.00 frames. ], tot_loss[loss=2.695, ArTop10Accuracy=0.7899, over 11385.90 frames. ], batch size: 18, lr: 6.04e-03
328
+ 2024-08-06 14:05:13,864 INFO [trainer.py:765] (4/8) Epoch 20, batch 700, train_loss[loss=2.717, ArTop10Accuracy=0.7839, over 9984.00 frames. ], tot_loss[loss=2.699, ArTop10Accuracy=0.7892, over 11521.50 frames. ], batch size: 12, lr: 6.03e-03
329
+ 2024-08-06 14:05:30,791 INFO [optim.py:386] (4/8) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.365e+02 1.456e+02 1.550e+02 3.525e+02, threshold=2.913e+02, percent-clipped=0.1
330
+ 2024-08-06 14:06:34,509 INFO [trainer.py:765] (4/8) Epoch 20, batch 800, train_loss[loss=2.721, ArTop10Accuracy=0.7837, over 10083.00 frames. ], tot_loss[loss=2.705, ArTop10Accuracy=0.7881, over 11637.46 frames. ], batch size: 12, lr: 6.02e-03
331
+ 2024-08-06 14:07:50,944 INFO [trainer.py:765] (4/8) Epoch 20, batch 900, train_loss[loss=2.635, ArTop10Accuracy=0.8005, over 12861.00 frames. ], tot_loss[loss=2.704, ArTop10Accuracy=0.7881, over 11700.37 frames. ], batch size: 27, lr: 6.01e-03
332
+ 2024-08-06 14:09:07,173 INFO [trainer.py:765] (4/8) Epoch 20, batch 1000, train_loss[loss=2.693, ArTop10Accuracy=0.7967, over 12675.00 frames. ], tot_loss[loss=2.708, ArTop10Accuracy=0.7876, over 11883.15 frames. ], batch size: 27, lr: 6.00e-03
333
+ 2024-08-06 14:10:21,210 INFO [trainer.py:765] (4/8) Epoch 20, batch 1100, train_loss[loss=2.709, ArTop10Accuracy=0.7851, over 13629.00 frames. ], tot_loss[loss=2.714, ArTop10Accuracy=0.7864, over 11931.12 frames. ], batch size: 34, lr: 5.99e-03
334
+ 2024-08-06 14:11:37,813 INFO [trainer.py:765] (4/8) Epoch 20, batch 1200, train_loss[loss=2.855, ArTop10Accuracy=0.7594, over 11973.00 frames. ], tot_loss[loss=2.714, ArTop10Accuracy=0.7863, over 11830.00 frames. ], batch size: 105, lr: 5.98e-03
335
+ 2024-08-06 14:12:37,299 INFO [trainer.py:650] (4/8) Reaches end of dataloader.
336
+ 2024-08-06 14:12:37,301 INFO [trainer.py:1069] (4/8) Done!
libritts-r/log/log-train-2024-08-06-08-06-14-5 ADDED
@@ -0,0 +1,336 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:06:14,312 INFO [trainer.py:870] (5/8) Training started
2
+ 2024-08-06 08:06:14,313 INFO [trainer.py:889] (5/8) Device: cuda:5
3
+ 2024-08-06 08:06:14,314 INFO [trainer.py:890] (5/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:06:14,314 INFO [trainer.py:892] (5/8) About to create model
5
+ 2024-08-06 08:06:15,008 INFO [trainer.py:899] (5/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:06:16,222 INFO [trainer.py:914] (5/8) Using DDP
7
+ 2024-08-06 08:06:19,151 INFO [datamodule.py:427] (5/8) About to get train cuts
8
+ 2024-08-06 08:06:19,153 INFO [datamodule.py:434] (5/8) About to get dev cuts
9
+ 2024-08-06 08:06:19,155 INFO [datamodule.py:292] (5/8) Disable SpecAugment
10
+ 2024-08-06 08:06:19,155 INFO [datamodule.py:294] (5/8) About to create train dataset
11
+ 2024-08-06 08:06:19,155 INFO [datamodule.py:323] (5/8) Using DynamicBucketingSampler
12
+ 2024-08-06 08:06:19,766 INFO [datamodule.py:344] (5/8) About to create train dataloader
13
+ 2024-08-06 08:06:19,766 INFO [datamodule.py:367] (5/8) About to create dev dataset
14
+ 2024-08-06 08:06:20,091 INFO [datamodule.py:388] (5/8) About to create dev dataloader
15
+ 2024-08-06 08:08:02,120 INFO [trainer.py:765] (5/8) Epoch 1, batch 100, train_loss[loss=4.267, ArTop10Accuracy=0.5104, over 13962.00 frames. ], tot_loss[loss=5.049, ArTop10Accuracy=0.3742, over 4764.60 frames. ], batch size: 62, lr: 2.25e-02
16
+ 2024-08-06 08:09:28,828 INFO [trainer.py:765] (5/8) Epoch 1, batch 200, train_loss[loss=4.009, ArTop10Accuracy=0.5501, over 13701.00 frames. ], tot_loss[loss=4.489, ArTop10Accuracy=0.4683, over 7752.76 frames. ], batch size: 34, lr: 3.00e-02
17
+ 2024-08-06 08:10:52,429 INFO [trainer.py:765] (5/8) Epoch 1, batch 300, train_loss[loss=3.902, ArTop10Accuracy=0.5643, over 14151.00 frames. ], tot_loss[loss=4.21, ArTop10Accuracy=0.5149, over 9369.31 frames. ], batch size: 44, lr: 3.00e-02
18
+ 2024-08-06 08:12:12,699 INFO [trainer.py:765] (5/8) Epoch 1, batch 400, train_loss[loss=3.705, ArTop10Accuracy=0.605, over 10998.00 frames. ], tot_loss[loss=4.023, ArTop10Accuracy=0.5465, over 10273.59 frames. ], batch size: 15, lr: 3.00e-02
19
+ 2024-08-06 08:13:40,050 INFO [trainer.py:765] (5/8) Epoch 1, batch 500, train_loss[loss=3.613, ArTop10Accuracy=0.6219, over 12171.00 frames. ], tot_loss[loss=3.878, ArTop10Accuracy=0.5715, over 10848.61 frames. ], batch size: 22, lr: 2.99e-02
20
+ 2024-08-06 08:15:00,243 INFO [trainer.py:765] (5/8) Epoch 1, batch 600, train_loss[loss=3.56, ArTop10Accuracy=0.6298, over 11346.00 frames. ], tot_loss[loss=3.765, ArTop10Accuracy=0.5916, over 11350.53 frames. ], batch size: 18, lr: 2.99e-02
21
+ 2024-08-06 08:16:26,424 INFO [trainer.py:765] (5/8) Epoch 1, batch 700, train_loss[loss=3.414, ArTop10Accuracy=0.6566, over 10089.00 frames. ], tot_loss[loss=3.691, ArTop10Accuracy=0.6047, over 11494.81 frames. ], batch size: 12, lr: 2.99e-02
22
+ 2024-08-06 08:17:43,017 INFO [trainer.py:765] (5/8) Epoch 1, batch 800, train_loss[loss=3.544, ArTop10Accuracy=0.6297, over 9378.00 frames. ], tot_loss[loss=3.625, ArTop10Accuracy=0.6168, over 11636.05 frames. ], batch size: 11, lr: 2.98e-02
23
+ 2024-08-06 08:18:56,151 INFO [trainer.py:765] (5/8) Epoch 1, batch 900, train_loss[loss=3.446, ArTop10Accuracy=0.649, over 12843.00 frames. ], tot_loss[loss=3.57, ArTop10Accuracy=0.6266, over 11681.46 frames. ], batch size: 27, lr: 2.98e-02
24
+ 2024-08-06 08:20:12,862 INFO [trainer.py:765] (5/8) Epoch 1, batch 1000, train_loss[loss=3.474, ArTop10Accuracy=0.6463, over 12768.00 frames. ], tot_loss[loss=3.531, ArTop10Accuracy=0.6336, over 11878.58 frames. ], batch size: 27, lr: 2.97e-02
25
+ 2024-08-06 08:20:13,539 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 9.300e+01 1.871e+02 2.675e+02 4.030e+02 9.119e+03, threshold=5.351e+02, percent-clipped=0.0
26
+ 2024-08-06 08:21:29,155 INFO [trainer.py:765] (5/8) Epoch 1, batch 1100, train_loss[loss=3.411, ArTop10Accuracy=0.6511, over 13878.00 frames. ], tot_loss[loss=3.494, ArTop10Accuracy=0.6401, over 11961.72 frames. ], batch size: 35, lr: 2.96e-02
27
+ 2024-08-06 08:22:45,411 INFO [trainer.py:765] (5/8) Epoch 1, batch 1200, train_loss[loss=3.503, ArTop10Accuracy=0.6388, over 11898.00 frames. ], tot_loss[loss=3.465, ArTop10Accuracy=0.6456, over 11873.58 frames. ], batch size: 101, lr: 2.96e-02
28
+ 2024-08-06 08:23:45,288 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
29
+ 2024-08-06 08:25:36,237 INFO [trainer.py:765] (5/8) Epoch 2, batch 100, train_loss[loss=3.392, ArTop10Accuracy=0.655, over 14622.00 frames. ], tot_loss[loss=3.424, ArTop10Accuracy=0.652, over 4773.72 frames. ], batch size: 62, lr: 2.90e-02
30
+ 2024-08-06 08:26:58,955 INFO [trainer.py:765] (5/8) Epoch 2, batch 200, train_loss[loss=3.378, ArTop10Accuracy=0.6575, over 13425.00 frames. ], tot_loss[loss=3.386, ArTop10Accuracy=0.6597, over 7764.85 frames. ], batch size: 34, lr: 2.89e-02
31
+ 2024-08-06 08:28:25,534 INFO [trainer.py:765] (5/8) Epoch 2, batch 300, train_loss[loss=3.37, ArTop10Accuracy=0.6646, over 14022.00 frames. ], tot_loss[loss=3.371, ArTop10Accuracy=0.6625, over 9389.25 frames. ], batch size: 44, lr: 2.89e-02
32
+ 2024-08-06 08:29:48,637 INFO [trainer.py:765] (5/8) Epoch 2, batch 400, train_loss[loss=3.421, ArTop10Accuracy=0.6513, over 10383.00 frames. ], tot_loss[loss=3.358, ArTop10Accuracy=0.6654, over 10290.14 frames. ], batch size: 14, lr: 2.88e-02
33
+ 2024-08-06 08:31:22,902 INFO [trainer.py:765] (5/8) Epoch 2, batch 500, train_loss[loss=3.412, ArTop10Accuracy=0.6543, over 12753.00 frames. ], tot_loss[loss=3.343, ArTop10Accuracy=0.6681, over 10849.81 frames. ], batch size: 23, lr: 2.87e-02
34
+ 2024-08-06 08:32:45,687 INFO [trainer.py:765] (5/8) Epoch 2, batch 600, train_loss[loss=3.332, ArTop10Accuracy=0.67, over 11454.00 frames. ], tot_loss[loss=3.331, ArTop10Accuracy=0.6706, over 11349.06 frames. ], batch size: 18, lr: 2.86e-02
35
+ 2024-08-06 08:34:13,583 INFO [trainer.py:765] (5/8) Epoch 2, batch 700, train_loss[loss=3.326, ArTop10Accuracy=0.6678, over 10104.00 frames. ], tot_loss[loss=3.326, ArTop10Accuracy=0.6716, over 11504.73 frames. ], batch size: 12, lr: 2.85e-02
36
+ 2024-08-06 08:34:31,175 INFO [trainer.py:803] (5/8) Computing validation loss
37
+ 2024-08-06 08:34:40,888 INFO [trainer.py:811] (5/8) Epoch 2, validation: loss=3.277, ArTop10Accuracy=0.6803, over 1827537.00 frames.
38
+ 2024-08-06 08:34:40,889 INFO [trainer.py:814] (5/8) Maximum memory allocated so far is 28868MB
39
+ 2024-08-06 08:34:41,700 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 7.953e+01 1.592e+02 2.200e+02 3.344e+02 2.949e+03, threshold=4.400e+02, percent-clipped=8.6
40
+ 2024-08-06 08:35:39,877 INFO [trainer.py:765] (5/8) Epoch 2, batch 800, train_loss[loss=3.238, ArTop10Accuracy=0.6932, over 9108.00 frames. ], tot_loss[loss=3.319, ArTop10Accuracy=0.6731, over 11622.67 frames. ], batch size: 11, lr: 2.84e-02
41
+ 2024-08-06 08:36:56,372 INFO [trainer.py:765] (5/8) Epoch 2, batch 900, train_loss[loss=3.164, ArTop10Accuracy=0.6972, over 12777.00 frames. ], tot_loss[loss=3.305, ArTop10Accuracy=0.6756, over 11673.76 frames. ], batch size: 27, lr: 2.83e-02
42
+ 2024-08-06 08:38:10,511 INFO [trainer.py:765] (5/8) Epoch 2, batch 1000, train_loss[loss=3.184, ArTop10Accuracy=0.6983, over 13158.00 frames. ], tot_loss[loss=3.299, ArTop10Accuracy=0.6768, over 11869.74 frames. ], batch size: 27, lr: 2.82e-02
43
+ 2024-08-06 08:39:25,059 INFO [trainer.py:765] (5/8) Epoch 2, batch 1100, train_loss[loss=3.256, ArTop10Accuracy=0.6888, over 13548.00 frames. ], tot_loss[loss=3.292, ArTop10Accuracy=0.6782, over 11932.93 frames. ], batch size: 34, lr: 2.81e-02
44
+ 2024-08-06 08:40:38,220 INFO [trainer.py:765] (5/8) Epoch 2, batch 1200, train_loss[loss=3.328, ArTop10Accuracy=0.6618, over 12423.00 frames. ], tot_loss[loss=3.283, ArTop10Accuracy=0.6796, over 11869.40 frames. ], batch size: 101, lr: 2.80e-02
45
+ 2024-08-06 08:41:38,289 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
46
+ 2024-08-06 08:43:36,649 INFO [trainer.py:765] (5/8) Epoch 3, batch 100, train_loss[loss=3.275, ArTop10Accuracy=0.6816, over 14241.00 frames. ], tot_loss[loss=3.244, ArTop10Accuracy=0.6861, over 4767.69 frames. ], batch size: 62, lr: 2.67e-02
47
+ 2024-08-06 08:45:10,500 INFO [trainer.py:765] (5/8) Epoch 3, batch 200, train_loss[loss=3.273, ArTop10Accuracy=0.6772, over 13731.00 frames. ], tot_loss[loss=3.223, ArTop10Accuracy=0.6902, over 7752.86 frames. ], batch size: 34, lr: 2.66e-02
48
+ 2024-08-06 08:46:29,257 INFO [trainer.py:765] (5/8) Epoch 3, batch 300, train_loss[loss=3.183, ArTop10Accuracy=0.6995, over 14106.00 frames. ], tot_loss[loss=3.205, ArTop10Accuracy=0.6939, over 9389.07 frames. ], batch size: 44, lr: 2.64e-02
49
+ 2024-08-06 08:48:04,218 INFO [trainer.py:765] (5/8) Epoch 3, batch 400, train_loss[loss=3.116, ArTop10Accuracy=0.7163, over 10269.00 frames. ], tot_loss[loss=3.19, ArTop10Accuracy=0.6969, over 10295.87 frames. ], batch size: 14, lr: 2.63e-02
50
+ 2024-08-06 08:48:40,881 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 9.282e+01 1.561e+02 1.981e+02 2.686e+02 1.768e+03, threshold=3.962e+02, percent-clipped=7.6
51
+ 2024-08-06 08:49:25,541 INFO [trainer.py:765] (5/8) Epoch 3, batch 500, train_loss[loss=3.086, ArTop10Accuracy=0.7171, over 12081.00 frames. ], tot_loss[loss=3.174, ArTop10Accuracy=0.7001, over 10836.69 frames. ], batch size: 22, lr: 2.62e-02
52
+ 2024-08-06 08:51:00,476 INFO [trainer.py:765] (5/8) Epoch 3, batch 600, train_loss[loss=3.137, ArTop10Accuracy=0.7034, over 11385.00 frames. ], tot_loss[loss=3.153, ArTop10Accuracy=0.7042, over 11343.41 frames. ], batch size: 18, lr: 2.61e-02
53
+ 2024-08-06 08:52:31,617 INFO [trainer.py:765] (5/8) Epoch 3, batch 700, train_loss[loss=3.084, ArTop10Accuracy=0.7191, over 9357.00 frames. ], tot_loss[loss=3.145, ArTop10Accuracy=0.7056, over 11508.14 frames. ], batch size: 11, lr: 2.60e-02
54
+ 2024-08-06 08:53:57,388 INFO [trainer.py:765] (5/8) Epoch 3, batch 800, train_loss[loss=3.123, ArTop10Accuracy=0.7103, over 9462.00 frames. ], tot_loss[loss=3.138, ArTop10Accuracy=0.7073, over 11610.68 frames. ], batch size: 11, lr: 2.59e-02
55
+ 2024-08-06 08:55:15,117 INFO [trainer.py:765] (5/8) Epoch 3, batch 900, train_loss[loss=2.989, ArTop10Accuracy=0.7359, over 12843.00 frames. ], tot_loss[loss=3.12, ArTop10Accuracy=0.7107, over 11648.07 frames. ], batch size: 27, lr: 2.57e-02
56
+ 2024-08-06 08:56:31,557 INFO [trainer.py:765] (5/8) Epoch 3, batch 1000, train_loss[loss=3.13, ArTop10Accuracy=0.7095, over 13044.00 frames. ], tot_loss[loss=3.112, ArTop10Accuracy=0.712, over 11857.18 frames. ], batch size: 27, lr: 2.56e-02
57
+ 2024-08-06 08:57:46,505 INFO [trainer.py:765] (5/8) Epoch 3, batch 1100, train_loss[loss=3.133, ArTop10Accuracy=0.7092, over 13536.00 frames. ], tot_loss[loss=3.108, ArTop10Accuracy=0.7126, over 11942.38 frames. ], batch size: 34, lr: 2.55e-02
58
+ 2024-08-06 08:59:01,399 INFO [trainer.py:765] (5/8) Epoch 3, batch 1200, train_loss[loss=3.122, ArTop10Accuracy=0.7062, over 11571.00 frames. ], tot_loss[loss=3.098, ArTop10Accuracy=0.7146, over 11849.80 frames. ], batch size: 101, lr: 2.54e-02
59
+ 2024-08-06 09:00:01,730 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
60
+ 2024-08-06 09:01:50,741 INFO [trainer.py:765] (5/8) Epoch 4, batch 100, train_loss[loss=3.116, ArTop10Accuracy=0.7056, over 14949.00 frames. ], tot_loss[loss=3.071, ArTop10Accuracy=0.7187, over 4748.08 frames. ], batch size: 62, lr: 2.38e-02
61
+ 2024-08-06 09:02:52,858 INFO [trainer.py:803] (5/8) Computing validation loss
62
+ 2024-08-06 09:03:02,384 INFO [trainer.py:811] (5/8) Epoch 4, validation: loss=2.997, ArTop10Accuracy=0.7338, over 1827537.00 frames.
63
+ 2024-08-06 09:03:02,385 INFO [trainer.py:814] (5/8) Maximum memory allocated so far is 29481MB
64
+ 2024-08-06 09:03:03,364 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.499e+02 1.782e+02 2.273e+02 1.100e+03, threshold=3.565e+02, percent-clipped=4.7
65
+ 2024-08-06 09:03:29,273 INFO [trainer.py:765] (5/8) Epoch 4, batch 200, train_loss[loss=2.949, ArTop10Accuracy=0.744, over 13677.00 frames. ], tot_loss[loss=3.048, ArTop10Accuracy=0.7237, over 7770.03 frames. ], batch size: 34, lr: 2.37e-02
66
+ 2024-08-06 09:05:01,732 INFO [trainer.py:765] (5/8) Epoch 4, batch 300, train_loss[loss=3.044, ArTop10Accuracy=0.7282, over 14460.00 frames. ], tot_loss[loss=3.036, ArTop10Accuracy=0.7261, over 9385.26 frames. ], batch size: 45, lr: 2.36e-02
67
+ 2024-08-06 09:06:28,150 INFO [trainer.py:765] (5/8) Epoch 4, batch 400, train_loss[loss=2.864, ArTop10Accuracy=0.7614, over 10161.00 frames. ], tot_loss[loss=3.032, ArTop10Accuracy=0.7271, over 10287.27 frames. ], batch size: 14, lr: 2.34e-02
68
+ 2024-08-06 09:08:01,924 INFO [trainer.py:765] (5/8) Epoch 4, batch 500, train_loss[loss=2.968, ArTop10Accuracy=0.7405, over 12393.00 frames. ], tot_loss[loss=3.022, ArTop10Accuracy=0.729, over 10846.36 frames. ], batch size: 22, lr: 2.33e-02
69
+ 2024-08-06 09:09:28,540 INFO [trainer.py:765] (5/8) Epoch 4, batch 600, train_loss[loss=3.045, ArTop10Accuracy=0.7254, over 11475.00 frames. ], tot_loss[loss=3.019, ArTop10Accuracy=0.7295, over 11367.69 frames. ], batch size: 18, lr: 2.32e-02
70
+ 2024-08-06 09:10:59,865 INFO [trainer.py:765] (5/8) Epoch 4, batch 700, train_loss[loss=2.952, ArTop10Accuracy=0.7418, over 10227.00 frames. ], tot_loss[loss=3.023, ArTop10Accuracy=0.7287, over 11527.79 frames. ], batch size: 12, lr: 2.31e-02
71
+ 2024-08-06 09:12:17,512 INFO [trainer.py:765] (5/8) Epoch 4, batch 800, train_loss[loss=3.026, ArTop10Accuracy=0.7246, over 9501.00 frames. ], tot_loss[loss=3.023, ArTop10Accuracy=0.7287, over 11616.75 frames. ], batch size: 11, lr: 2.30e-02
72
+ 2024-08-06 09:13:33,212 INFO [trainer.py:765] (5/8) Epoch 4, batch 900, train_loss[loss=2.961, ArTop10Accuracy=0.7415, over 12981.00 frames. ], tot_loss[loss=3.014, ArTop10Accuracy=0.7305, over 11675.99 frames. ], batch size: 27, lr: 2.29e-02
73
+ 2024-08-06 09:14:47,519 INFO [trainer.py:765] (5/8) Epoch 4, batch 1000, train_loss[loss=3.037, ArTop10Accuracy=0.7291, over 12837.00 frames. ], tot_loss[loss=3.011, ArTop10Accuracy=0.731, over 11879.91 frames. ], batch size: 27, lr: 2.28e-02
74
+ 2024-08-06 09:16:02,981 INFO [trainer.py:765] (5/8) Epoch 4, batch 1100, train_loss[loss=3.094, ArTop10Accuracy=0.7178, over 13704.00 frames. ], tot_loss[loss=3.013, ArTop10Accuracy=0.7306, over 11943.31 frames. ], batch size: 34, lr: 2.26e-02
75
+ 2024-08-06 09:16:53,291 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.440e+02 1.636e+02 1.968e+02 7.702e+02, threshold=3.273e+02, percent-clipped=1.3
76
+ 2024-08-06 09:17:18,344 INFO [trainer.py:765] (5/8) Epoch 4, batch 1200, train_loss[loss=3.051, ArTop10Accuracy=0.7233, over 12387.00 frames. ], tot_loss[loss=3.011, ArTop10Accuracy=0.7312, over 11857.01 frames. ], batch size: 101, lr: 2.25e-02
77
+ 2024-08-06 09:18:17,719 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
78
+ 2024-08-06 09:20:17,170 INFO [trainer.py:765] (5/8) Epoch 5, batch 100, train_loss[loss=3.023, ArTop10Accuracy=0.731, over 14667.00 frames. ], tot_loss[loss=2.989, ArTop10Accuracy=0.7344, over 4786.09 frames. ], batch size: 62, lr: 2.10e-02
79
+ 2024-08-06 09:21:52,295 INFO [trainer.py:765] (5/8) Epoch 5, batch 200, train_loss[loss=2.991, ArTop10Accuracy=0.7299, over 13593.00 frames. ], tot_loss[loss=2.981, ArTop10Accuracy=0.736, over 7754.44 frames. ], batch size: 34, lr: 2.09e-02
80
+ 2024-08-06 09:23:19,240 INFO [trainer.py:765] (5/8) Epoch 5, batch 300, train_loss[loss=2.998, ArTop10Accuracy=0.7285, over 14391.00 frames. ], tot_loss[loss=2.972, ArTop10Accuracy=0.7377, over 9395.87 frames. ], batch size: 44, lr: 2.08e-02
81
+ 2024-08-06 09:24:53,536 INFO [trainer.py:765] (5/8) Epoch 5, batch 400, train_loss[loss=2.832, ArTop10Accuracy=0.7682, over 10143.00 frames. ], tot_loss[loss=2.969, ArTop10Accuracy=0.7386, over 10302.88 frames. ], batch size: 14, lr: 2.07e-02
82
+ 2024-08-06 09:26:19,417 INFO [trainer.py:765] (5/8) Epoch 5, batch 500, train_loss[loss=3.012, ArTop10Accuracy=0.7274, over 12156.00 frames. ], tot_loss[loss=2.965, ArTop10Accuracy=0.7393, over 10871.70 frames. ], batch size: 22, lr: 2.06e-02
83
+ 2024-08-06 09:27:49,537 INFO [trainer.py:765] (5/8) Epoch 5, batch 600, train_loss[loss=2.916, ArTop10Accuracy=0.7567, over 11493.00 frames. ], tot_loss[loss=2.962, ArTop10Accuracy=0.7399, over 11392.96 frames. ], batch size: 18, lr: 2.05e-02
84
+ 2024-08-06 09:29:21,669 INFO [trainer.py:765] (5/8) Epoch 5, batch 700, train_loss[loss=2.98, ArTop10Accuracy=0.7363, over 10116.00 frames. ], tot_loss[loss=2.966, ArTop10Accuracy=0.7392, over 11554.05 frames. ], batch size: 12, lr: 2.04e-02
85
+ 2024-08-06 09:30:44,692 INFO [trainer.py:765] (5/8) Epoch 5, batch 800, train_loss[loss=3.029, ArTop10Accuracy=0.7166, over 10044.00 frames. ], tot_loss[loss=2.967, ArTop10Accuracy=0.7391, over 11668.31 frames. ], batch size: 12, lr: 2.03e-02
86
+ 2024-08-06 09:31:51,238 INFO [trainer.py:803] (5/8) Computing validation loss
87
+ 2024-08-06 09:32:00,762 INFO [trainer.py:811] (5/8) Epoch 5, validation: loss=2.926, ArTop10Accuracy=0.7466, over 1827537.00 frames.
88
+ 2024-08-06 09:32:00,763 INFO [trainer.py:814] (5/8) Maximum memory allocated so far is 29481MB
89
+ 2024-08-06 09:32:01,708 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.349e+02 1.525e+02 1.806e+02 1.007e+03, threshold=3.049e+02, percent-clipped=2.3
90
+ 2024-08-06 09:32:10,553 INFO [trainer.py:765] (5/8) Epoch 5, batch 900, train_loss[loss=2.925, ArTop10Accuracy=0.7434, over 12921.00 frames. ], tot_loss[loss=2.959, ArTop10Accuracy=0.7408, over 11719.21 frames. ], batch size: 27, lr: 2.02e-02
91
+ 2024-08-06 09:33:27,323 INFO [trainer.py:765] (5/8) Epoch 5, batch 1000, train_loss[loss=2.894, ArTop10Accuracy=0.7564, over 12846.00 frames. ], tot_loss[loss=2.96, ArTop10Accuracy=0.7406, over 11894.84 frames. ], batch size: 27, lr: 2.01e-02
92
+ 2024-08-06 09:34:42,300 INFO [trainer.py:765] (5/8) Epoch 5, batch 1100, train_loss[loss=2.979, ArTop10Accuracy=0.7327, over 13755.00 frames. ], tot_loss[loss=2.966, ArTop10Accuracy=0.7394, over 11957.99 frames. ], batch size: 34, lr: 2.00e-02
93
+ 2024-08-06 09:35:56,331 INFO [trainer.py:765] (5/8) Epoch 5, batch 1200, train_loss[loss=3.056, ArTop10Accuracy=0.7168, over 12480.00 frames. ], tot_loss[loss=2.962, ArTop10Accuracy=0.7402, over 11875.87 frames. ], batch size: 101, lr: 1.99e-02
94
+ 2024-08-06 09:36:55,627 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
95
+ 2024-08-06 09:38:52,664 INFO [trainer.py:765] (5/8) Epoch 6, batch 100, train_loss[loss=3.02, ArTop10Accuracy=0.7256, over 14367.00 frames. ], tot_loss[loss=2.952, ArTop10Accuracy=0.7414, over 4758.76 frames. ], batch size: 62, lr: 1.85e-02
96
+ 2024-08-06 09:40:19,833 INFO [trainer.py:765] (5/8) Epoch 6, batch 200, train_loss[loss=2.915, ArTop10Accuracy=0.7538, over 13674.00 frames. ], tot_loss[loss=2.94, ArTop10Accuracy=0.7438, over 7757.35 frames. ], batch size: 34, lr: 1.84e-02
97
+ 2024-08-06 09:41:52,964 INFO [trainer.py:765] (5/8) Epoch 6, batch 300, train_loss[loss=2.919, ArTop10Accuracy=0.7497, over 14718.00 frames. ], tot_loss[loss=2.932, ArTop10Accuracy=0.7457, over 9372.92 frames. ], batch size: 45, lr: 1.83e-02
98
+ 2024-08-06 09:43:17,827 INFO [trainer.py:765] (5/8) Epoch 6, batch 400, train_loss[loss=2.957, ArTop10Accuracy=0.7436, over 10209.00 frames. ], tot_loss[loss=2.93, ArTop10Accuracy=0.7461, over 10291.45 frames. ], batch size: 14, lr: 1.83e-02
99
+ 2024-08-06 09:44:54,128 INFO [trainer.py:765] (5/8) Epoch 6, batch 500, train_loss[loss=2.95, ArTop10Accuracy=0.7403, over 12219.00 frames. ], tot_loss[loss=2.921, ArTop10Accuracy=0.748, over 10847.88 frames. ], batch size: 22, lr: 1.82e-02
100
+ 2024-08-06 09:46:22,872 INFO [trainer.py:765] (5/8) Epoch 6, batch 600, train_loss[loss=2.853, ArTop10Accuracy=0.7617, over 11301.00 frames. ], tot_loss[loss=2.918, ArTop10Accuracy=0.7483, over 11379.44 frames. ], batch size: 18, lr: 1.81e-02
101
+ 2024-08-06 09:46:37,219 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.012e+02 1.339e+02 1.480e+02 1.701e+02 7.506e+02, threshold=2.959e+02, percent-clipped=1.1
102
+ 2024-08-06 09:47:57,870 INFO [trainer.py:765] (5/8) Epoch 6, batch 700, train_loss[loss=2.774, ArTop10Accuracy=0.7798, over 10119.00 frames. ], tot_loss[loss=2.924, ArTop10Accuracy=0.7473, over 11527.84 frames. ], batch size: 12, lr: 1.80e-02
103
+ 2024-08-06 09:49:15,954 INFO [trainer.py:765] (5/8) Epoch 6, batch 800, train_loss[loss=2.951, ArTop10Accuracy=0.7472, over 10164.00 frames. ], tot_loss[loss=2.927, ArTop10Accuracy=0.7466, over 11640.78 frames. ], batch size: 12, lr: 1.79e-02
104
+ 2024-08-06 09:50:32,135 INFO [trainer.py:765] (5/8) Epoch 6, batch 900, train_loss[loss=2.91, ArTop10Accuracy=0.7532, over 12720.00 frames. ], tot_loss[loss=2.921, ArTop10Accuracy=0.7477, over 11692.38 frames. ], batch size: 27, lr: 1.78e-02
105
+ 2024-08-06 09:51:47,298 INFO [trainer.py:765] (5/8) Epoch 6, batch 1000, train_loss[loss=2.88, ArTop10Accuracy=0.7574, over 12849.00 frames. ], tot_loss[loss=2.923, ArTop10Accuracy=0.7472, over 11888.94 frames. ], batch size: 27, lr: 1.77e-02
106
+ 2024-08-06 09:53:00,920 INFO [trainer.py:765] (5/8) Epoch 6, batch 1100, train_loss[loss=2.869, ArTop10Accuracy=0.7568, over 13800.00 frames. ], tot_loss[loss=2.928, ArTop10Accuracy=0.7463, over 11949.08 frames. ], batch size: 34, lr: 1.77e-02
107
+ 2024-08-06 09:54:14,336 INFO [trainer.py:765] (5/8) Epoch 6, batch 1200, train_loss[loss=2.998, ArTop10Accuracy=0.7337, over 12078.00 frames. ], tot_loss[loss=2.926, ArTop10Accuracy=0.7469, over 11859.24 frames. ], batch size: 101, lr: 1.76e-02
108
+ 2024-08-06 09:55:13,309 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
109
+ 2024-08-06 09:57:06,698 INFO [trainer.py:765] (5/8) Epoch 7, batch 100, train_loss[loss=2.986, ArTop10Accuracy=0.7338, over 14781.00 frames. ], tot_loss[loss=2.912, ArTop10Accuracy=0.7488, over 4773.34 frames. ], batch size: 64, lr: 1.64e-02
110
+ 2024-08-06 09:58:39,425 INFO [trainer.py:765] (5/8) Epoch 7, batch 200, train_loss[loss=2.944, ArTop10Accuracy=0.7442, over 13797.00 frames. ], tot_loss[loss=2.904, ArTop10Accuracy=0.7506, over 7754.99 frames. ], batch size: 34, lr: 1.64e-02
111
+ 2024-08-06 10:00:06,083 INFO [trainer.py:765] (5/8) Epoch 7, batch 300, train_loss[loss=2.971, ArTop10Accuracy=0.7391, over 14244.00 frames. ], tot_loss[loss=2.9, ArTop10Accuracy=0.7514, over 9363.68 frames. ], batch size: 44, lr: 1.63e-02
112
+ 2024-08-06 10:00:40,510 INFO [trainer.py:803] (5/8) Computing validation loss
113
+ 2024-08-06 10:00:50,245 INFO [trainer.py:811] (5/8) Epoch 7, validation: loss=2.88, ArTop10Accuracy=0.7554, over 1827537.00 frames.
114
+ 2024-08-06 10:00:50,246 INFO [trainer.py:814] (5/8) Maximum memory allocated so far is 29481MB
115
+ 2024-08-06 10:00:50,976 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.002e+02 1.286e+02 1.429e+02 1.605e+02 1.020e+03, threshold=2.857e+02, percent-clipped=1.5
116
+ 2024-08-06 10:01:49,117 INFO [trainer.py:765] (5/8) Epoch 7, batch 400, train_loss[loss=2.894, ArTop10Accuracy=0.753, over 10830.00 frames. ], tot_loss[loss=2.894, ArTop10Accuracy=0.7526, over 10274.54 frames. ], batch size: 15, lr: 1.62e-02
117
+ 2024-08-06 10:03:21,459 INFO [trainer.py:765] (5/8) Epoch 7, batch 500, train_loss[loss=2.883, ArTop10Accuracy=0.7543, over 12246.00 frames. ], tot_loss[loss=2.891, ArTop10Accuracy=0.7534, over 10842.54 frames. ], batch size: 22, lr: 1.61e-02
118
+ 2024-08-06 10:04:51,882 INFO [trainer.py:765] (5/8) Epoch 7, batch 600, train_loss[loss=2.761, ArTop10Accuracy=0.7776, over 11928.00 frames. ], tot_loss[loss=2.89, ArTop10Accuracy=0.7535, over 11391.82 frames. ], batch size: 19, lr: 1.61e-02
119
+ 2024-08-06 10:06:25,111 INFO [trainer.py:765] (5/8) Epoch 7, batch 700, train_loss[loss=2.848, ArTop10Accuracy=0.7626, over 10137.00 frames. ], tot_loss[loss=2.897, ArTop10Accuracy=0.7521, over 11519.69 frames. ], batch size: 12, lr: 1.60e-02
120
+ 2024-08-06 10:07:46,948 INFO [trainer.py:765] (5/8) Epoch 7, batch 800, train_loss[loss=2.722, ArTop10Accuracy=0.7914, over 10032.00 frames. ], tot_loss[loss=2.896, ArTop10Accuracy=0.7523, over 11636.71 frames. ], batch size: 12, lr: 1.59e-02
121
+ 2024-08-06 10:09:02,824 INFO [trainer.py:765] (5/8) Epoch 7, batch 900, train_loss[loss=2.903, ArTop10Accuracy=0.7506, over 13089.00 frames. ], tot_loss[loss=2.891, ArTop10Accuracy=0.7534, over 11693.06 frames. ], batch size: 27, lr: 1.59e-02
122
+ 2024-08-06 10:10:19,635 INFO [trainer.py:765] (5/8) Epoch 7, batch 1000, train_loss[loss=2.899, ArTop10Accuracy=0.7489, over 12903.00 frames. ], tot_loss[loss=2.895, ArTop10Accuracy=0.7529, over 11887.87 frames. ], batch size: 27, lr: 1.58e-02
123
+ 2024-08-06 10:11:35,208 INFO [trainer.py:765] (5/8) Epoch 7, batch 1100, train_loss[loss=2.961, ArTop10Accuracy=0.7398, over 13668.00 frames. ], tot_loss[loss=2.899, ArTop10Accuracy=0.7522, over 11947.64 frames. ], batch size: 34, lr: 1.57e-02
124
+ 2024-08-06 10:12:48,205 INFO [trainer.py:765] (5/8) Epoch 7, batch 1200, train_loss[loss=3.019, ArTop10Accuracy=0.7253, over 12288.00 frames. ], tot_loss[loss=2.897, ArTop10Accuracy=0.7522, over 11875.89 frames. ], batch size: 101, lr: 1.57e-02
125
+ 2024-08-06 10:13:46,878 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
126
+ 2024-08-06 10:15:03,601 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.017e+02 1.283e+02 1.410e+02 1.601e+02 1.017e+03, threshold=2.820e+02, percent-clipped=0.9
127
+ 2024-08-06 10:15:40,821 INFO [trainer.py:765] (5/8) Epoch 8, batch 100, train_loss[loss=2.939, ArTop10Accuracy=0.7438, over 14424.00 frames. ], tot_loss[loss=2.88, ArTop10Accuracy=0.7551, over 4781.58 frames. ], batch size: 62, lr: 1.47e-02
128
+ 2024-08-06 10:17:12,861 INFO [trainer.py:765] (5/8) Epoch 8, batch 200, train_loss[loss=2.922, ArTop10Accuracy=0.7507, over 13815.00 frames. ], tot_loss[loss=2.872, ArTop10Accuracy=0.7568, over 7780.65 frames. ], batch size: 34, lr: 1.46e-02
129
+ 2024-08-06 10:18:37,898 INFO [trainer.py:765] (5/8) Epoch 8, batch 300, train_loss[loss=2.912, ArTop10Accuracy=0.743, over 14070.00 frames. ], tot_loss[loss=2.868, ArTop10Accuracy=0.7575, over 9380.95 frames. ], batch size: 44, lr: 1.46e-02
130
+ 2024-08-06 10:20:06,342 INFO [trainer.py:765] (5/8) Epoch 8, batch 400, train_loss[loss=2.768, ArTop10Accuracy=0.7759, over 10383.00 frames. ], tot_loss[loss=2.867, ArTop10Accuracy=0.7579, over 10283.30 frames. ], batch size: 14, lr: 1.45e-02
131
+ 2024-08-06 10:21:32,411 INFO [trainer.py:765] (5/8) Epoch 8, batch 500, train_loss[loss=2.79, ArTop10Accuracy=0.7696, over 12282.00 frames. ], tot_loss[loss=2.864, ArTop10Accuracy=0.7583, over 10854.83 frames. ], batch size: 22, lr: 1.45e-02
132
+ 2024-08-06 10:23:00,974 INFO [trainer.py:765] (5/8) Epoch 8, batch 600, train_loss[loss=2.803, ArTop10Accuracy=0.769, over 11862.00 frames. ], tot_loss[loss=2.863, ArTop10Accuracy=0.7586, over 11351.68 frames. ], batch size: 19, lr: 1.44e-02
133
+ 2024-08-06 10:24:37,787 INFO [trainer.py:765] (5/8) Epoch 8, batch 700, train_loss[loss=2.795, ArTop10Accuracy=0.7727, over 9330.00 frames. ], tot_loss[loss=2.87, ArTop10Accuracy=0.7574, over 11504.70 frames. ], batch size: 11, lr: 1.43e-02
134
+ 2024-08-06 10:25:56,085 INFO [trainer.py:765] (5/8) Epoch 8, batch 800, train_loss[loss=2.741, ArTop10Accuracy=0.7868, over 10188.00 frames. ], tot_loss[loss=2.875, ArTop10Accuracy=0.7567, over 11641.71 frames. ], batch size: 12, lr: 1.43e-02
135
+ 2024-08-06 10:27:12,244 INFO [trainer.py:765] (5/8) Epoch 8, batch 900, train_loss[loss=2.89, ArTop10Accuracy=0.7553, over 12810.00 frames. ], tot_loss[loss=2.868, ArTop10Accuracy=0.7581, over 11686.35 frames. ], batch size: 27, lr: 1.42e-02
136
+ 2024-08-06 10:28:25,263 INFO [trainer.py:765] (5/8) Epoch 8, batch 1000, train_loss[loss=2.892, ArTop10Accuracy=0.7476, over 13041.00 frames. ], tot_loss[loss=2.868, ArTop10Accuracy=0.7578, over 11879.42 frames. ], batch size: 27, lr: 1.42e-02
137
+ 2024-08-06 10:29:07,155 INFO [trainer.py:803] (5/8) Computing validation loss
138
+ 2024-08-06 10:29:16,831 INFO [trainer.py:811] (5/8) Epoch 8, validation: loss=2.858, ArTop10Accuracy=0.7594, over 1827537.00 frames.
139
+ 2024-08-06 10:29:16,831 INFO [trainer.py:814] (5/8) Maximum memory allocated so far is 32717MB
140
+ 2024-08-06 10:29:17,490 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.275e+02 1.390e+02 1.547e+02 3.717e+02, threshold=2.781e+02, percent-clipped=0.7
141
+ 2024-08-06 10:29:51,730 INFO [trainer.py:765] (5/8) Epoch 8, batch 1100, train_loss[loss=2.859, ArTop10Accuracy=0.7609, over 13695.00 frames. ], tot_loss[loss=2.874, ArTop10Accuracy=0.7563, over 11977.45 frames. ], batch size: 34, lr: 1.41e-02
142
+ 2024-08-06 10:31:05,945 INFO [trainer.py:765] (5/8) Epoch 8, batch 1200, train_loss[loss=3.031, ArTop10Accuracy=0.7265, over 12033.00 frames. ], tot_loss[loss=2.875, ArTop10Accuracy=0.7565, over 11874.16 frames. ], batch size: 101, lr: 1.40e-02
143
+ 2024-08-06 10:32:05,333 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
144
+ 2024-08-06 10:34:01,255 INFO [trainer.py:765] (5/8) Epoch 9, batch 100, train_loss[loss=2.908, ArTop10Accuracy=0.7534, over 14391.00 frames. ], tot_loss[loss=2.861, ArTop10Accuracy=0.7583, over 4756.33 frames. ], batch size: 62, lr: 1.32e-02
145
+ 2024-08-06 10:35:31,771 INFO [trainer.py:765] (5/8) Epoch 9, batch 200, train_loss[loss=2.846, ArTop10Accuracy=0.761, over 13746.00 frames. ], tot_loss[loss=2.855, ArTop10Accuracy=0.7598, over 7743.84 frames. ], batch size: 34, lr: 1.32e-02
146
+ 2024-08-06 10:36:57,927 INFO [trainer.py:765] (5/8) Epoch 9, batch 300, train_loss[loss=2.858, ArTop10Accuracy=0.7584, over 14196.00 frames. ], tot_loss[loss=2.85, ArTop10Accuracy=0.7611, over 9395.83 frames. ], batch size: 44, lr: 1.31e-02
147
+ 2024-08-06 10:38:32,696 INFO [trainer.py:765] (5/8) Epoch 9, batch 400, train_loss[loss=2.8, ArTop10Accuracy=0.7774, over 10344.00 frames. ], tot_loss[loss=2.846, ArTop10Accuracy=0.7622, over 10296.88 frames. ], batch size: 14, lr: 1.31e-02
148
+ 2024-08-06 10:39:59,255 INFO [trainer.py:765] (5/8) Epoch 9, batch 500, train_loss[loss=2.821, ArTop10Accuracy=0.7724, over 12003.00 frames. ], tot_loss[loss=2.843, ArTop10Accuracy=0.7627, over 10841.83 frames. ], batch size: 22, lr: 1.30e-02
149
+ 2024-08-06 10:41:29,689 INFO [trainer.py:765] (5/8) Epoch 9, batch 600, train_loss[loss=2.779, ArTop10Accuracy=0.7783, over 11358.00 frames. ], tot_loss[loss=2.844, ArTop10Accuracy=0.7625, over 11341.87 frames. ], batch size: 18, lr: 1.30e-02
150
+ 2024-08-06 10:42:58,440 INFO [trainer.py:765] (5/8) Epoch 9, batch 700, train_loss[loss=2.838, ArTop10Accuracy=0.7582, over 9384.00 frames. ], tot_loss[loss=2.849, ArTop10Accuracy=0.7613, over 11516.65 frames. ], batch size: 11, lr: 1.29e-02
151
+ 2024-08-06 10:44:02,952 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.039e+02 1.253e+02 1.352e+02 1.493e+02 7.010e+02, threshold=2.704e+02, percent-clipped=0.6
152
+ 2024-08-06 10:44:19,668 INFO [trainer.py:765] (5/8) Epoch 9, batch 800, train_loss[loss=2.791, ArTop10Accuracy=0.765, over 9273.00 frames. ], tot_loss[loss=2.853, ArTop10Accuracy=0.7604, over 11630.55 frames. ], batch size: 11, lr: 1.29e-02
153
+ 2024-08-06 10:45:35,718 INFO [trainer.py:765] (5/8) Epoch 9, batch 900, train_loss[loss=2.822, ArTop10Accuracy=0.7728, over 13212.00 frames. ], tot_loss[loss=2.848, ArTop10Accuracy=0.7615, over 11669.97 frames. ], batch size: 28, lr: 1.28e-02
154
+ 2024-08-06 10:46:51,270 INFO [trainer.py:765] (5/8) Epoch 9, batch 1000, train_loss[loss=2.769, ArTop10Accuracy=0.7772, over 12984.00 frames. ], tot_loss[loss=2.851, ArTop10Accuracy=0.7609, over 11902.65 frames. ], batch size: 27, lr: 1.28e-02
155
+ 2024-08-06 10:48:06,247 INFO [trainer.py:765] (5/8) Epoch 9, batch 1100, train_loss[loss=2.867, ArTop10Accuracy=0.7579, over 13776.00 frames. ], tot_loss[loss=2.857, ArTop10Accuracy=0.7596, over 11970.09 frames. ], batch size: 34, lr: 1.28e-02
156
+ 2024-08-06 10:49:21,053 INFO [trainer.py:765] (5/8) Epoch 9, batch 1200, train_loss[loss=2.949, ArTop10Accuracy=0.7393, over 12036.00 frames. ], tot_loss[loss=2.858, ArTop10Accuracy=0.7596, over 11856.75 frames. ], batch size: 101, lr: 1.27e-02
157
+ 2024-08-06 10:50:22,407 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
158
+ 2024-08-06 10:52:12,325 INFO [trainer.py:765] (5/8) Epoch 10, batch 100, train_loss[loss=2.852, ArTop10Accuracy=0.758, over 14463.00 frames. ], tot_loss[loss=2.84, ArTop10Accuracy=0.7628, over 4742.65 frames. ], batch size: 62, lr: 1.20e-02
159
+ 2024-08-06 10:53:44,585 INFO [trainer.py:765] (5/8) Epoch 10, batch 200, train_loss[loss=2.813, ArTop10Accuracy=0.7699, over 13758.00 frames. ], tot_loss[loss=2.831, ArTop10Accuracy=0.7645, over 7757.02 frames. ], batch size: 34, lr: 1.20e-02
160
+ 2024-08-06 10:55:08,089 INFO [trainer.py:765] (5/8) Epoch 10, batch 300, train_loss[loss=2.914, ArTop10Accuracy=0.7479, over 14004.00 frames. ], tot_loss[loss=2.827, ArTop10Accuracy=0.7654, over 9373.26 frames. ], batch size: 44, lr: 1.19e-02
161
+ 2024-08-06 10:56:41,176 INFO [trainer.py:765] (5/8) Epoch 10, batch 400, train_loss[loss=2.773, ArTop10Accuracy=0.7801, over 10770.00 frames. ], tot_loss[loss=2.822, ArTop10Accuracy=0.7663, over 10273.04 frames. ], batch size: 15, lr: 1.19e-02
162
+ 2024-08-06 10:58:04,937 INFO [trainer.py:803] (5/8) Computing validation loss
163
+ 2024-08-06 10:58:14,559 INFO [trainer.py:811] (5/8) Epoch 10, validation: loss=2.842, ArTop10Accuracy=0.7624, over 1827537.00 frames.
164
+ 2024-08-06 10:58:14,560 INFO [trainer.py:814] (5/8) Maximum memory allocated so far is 32720MB
165
+ 2024-08-06 10:58:15,573 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.228e+02 1.320e+02 1.458e+02 6.096e+02, threshold=2.641e+02, percent-clipped=0.6
166
+ 2024-08-06 10:58:15,577 INFO [trainer.py:765] (5/8) Epoch 10, batch 500, train_loss[loss=2.785, ArTop10Accuracy=0.773, over 11997.00 frames. ], tot_loss[loss=2.816, ArTop10Accuracy=0.7675, over 10852.04 frames. ], batch size: 22, lr: 1.19e-02
167
+ 2024-08-06 10:59:42,814 INFO [trainer.py:765] (5/8) Epoch 10, batch 600, train_loss[loss=2.791, ArTop10Accuracy=0.7766, over 11439.00 frames. ], tot_loss[loss=2.82, ArTop10Accuracy=0.7666, over 11373.96 frames. ], batch size: 18, lr: 1.18e-02
168
+ 2024-08-06 11:01:18,107 INFO [trainer.py:765] (5/8) Epoch 10, batch 700, train_loss[loss=2.726, ArTop10Accuracy=0.79, over 9531.00 frames. ], tot_loss[loss=2.824, ArTop10Accuracy=0.7659, over 11496.29 frames. ], batch size: 11, lr: 1.18e-02
169
+ 2024-08-06 11:02:36,917 INFO [trainer.py:765] (5/8) Epoch 10, batch 800, train_loss[loss=2.834, ArTop10Accuracy=0.7639, over 9489.00 frames. ], tot_loss[loss=2.83, ArTop10Accuracy=0.7648, over 11633.51 frames. ], batch size: 11, lr: 1.17e-02
170
+ 2024-08-06 11:03:51,212 INFO [trainer.py:765] (5/8) Epoch 10, batch 900, train_loss[loss=2.879, ArTop10Accuracy=0.7498, over 12795.00 frames. ], tot_loss[loss=2.825, ArTop10Accuracy=0.7655, over 11693.93 frames. ], batch size: 27, lr: 1.17e-02
171
+ 2024-08-06 11:05:06,351 INFO [trainer.py:765] (5/8) Epoch 10, batch 1000, train_loss[loss=2.797, ArTop10Accuracy=0.7712, over 12963.00 frames. ], tot_loss[loss=2.826, ArTop10Accuracy=0.7653, over 11888.02 frames. ], batch size: 27, lr: 1.17e-02
172
+ 2024-08-06 11:06:21,721 INFO [trainer.py:765] (5/8) Epoch 10, batch 1100, train_loss[loss=2.854, ArTop10Accuracy=0.7611, over 13578.00 frames. ], tot_loss[loss=2.833, ArTop10Accuracy=0.7641, over 11957.09 frames. ], batch size: 34, lr: 1.16e-02
173
+ 2024-08-06 11:07:34,771 INFO [trainer.py:765] (5/8) Epoch 10, batch 1200, train_loss[loss=2.927, ArTop10Accuracy=0.7442, over 12105.00 frames. ], tot_loss[loss=2.835, ArTop10Accuracy=0.7636, over 11866.59 frames. ], batch size: 101, lr: 1.16e-02
174
+ 2024-08-06 11:08:33,901 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
175
+ 2024-08-06 11:10:29,955 INFO [trainer.py:765] (5/8) Epoch 11, batch 100, train_loss[loss=2.92, ArTop10Accuracy=0.7496, over 14313.00 frames. ], tot_loss[loss=2.817, ArTop10Accuracy=0.7665, over 4766.81 frames. ], batch size: 62, lr: 1.10e-02
176
+ 2024-08-06 11:12:04,674 INFO [trainer.py:765] (5/8) Epoch 11, batch 200, train_loss[loss=2.854, ArTop10Accuracy=0.7574, over 13833.00 frames. ], tot_loss[loss=2.813, ArTop10Accuracy=0.7673, over 7738.81 frames. ], batch size: 35, lr: 1.10e-02
177
+ 2024-08-06 11:12:22,826 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 9.884e+01 1.240e+02 1.333e+02 1.457e+02 6.939e+02, threshold=2.667e+02, percent-clipped=0.1
178
+ 2024-08-06 11:13:31,549 INFO [trainer.py:765] (5/8) Epoch 11, batch 300, train_loss[loss=2.892, ArTop10Accuracy=0.757, over 14262.00 frames. ], tot_loss[loss=2.809, ArTop10Accuracy=0.7685, over 9368.06 frames. ], batch size: 44, lr: 1.09e-02
179
+ 2024-08-06 11:15:03,269 INFO [trainer.py:765] (5/8) Epoch 11, batch 400, train_loss[loss=2.728, ArTop10Accuracy=0.7802, over 10251.00 frames. ], tot_loss[loss=2.805, ArTop10Accuracy=0.7695, over 10282.75 frames. ], batch size: 14, lr: 1.09e-02
180
+ 2024-08-06 11:16:29,637 INFO [trainer.py:765] (5/8) Epoch 11, batch 500, train_loss[loss=2.883, ArTop10Accuracy=0.752, over 12291.00 frames. ], tot_loss[loss=2.802, ArTop10Accuracy=0.7702, over 10837.26 frames. ], batch size: 22, lr: 1.09e-02
181
+ 2024-08-06 11:18:00,517 INFO [trainer.py:765] (5/8) Epoch 11, batch 600, train_loss[loss=2.792, ArTop10Accuracy=0.7794, over 11952.00 frames. ], tot_loss[loss=2.804, ArTop10Accuracy=0.7697, over 11348.71 frames. ], batch size: 19, lr: 1.08e-02
182
+ 2024-08-06 11:19:34,514 INFO [trainer.py:765] (5/8) Epoch 11, batch 700, train_loss[loss=2.643, ArTop10Accuracy=0.8037, over 10194.00 frames. ], tot_loss[loss=2.808, ArTop10Accuracy=0.7689, over 11504.54 frames. ], batch size: 12, lr: 1.08e-02
183
+ 2024-08-06 11:20:55,484 INFO [trainer.py:765] (5/8) Epoch 11, batch 800, train_loss[loss=2.71, ArTop10Accuracy=0.797, over 9279.00 frames. ], tot_loss[loss=2.814, ArTop10Accuracy=0.7677, over 11622.96 frames. ], batch size: 11, lr: 1.07e-02
184
+ 2024-08-06 11:22:13,706 INFO [trainer.py:765] (5/8) Epoch 11, batch 900, train_loss[loss=2.818, ArTop10Accuracy=0.7658, over 13197.00 frames. ], tot_loss[loss=2.807, ArTop10Accuracy=0.7691, over 11678.82 frames. ], batch size: 28, lr: 1.07e-02
185
+ 2024-08-06 11:23:31,799 INFO [trainer.py:765] (5/8) Epoch 11, batch 1000, train_loss[loss=2.855, ArTop10Accuracy=0.7635, over 12798.00 frames. ], tot_loss[loss=2.812, ArTop10Accuracy=0.7684, over 11876.21 frames. ], batch size: 27, lr: 1.07e-02
186
+ 2024-08-06 11:24:46,902 INFO [trainer.py:765] (5/8) Epoch 11, batch 1100, train_loss[loss=2.822, ArTop10Accuracy=0.77, over 13593.00 frames. ], tot_loss[loss=2.817, ArTop10Accuracy=0.7673, over 11945.29 frames. ], batch size: 34, lr: 1.06e-02
187
+ 2024-08-06 11:26:00,733 INFO [trainer.py:765] (5/8) Epoch 11, batch 1200, train_loss[loss=2.923, ArTop10Accuracy=0.7414, over 12033.00 frames. ], tot_loss[loss=2.822, ArTop10Accuracy=0.7665, over 11870.12 frames. ], batch size: 101, lr: 1.06e-02
188
+ 2024-08-06 11:26:15,848 INFO [trainer.py:803] (5/8) Computing validation loss
189
+ 2024-08-06 11:26:25,556 INFO [trainer.py:811] (5/8) Epoch 11, validation: loss=2.831, ArTop10Accuracy=0.7643, over 1827537.00 frames.
190
+ 2024-08-06 11:26:25,557 INFO [trainer.py:814] (5/8) Maximum memory allocated so far is 33004MB
191
+ 2024-08-06 11:26:26,186 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.251e+02 1.335e+02 1.441e+02 2.942e+02, threshold=2.669e+02, percent-clipped=0.1
192
+ 2024-08-06 11:27:09,681 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
193
+ 2024-08-06 11:29:03,451 INFO [trainer.py:765] (5/8) Epoch 12, batch 100, train_loss[loss=2.86, ArTop10Accuracy=0.7608, over 14634.00 frames. ], tot_loss[loss=2.805, ArTop10Accuracy=0.7687, over 4753.89 frames. ], batch size: 63, lr: 1.01e-02
194
+ 2024-08-06 11:30:30,674 INFO [trainer.py:765] (5/8) Epoch 12, batch 200, train_loss[loss=2.759, ArTop10Accuracy=0.7796, over 13662.00 frames. ], tot_loss[loss=2.799, ArTop10Accuracy=0.7702, over 7750.57 frames. ], batch size: 34, lr: 1.01e-02
195
+ 2024-08-06 11:31:57,655 INFO [trainer.py:765] (5/8) Epoch 12, batch 300, train_loss[loss=2.788, ArTop10Accuracy=0.7695, over 14505.00 frames. ], tot_loss[loss=2.792, ArTop10Accuracy=0.772, over 9362.26 frames. ], batch size: 44, lr: 1.01e-02
196
+ 2024-08-06 11:33:30,737 INFO [trainer.py:765] (5/8) Epoch 12, batch 400, train_loss[loss=2.694, ArTop10Accuracy=0.7914, over 10332.00 frames. ], tot_loss[loss=2.791, ArTop10Accuracy=0.7723, over 10271.04 frames. ], batch size: 14, lr: 1.00e-02
197
+ 2024-08-06 11:34:55,733 INFO [trainer.py:765] (5/8) Epoch 12, batch 500, train_loss[loss=2.766, ArTop10Accuracy=0.7771, over 12225.00 frames. ], tot_loss[loss=2.786, ArTop10Accuracy=0.7732, over 10823.88 frames. ], batch size: 22, lr: 1.00e-02
198
+ 2024-08-06 11:36:29,361 INFO [trainer.py:765] (5/8) Epoch 12, batch 600, train_loss[loss=2.677, ArTop10Accuracy=0.792, over 11376.00 frames. ], tot_loss[loss=2.792, ArTop10Accuracy=0.7719, over 11337.58 frames. ], batch size: 18, lr: 9.97e-03
199
+ 2024-08-06 11:38:00,343 INFO [trainer.py:765] (5/8) Epoch 12, batch 700, train_loss[loss=2.785, ArTop10Accuracy=0.7774, over 9279.00 frames. ], tot_loss[loss=2.796, ArTop10Accuracy=0.7713, over 11485.09 frames. ], batch size: 11, lr: 9.93e-03
200
+ 2024-08-06 11:39:23,610 INFO [trainer.py:765] (5/8) Epoch 12, batch 800, train_loss[loss=2.656, ArTop10Accuracy=0.7993, over 10065.00 frames. ], tot_loss[loss=2.801, ArTop10Accuracy=0.7704, over 11623.21 frames. ], batch size: 12, lr: 9.90e-03
201
+ 2024-08-06 11:40:39,889 INFO [trainer.py:765] (5/8) Epoch 12, batch 900, train_loss[loss=2.795, ArTop10Accuracy=0.771, over 12933.00 frames. ], tot_loss[loss=2.797, ArTop10Accuracy=0.7712, over 11693.14 frames. ], batch size: 27, lr: 9.87e-03
202
+ 2024-08-06 11:41:13,995 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.248e+02 1.348e+02 1.459e+02 5.540e+02, threshold=2.695e+02, percent-clipped=0.3
203
+ 2024-08-06 11:41:56,189 INFO [trainer.py:765] (5/8) Epoch 12, batch 1000, train_loss[loss=2.816, ArTop10Accuracy=0.7683, over 12939.00 frames. ], tot_loss[loss=2.8, ArTop10Accuracy=0.7707, over 11895.98 frames. ], batch size: 27, lr: 9.85e-03
204
+ 2024-08-06 11:43:14,320 INFO [trainer.py:765] (5/8) Epoch 12, batch 1100, train_loss[loss=2.788, ArTop10Accuracy=0.7715, over 13668.00 frames. ], tot_loss[loss=2.803, ArTop10Accuracy=0.7701, over 11955.74 frames. ], batch size: 34, lr: 9.82e-03
205
+ 2024-08-06 11:44:26,155 INFO [trainer.py:765] (5/8) Epoch 12, batch 1200, train_loss[loss=2.949, ArTop10Accuracy=0.7443, over 12600.00 frames. ], tot_loss[loss=2.804, ArTop10Accuracy=0.7698, over 11871.36 frames. ], batch size: 101, lr: 9.79e-03
206
+ 2024-08-06 11:45:26,265 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
207
+ 2024-08-06 11:47:26,600 INFO [trainer.py:765] (5/8) Epoch 13, batch 100, train_loss[loss=2.828, ArTop10Accuracy=0.7665, over 14454.00 frames. ], tot_loss[loss=2.795, ArTop10Accuracy=0.7707, over 4769.49 frames. ], batch size: 62, lr: 9.37e-03
208
+ 2024-08-06 11:48:54,779 INFO [trainer.py:765] (5/8) Epoch 13, batch 200, train_loss[loss=2.763, ArTop10Accuracy=0.7791, over 13644.00 frames. ], tot_loss[loss=2.785, ArTop10Accuracy=0.773, over 7764.91 frames. ], batch size: 34, lr: 9.34e-03
209
+ 2024-08-06 11:50:20,516 INFO [trainer.py:765] (5/8) Epoch 13, batch 300, train_loss[loss=2.849, ArTop10Accuracy=0.7621, over 14187.00 frames. ], tot_loss[loss=2.778, ArTop10Accuracy=0.7741, over 9372.51 frames. ], batch size: 45, lr: 9.31e-03
210
+ 2024-08-06 11:51:48,765 INFO [trainer.py:765] (5/8) Epoch 13, batch 400, train_loss[loss=2.674, ArTop10Accuracy=0.7952, over 10284.00 frames. ], tot_loss[loss=2.777, ArTop10Accuracy=0.7745, over 10272.45 frames. ], batch size: 14, lr: 9.28e-03
211
+ 2024-08-06 11:53:13,407 INFO [trainer.py:765] (5/8) Epoch 13, batch 500, train_loss[loss=2.734, ArTop10Accuracy=0.7823, over 12261.00 frames. ], tot_loss[loss=2.77, ArTop10Accuracy=0.7759, over 10847.65 frames. ], batch size: 22, lr: 9.26e-03
212
+ 2024-08-06 11:54:52,223 INFO [trainer.py:765] (5/8) Epoch 13, batch 600, train_loss[loss=2.73, ArTop10Accuracy=0.7878, over 11385.00 frames. ], tot_loss[loss=2.777, ArTop10Accuracy=0.7746, over 11374.02 frames. ], batch size: 18, lr: 9.23e-03
213
+ 2024-08-06 11:55:47,080 INFO [trainer.py:803] (5/8) Computing validation loss
214
+ 2024-08-06 11:55:56,835 INFO [trainer.py:811] (5/8) Epoch 13, validation: loss=2.824, ArTop10Accuracy=0.7662, over 1827537.00 frames.
215
+ 2024-08-06 11:55:56,835 INFO [trainer.py:814] (5/8) Maximum memory allocated so far is 33004MB
216
+ 2024-08-06 11:55:57,712 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.255e+02 1.343e+02 1.452e+02 4.888e+02, threshold=2.687e+02, percent-clipped=0.1
217
+ 2024-08-06 11:56:28,465 INFO [trainer.py:765] (5/8) Epoch 13, batch 700, train_loss[loss=2.767, ArTop10Accuracy=0.7719, over 10383.00 frames. ], tot_loss[loss=2.779, ArTop10Accuracy=0.7743, over 11511.62 frames. ], batch size: 12, lr: 9.20e-03
218
+ 2024-08-06 11:57:46,684 INFO [trainer.py:765] (5/8) Epoch 13, batch 800, train_loss[loss=2.706, ArTop10Accuracy=0.789, over 10110.00 frames. ], tot_loss[loss=2.783, ArTop10Accuracy=0.7736, over 11648.41 frames. ], batch size: 12, lr: 9.18e-03
219
+ 2024-08-06 11:59:03,287 INFO [trainer.py:765] (5/8) Epoch 13, batch 900, train_loss[loss=2.762, ArTop10Accuracy=0.7781, over 12837.00 frames. ], tot_loss[loss=2.78, ArTop10Accuracy=0.7742, over 11698.79 frames. ], batch size: 27, lr: 9.15e-03
220
+ 2024-08-06 12:00:19,174 INFO [trainer.py:765] (5/8) Epoch 13, batch 1000, train_loss[loss=2.702, ArTop10Accuracy=0.7918, over 13026.00 frames. ], tot_loss[loss=2.783, ArTop10Accuracy=0.774, over 11892.62 frames. ], batch size: 27, lr: 9.13e-03
221
+ 2024-08-06 12:01:34,881 INFO [trainer.py:765] (5/8) Epoch 13, batch 1100, train_loss[loss=2.782, ArTop10Accuracy=0.7729, over 13821.00 frames. ], tot_loss[loss=2.791, ArTop10Accuracy=0.7723, over 11969.07 frames. ], batch size: 34, lr: 9.10e-03
222
+ 2024-08-06 12:02:48,663 INFO [trainer.py:765] (5/8) Epoch 13, batch 1200, train_loss[loss=2.887, ArTop10Accuracy=0.7551, over 12504.00 frames. ], tot_loss[loss=2.79, ArTop10Accuracy=0.7727, over 11873.91 frames. ], batch size: 101, lr: 9.08e-03
223
+ 2024-08-06 12:03:48,484 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
224
+ 2024-08-06 12:05:45,334 INFO [trainer.py:765] (5/8) Epoch 14, batch 100, train_loss[loss=2.834, ArTop10Accuracy=0.7626, over 14121.00 frames. ], tot_loss[loss=2.778, ArTop10Accuracy=0.7746, over 4755.27 frames. ], batch size: 62, lr: 8.71e-03
225
+ 2024-08-06 12:07:16,604 INFO [trainer.py:765] (5/8) Epoch 14, batch 200, train_loss[loss=2.797, ArTop10Accuracy=0.7675, over 13734.00 frames. ], tot_loss[loss=2.767, ArTop10Accuracy=0.7765, over 7756.67 frames. ], batch size: 34, lr: 8.69e-03
226
+ 2024-08-06 12:08:44,311 INFO [trainer.py:765] (5/8) Epoch 14, batch 300, train_loss[loss=2.841, ArTop10Accuracy=0.76, over 14163.00 frames. ], tot_loss[loss=2.766, ArTop10Accuracy=0.7764, over 9360.38 frames. ], batch size: 44, lr: 8.66e-03
227
+ 2024-08-06 12:10:01,130 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.266e+02 1.374e+02 1.483e+02 6.480e+02, threshold=2.748e+02, percent-clipped=0.2
228
+ 2024-08-06 12:10:10,226 INFO [trainer.py:765] (5/8) Epoch 14, batch 400, train_loss[loss=2.808, ArTop10Accuracy=0.7624, over 10140.00 frames. ], tot_loss[loss=2.766, ArTop10Accuracy=0.7767, over 10281.93 frames. ], batch size: 14, lr: 8.64e-03
229
+ 2024-08-06 12:11:36,150 INFO [trainer.py:765] (5/8) Epoch 14, batch 500, train_loss[loss=2.77, ArTop10Accuracy=0.7749, over 12003.00 frames. ], tot_loss[loss=2.758, ArTop10Accuracy=0.7782, over 10833.99 frames. ], batch size: 22, lr: 8.62e-03
230
+ 2024-08-06 12:13:05,993 INFO [trainer.py:765] (5/8) Epoch 14, batch 600, train_loss[loss=2.66, ArTop10Accuracy=0.7942, over 11427.00 frames. ], tot_loss[loss=2.764, ArTop10Accuracy=0.7773, over 11372.99 frames. ], batch size: 18, lr: 8.59e-03
231
+ 2024-08-06 12:14:38,552 INFO [trainer.py:765] (5/8) Epoch 14, batch 700, train_loss[loss=2.681, ArTop10Accuracy=0.7937, over 10236.00 frames. ], tot_loss[loss=2.769, ArTop10Accuracy=0.7763, over 11512.30 frames. ], batch size: 12, lr: 8.57e-03
232
+ 2024-08-06 12:15:58,069 INFO [trainer.py:765] (5/8) Epoch 14, batch 800, train_loss[loss=2.677, ArTop10Accuracy=0.7916, over 9333.00 frames. ], tot_loss[loss=2.771, ArTop10Accuracy=0.7759, over 11636.50 frames. ], batch size: 11, lr: 8.55e-03
233
+ 2024-08-06 12:17:12,866 INFO [trainer.py:765] (5/8) Epoch 14, batch 900, train_loss[loss=2.798, ArTop10Accuracy=0.7775, over 12933.00 frames. ], tot_loss[loss=2.764, ArTop10Accuracy=0.7774, over 11669.10 frames. ], batch size: 27, lr: 8.52e-03
234
+ 2024-08-06 12:18:29,615 INFO [trainer.py:765] (5/8) Epoch 14, batch 1000, train_loss[loss=2.859, ArTop10Accuracy=0.7621, over 13071.00 frames. ], tot_loss[loss=2.769, ArTop10Accuracy=0.7765, over 11877.89 frames. ], batch size: 27, lr: 8.50e-03
235
+ 2024-08-06 12:19:45,377 INFO [trainer.py:765] (5/8) Epoch 14, batch 1100, train_loss[loss=2.76, ArTop10Accuracy=0.7808, over 13647.00 frames. ], tot_loss[loss=2.778, ArTop10Accuracy=0.7747, over 11945.73 frames. ], batch size: 34, lr: 8.48e-03
236
+ 2024-08-06 12:20:59,279 INFO [trainer.py:765] (5/8) Epoch 14, batch 1200, train_loss[loss=2.936, ArTop10Accuracy=0.7421, over 11955.00 frames. ], tot_loss[loss=2.779, ArTop10Accuracy=0.7744, over 11860.24 frames. ], batch size: 101, lr: 8.46e-03
237
+ 2024-08-06 12:21:57,889 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
238
+ 2024-08-06 12:23:51,962 INFO [trainer.py:765] (5/8) Epoch 15, batch 100, train_loss[loss=2.862, ArTop10Accuracy=0.7631, over 14766.00 frames. ], tot_loss[loss=2.767, ArTop10Accuracy=0.7762, over 4772.35 frames. ], batch size: 63, lr: 8.14e-03
239
+ 2024-08-06 12:24:00,598 INFO [trainer.py:803] (5/8) Computing validation loss
240
+ 2024-08-06 12:24:10,290 INFO [trainer.py:811] (5/8) Epoch 15, validation: loss=2.819, ArTop10Accuracy=0.7675, over 1827537.00 frames.
241
+ 2024-08-06 12:24:10,291 INFO [trainer.py:814] (5/8) Maximum memory allocated so far is 33004MB
242
+ 2024-08-06 12:24:11,094 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.284e+02 1.371e+02 1.488e+02 4.667e+02, threshold=2.743e+02, percent-clipped=0.2
243
+ 2024-08-06 12:25:29,989 INFO [trainer.py:765] (5/8) Epoch 15, batch 200, train_loss[loss=2.767, ArTop10Accuracy=0.7776, over 13629.00 frames. ], tot_loss[loss=2.759, ArTop10Accuracy=0.7778, over 7753.42 frames. ], batch size: 34, lr: 8.12e-03
244
+ 2024-08-06 12:26:58,695 INFO [trainer.py:765] (5/8) Epoch 15, batch 300, train_loss[loss=2.774, ArTop10Accuracy=0.7769, over 13872.00 frames. ], tot_loss[loss=2.75, ArTop10Accuracy=0.7796, over 9374.66 frames. ], batch size: 44, lr: 8.09e-03
245
+ 2024-08-06 12:28:28,535 INFO [trainer.py:765] (5/8) Epoch 15, batch 400, train_loss[loss=2.682, ArTop10Accuracy=0.7932, over 10188.00 frames. ], tot_loss[loss=2.748, ArTop10Accuracy=0.7801, over 10289.44 frames. ], batch size: 14, lr: 8.07e-03
246
+ 2024-08-06 12:29:54,031 INFO [trainer.py:765] (5/8) Epoch 15, batch 500, train_loss[loss=2.693, ArTop10Accuracy=0.7891, over 12228.00 frames. ], tot_loss[loss=2.747, ArTop10Accuracy=0.7803, over 10854.86 frames. ], batch size: 22, lr: 8.05e-03
247
+ 2024-08-06 12:31:23,293 INFO [trainer.py:765] (5/8) Epoch 15, batch 600, train_loss[loss=2.795, ArTop10Accuracy=0.7702, over 11559.00 frames. ], tot_loss[loss=2.751, ArTop10Accuracy=0.7795, over 11372.03 frames. ], batch size: 18, lr: 8.03e-03
248
+ 2024-08-06 12:32:53,176 INFO [trainer.py:765] (5/8) Epoch 15, batch 700, train_loss[loss=2.946, ArTop10Accuracy=0.7441, over 9426.00 frames. ], tot_loss[loss=2.755, ArTop10Accuracy=0.7785, over 11509.96 frames. ], batch size: 11, lr: 8.01e-03
249
+ 2024-08-06 12:34:18,254 INFO [trainer.py:765] (5/8) Epoch 15, batch 800, train_loss[loss=2.731, ArTop10Accuracy=0.7853, over 10077.00 frames. ], tot_loss[loss=2.759, ArTop10Accuracy=0.7778, over 11655.46 frames. ], batch size: 12, lr: 7.99e-03
250
+ 2024-08-06 12:35:34,726 INFO [trainer.py:765] (5/8) Epoch 15, batch 900, train_loss[loss=2.748, ArTop10Accuracy=0.7775, over 12918.00 frames. ], tot_loss[loss=2.757, ArTop10Accuracy=0.7782, over 11703.63 frames. ], batch size: 27, lr: 7.97e-03
251
+ 2024-08-06 12:36:50,540 INFO [trainer.py:765] (5/8) Epoch 15, batch 1000, train_loss[loss=2.679, ArTop10Accuracy=0.7947, over 12975.00 frames. ], tot_loss[loss=2.758, ArTop10Accuracy=0.7781, over 11898.75 frames. ], batch size: 27, lr: 7.95e-03
252
+ 2024-08-06 12:38:05,179 INFO [trainer.py:765] (5/8) Epoch 15, batch 1100, train_loss[loss=2.721, ArTop10Accuracy=0.7841, over 13785.00 frames. ], tot_loss[loss=2.765, ArTop10Accuracy=0.7767, over 11973.07 frames. ], batch size: 34, lr: 7.93e-03
253
+ 2024-08-06 12:38:12,841 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.293e+02 1.379e+02 1.467e+02 2.824e+02, threshold=2.759e+02, percent-clipped=0.1
254
+ 2024-08-06 12:39:18,789 INFO [trainer.py:765] (5/8) Epoch 15, batch 1200, train_loss[loss=2.935, ArTop10Accuracy=0.7409, over 12156.00 frames. ], tot_loss[loss=2.767, ArTop10Accuracy=0.7763, over 11863.35 frames. ], batch size: 101, lr: 7.91e-03
255
+ 2024-08-06 12:40:18,769 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
256
+ 2024-08-06 12:42:17,619 INFO [trainer.py:765] (5/8) Epoch 16, batch 100, train_loss[loss=2.761, ArTop10Accuracy=0.7795, over 14676.00 frames. ], tot_loss[loss=2.752, ArTop10Accuracy=0.779, over 4775.40 frames. ], batch size: 62, lr: 7.63e-03
257
+ 2024-08-06 12:43:49,563 INFO [trainer.py:765] (5/8) Epoch 16, batch 200, train_loss[loss=2.658, ArTop10Accuracy=0.7943, over 13419.00 frames. ], tot_loss[loss=2.746, ArTop10Accuracy=0.7803, over 7764.60 frames. ], batch size: 34, lr: 7.61e-03
258
+ 2024-08-06 12:45:18,502 INFO [trainer.py:765] (5/8) Epoch 16, batch 300, train_loss[loss=2.827, ArTop10Accuracy=0.7627, over 14460.00 frames. ], tot_loss[loss=2.742, ArTop10Accuracy=0.7812, over 9390.93 frames. ], batch size: 44, lr: 7.59e-03
259
+ 2024-08-06 12:46:45,209 INFO [trainer.py:765] (5/8) Epoch 16, batch 400, train_loss[loss=2.647, ArTop10Accuracy=0.8053, over 10230.00 frames. ], tot_loss[loss=2.739, ArTop10Accuracy=0.7819, over 10315.27 frames. ], batch size: 14, lr: 7.58e-03
260
+ 2024-08-06 12:48:16,311 INFO [trainer.py:765] (5/8) Epoch 16, batch 500, train_loss[loss=2.762, ArTop10Accuracy=0.7751, over 12336.00 frames. ], tot_loss[loss=2.737, ArTop10Accuracy=0.7821, over 10875.67 frames. ], batch size: 22, lr: 7.56e-03
261
+ 2024-08-06 12:49:46,642 INFO [trainer.py:765] (5/8) Epoch 16, batch 600, train_loss[loss=2.76, ArTop10Accuracy=0.7807, over 11430.00 frames. ], tot_loss[loss=2.74, ArTop10Accuracy=0.7817, over 11381.50 frames. ], batch size: 18, lr: 7.54e-03
262
+ 2024-08-06 12:51:23,681 INFO [trainer.py:765] (5/8) Epoch 16, batch 700, train_loss[loss=2.55, ArTop10Accuracy=0.8138, over 9411.00 frames. ], tot_loss[loss=2.74, ArTop10Accuracy=0.7817, over 11524.49 frames. ], batch size: 11, lr: 7.52e-03
263
+ 2024-08-06 12:52:43,501 INFO [trainer.py:765] (5/8) Epoch 16, batch 800, train_loss[loss=2.647, ArTop10Accuracy=0.8003, over 10152.00 frames. ], tot_loss[loss=2.745, ArTop10Accuracy=0.7807, over 11650.13 frames. ], batch size: 12, lr: 7.51e-03
264
+ 2024-08-06 12:53:06,015 INFO [trainer.py:803] (5/8) Computing validation loss
265
+ 2024-08-06 12:53:15,497 INFO [trainer.py:811] (5/8) Epoch 16, validation: loss=2.816, ArTop10Accuracy=0.7678, over 1827537.00 frames.
266
+ 2024-08-06 12:53:15,497 INFO [trainer.py:814] (5/8) Maximum memory allocated so far is 33004MB
267
+ 2024-08-06 12:53:16,186 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.291e+02 1.391e+02 1.487e+02 3.459e+02, threshold=2.783e+02, percent-clipped=0.1
268
+ 2024-08-06 12:54:06,480 INFO [trainer.py:765] (5/8) Epoch 16, batch 900, train_loss[loss=2.733, ArTop10Accuracy=0.7827, over 13089.00 frames. ], tot_loss[loss=2.74, ArTop10Accuracy=0.7817, over 11714.19 frames. ], batch size: 27, lr: 7.49e-03
269
+ 2024-08-06 12:55:19,791 INFO [trainer.py:765] (5/8) Epoch 16, batch 1000, train_loss[loss=2.714, ArTop10Accuracy=0.7895, over 12903.00 frames. ], tot_loss[loss=2.744, ArTop10Accuracy=0.7809, over 11908.25 frames. ], batch size: 27, lr: 7.47e-03
270
+ 2024-08-06 12:56:33,163 INFO [trainer.py:765] (5/8) Epoch 16, batch 1100, train_loss[loss=2.734, ArTop10Accuracy=0.7784, over 13548.00 frames. ], tot_loss[loss=2.756, ArTop10Accuracy=0.7786, over 11968.05 frames. ], batch size: 34, lr: 7.45e-03
271
+ 2024-08-06 12:57:48,485 INFO [trainer.py:765] (5/8) Epoch 16, batch 1200, train_loss[loss=2.914, ArTop10Accuracy=0.7486, over 12042.00 frames. ], tot_loss[loss=2.755, ArTop10Accuracy=0.7789, over 11883.70 frames. ], batch size: 101, lr: 7.44e-03
272
+ 2024-08-06 12:58:48,462 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
273
+ 2024-08-06 13:00:47,900 INFO [trainer.py:765] (5/8) Epoch 17, batch 100, train_loss[loss=2.79, ArTop10Accuracy=0.7739, over 14514.00 frames. ], tot_loss[loss=2.744, ArTop10Accuracy=0.7802, over 4746.89 frames. ], batch size: 62, lr: 7.18e-03
274
+ 2024-08-06 13:02:19,302 INFO [trainer.py:765] (5/8) Epoch 17, batch 200, train_loss[loss=2.762, ArTop10Accuracy=0.7752, over 13503.00 frames. ], tot_loss[loss=2.737, ArTop10Accuracy=0.7818, over 7745.07 frames. ], batch size: 34, lr: 7.17e-03
275
+ 2024-08-06 13:03:45,517 INFO [trainer.py:765] (5/8) Epoch 17, batch 300, train_loss[loss=2.704, ArTop10Accuracy=0.7863, over 14109.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.7837, over 9363.13 frames. ], batch size: 44, lr: 7.15e-03
276
+ 2024-08-06 13:05:21,760 INFO [trainer.py:765] (5/8) Epoch 17, batch 400, train_loss[loss=2.597, ArTop10Accuracy=0.8068, over 10308.00 frames. ], tot_loss[loss=2.727, ArTop10Accuracy=0.784, over 10298.50 frames. ], batch size: 14, lr: 7.14e-03
277
+ 2024-08-06 13:06:47,021 INFO [trainer.py:765] (5/8) Epoch 17, batch 500, train_loss[loss=2.65, ArTop10Accuracy=0.7994, over 12138.00 frames. ], tot_loss[loss=2.723, ArTop10Accuracy=0.7848, over 10855.95 frames. ], batch size: 22, lr: 7.12e-03
278
+ 2024-08-06 13:07:39,878 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.293e+02 1.386e+02 1.488e+02 3.253e+02, threshold=2.772e+02, percent-clipped=0.1
279
+ 2024-08-06 13:08:22,688 INFO [trainer.py:765] (5/8) Epoch 17, batch 600, train_loss[loss=2.731, ArTop10Accuracy=0.7827, over 11508.00 frames. ], tot_loss[loss=2.729, ArTop10Accuracy=0.7836, over 11353.38 frames. ], batch size: 18, lr: 7.10e-03
280
+ 2024-08-06 13:09:54,835 INFO [trainer.py:765] (5/8) Epoch 17, batch 700, train_loss[loss=2.661, ArTop10Accuracy=0.8012, over 10047.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.7837, over 11499.97 frames. ], batch size: 12, lr: 7.09e-03
281
+ 2024-08-06 13:11:19,480 INFO [trainer.py:765] (5/8) Epoch 17, batch 800, train_loss[loss=2.683, ArTop10Accuracy=0.7903, over 9426.00 frames. ], tot_loss[loss=2.731, ArTop10Accuracy=0.7833, over 11612.45 frames. ], batch size: 11, lr: 7.07e-03
282
+ 2024-08-06 13:12:35,669 INFO [trainer.py:765] (5/8) Epoch 17, batch 900, train_loss[loss=2.679, ArTop10Accuracy=0.7909, over 13299.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.784, over 11667.58 frames. ], batch size: 28, lr: 7.06e-03
283
+ 2024-08-06 13:13:53,061 INFO [trainer.py:765] (5/8) Epoch 17, batch 1000, train_loss[loss=2.747, ArTop10Accuracy=0.7768, over 12849.00 frames. ], tot_loss[loss=2.733, ArTop10Accuracy=0.7829, over 11871.51 frames. ], batch size: 27, lr: 7.04e-03
284
+ 2024-08-06 13:15:08,484 INFO [trainer.py:765] (5/8) Epoch 17, batch 1100, train_loss[loss=2.783, ArTop10Accuracy=0.7756, over 13917.00 frames. ], tot_loss[loss=2.743, ArTop10Accuracy=0.781, over 11963.27 frames. ], batch size: 34, lr: 7.02e-03
285
+ 2024-08-06 13:16:22,388 INFO [trainer.py:765] (5/8) Epoch 17, batch 1200, train_loss[loss=2.881, ArTop10Accuracy=0.7528, over 11922.00 frames. ], tot_loss[loss=2.742, ArTop10Accuracy=0.7812, over 11883.35 frames. ], batch size: 101, lr: 7.01e-03
286
+ 2024-08-06 13:17:21,256 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
287
+ 2024-08-06 13:19:15,994 INFO [trainer.py:765] (5/8) Epoch 18, batch 100, train_loss[loss=2.836, ArTop10Accuracy=0.7618, over 14517.00 frames. ], tot_loss[loss=2.734, ArTop10Accuracy=0.7823, over 4741.07 frames. ], batch size: 62, lr: 6.78e-03
288
+ 2024-08-06 13:20:46,600 INFO [trainer.py:765] (5/8) Epoch 18, batch 200, train_loss[loss=2.73, ArTop10Accuracy=0.7836, over 13572.00 frames. ], tot_loss[loss=2.723, ArTop10Accuracy=0.7844, over 7726.79 frames. ], batch size: 34, lr: 6.77e-03
289
+ 2024-08-06 13:21:55,105 INFO [trainer.py:803] (5/8) Computing validation loss
290
+ 2024-08-06 13:22:04,751 INFO [trainer.py:811] (5/8) Epoch 18, validation: loss=2.817, ArTop10Accuracy=0.768, over 1827537.00 frames.
291
+ 2024-08-06 13:22:04,752 INFO [trainer.py:814] (5/8) Maximum memory allocated so far is 33004MB
292
+ 2024-08-06 13:22:05,473 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.323e+02 1.409e+02 1.514e+02 3.209e+02, threshold=2.818e+02, percent-clipped=0.1
293
+ 2024-08-06 13:22:26,581 INFO [trainer.py:765] (5/8) Epoch 18, batch 300, train_loss[loss=2.857, ArTop10Accuracy=0.7546, over 14451.00 frames. ], tot_loss[loss=2.716, ArTop10Accuracy=0.7859, over 9365.24 frames. ], batch size: 44, lr: 6.76e-03
294
+ 2024-08-06 13:23:57,930 INFO [trainer.py:765] (5/8) Epoch 18, batch 400, train_loss[loss=2.635, ArTop10Accuracy=0.7974, over 10905.00 frames. ], tot_loss[loss=2.714, ArTop10Accuracy=0.7862, over 10266.45 frames. ], batch size: 15, lr: 6.74e-03
295
+ 2024-08-06 13:25:34,013 INFO [trainer.py:765] (5/8) Epoch 18, batch 500, train_loss[loss=2.698, ArTop10Accuracy=0.7882, over 12288.00 frames. ], tot_loss[loss=2.708, ArTop10Accuracy=0.7875, over 10841.88 frames. ], batch size: 22, lr: 6.73e-03
296
+ 2024-08-06 13:27:00,633 INFO [trainer.py:765] (5/8) Epoch 18, batch 600, train_loss[loss=2.6, ArTop10Accuracy=0.8053, over 11361.00 frames. ], tot_loss[loss=2.715, ArTop10Accuracy=0.786, over 11357.44 frames. ], batch size: 18, lr: 6.71e-03
297
+ 2024-08-06 13:28:33,583 INFO [trainer.py:765] (5/8) Epoch 18, batch 700, train_loss[loss=2.631, ArTop10Accuracy=0.7964, over 9192.00 frames. ], tot_loss[loss=2.72, ArTop10Accuracy=0.785, over 11507.07 frames. ], batch size: 11, lr: 6.70e-03
298
+ 2024-08-06 13:29:54,984 INFO [trainer.py:765] (5/8) Epoch 18, batch 800, train_loss[loss=2.628, ArTop10Accuracy=0.8056, over 9312.00 frames. ], tot_loss[loss=2.721, ArTop10Accuracy=0.7848, over 11606.69 frames. ], batch size: 11, lr: 6.68e-03
299
+ 2024-08-06 13:31:12,519 INFO [trainer.py:765] (5/8) Epoch 18, batch 900, train_loss[loss=2.791, ArTop10Accuracy=0.7691, over 12933.00 frames. ], tot_loss[loss=2.719, ArTop10Accuracy=0.7852, over 11666.15 frames. ], batch size: 27, lr: 6.67e-03
300
+ 2024-08-06 13:32:26,551 INFO [trainer.py:765] (5/8) Epoch 18, batch 1000, train_loss[loss=2.696, ArTop10Accuracy=0.7862, over 12858.00 frames. ], tot_loss[loss=2.729, ArTop10Accuracy=0.7835, over 11872.20 frames. ], batch size: 27, lr: 6.66e-03
301
+ 2024-08-06 13:33:41,497 INFO [trainer.py:765] (5/8) Epoch 18, batch 1100, train_loss[loss=2.694, ArTop10Accuracy=0.788, over 13674.00 frames. ], tot_loss[loss=2.736, ArTop10Accuracy=0.7822, over 11940.32 frames. ], batch size: 34, lr: 6.64e-03
302
+ 2024-08-06 13:34:54,674 INFO [trainer.py:765] (5/8) Epoch 18, batch 1200, train_loss[loss=2.855, ArTop10Accuracy=0.7583, over 12042.00 frames. ], tot_loss[loss=2.736, ArTop10Accuracy=0.782, over 11858.01 frames. ], batch size: 101, lr: 6.63e-03
303
+ 2024-08-06 13:35:51,064 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.340e+02 1.433e+02 1.533e+02 2.444e+02, threshold=2.867e+02, percent-clipped=0.0
304
+ 2024-08-06 13:35:54,247 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
305
+ 2024-08-06 13:37:48,624 INFO [trainer.py:765] (5/8) Epoch 19, batch 100, train_loss[loss=2.784, ArTop10Accuracy=0.7731, over 14520.00 frames. ], tot_loss[loss=2.72, ArTop10Accuracy=0.7851, over 4772.92 frames. ], batch size: 62, lr: 6.43e-03
306
+ 2024-08-06 13:39:23,257 INFO [trainer.py:765] (5/8) Epoch 19, batch 200, train_loss[loss=2.721, ArTop10Accuracy=0.7922, over 13578.00 frames. ], tot_loss[loss=2.714, ArTop10Accuracy=0.7863, over 7762.37 frames. ], batch size: 34, lr: 6.41e-03
307
+ 2024-08-06 13:40:48,360 INFO [trainer.py:765] (5/8) Epoch 19, batch 300, train_loss[loss=2.743, ArTop10Accuracy=0.7811, over 14157.00 frames. ], tot_loss[loss=2.712, ArTop10Accuracy=0.7867, over 9395.02 frames. ], batch size: 44, lr: 6.40e-03
308
+ 2024-08-06 13:42:21,067 INFO [trainer.py:765] (5/8) Epoch 19, batch 400, train_loss[loss=2.584, ArTop10Accuracy=0.8123, over 10896.00 frames. ], tot_loss[loss=2.706, ArTop10Accuracy=0.7878, over 10297.34 frames. ], batch size: 15, lr: 6.39e-03
309
+ 2024-08-06 13:43:44,955 INFO [trainer.py:765] (5/8) Epoch 19, batch 500, train_loss[loss=2.74, ArTop10Accuracy=0.7834, over 12279.00 frames. ], tot_loss[loss=2.701, ArTop10Accuracy=0.7886, over 10864.55 frames. ], batch size: 22, lr: 6.37e-03
310
+ 2024-08-06 13:45:16,682 INFO [trainer.py:765] (5/8) Epoch 19, batch 600, train_loss[loss=2.702, ArTop10Accuracy=0.7867, over 11439.00 frames. ], tot_loss[loss=2.706, ArTop10Accuracy=0.788, over 11385.29 frames. ], batch size: 18, lr: 6.36e-03
311
+ 2024-08-06 13:46:48,324 INFO [trainer.py:765] (5/8) Epoch 19, batch 700, train_loss[loss=2.638, ArTop10Accuracy=0.8023, over 10281.00 frames. ], tot_loss[loss=2.71, ArTop10Accuracy=0.7873, over 11528.57 frames. ], batch size: 12, lr: 6.35e-03
312
+ 2024-08-06 13:48:11,884 INFO [trainer.py:765] (5/8) Epoch 19, batch 800, train_loss[loss=2.69, ArTop10Accuracy=0.7886, over 9447.00 frames. ], tot_loss[loss=2.716, ArTop10Accuracy=0.786, over 11643.01 frames. ], batch size: 11, lr: 6.34e-03
313
+ 2024-08-06 13:49:27,259 INFO [trainer.py:765] (5/8) Epoch 19, batch 900, train_loss[loss=2.693, ArTop10Accuracy=0.7895, over 12918.00 frames. ], tot_loss[loss=2.709, ArTop10Accuracy=0.7871, over 11699.57 frames. ], batch size: 27, lr: 6.32e-03
314
+ 2024-08-06 13:50:40,655 INFO [trainer.py:803] (5/8) Computing validation loss
315
+ 2024-08-06 13:50:50,537 INFO [trainer.py:811] (5/8) Epoch 19, validation: loss=2.818, ArTop10Accuracy=0.7679, over 1827537.00 frames.
316
+ 2024-08-06 13:50:50,537 INFO [trainer.py:814] (5/8) Maximum memory allocated so far is 33004MB
317
+ 2024-08-06 13:50:51,490 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.371e+02 1.455e+02 1.550e+02 3.697e+02, threshold=2.909e+02, percent-clipped=0.2
318
+ 2024-08-06 13:50:52,916 INFO [trainer.py:765] (5/8) Epoch 19, batch 1000, train_loss[loss=2.719, ArTop10Accuracy=0.7876, over 12822.00 frames. ], tot_loss[loss=2.717, ArTop10Accuracy=0.7857, over 11884.36 frames. ], batch size: 27, lr: 6.31e-03
319
+ 2024-08-06 13:52:08,266 INFO [trainer.py:765] (5/8) Epoch 19, batch 1100, train_loss[loss=2.753, ArTop10Accuracy=0.7806, over 13827.00 frames. ], tot_loss[loss=2.725, ArTop10Accuracy=0.7842, over 11932.60 frames. ], batch size: 34, lr: 6.30e-03
320
+ 2024-08-06 13:53:22,314 INFO [trainer.py:765] (5/8) Epoch 19, batch 1200, train_loss[loss=2.875, ArTop10Accuracy=0.754, over 12696.00 frames. ], tot_loss[loss=2.727, ArTop10Accuracy=0.7839, over 11864.89 frames. ], batch size: 101, lr: 6.28e-03
321
+ 2024-08-06 13:54:21,608 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
322
+ 2024-08-06 13:56:12,905 INFO [trainer.py:765] (5/8) Epoch 20, batch 100, train_loss[loss=2.771, ArTop10Accuracy=0.774, over 14373.00 frames. ], tot_loss[loss=2.713, ArTop10Accuracy=0.7863, over 4760.92 frames. ], batch size: 62, lr: 6.10e-03
323
+ 2024-08-06 13:57:42,495 INFO [trainer.py:765] (5/8) Epoch 20, batch 200, train_loss[loss=2.687, ArTop10Accuracy=0.7925, over 13419.00 frames. ], tot_loss[loss=2.704, ArTop10Accuracy=0.788, over 7758.04 frames. ], batch size: 34, lr: 6.09e-03
324
+ 2024-08-06 13:59:15,430 INFO [trainer.py:765] (5/8) Epoch 20, batch 300, train_loss[loss=2.741, ArTop10Accuracy=0.7801, over 13872.00 frames. ], tot_loss[loss=2.698, ArTop10Accuracy=0.7891, over 9376.52 frames. ], batch size: 44, lr: 6.08e-03
325
+ 2024-08-06 14:00:44,357 INFO [trainer.py:765] (5/8) Epoch 20, batch 400, train_loss[loss=2.606, ArTop10Accuracy=0.8038, over 10281.00 frames. ], tot_loss[loss=2.703, ArTop10Accuracy=0.7882, over 10286.27 frames. ], batch size: 14, lr: 6.07e-03
326
+ 2024-08-06 14:02:14,855 INFO [trainer.py:765] (5/8) Epoch 20, batch 500, train_loss[loss=2.693, ArTop10Accuracy=0.7924, over 12243.00 frames. ], tot_loss[loss=2.7, ArTop10Accuracy=0.7887, over 10847.60 frames. ], batch size: 22, lr: 6.06e-03
327
+ 2024-08-06 14:03:40,856 INFO [trainer.py:765] (5/8) Epoch 20, batch 600, train_loss[loss=2.575, ArTop10Accuracy=0.8142, over 11583.00 frames. ], tot_loss[loss=2.701, ArTop10Accuracy=0.7885, over 11371.01 frames. ], batch size: 18, lr: 6.04e-03
328
+ 2024-08-06 14:05:13,864 INFO [trainer.py:765] (5/8) Epoch 20, batch 700, train_loss[loss=2.636, ArTop10Accuracy=0.8101, over 10137.00 frames. ], tot_loss[loss=2.704, ArTop10Accuracy=0.788, over 11525.42 frames. ], batch size: 12, lr: 6.03e-03
329
+ 2024-08-06 14:05:30,791 INFO [optim.py:386] (5/8) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.365e+02 1.456e+02 1.550e+02 3.525e+02, threshold=2.913e+02, percent-clipped=0.1
330
+ 2024-08-06 14:06:34,509 INFO [trainer.py:765] (5/8) Epoch 20, batch 800, train_loss[loss=2.63, ArTop10Accuracy=0.805, over 9363.00 frames. ], tot_loss[loss=2.708, ArTop10Accuracy=0.7871, over 11647.49 frames. ], batch size: 11, lr: 6.02e-03
331
+ 2024-08-06 14:07:50,944 INFO [trainer.py:765] (5/8) Epoch 20, batch 900, train_loss[loss=2.69, ArTop10Accuracy=0.7948, over 12888.00 frames. ], tot_loss[loss=2.704, ArTop10Accuracy=0.7881, over 11693.94 frames. ], batch size: 27, lr: 6.01e-03
332
+ 2024-08-06 14:09:07,173 INFO [trainer.py:765] (5/8) Epoch 20, batch 1000, train_loss[loss=2.754, ArTop10Accuracy=0.7762, over 12837.00 frames. ], tot_loss[loss=2.709, ArTop10Accuracy=0.7873, over 11889.47 frames. ], batch size: 27, lr: 6.00e-03
333
+ 2024-08-06 14:10:21,209 INFO [trainer.py:765] (5/8) Epoch 20, batch 1100, train_loss[loss=2.775, ArTop10Accuracy=0.776, over 13869.00 frames. ], tot_loss[loss=2.716, ArTop10Accuracy=0.7859, over 11949.23 frames. ], batch size: 35, lr: 5.99e-03
334
+ 2024-08-06 14:11:37,813 INFO [trainer.py:765] (5/8) Epoch 20, batch 1200, train_loss[loss=2.857, ArTop10Accuracy=0.7624, over 11853.00 frames. ], tot_loss[loss=2.715, ArTop10Accuracy=0.7862, over 11859.92 frames. ], batch size: 101, lr: 5.98e-03
335
+ 2024-08-06 14:12:37,384 INFO [trainer.py:650] (5/8) Reaches end of dataloader.
336
+ 2024-08-06 14:12:37,386 INFO [trainer.py:1069] (5/8) Done!
libritts-r/log/log-train-2024-08-06-08-06-14-6 ADDED
@@ -0,0 +1,336 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:06:14,313 INFO [trainer.py:870] (6/8) Training started
2
+ 2024-08-06 08:06:14,315 INFO [trainer.py:889] (6/8) Device: cuda:6
3
+ 2024-08-06 08:06:14,315 INFO [trainer.py:890] (6/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:06:14,315 INFO [trainer.py:892] (6/8) About to create model
5
+ 2024-08-06 08:06:15,003 INFO [trainer.py:899] (6/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:06:16,221 INFO [trainer.py:914] (6/8) Using DDP
7
+ 2024-08-06 08:06:19,152 INFO [datamodule.py:427] (6/8) About to get train cuts
8
+ 2024-08-06 08:06:19,154 INFO [datamodule.py:434] (6/8) About to get dev cuts
9
+ 2024-08-06 08:06:19,155 INFO [datamodule.py:292] (6/8) Disable SpecAugment
10
+ 2024-08-06 08:06:19,155 INFO [datamodule.py:294] (6/8) About to create train dataset
11
+ 2024-08-06 08:06:19,155 INFO [datamodule.py:323] (6/8) Using DynamicBucketingSampler
12
+ 2024-08-06 08:06:19,762 INFO [datamodule.py:344] (6/8) About to create train dataloader
13
+ 2024-08-06 08:06:19,762 INFO [datamodule.py:367] (6/8) About to create dev dataset
14
+ 2024-08-06 08:06:20,082 INFO [datamodule.py:388] (6/8) About to create dev dataloader
15
+ 2024-08-06 08:08:02,121 INFO [trainer.py:765] (6/8) Epoch 1, batch 100, train_loss[loss=4.388, ArTop10Accuracy=0.4801, over 14610.00 frames. ], tot_loss[loss=5.055, ArTop10Accuracy=0.3726, over 4770.95 frames. ], batch size: 63, lr: 2.25e-02
16
+ 2024-08-06 08:09:28,827 INFO [trainer.py:765] (6/8) Epoch 1, batch 200, train_loss[loss=3.986, ArTop10Accuracy=0.5533, over 13587.00 frames. ], tot_loss[loss=4.49, ArTop10Accuracy=0.4676, over 7758.92 frames. ], batch size: 34, lr: 3.00e-02
17
+ 2024-08-06 08:10:52,429 INFO [trainer.py:765] (6/8) Epoch 1, batch 300, train_loss[loss=3.87, ArTop10Accuracy=0.5696, over 14733.00 frames. ], tot_loss[loss=4.219, ArTop10Accuracy=0.5124, over 9377.57 frames. ], batch size: 45, lr: 3.00e-02
18
+ 2024-08-06 08:12:12,699 INFO [trainer.py:765] (6/8) Epoch 1, batch 400, train_loss[loss=3.622, ArTop10Accuracy=0.6205, over 10365.00 frames. ], tot_loss[loss=4.032, ArTop10Accuracy=0.5444, over 10277.76 frames. ], batch size: 14, lr: 3.00e-02
19
+ 2024-08-06 08:13:40,049 INFO [trainer.py:765] (6/8) Epoch 1, batch 500, train_loss[loss=3.739, ArTop10Accuracy=0.5912, over 12429.00 frames. ], tot_loss[loss=3.887, ArTop10Accuracy=0.5694, over 10839.94 frames. ], batch size: 22, lr: 2.99e-02
20
+ 2024-08-06 08:15:00,242 INFO [trainer.py:765] (6/8) Epoch 1, batch 600, train_loss[loss=3.558, ArTop10Accuracy=0.6331, over 11361.00 frames. ], tot_loss[loss=3.772, ArTop10Accuracy=0.5898, over 11361.50 frames. ], batch size: 18, lr: 2.99e-02
21
+ 2024-08-06 08:16:26,423 INFO [trainer.py:765] (6/8) Epoch 1, batch 700, train_loss[loss=3.451, ArTop10Accuracy=0.6472, over 10158.00 frames. ], tot_loss[loss=3.689, ArTop10Accuracy=0.6045, over 11517.12 frames. ], batch size: 12, lr: 2.99e-02
22
+ 2024-08-06 08:17:43,017 INFO [trainer.py:765] (6/8) Epoch 1, batch 800, train_loss[loss=3.352, ArTop10Accuracy=0.6725, over 9489.00 frames. ], tot_loss[loss=3.627, ArTop10Accuracy=0.6162, over 11639.17 frames. ], batch size: 11, lr: 2.98e-02
23
+ 2024-08-06 08:18:56,150 INFO [trainer.py:765] (6/8) Epoch 1, batch 900, train_loss[loss=3.437, ArTop10Accuracy=0.6495, over 12990.00 frames. ], tot_loss[loss=3.567, ArTop10Accuracy=0.6269, over 11692.94 frames. ], batch size: 27, lr: 2.98e-02
24
+ 2024-08-06 08:20:12,862 INFO [trainer.py:765] (6/8) Epoch 1, batch 1000, train_loss[loss=3.39, ArTop10Accuracy=0.6602, over 12999.00 frames. ], tot_loss[loss=3.525, ArTop10Accuracy=0.6346, over 11885.92 frames. ], batch size: 27, lr: 2.97e-02
25
+ 2024-08-06 08:20:13,538 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 9.300e+01 1.871e+02 2.675e+02 4.030e+02 9.119e+03, threshold=5.351e+02, percent-clipped=0.0
26
+ 2024-08-06 08:21:29,154 INFO [trainer.py:765] (6/8) Epoch 1, batch 1100, train_loss[loss=3.509, ArTop10Accuracy=0.6415, over 13734.00 frames. ], tot_loss[loss=3.494, ArTop10Accuracy=0.6401, over 11960.16 frames. ], batch size: 34, lr: 2.96e-02
27
+ 2024-08-06 08:22:45,411 INFO [trainer.py:765] (6/8) Epoch 1, batch 1200, train_loss[loss=3.423, ArTop10Accuracy=0.6589, over 11565.00 frames. ], tot_loss[loss=3.464, ArTop10Accuracy=0.6459, over 11864.25 frames. ], batch size: 101, lr: 2.96e-02
28
+ 2024-08-06 08:23:45,150 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
29
+ 2024-08-06 08:25:36,237 INFO [trainer.py:765] (6/8) Epoch 2, batch 100, train_loss[loss=3.402, ArTop10Accuracy=0.6551, over 14169.00 frames. ], tot_loss[loss=3.421, ArTop10Accuracy=0.6534, over 4752.27 frames. ], batch size: 62, lr: 2.90e-02
30
+ 2024-08-06 08:26:58,956 INFO [trainer.py:765] (6/8) Epoch 2, batch 200, train_loss[loss=3.363, ArTop10Accuracy=0.6631, over 13458.00 frames. ], tot_loss[loss=3.391, ArTop10Accuracy=0.6587, over 7750.75 frames. ], batch size: 34, lr: 2.89e-02
31
+ 2024-08-06 08:28:25,532 INFO [trainer.py:765] (6/8) Epoch 2, batch 300, train_loss[loss=3.304, ArTop10Accuracy=0.6729, over 14391.00 frames. ], tot_loss[loss=3.375, ArTop10Accuracy=0.6616, over 9369.14 frames. ], batch size: 45, lr: 2.89e-02
32
+ 2024-08-06 08:29:48,636 INFO [trainer.py:765] (6/8) Epoch 2, batch 400, train_loss[loss=3.287, ArTop10Accuracy=0.6818, over 10422.00 frames. ], tot_loss[loss=3.355, ArTop10Accuracy=0.6658, over 10264.80 frames. ], batch size: 14, lr: 2.88e-02
33
+ 2024-08-06 08:31:22,901 INFO [trainer.py:765] (6/8) Epoch 2, batch 500, train_loss[loss=3.302, ArTop10Accuracy=0.6806, over 12180.00 frames. ], tot_loss[loss=3.337, ArTop10Accuracy=0.6695, over 10824.54 frames. ], batch size: 22, lr: 2.87e-02
34
+ 2024-08-06 08:32:45,686 INFO [trainer.py:765] (6/8) Epoch 2, batch 600, train_loss[loss=3.304, ArTop10Accuracy=0.6782, over 11436.00 frames. ], tot_loss[loss=3.329, ArTop10Accuracy=0.6711, over 11355.80 frames. ], batch size: 18, lr: 2.86e-02
35
+ 2024-08-06 08:34:13,582 INFO [trainer.py:765] (6/8) Epoch 2, batch 700, train_loss[loss=3.079, ArTop10Accuracy=0.7212, over 9345.00 frames. ], tot_loss[loss=3.324, ArTop10Accuracy=0.6721, over 11500.32 frames. ], batch size: 11, lr: 2.85e-02
36
+ 2024-08-06 08:34:31,174 INFO [trainer.py:803] (6/8) Computing validation loss
37
+ 2024-08-06 08:34:40,888 INFO [trainer.py:811] (6/8) Epoch 2, validation: loss=3.277, ArTop10Accuracy=0.6803, over 1827537.00 frames.
38
+ 2024-08-06 08:34:40,889 INFO [trainer.py:814] (6/8) Maximum memory allocated so far is 29113MB
39
+ 2024-08-06 08:34:41,699 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 7.953e+01 1.592e+02 2.200e+02 3.344e+02 2.949e+03, threshold=4.400e+02, percent-clipped=8.6
40
+ 2024-08-06 08:35:39,876 INFO [trainer.py:765] (6/8) Epoch 2, batch 800, train_loss[loss=3.286, ArTop10Accuracy=0.6826, over 10155.00 frames. ], tot_loss[loss=3.321, ArTop10Accuracy=0.6728, over 11631.80 frames. ], batch size: 12, lr: 2.84e-02
41
+ 2024-08-06 08:36:56,370 INFO [trainer.py:765] (6/8) Epoch 2, batch 900, train_loss[loss=3.287, ArTop10Accuracy=0.6823, over 13350.00 frames. ], tot_loss[loss=3.309, ArTop10Accuracy=0.6751, over 11666.99 frames. ], batch size: 28, lr: 2.83e-02
42
+ 2024-08-06 08:38:10,510 INFO [trainer.py:765] (6/8) Epoch 2, batch 1000, train_loss[loss=3.288, ArTop10Accuracy=0.6815, over 13026.00 frames. ], tot_loss[loss=3.3, ArTop10Accuracy=0.6765, over 11879.59 frames. ], batch size: 27, lr: 2.82e-02
43
+ 2024-08-06 08:39:25,058 INFO [trainer.py:765] (6/8) Epoch 2, batch 1100, train_loss[loss=3.25, ArTop10Accuracy=0.6835, over 13587.00 frames. ], tot_loss[loss=3.296, ArTop10Accuracy=0.6776, over 11944.38 frames. ], batch size: 34, lr: 2.81e-02
44
+ 2024-08-06 08:40:38,219 INFO [trainer.py:765] (6/8) Epoch 2, batch 1200, train_loss[loss=3.286, ArTop10Accuracy=0.6777, over 11184.00 frames. ], tot_loss[loss=3.286, ArTop10Accuracy=0.6793, over 11847.29 frames. ], batch size: 101, lr: 2.80e-02
45
+ 2024-08-06 08:41:38,257 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
46
+ 2024-08-06 08:43:36,648 INFO [trainer.py:765] (6/8) Epoch 3, batch 100, train_loss[loss=3.289, ArTop10Accuracy=0.6722, over 14634.00 frames. ], tot_loss[loss=3.251, ArTop10Accuracy=0.6851, over 4755.45 frames. ], batch size: 62, lr: 2.67e-02
47
+ 2024-08-06 08:45:10,500 INFO [trainer.py:765] (6/8) Epoch 3, batch 200, train_loss[loss=3.181, ArTop10Accuracy=0.7005, over 13515.00 frames. ], tot_loss[loss=3.221, ArTop10Accuracy=0.6909, over 7745.42 frames. ], batch size: 34, lr: 2.66e-02
48
+ 2024-08-06 08:46:29,257 INFO [trainer.py:765] (6/8) Epoch 3, batch 300, train_loss[loss=3.261, ArTop10Accuracy=0.6856, over 14103.00 frames. ], tot_loss[loss=3.206, ArTop10Accuracy=0.6938, over 9364.34 frames. ], batch size: 44, lr: 2.64e-02
49
+ 2024-08-06 08:48:04,218 INFO [trainer.py:765] (6/8) Epoch 3, batch 400, train_loss[loss=3.105, ArTop10Accuracy=0.7125, over 10353.00 frames. ], tot_loss[loss=3.191, ArTop10Accuracy=0.6967, over 10275.21 frames. ], batch size: 14, lr: 2.63e-02
50
+ 2024-08-06 08:48:40,881 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 9.282e+01 1.561e+02 1.981e+02 2.686e+02 1.768e+03, threshold=3.962e+02, percent-clipped=7.6
51
+ 2024-08-06 08:49:25,541 INFO [trainer.py:765] (6/8) Epoch 3, batch 500, train_loss[loss=3.089, ArTop10Accuracy=0.7152, over 12294.00 frames. ], tot_loss[loss=3.171, ArTop10Accuracy=0.7005, over 10826.07 frames. ], batch size: 22, lr: 2.62e-02
52
+ 2024-08-06 08:51:00,476 INFO [trainer.py:765] (6/8) Epoch 3, batch 600, train_loss[loss=3.138, ArTop10Accuracy=0.7107, over 11346.00 frames. ], tot_loss[loss=3.155, ArTop10Accuracy=0.7036, over 11363.23 frames. ], batch size: 18, lr: 2.61e-02
53
+ 2024-08-06 08:52:31,618 INFO [trainer.py:765] (6/8) Epoch 3, batch 700, train_loss[loss=3.117, ArTop10Accuracy=0.7064, over 9513.00 frames. ], tot_loss[loss=3.144, ArTop10Accuracy=0.7058, over 11502.94 frames. ], batch size: 11, lr: 2.60e-02
54
+ 2024-08-06 08:53:57,388 INFO [trainer.py:765] (6/8) Epoch 3, batch 800, train_loss[loss=3.005, ArTop10Accuracy=0.7328, over 10665.00 frames. ], tot_loss[loss=3.136, ArTop10Accuracy=0.7073, over 11636.91 frames. ], batch size: 13, lr: 2.59e-02
55
+ 2024-08-06 08:55:15,117 INFO [trainer.py:765] (6/8) Epoch 3, batch 900, train_loss[loss=3.093, ArTop10Accuracy=0.7182, over 13086.00 frames. ], tot_loss[loss=3.119, ArTop10Accuracy=0.7107, over 11682.79 frames. ], batch size: 27, lr: 2.57e-02
56
+ 2024-08-06 08:56:31,557 INFO [trainer.py:765] (6/8) Epoch 3, batch 1000, train_loss[loss=2.997, ArTop10Accuracy=0.7294, over 12915.00 frames. ], tot_loss[loss=3.109, ArTop10Accuracy=0.7126, over 11878.15 frames. ], batch size: 27, lr: 2.56e-02
57
+ 2024-08-06 08:57:46,506 INFO [trainer.py:765] (6/8) Epoch 3, batch 1100, train_loss[loss=3.086, ArTop10Accuracy=0.7154, over 13635.00 frames. ], tot_loss[loss=3.102, ArTop10Accuracy=0.7138, over 11951.69 frames. ], batch size: 34, lr: 2.55e-02
58
+ 2024-08-06 08:59:01,399 INFO [trainer.py:765] (6/8) Epoch 3, batch 1200, train_loss[loss=3.159, ArTop10Accuracy=0.7025, over 11697.00 frames. ], tot_loss[loss=3.094, ArTop10Accuracy=0.7154, over 11858.17 frames. ], batch size: 101, lr: 2.54e-02
59
+ 2024-08-06 09:00:02,076 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
60
+ 2024-08-06 09:01:50,740 INFO [trainer.py:765] (6/8) Epoch 4, batch 100, train_loss[loss=3.078, ArTop10Accuracy=0.7206, over 14586.00 frames. ], tot_loss[loss=3.077, ArTop10Accuracy=0.7175, over 4762.22 frames. ], batch size: 62, lr: 2.38e-02
61
+ 2024-08-06 09:02:52,858 INFO [trainer.py:803] (6/8) Computing validation loss
62
+ 2024-08-06 09:03:02,384 INFO [trainer.py:811] (6/8) Epoch 4, validation: loss=2.997, ArTop10Accuracy=0.7338, over 1827537.00 frames.
63
+ 2024-08-06 09:03:02,385 INFO [trainer.py:814] (6/8) Maximum memory allocated so far is 29374MB
64
+ 2024-08-06 09:03:03,364 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.499e+02 1.782e+02 2.273e+02 1.100e+03, threshold=3.565e+02, percent-clipped=4.7
65
+ 2024-08-06 09:03:29,274 INFO [trainer.py:765] (6/8) Epoch 4, batch 200, train_loss[loss=3.044, ArTop10Accuracy=0.7246, over 13827.00 frames. ], tot_loss[loss=3.049, ArTop10Accuracy=0.7232, over 7750.55 frames. ], batch size: 35, lr: 2.37e-02
66
+ 2024-08-06 09:05:01,732 INFO [trainer.py:765] (6/8) Epoch 4, batch 300, train_loss[loss=3.137, ArTop10Accuracy=0.7043, over 14121.00 frames. ], tot_loss[loss=3.041, ArTop10Accuracy=0.7249, over 9386.87 frames. ], batch size: 44, lr: 2.36e-02
67
+ 2024-08-06 09:06:28,150 INFO [trainer.py:765] (6/8) Epoch 4, batch 400, train_loss[loss=3.036, ArTop10Accuracy=0.7249, over 10518.00 frames. ], tot_loss[loss=3.037, ArTop10Accuracy=0.7257, over 10309.40 frames. ], batch size: 14, lr: 2.34e-02
68
+ 2024-08-06 09:08:01,925 INFO [trainer.py:765] (6/8) Epoch 4, batch 500, train_loss[loss=2.881, ArTop10Accuracy=0.7551, over 12159.00 frames. ], tot_loss[loss=3.028, ArTop10Accuracy=0.7275, over 10832.97 frames. ], batch size: 22, lr: 2.33e-02
69
+ 2024-08-06 09:09:28,540 INFO [trainer.py:765] (6/8) Epoch 4, batch 600, train_loss[loss=2.984, ArTop10Accuracy=0.7324, over 11529.00 frames. ], tot_loss[loss=3.025, ArTop10Accuracy=0.728, over 11364.42 frames. ], batch size: 18, lr: 2.32e-02
70
+ 2024-08-06 09:10:59,865 INFO [trainer.py:765] (6/8) Epoch 4, batch 700, train_loss[loss=2.895, ArTop10Accuracy=0.7564, over 10038.00 frames. ], tot_loss[loss=3.024, ArTop10Accuracy=0.7284, over 11504.85 frames. ], batch size: 12, lr: 2.31e-02
71
+ 2024-08-06 09:12:17,513 INFO [trainer.py:765] (6/8) Epoch 4, batch 800, train_loss[loss=2.829, ArTop10Accuracy=0.7753, over 10317.00 frames. ], tot_loss[loss=3.021, ArTop10Accuracy=0.729, over 11618.34 frames. ], batch size: 12, lr: 2.30e-02
72
+ 2024-08-06 09:13:33,212 INFO [trainer.py:765] (6/8) Epoch 4, batch 900, train_loss[loss=2.953, ArTop10Accuracy=0.7436, over 13008.00 frames. ], tot_loss[loss=3.013, ArTop10Accuracy=0.7307, over 11683.53 frames. ], batch size: 27, lr: 2.29e-02
73
+ 2024-08-06 09:14:47,520 INFO [trainer.py:765] (6/8) Epoch 4, batch 1000, train_loss[loss=3.061, ArTop10Accuracy=0.7249, over 12897.00 frames. ], tot_loss[loss=3.013, ArTop10Accuracy=0.7306, over 11877.18 frames. ], batch size: 27, lr: 2.28e-02
74
+ 2024-08-06 09:16:02,982 INFO [trainer.py:765] (6/8) Epoch 4, batch 1100, train_loss[loss=3.017, ArTop10Accuracy=0.7294, over 13554.00 frames. ], tot_loss[loss=3.015, ArTop10Accuracy=0.7302, over 11959.80 frames. ], batch size: 34, lr: 2.26e-02
75
+ 2024-08-06 09:16:53,291 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.440e+02 1.636e+02 1.968e+02 7.702e+02, threshold=3.273e+02, percent-clipped=1.3
76
+ 2024-08-06 09:17:18,344 INFO [trainer.py:765] (6/8) Epoch 4, batch 1200, train_loss[loss=3.105, ArTop10Accuracy=0.712, over 12048.00 frames. ], tot_loss[loss=3.01, ArTop10Accuracy=0.7309, over 11874.78 frames. ], batch size: 103, lr: 2.25e-02
77
+ 2024-08-06 09:18:17,022 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
78
+ 2024-08-06 09:20:17,171 INFO [trainer.py:765] (6/8) Epoch 5, batch 100, train_loss[loss=3.024, ArTop10Accuracy=0.729, over 14520.00 frames. ], tot_loss[loss=2.998, ArTop10Accuracy=0.7323, over 4763.86 frames. ], batch size: 62, lr: 2.10e-02
79
+ 2024-08-06 09:21:52,296 INFO [trainer.py:765] (6/8) Epoch 5, batch 200, train_loss[loss=2.932, ArTop10Accuracy=0.7488, over 13890.00 frames. ], tot_loss[loss=2.978, ArTop10Accuracy=0.7364, over 7748.36 frames. ], batch size: 34, lr: 2.09e-02
80
+ 2024-08-06 09:23:19,241 INFO [trainer.py:765] (6/8) Epoch 5, batch 300, train_loss[loss=2.922, ArTop10Accuracy=0.7481, over 14118.00 frames. ], tot_loss[loss=2.969, ArTop10Accuracy=0.7383, over 9371.93 frames. ], batch size: 45, lr: 2.08e-02
81
+ 2024-08-06 09:24:53,537 INFO [trainer.py:765] (6/8) Epoch 5, batch 400, train_loss[loss=2.855, ArTop10Accuracy=0.7614, over 10344.00 frames. ], tot_loss[loss=2.97, ArTop10Accuracy=0.7382, over 10282.75 frames. ], batch size: 14, lr: 2.07e-02
82
+ 2024-08-06 09:26:19,418 INFO [trainer.py:765] (6/8) Epoch 5, batch 500, train_loss[loss=2.962, ArTop10Accuracy=0.7411, over 12372.00 frames. ], tot_loss[loss=2.966, ArTop10Accuracy=0.739, over 10822.00 frames. ], batch size: 22, lr: 2.06e-02
83
+ 2024-08-06 09:27:49,538 INFO [trainer.py:765] (6/8) Epoch 5, batch 600, train_loss[loss=2.859, ArTop10Accuracy=0.7623, over 11424.00 frames. ], tot_loss[loss=2.963, ArTop10Accuracy=0.7396, over 11344.24 frames. ], batch size: 18, lr: 2.05e-02
84
+ 2024-08-06 09:29:21,670 INFO [trainer.py:765] (6/8) Epoch 5, batch 700, train_loss[loss=2.829, ArTop10Accuracy=0.766, over 10110.00 frames. ], tot_loss[loss=2.964, ArTop10Accuracy=0.7394, over 11498.36 frames. ], batch size: 12, lr: 2.04e-02
85
+ 2024-08-06 09:30:44,694 INFO [trainer.py:765] (6/8) Epoch 5, batch 800, train_loss[loss=2.993, ArTop10Accuracy=0.7335, over 10182.00 frames. ], tot_loss[loss=2.968, ArTop10Accuracy=0.7386, over 11628.28 frames. ], batch size: 12, lr: 2.03e-02
86
+ 2024-08-06 09:31:51,240 INFO [trainer.py:803] (6/8) Computing validation loss
87
+ 2024-08-06 09:32:00,762 INFO [trainer.py:811] (6/8) Epoch 5, validation: loss=2.926, ArTop10Accuracy=0.7466, over 1827537.00 frames.
88
+ 2024-08-06 09:32:00,763 INFO [trainer.py:814] (6/8) Maximum memory allocated so far is 29654MB
89
+ 2024-08-06 09:32:01,709 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.349e+02 1.525e+02 1.806e+02 1.007e+03, threshold=3.049e+02, percent-clipped=2.3
90
+ 2024-08-06 09:32:10,554 INFO [trainer.py:765] (6/8) Epoch 5, batch 900, train_loss[loss=2.987, ArTop10Accuracy=0.7291, over 13041.00 frames. ], tot_loss[loss=2.961, ArTop10Accuracy=0.7401, over 11691.73 frames. ], batch size: 27, lr: 2.02e-02
91
+ 2024-08-06 09:33:27,323 INFO [trainer.py:765] (6/8) Epoch 5, batch 1000, train_loss[loss=2.989, ArTop10Accuracy=0.7325, over 13020.00 frames. ], tot_loss[loss=2.961, ArTop10Accuracy=0.74, over 11890.41 frames. ], batch size: 27, lr: 2.01e-02
92
+ 2024-08-06 09:34:42,300 INFO [trainer.py:765] (6/8) Epoch 5, batch 1100, train_loss[loss=2.954, ArTop10Accuracy=0.7405, over 13821.00 frames. ], tot_loss[loss=2.964, ArTop10Accuracy=0.7393, over 11936.71 frames. ], batch size: 34, lr: 2.00e-02
93
+ 2024-08-06 09:35:56,331 INFO [trainer.py:765] (6/8) Epoch 5, batch 1200, train_loss[loss=3.119, ArTop10Accuracy=0.7086, over 12636.00 frames. ], tot_loss[loss=2.961, ArTop10Accuracy=0.7402, over 11856.38 frames. ], batch size: 101, lr: 1.99e-02
94
+ 2024-08-06 09:36:55,716 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
95
+ 2024-08-06 09:38:52,665 INFO [trainer.py:765] (6/8) Epoch 6, batch 100, train_loss[loss=2.955, ArTop10Accuracy=0.7445, over 14772.00 frames. ], tot_loss[loss=2.951, ArTop10Accuracy=0.7416, over 4751.62 frames. ], batch size: 62, lr: 1.85e-02
96
+ 2024-08-06 09:40:19,834 INFO [trainer.py:765] (6/8) Epoch 6, batch 200, train_loss[loss=2.913, ArTop10Accuracy=0.7489, over 13596.00 frames. ], tot_loss[loss=2.936, ArTop10Accuracy=0.7445, over 7733.30 frames. ], batch size: 34, lr: 1.84e-02
97
+ 2024-08-06 09:41:52,965 INFO [trainer.py:765] (6/8) Epoch 6, batch 300, train_loss[loss=2.968, ArTop10Accuracy=0.7397, over 13965.00 frames. ], tot_loss[loss=2.929, ArTop10Accuracy=0.7459, over 9366.45 frames. ], batch size: 44, lr: 1.83e-02
98
+ 2024-08-06 09:43:17,828 INFO [trainer.py:765] (6/8) Epoch 6, batch 400, train_loss[loss=2.856, ArTop10Accuracy=0.7616, over 10935.00 frames. ], tot_loss[loss=2.925, ArTop10Accuracy=0.7471, over 10286.23 frames. ], batch size: 15, lr: 1.83e-02
99
+ 2024-08-06 09:44:54,128 INFO [trainer.py:765] (6/8) Epoch 6, batch 500, train_loss[loss=2.918, ArTop10Accuracy=0.7506, over 12210.00 frames. ], tot_loss[loss=2.923, ArTop10Accuracy=0.7475, over 10852.19 frames. ], batch size: 22, lr: 1.82e-02
100
+ 2024-08-06 09:46:22,873 INFO [trainer.py:765] (6/8) Epoch 6, batch 600, train_loss[loss=2.782, ArTop10Accuracy=0.7736, over 11358.00 frames. ], tot_loss[loss=2.92, ArTop10Accuracy=0.7481, over 11364.24 frames. ], batch size: 18, lr: 1.81e-02
101
+ 2024-08-06 09:46:37,219 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.012e+02 1.339e+02 1.480e+02 1.701e+02 7.506e+02, threshold=2.959e+02, percent-clipped=1.1
102
+ 2024-08-06 09:47:57,870 INFO [trainer.py:765] (6/8) Epoch 6, batch 700, train_loss[loss=2.855, ArTop10Accuracy=0.7598, over 10134.00 frames. ], tot_loss[loss=2.925, ArTop10Accuracy=0.7472, over 11502.69 frames. ], batch size: 12, lr: 1.80e-02
103
+ 2024-08-06 09:49:15,955 INFO [trainer.py:765] (6/8) Epoch 6, batch 800, train_loss[loss=2.854, ArTop10Accuracy=0.759, over 10095.00 frames. ], tot_loss[loss=2.926, ArTop10Accuracy=0.7469, over 11620.14 frames. ], batch size: 12, lr: 1.79e-02
104
+ 2024-08-06 09:50:32,135 INFO [trainer.py:765] (6/8) Epoch 6, batch 900, train_loss[loss=2.932, ArTop10Accuracy=0.7518, over 13065.00 frames. ], tot_loss[loss=2.922, ArTop10Accuracy=0.7477, over 11670.94 frames. ], batch size: 27, lr: 1.78e-02
105
+ 2024-08-06 09:51:47,299 INFO [trainer.py:765] (6/8) Epoch 6, batch 1000, train_loss[loss=2.987, ArTop10Accuracy=0.735, over 12957.00 frames. ], tot_loss[loss=2.926, ArTop10Accuracy=0.7468, over 11889.76 frames. ], batch size: 27, lr: 1.77e-02
106
+ 2024-08-06 09:53:00,921 INFO [trainer.py:765] (6/8) Epoch 6, batch 1100, train_loss[loss=2.839, ArTop10Accuracy=0.7588, over 13614.00 frames. ], tot_loss[loss=2.93, ArTop10Accuracy=0.7459, over 11947.39 frames. ], batch size: 34, lr: 1.77e-02
107
+ 2024-08-06 09:54:14,337 INFO [trainer.py:765] (6/8) Epoch 6, batch 1200, train_loss[loss=3.013, ArTop10Accuracy=0.7295, over 12051.00 frames. ], tot_loss[loss=2.929, ArTop10Accuracy=0.746, over 11869.99 frames. ], batch size: 101, lr: 1.76e-02
108
+ 2024-08-06 09:55:13,008 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
109
+ 2024-08-06 09:57:06,699 INFO [trainer.py:765] (6/8) Epoch 7, batch 100, train_loss[loss=2.878, ArTop10Accuracy=0.7565, over 14559.00 frames. ], tot_loss[loss=2.912, ArTop10Accuracy=0.7495, over 4765.49 frames. ], batch size: 63, lr: 1.64e-02
110
+ 2024-08-06 09:58:39,426 INFO [trainer.py:765] (6/8) Epoch 7, batch 200, train_loss[loss=2.901, ArTop10Accuracy=0.7472, over 13467.00 frames. ], tot_loss[loss=2.9, ArTop10Accuracy=0.7514, over 7756.69 frames. ], batch size: 34, lr: 1.64e-02
111
+ 2024-08-06 10:00:06,083 INFO [trainer.py:765] (6/8) Epoch 7, batch 300, train_loss[loss=2.904, ArTop10Accuracy=0.7493, over 14169.00 frames. ], tot_loss[loss=2.896, ArTop10Accuracy=0.7525, over 9366.34 frames. ], batch size: 44, lr: 1.63e-02
112
+ 2024-08-06 10:00:40,509 INFO [trainer.py:803] (6/8) Computing validation loss
113
+ 2024-08-06 10:00:50,245 INFO [trainer.py:811] (6/8) Epoch 7, validation: loss=2.88, ArTop10Accuracy=0.7554, over 1827537.00 frames.
114
+ 2024-08-06 10:00:50,246 INFO [trainer.py:814] (6/8) Maximum memory allocated so far is 29654MB
115
+ 2024-08-06 10:00:50,976 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.002e+02 1.286e+02 1.429e+02 1.605e+02 1.020e+03, threshold=2.857e+02, percent-clipped=1.5
116
+ 2024-08-06 10:01:49,115 INFO [trainer.py:765] (6/8) Epoch 7, batch 400, train_loss[loss=2.873, ArTop10Accuracy=0.7615, over 10770.00 frames. ], tot_loss[loss=2.894, ArTop10Accuracy=0.7531, over 10290.52 frames. ], batch size: 15, lr: 1.62e-02
117
+ 2024-08-06 10:03:21,455 INFO [trainer.py:765] (6/8) Epoch 7, batch 500, train_loss[loss=2.785, ArTop10Accuracy=0.779, over 12291.00 frames. ], tot_loss[loss=2.89, ArTop10Accuracy=0.7536, over 10870.88 frames. ], batch size: 22, lr: 1.61e-02
118
+ 2024-08-06 10:04:51,881 INFO [trainer.py:765] (6/8) Epoch 7, batch 600, train_loss[loss=2.849, ArTop10Accuracy=0.7606, over 11370.00 frames. ], tot_loss[loss=2.892, ArTop10Accuracy=0.753, over 11381.86 frames. ], batch size: 18, lr: 1.61e-02
119
+ 2024-08-06 10:06:25,112 INFO [trainer.py:765] (6/8) Epoch 7, batch 700, train_loss[loss=2.794, ArTop10Accuracy=0.7743, over 10017.00 frames. ], tot_loss[loss=2.896, ArTop10Accuracy=0.7523, over 11532.56 frames. ], batch size: 12, lr: 1.60e-02
120
+ 2024-08-06 10:07:46,948 INFO [trainer.py:765] (6/8) Epoch 7, batch 800, train_loss[loss=2.775, ArTop10Accuracy=0.7791, over 10104.00 frames. ], tot_loss[loss=2.896, ArTop10Accuracy=0.7525, over 11648.59 frames. ], batch size: 12, lr: 1.59e-02
121
+ 2024-08-06 10:09:02,822 INFO [trainer.py:765] (6/8) Epoch 7, batch 900, train_loss[loss=2.816, ArTop10Accuracy=0.7693, over 13011.00 frames. ], tot_loss[loss=2.89, ArTop10Accuracy=0.7538, over 11702.50 frames. ], batch size: 27, lr: 1.59e-02
122
+ 2024-08-06 10:10:19,636 INFO [trainer.py:765] (6/8) Epoch 7, batch 1000, train_loss[loss=2.9, ArTop10Accuracy=0.7526, over 12774.00 frames. ], tot_loss[loss=2.897, ArTop10Accuracy=0.7523, over 11906.77 frames. ], batch size: 27, lr: 1.58e-02
123
+ 2024-08-06 10:11:35,208 INFO [trainer.py:765] (6/8) Epoch 7, batch 1100, train_loss[loss=2.894, ArTop10Accuracy=0.7517, over 13650.00 frames. ], tot_loss[loss=2.902, ArTop10Accuracy=0.7512, over 11965.81 frames. ], batch size: 34, lr: 1.57e-02
124
+ 2024-08-06 10:12:48,204 INFO [trainer.py:765] (6/8) Epoch 7, batch 1200, train_loss[loss=3.006, ArTop10Accuracy=0.7278, over 12288.00 frames. ], tot_loss[loss=2.898, ArTop10Accuracy=0.752, over 11848.72 frames. ], batch size: 101, lr: 1.57e-02
125
+ 2024-08-06 10:13:46,782 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
126
+ 2024-08-06 10:15:03,600 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.017e+02 1.283e+02 1.410e+02 1.601e+02 1.017e+03, threshold=2.820e+02, percent-clipped=0.9
127
+ 2024-08-06 10:15:40,820 INFO [trainer.py:765] (6/8) Epoch 8, batch 100, train_loss[loss=2.916, ArTop10Accuracy=0.7516, over 14502.00 frames. ], tot_loss[loss=2.882, ArTop10Accuracy=0.7548, over 4756.16 frames. ], batch size: 62, lr: 1.47e-02
128
+ 2024-08-06 10:17:12,861 INFO [trainer.py:765] (6/8) Epoch 8, batch 200, train_loss[loss=2.881, ArTop10Accuracy=0.7545, over 13539.00 frames. ], tot_loss[loss=2.873, ArTop10Accuracy=0.7563, over 7759.48 frames. ], batch size: 34, lr: 1.46e-02
129
+ 2024-08-06 10:18:37,898 INFO [trainer.py:765] (6/8) Epoch 8, batch 300, train_loss[loss=2.96, ArTop10Accuracy=0.7408, over 13821.00 frames. ], tot_loss[loss=2.871, ArTop10Accuracy=0.7571, over 9369.67 frames. ], batch size: 44, lr: 1.46e-02
130
+ 2024-08-06 10:20:06,341 INFO [trainer.py:765] (6/8) Epoch 8, batch 400, train_loss[loss=2.908, ArTop10Accuracy=0.7466, over 10806.00 frames. ], tot_loss[loss=2.868, ArTop10Accuracy=0.7576, over 10268.57 frames. ], batch size: 15, lr: 1.45e-02
131
+ 2024-08-06 10:21:32,411 INFO [trainer.py:765] (6/8) Epoch 8, batch 500, train_loss[loss=2.894, ArTop10Accuracy=0.7548, over 12519.00 frames. ], tot_loss[loss=2.863, ArTop10Accuracy=0.7589, over 10821.20 frames. ], batch size: 23, lr: 1.45e-02
132
+ 2024-08-06 10:23:00,974 INFO [trainer.py:765] (6/8) Epoch 8, batch 600, train_loss[loss=2.724, ArTop10Accuracy=0.786, over 11475.00 frames. ], tot_loss[loss=2.864, ArTop10Accuracy=0.7585, over 11354.19 frames. ], batch size: 18, lr: 1.44e-02
133
+ 2024-08-06 10:24:37,788 INFO [trainer.py:765] (6/8) Epoch 8, batch 700, train_loss[loss=2.836, ArTop10Accuracy=0.7659, over 10110.00 frames. ], tot_loss[loss=2.87, ArTop10Accuracy=0.7572, over 11488.52 frames. ], batch size: 12, lr: 1.43e-02
134
+ 2024-08-06 10:25:56,085 INFO [trainer.py:765] (6/8) Epoch 8, batch 800, train_loss[loss=2.815, ArTop10Accuracy=0.7729, over 10086.00 frames. ], tot_loss[loss=2.873, ArTop10Accuracy=0.7568, over 11614.89 frames. ], batch size: 12, lr: 1.43e-02
135
+ 2024-08-06 10:27:12,245 INFO [trainer.py:765] (6/8) Epoch 8, batch 900, train_loss[loss=2.778, ArTop10Accuracy=0.7739, over 13353.00 frames. ], tot_loss[loss=2.868, ArTop10Accuracy=0.758, over 11678.16 frames. ], batch size: 28, lr: 1.42e-02
136
+ 2024-08-06 10:28:25,263 INFO [trainer.py:765] (6/8) Epoch 8, batch 1000, train_loss[loss=2.856, ArTop10Accuracy=0.7623, over 12663.00 frames. ], tot_loss[loss=2.871, ArTop10Accuracy=0.7573, over 11880.24 frames. ], batch size: 27, lr: 1.42e-02
137
+ 2024-08-06 10:29:07,155 INFO [trainer.py:803] (6/8) Computing validation loss
138
+ 2024-08-06 10:29:16,831 INFO [trainer.py:811] (6/8) Epoch 8, validation: loss=2.858, ArTop10Accuracy=0.7594, over 1827537.00 frames.
139
+ 2024-08-06 10:29:16,831 INFO [trainer.py:814] (6/8) Maximum memory allocated so far is 29654MB
140
+ 2024-08-06 10:29:17,491 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.275e+02 1.390e+02 1.547e+02 3.717e+02, threshold=2.781e+02, percent-clipped=0.7
141
+ 2024-08-06 10:29:51,731 INFO [trainer.py:765] (6/8) Epoch 8, batch 1100, train_loss[loss=2.932, ArTop10Accuracy=0.7465, over 13584.00 frames. ], tot_loss[loss=2.877, ArTop10Accuracy=0.7559, over 11946.18 frames. ], batch size: 34, lr: 1.41e-02
142
+ 2024-08-06 10:31:05,946 INFO [trainer.py:765] (6/8) Epoch 8, batch 1200, train_loss[loss=2.927, ArTop10Accuracy=0.7476, over 12045.00 frames. ], tot_loss[loss=2.877, ArTop10Accuracy=0.7557, over 11857.21 frames. ], batch size: 101, lr: 1.40e-02
143
+ 2024-08-06 10:32:05,758 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
144
+ 2024-08-06 10:34:01,256 INFO [trainer.py:765] (6/8) Epoch 9, batch 100, train_loss[loss=2.9, ArTop10Accuracy=0.7515, over 14604.00 frames. ], tot_loss[loss=2.86, ArTop10Accuracy=0.7584, over 4783.71 frames. ], batch size: 62, lr: 1.32e-02
145
+ 2024-08-06 10:35:31,773 INFO [trainer.py:765] (6/8) Epoch 9, batch 200, train_loss[loss=2.829, ArTop10Accuracy=0.7662, over 13380.00 frames. ], tot_loss[loss=2.851, ArTop10Accuracy=0.7607, over 7756.80 frames. ], batch size: 34, lr: 1.32e-02
146
+ 2024-08-06 10:36:57,928 INFO [trainer.py:765] (6/8) Epoch 9, batch 300, train_loss[loss=2.847, ArTop10Accuracy=0.7628, over 14202.00 frames. ], tot_loss[loss=2.847, ArTop10Accuracy=0.7615, over 9387.52 frames. ], batch size: 44, lr: 1.31e-02
147
+ 2024-08-06 10:38:32,697 INFO [trainer.py:765] (6/8) Epoch 9, batch 400, train_loss[loss=2.716, ArTop10Accuracy=0.7936, over 10851.00 frames. ], tot_loss[loss=2.846, ArTop10Accuracy=0.7619, over 10294.73 frames. ], batch size: 15, lr: 1.31e-02
148
+ 2024-08-06 10:39:59,256 INFO [trainer.py:765] (6/8) Epoch 9, batch 500, train_loss[loss=2.8, ArTop10Accuracy=0.7729, over 12210.00 frames. ], tot_loss[loss=2.842, ArTop10Accuracy=0.7626, over 10867.43 frames. ], batch size: 22, lr: 1.30e-02
149
+ 2024-08-06 10:41:29,691 INFO [trainer.py:765] (6/8) Epoch 9, batch 600, train_loss[loss=2.781, ArTop10Accuracy=0.7675, over 11394.00 frames. ], tot_loss[loss=2.844, ArTop10Accuracy=0.7622, over 11375.14 frames. ], batch size: 18, lr: 1.30e-02
150
+ 2024-08-06 10:42:58,441 INFO [trainer.py:765] (6/8) Epoch 9, batch 700, train_loss[loss=2.918, ArTop10Accuracy=0.7417, over 9318.00 frames. ], tot_loss[loss=2.848, ArTop10Accuracy=0.7614, over 11500.36 frames. ], batch size: 11, lr: 1.29e-02
151
+ 2024-08-06 10:44:02,952 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.039e+02 1.253e+02 1.352e+02 1.493e+02 7.010e+02, threshold=2.704e+02, percent-clipped=0.6
152
+ 2024-08-06 10:44:19,670 INFO [trainer.py:765] (6/8) Epoch 9, batch 800, train_loss[loss=2.715, ArTop10Accuracy=0.7933, over 10152.00 frames. ], tot_loss[loss=2.849, ArTop10Accuracy=0.7614, over 11616.55 frames. ], batch size: 12, lr: 1.29e-02
153
+ 2024-08-06 10:45:35,720 INFO [trainer.py:765] (6/8) Epoch 9, batch 900, train_loss[loss=2.749, ArTop10Accuracy=0.7809, over 13059.00 frames. ], tot_loss[loss=2.843, ArTop10Accuracy=0.7623, over 11671.85 frames. ], batch size: 27, lr: 1.28e-02
154
+ 2024-08-06 10:46:51,272 INFO [trainer.py:765] (6/8) Epoch 9, batch 1000, train_loss[loss=2.793, ArTop10Accuracy=0.7717, over 13047.00 frames. ], tot_loss[loss=2.848, ArTop10Accuracy=0.7618, over 11855.39 frames. ], batch size: 27, lr: 1.28e-02
155
+ 2024-08-06 10:48:06,247 INFO [trainer.py:765] (6/8) Epoch 9, batch 1100, train_loss[loss=2.91, ArTop10Accuracy=0.7485, over 13737.00 frames. ], tot_loss[loss=2.854, ArTop10Accuracy=0.7603, over 11939.27 frames. ], batch size: 34, lr: 1.28e-02
156
+ 2024-08-06 10:49:21,054 INFO [trainer.py:765] (6/8) Epoch 9, batch 1200, train_loss[loss=2.971, ArTop10Accuracy=0.7364, over 11847.00 frames. ], tot_loss[loss=2.857, ArTop10Accuracy=0.7598, over 11871.90 frames. ], batch size: 101, lr: 1.27e-02
157
+ 2024-08-06 10:50:22,739 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
158
+ 2024-08-06 10:52:12,325 INFO [trainer.py:765] (6/8) Epoch 10, batch 100, train_loss[loss=2.875, ArTop10Accuracy=0.7539, over 14637.00 frames. ], tot_loss[loss=2.843, ArTop10Accuracy=0.762, over 4761.86 frames. ], batch size: 63, lr: 1.20e-02
159
+ 2024-08-06 10:53:44,585 INFO [trainer.py:765] (6/8) Epoch 10, batch 200, train_loss[loss=2.909, ArTop10Accuracy=0.7444, over 13647.00 frames. ], tot_loss[loss=2.835, ArTop10Accuracy=0.7635, over 7759.99 frames. ], batch size: 34, lr: 1.20e-02
160
+ 2024-08-06 10:55:08,089 INFO [trainer.py:765] (6/8) Epoch 10, batch 300, train_loss[loss=2.861, ArTop10Accuracy=0.761, over 14100.00 frames. ], tot_loss[loss=2.827, ArTop10Accuracy=0.765, over 9360.91 frames. ], batch size: 44, lr: 1.19e-02
161
+ 2024-08-06 10:56:41,175 INFO [trainer.py:765] (6/8) Epoch 10, batch 400, train_loss[loss=2.692, ArTop10Accuracy=0.7867, over 10296.00 frames. ], tot_loss[loss=2.823, ArTop10Accuracy=0.766, over 10289.78 frames. ], batch size: 14, lr: 1.19e-02
162
+ 2024-08-06 10:58:04,937 INFO [trainer.py:803] (6/8) Computing validation loss
163
+ 2024-08-06 10:58:14,559 INFO [trainer.py:811] (6/8) Epoch 10, validation: loss=2.842, ArTop10Accuracy=0.7624, over 1827537.00 frames.
164
+ 2024-08-06 10:58:14,560 INFO [trainer.py:814] (6/8) Maximum memory allocated so far is 29654MB
165
+ 2024-08-06 10:58:15,572 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.228e+02 1.320e+02 1.458e+02 6.096e+02, threshold=2.641e+02, percent-clipped=0.6
166
+ 2024-08-06 10:58:15,576 INFO [trainer.py:765] (6/8) Epoch 10, batch 500, train_loss[loss=2.872, ArTop10Accuracy=0.7575, over 12288.00 frames. ], tot_loss[loss=2.823, ArTop10Accuracy=0.7662, over 10829.88 frames. ], batch size: 22, lr: 1.19e-02
167
+ 2024-08-06 10:59:42,814 INFO [trainer.py:765] (6/8) Epoch 10, batch 600, train_loss[loss=2.743, ArTop10Accuracy=0.776, over 11544.00 frames. ], tot_loss[loss=2.825, ArTop10Accuracy=0.7657, over 11345.47 frames. ], batch size: 18, lr: 1.18e-02
168
+ 2024-08-06 11:01:18,107 INFO [trainer.py:765] (6/8) Epoch 10, batch 700, train_loss[loss=2.692, ArTop10Accuracy=0.7882, over 10173.00 frames. ], tot_loss[loss=2.828, ArTop10Accuracy=0.765, over 11518.39 frames. ], batch size: 12, lr: 1.18e-02
169
+ 2024-08-06 11:02:36,917 INFO [trainer.py:765] (6/8) Epoch 10, batch 800, train_loss[loss=2.795, ArTop10Accuracy=0.7671, over 9525.00 frames. ], tot_loss[loss=2.83, ArTop10Accuracy=0.7647, over 11631.64 frames. ], batch size: 11, lr: 1.17e-02
170
+ 2024-08-06 11:03:51,211 INFO [trainer.py:765] (6/8) Epoch 10, batch 900, train_loss[loss=2.774, ArTop10Accuracy=0.7739, over 12909.00 frames. ], tot_loss[loss=2.828, ArTop10Accuracy=0.7651, over 11689.97 frames. ], batch size: 27, lr: 1.17e-02
171
+ 2024-08-06 11:05:06,351 INFO [trainer.py:765] (6/8) Epoch 10, batch 1000, train_loss[loss=2.859, ArTop10Accuracy=0.7609, over 12915.00 frames. ], tot_loss[loss=2.834, ArTop10Accuracy=0.7639, over 11887.16 frames. ], batch size: 27, lr: 1.17e-02
172
+ 2024-08-06 11:06:21,722 INFO [trainer.py:765] (6/8) Epoch 10, batch 1100, train_loss[loss=2.86, ArTop10Accuracy=0.7558, over 13509.00 frames. ], tot_loss[loss=2.837, ArTop10Accuracy=0.7633, over 11950.67 frames. ], batch size: 34, lr: 1.16e-02
173
+ 2024-08-06 11:07:34,772 INFO [trainer.py:765] (6/8) Epoch 10, batch 1200, train_loss[loss=2.952, ArTop10Accuracy=0.7372, over 12240.00 frames. ], tot_loss[loss=2.841, ArTop10Accuracy=0.7625, over 11878.92 frames. ], batch size: 102, lr: 1.16e-02
174
+ 2024-08-06 11:08:33,387 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
175
+ 2024-08-06 11:10:29,954 INFO [trainer.py:765] (6/8) Epoch 11, batch 100, train_loss[loss=2.866, ArTop10Accuracy=0.7539, over 14262.00 frames. ], tot_loss[loss=2.82, ArTop10Accuracy=0.7664, over 4767.89 frames. ], batch size: 62, lr: 1.10e-02
176
+ 2024-08-06 11:12:04,673 INFO [trainer.py:765] (6/8) Epoch 11, batch 200, train_loss[loss=2.809, ArTop10Accuracy=0.7704, over 13536.00 frames. ], tot_loss[loss=2.812, ArTop10Accuracy=0.7679, over 7753.99 frames. ], batch size: 34, lr: 1.10e-02
177
+ 2024-08-06 11:12:22,825 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 9.884e+01 1.240e+02 1.333e+02 1.457e+02 6.939e+02, threshold=2.667e+02, percent-clipped=0.1
178
+ 2024-08-06 11:13:31,547 INFO [trainer.py:765] (6/8) Epoch 11, batch 300, train_loss[loss=2.827, ArTop10Accuracy=0.7662, over 13899.00 frames. ], tot_loss[loss=2.808, ArTop10Accuracy=0.7685, over 9366.37 frames. ], batch size: 44, lr: 1.09e-02
179
+ 2024-08-06 11:15:03,268 INFO [trainer.py:765] (6/8) Epoch 11, batch 400, train_loss[loss=2.802, ArTop10Accuracy=0.7696, over 10353.00 frames. ], tot_loss[loss=2.807, ArTop10Accuracy=0.7689, over 10271.71 frames. ], batch size: 14, lr: 1.09e-02
180
+ 2024-08-06 11:16:29,636 INFO [trainer.py:765] (6/8) Epoch 11, batch 500, train_loss[loss=2.814, ArTop10Accuracy=0.7663, over 12087.00 frames. ], tot_loss[loss=2.801, ArTop10Accuracy=0.7703, over 10832.24 frames. ], batch size: 22, lr: 1.09e-02
181
+ 2024-08-06 11:18:00,516 INFO [trainer.py:765] (6/8) Epoch 11, batch 600, train_loss[loss=2.82, ArTop10Accuracy=0.7657, over 11532.00 frames. ], tot_loss[loss=2.805, ArTop10Accuracy=0.7695, over 11361.59 frames. ], batch size: 18, lr: 1.08e-02
182
+ 2024-08-06 11:19:34,513 INFO [trainer.py:765] (6/8) Epoch 11, batch 700, train_loss[loss=2.577, ArTop10Accuracy=0.8116, over 10242.00 frames. ], tot_loss[loss=2.806, ArTop10Accuracy=0.7693, over 11536.07 frames. ], batch size: 12, lr: 1.08e-02
183
+ 2024-08-06 11:20:55,482 INFO [trainer.py:765] (6/8) Epoch 11, batch 800, train_loss[loss=2.719, ArTop10Accuracy=0.7877, over 9489.00 frames. ], tot_loss[loss=2.811, ArTop10Accuracy=0.7682, over 11637.50 frames. ], batch size: 11, lr: 1.07e-02
184
+ 2024-08-06 11:22:13,704 INFO [trainer.py:765] (6/8) Epoch 11, batch 900, train_loss[loss=2.831, ArTop10Accuracy=0.7682, over 12921.00 frames. ], tot_loss[loss=2.809, ArTop10Accuracy=0.7688, over 11674.79 frames. ], batch size: 27, lr: 1.07e-02
185
+ 2024-08-06 11:23:31,798 INFO [trainer.py:765] (6/8) Epoch 11, batch 1000, train_loss[loss=2.826, ArTop10Accuracy=0.7626, over 12708.00 frames. ], tot_loss[loss=2.816, ArTop10Accuracy=0.7673, over 11858.76 frames. ], batch size: 27, lr: 1.07e-02
186
+ 2024-08-06 11:24:46,901 INFO [trainer.py:765] (6/8) Epoch 11, batch 1100, train_loss[loss=2.864, ArTop10Accuracy=0.759, over 13605.00 frames. ], tot_loss[loss=2.826, ArTop10Accuracy=0.7654, over 11944.73 frames. ], batch size: 34, lr: 1.06e-02
187
+ 2024-08-06 11:26:00,732 INFO [trainer.py:765] (6/8) Epoch 11, batch 1200, train_loss[loss=2.94, ArTop10Accuracy=0.7391, over 12612.00 frames. ], tot_loss[loss=2.826, ArTop10Accuracy=0.7655, over 11881.65 frames. ], batch size: 101, lr: 1.06e-02
188
+ 2024-08-06 11:26:15,846 INFO [trainer.py:803] (6/8) Computing validation loss
189
+ 2024-08-06 11:26:25,556 INFO [trainer.py:811] (6/8) Epoch 11, validation: loss=2.831, ArTop10Accuracy=0.7643, over 1827537.00 frames.
190
+ 2024-08-06 11:26:25,557 INFO [trainer.py:814] (6/8) Maximum memory allocated so far is 30358MB
191
+ 2024-08-06 11:26:26,184 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.251e+02 1.335e+02 1.441e+02 2.942e+02, threshold=2.669e+02, percent-clipped=0.1
192
+ 2024-08-06 11:27:09,581 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
193
+ 2024-08-06 11:29:03,451 INFO [trainer.py:765] (6/8) Epoch 12, batch 100, train_loss[loss=2.865, ArTop10Accuracy=0.7524, over 14409.00 frames. ], tot_loss[loss=2.809, ArTop10Accuracy=0.7679, over 4752.94 frames. ], batch size: 62, lr: 1.01e-02
194
+ 2024-08-06 11:30:30,674 INFO [trainer.py:765] (6/8) Epoch 12, batch 200, train_loss[loss=2.826, ArTop10Accuracy=0.7657, over 13659.00 frames. ], tot_loss[loss=2.801, ArTop10Accuracy=0.7698, over 7751.77 frames. ], batch size: 34, lr: 1.01e-02
195
+ 2024-08-06 11:31:57,655 INFO [trainer.py:765] (6/8) Epoch 12, batch 300, train_loss[loss=2.881, ArTop10Accuracy=0.7502, over 14427.00 frames. ], tot_loss[loss=2.795, ArTop10Accuracy=0.7709, over 9386.69 frames. ], batch size: 45, lr: 1.01e-02
196
+ 2024-08-06 11:33:30,738 INFO [trainer.py:765] (6/8) Epoch 12, batch 400, train_loss[loss=2.72, ArTop10Accuracy=0.7858, over 10209.00 frames. ], tot_loss[loss=2.791, ArTop10Accuracy=0.7717, over 10294.20 frames. ], batch size: 14, lr: 1.00e-02
197
+ 2024-08-06 11:34:55,733 INFO [trainer.py:765] (6/8) Epoch 12, batch 500, train_loss[loss=2.798, ArTop10Accuracy=0.7727, over 12255.00 frames. ], tot_loss[loss=2.787, ArTop10Accuracy=0.7727, over 10843.06 frames. ], batch size: 22, lr: 1.00e-02
198
+ 2024-08-06 11:36:29,361 INFO [trainer.py:765] (6/8) Epoch 12, batch 600, train_loss[loss=2.749, ArTop10Accuracy=0.7803, over 12000.00 frames. ], tot_loss[loss=2.79, ArTop10Accuracy=0.7723, over 11370.61 frames. ], batch size: 19, lr: 9.97e-03
199
+ 2024-08-06 11:38:00,344 INFO [trainer.py:765] (6/8) Epoch 12, batch 700, train_loss[loss=2.655, ArTop10Accuracy=0.8034, over 10221.00 frames. ], tot_loss[loss=2.797, ArTop10Accuracy=0.7707, over 11515.54 frames. ], batch size: 12, lr: 9.93e-03
200
+ 2024-08-06 11:39:23,611 INFO [trainer.py:765] (6/8) Epoch 12, batch 800, train_loss[loss=2.715, ArTop10Accuracy=0.7911, over 10062.00 frames. ], tot_loss[loss=2.8, ArTop10Accuracy=0.77, over 11640.74 frames. ], batch size: 12, lr: 9.90e-03
201
+ 2024-08-06 11:40:39,889 INFO [trainer.py:765] (6/8) Epoch 12, batch 900, train_loss[loss=2.857, ArTop10Accuracy=0.756, over 12738.00 frames. ], tot_loss[loss=2.797, ArTop10Accuracy=0.7708, over 11687.97 frames. ], batch size: 27, lr: 9.87e-03
202
+ 2024-08-06 11:41:13,995 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.248e+02 1.348e+02 1.459e+02 5.540e+02, threshold=2.695e+02, percent-clipped=0.3
203
+ 2024-08-06 11:41:56,188 INFO [trainer.py:765] (6/8) Epoch 12, batch 1000, train_loss[loss=2.799, ArTop10Accuracy=0.7682, over 12924.00 frames. ], tot_loss[loss=2.799, ArTop10Accuracy=0.7705, over 11894.35 frames. ], batch size: 27, lr: 9.85e-03
204
+ 2024-08-06 11:43:14,320 INFO [trainer.py:765] (6/8) Epoch 12, batch 1100, train_loss[loss=2.824, ArTop10Accuracy=0.7649, over 13695.00 frames. ], tot_loss[loss=2.804, ArTop10Accuracy=0.7697, over 11970.40 frames. ], batch size: 34, lr: 9.82e-03
205
+ 2024-08-06 11:44:26,156 INFO [trainer.py:765] (6/8) Epoch 12, batch 1200, train_loss[loss=2.943, ArTop10Accuracy=0.7438, over 11631.00 frames. ], tot_loss[loss=2.806, ArTop10Accuracy=0.7694, over 11876.83 frames. ], batch size: 101, lr: 9.79e-03
206
+ 2024-08-06 11:45:26,431 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
207
+ 2024-08-06 11:47:26,600 INFO [trainer.py:765] (6/8) Epoch 13, batch 100, train_loss[loss=2.817, ArTop10Accuracy=0.7696, over 14379.00 frames. ], tot_loss[loss=2.798, ArTop10Accuracy=0.77, over 4763.63 frames. ], batch size: 62, lr: 9.37e-03
208
+ 2024-08-06 11:48:54,778 INFO [trainer.py:765] (6/8) Epoch 13, batch 200, train_loss[loss=2.781, ArTop10Accuracy=0.78, over 13704.00 frames. ], tot_loss[loss=2.788, ArTop10Accuracy=0.7723, over 7763.99 frames. ], batch size: 34, lr: 9.34e-03
209
+ 2024-08-06 11:50:20,515 INFO [trainer.py:765] (6/8) Epoch 13, batch 300, train_loss[loss=2.859, ArTop10Accuracy=0.7575, over 14160.00 frames. ], tot_loss[loss=2.779, ArTop10Accuracy=0.7741, over 9394.50 frames. ], batch size: 44, lr: 9.31e-03
210
+ 2024-08-06 11:51:48,764 INFO [trainer.py:765] (6/8) Epoch 13, batch 400, train_loss[loss=2.823, ArTop10Accuracy=0.7634, over 10341.00 frames. ], tot_loss[loss=2.776, ArTop10Accuracy=0.7748, over 10317.47 frames. ], batch size: 14, lr: 9.28e-03
211
+ 2024-08-06 11:53:13,406 INFO [trainer.py:765] (6/8) Epoch 13, batch 500, train_loss[loss=2.745, ArTop10Accuracy=0.7847, over 12111.00 frames. ], tot_loss[loss=2.774, ArTop10Accuracy=0.7749, over 10867.95 frames. ], batch size: 22, lr: 9.26e-03
212
+ 2024-08-06 11:54:52,222 INFO [trainer.py:765] (6/8) Epoch 13, batch 600, train_loss[loss=2.773, ArTop10Accuracy=0.7749, over 11472.00 frames. ], tot_loss[loss=2.781, ArTop10Accuracy=0.7737, over 11389.84 frames. ], batch size: 18, lr: 9.23e-03
213
+ 2024-08-06 11:55:47,079 INFO [trainer.py:803] (6/8) Computing validation loss
214
+ 2024-08-06 11:55:56,834 INFO [trainer.py:811] (6/8) Epoch 13, validation: loss=2.824, ArTop10Accuracy=0.7662, over 1827537.00 frames.
215
+ 2024-08-06 11:55:56,835 INFO [trainer.py:814] (6/8) Maximum memory allocated so far is 32996MB
216
+ 2024-08-06 11:55:57,712 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.255e+02 1.343e+02 1.452e+02 4.888e+02, threshold=2.687e+02, percent-clipped=0.1
217
+ 2024-08-06 11:56:28,464 INFO [trainer.py:765] (6/8) Epoch 13, batch 700, train_loss[loss=2.755, ArTop10Accuracy=0.7743, over 9240.00 frames. ], tot_loss[loss=2.785, ArTop10Accuracy=0.7732, over 11529.27 frames. ], batch size: 11, lr: 9.20e-03
218
+ 2024-08-06 11:57:46,683 INFO [trainer.py:765] (6/8) Epoch 13, batch 800, train_loss[loss=2.678, ArTop10Accuracy=0.7995, over 10161.00 frames. ], tot_loss[loss=2.787, ArTop10Accuracy=0.7728, over 11666.26 frames. ], batch size: 12, lr: 9.18e-03
219
+ 2024-08-06 11:59:03,286 INFO [trainer.py:765] (6/8) Epoch 13, batch 900, train_loss[loss=2.735, ArTop10Accuracy=0.7828, over 12972.00 frames. ], tot_loss[loss=2.78, ArTop10Accuracy=0.7743, over 11696.54 frames. ], batch size: 27, lr: 9.15e-03
220
+ 2024-08-06 12:00:19,174 INFO [trainer.py:765] (6/8) Epoch 13, batch 1000, train_loss[loss=2.802, ArTop10Accuracy=0.7653, over 13188.00 frames. ], tot_loss[loss=2.784, ArTop10Accuracy=0.7734, over 11875.95 frames. ], batch size: 28, lr: 9.13e-03
221
+ 2024-08-06 12:01:34,881 INFO [trainer.py:765] (6/8) Epoch 13, batch 1100, train_loss[loss=2.776, ArTop10Accuracy=0.7804, over 13716.00 frames. ], tot_loss[loss=2.791, ArTop10Accuracy=0.7721, over 11956.17 frames. ], batch size: 34, lr: 9.10e-03
222
+ 2024-08-06 12:02:48,662 INFO [trainer.py:765] (6/8) Epoch 13, batch 1200, train_loss[loss=2.953, ArTop10Accuracy=0.7397, over 13203.00 frames. ], tot_loss[loss=2.793, ArTop10Accuracy=0.7716, over 11874.72 frames. ], batch size: 101, lr: 9.08e-03
223
+ 2024-08-06 12:03:48,490 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
224
+ 2024-08-06 12:05:45,334 INFO [trainer.py:765] (6/8) Epoch 14, batch 100, train_loss[loss=2.79, ArTop10Accuracy=0.7743, over 14433.00 frames. ], tot_loss[loss=2.776, ArTop10Accuracy=0.7745, over 4759.85 frames. ], batch size: 62, lr: 8.71e-03
225
+ 2024-08-06 12:07:16,604 INFO [trainer.py:765] (6/8) Epoch 14, batch 200, train_loss[loss=2.769, ArTop10Accuracy=0.779, over 13692.00 frames. ], tot_loss[loss=2.771, ArTop10Accuracy=0.7753, over 7748.44 frames. ], batch size: 34, lr: 8.69e-03
226
+ 2024-08-06 12:08:44,311 INFO [trainer.py:765] (6/8) Epoch 14, batch 300, train_loss[loss=2.83, ArTop10Accuracy=0.7658, over 14106.00 frames. ], tot_loss[loss=2.765, ArTop10Accuracy=0.7766, over 9362.74 frames. ], batch size: 44, lr: 8.66e-03
227
+ 2024-08-06 12:10:01,130 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.266e+02 1.374e+02 1.483e+02 6.480e+02, threshold=2.748e+02, percent-clipped=0.2
228
+ 2024-08-06 12:10:10,227 INFO [trainer.py:765] (6/8) Epoch 14, batch 400, train_loss[loss=2.692, ArTop10Accuracy=0.7942, over 10272.00 frames. ], tot_loss[loss=2.76, ArTop10Accuracy=0.7778, over 10277.08 frames. ], batch size: 14, lr: 8.64e-03
229
+ 2024-08-06 12:11:36,151 INFO [trainer.py:765] (6/8) Epoch 14, batch 500, train_loss[loss=2.784, ArTop10Accuracy=0.774, over 12231.00 frames. ], tot_loss[loss=2.76, ArTop10Accuracy=0.7779, over 10848.71 frames. ], batch size: 22, lr: 8.62e-03
230
+ 2024-08-06 12:13:05,994 INFO [trainer.py:765] (6/8) Epoch 14, batch 600, train_loss[loss=2.734, ArTop10Accuracy=0.782, over 11403.00 frames. ], tot_loss[loss=2.761, ArTop10Accuracy=0.7779, over 11369.47 frames. ], batch size: 18, lr: 8.59e-03
231
+ 2024-08-06 12:14:38,553 INFO [trainer.py:765] (6/8) Epoch 14, batch 700, train_loss[loss=2.72, ArTop10Accuracy=0.7841, over 9363.00 frames. ], tot_loss[loss=2.765, ArTop10Accuracy=0.777, over 11518.27 frames. ], batch size: 11, lr: 8.57e-03
232
+ 2024-08-06 12:15:58,070 INFO [trainer.py:765] (6/8) Epoch 14, batch 800, train_loss[loss=2.736, ArTop10Accuracy=0.7816, over 10083.00 frames. ], tot_loss[loss=2.769, ArTop10Accuracy=0.7761, over 11641.03 frames. ], batch size: 12, lr: 8.55e-03
233
+ 2024-08-06 12:17:12,866 INFO [trainer.py:765] (6/8) Epoch 14, batch 900, train_loss[loss=2.893, ArTop10Accuracy=0.7504, over 12873.00 frames. ], tot_loss[loss=2.766, ArTop10Accuracy=0.7767, over 11702.17 frames. ], batch size: 27, lr: 8.52e-03
234
+ 2024-08-06 12:18:29,614 INFO [trainer.py:765] (6/8) Epoch 14, batch 1000, train_loss[loss=2.769, ArTop10Accuracy=0.775, over 12948.00 frames. ], tot_loss[loss=2.771, ArTop10Accuracy=0.7759, over 11886.54 frames. ], batch size: 27, lr: 8.50e-03
235
+ 2024-08-06 12:19:45,377 INFO [trainer.py:765] (6/8) Epoch 14, batch 1100, train_loss[loss=2.792, ArTop10Accuracy=0.7721, over 13494.00 frames. ], tot_loss[loss=2.779, ArTop10Accuracy=0.7743, over 11923.88 frames. ], batch size: 34, lr: 8.48e-03
236
+ 2024-08-06 12:20:59,279 INFO [trainer.py:765] (6/8) Epoch 14, batch 1200, train_loss[loss=2.957, ArTop10Accuracy=0.7376, over 13002.00 frames. ], tot_loss[loss=2.779, ArTop10Accuracy=0.7743, over 11856.84 frames. ], batch size: 104, lr: 8.46e-03
237
+ 2024-08-06 12:21:58,343 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
238
+ 2024-08-06 12:23:51,961 INFO [trainer.py:765] (6/8) Epoch 15, batch 100, train_loss[loss=2.806, ArTop10Accuracy=0.7697, over 14721.00 frames. ], tot_loss[loss=2.764, ArTop10Accuracy=0.7764, over 4782.17 frames. ], batch size: 62, lr: 8.14e-03
239
+ 2024-08-06 12:24:00,599 INFO [trainer.py:803] (6/8) Computing validation loss
240
+ 2024-08-06 12:24:10,290 INFO [trainer.py:811] (6/8) Epoch 15, validation: loss=2.819, ArTop10Accuracy=0.7675, over 1827537.00 frames.
241
+ 2024-08-06 12:24:10,291 INFO [trainer.py:814] (6/8) Maximum memory allocated so far is 32996MB
242
+ 2024-08-06 12:24:11,094 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.284e+02 1.371e+02 1.488e+02 4.667e+02, threshold=2.743e+02, percent-clipped=0.2
243
+ 2024-08-06 12:25:29,987 INFO [trainer.py:765] (6/8) Epoch 15, batch 200, train_loss[loss=2.735, ArTop10Accuracy=0.784, over 13680.00 frames. ], tot_loss[loss=2.757, ArTop10Accuracy=0.7782, over 7762.01 frames. ], batch size: 34, lr: 8.12e-03
244
+ 2024-08-06 12:26:58,695 INFO [trainer.py:765] (6/8) Epoch 15, batch 300, train_loss[loss=2.759, ArTop10Accuracy=0.778, over 14037.00 frames. ], tot_loss[loss=2.754, ArTop10Accuracy=0.7789, over 9372.04 frames. ], batch size: 44, lr: 8.09e-03
245
+ 2024-08-06 12:28:28,533 INFO [trainer.py:765] (6/8) Epoch 15, batch 400, train_loss[loss=2.646, ArTop10Accuracy=0.7968, over 10359.00 frames. ], tot_loss[loss=2.752, ArTop10Accuracy=0.7791, over 10297.47 frames. ], batch size: 14, lr: 8.07e-03
246
+ 2024-08-06 12:29:54,031 INFO [trainer.py:765] (6/8) Epoch 15, batch 500, train_loss[loss=2.737, ArTop10Accuracy=0.7873, over 12321.00 frames. ], tot_loss[loss=2.746, ArTop10Accuracy=0.7803, over 10856.16 frames. ], batch size: 22, lr: 8.05e-03
247
+ 2024-08-06 12:31:23,292 INFO [trainer.py:765] (6/8) Epoch 15, batch 600, train_loss[loss=2.682, ArTop10Accuracy=0.7952, over 11307.00 frames. ], tot_loss[loss=2.749, ArTop10Accuracy=0.7797, over 11384.85 frames. ], batch size: 18, lr: 8.03e-03
248
+ 2024-08-06 12:32:53,175 INFO [trainer.py:765] (6/8) Epoch 15, batch 700, train_loss[loss=2.718, ArTop10Accuracy=0.7914, over 10119.00 frames. ], tot_loss[loss=2.752, ArTop10Accuracy=0.7794, over 11527.71 frames. ], batch size: 12, lr: 8.01e-03
249
+ 2024-08-06 12:34:18,254 INFO [trainer.py:765] (6/8) Epoch 15, batch 800, train_loss[loss=2.788, ArTop10Accuracy=0.7694, over 10101.00 frames. ], tot_loss[loss=2.756, ArTop10Accuracy=0.7787, over 11637.91 frames. ], batch size: 12, lr: 7.99e-03
250
+ 2024-08-06 12:35:34,726 INFO [trainer.py:765] (6/8) Epoch 15, batch 900, train_loss[loss=2.783, ArTop10Accuracy=0.7685, over 13059.00 frames. ], tot_loss[loss=2.751, ArTop10Accuracy=0.7798, over 11686.68 frames. ], batch size: 27, lr: 7.97e-03
251
+ 2024-08-06 12:36:50,540 INFO [trainer.py:765] (6/8) Epoch 15, batch 1000, train_loss[loss=2.65, ArTop10Accuracy=0.8008, over 12771.00 frames. ], tot_loss[loss=2.755, ArTop10Accuracy=0.779, over 11872.69 frames. ], batch size: 27, lr: 7.95e-03
252
+ 2024-08-06 12:38:05,178 INFO [trainer.py:765] (6/8) Epoch 15, batch 1100, train_loss[loss=2.793, ArTop10Accuracy=0.7725, over 13560.00 frames. ], tot_loss[loss=2.765, ArTop10Accuracy=0.777, over 11943.50 frames. ], batch size: 34, lr: 7.93e-03
253
+ 2024-08-06 12:38:12,841 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.293e+02 1.379e+02 1.467e+02 2.824e+02, threshold=2.759e+02, percent-clipped=0.1
254
+ 2024-08-06 12:39:18,788 INFO [trainer.py:765] (6/8) Epoch 15, batch 1200, train_loss[loss=2.871, ArTop10Accuracy=0.7557, over 12357.00 frames. ], tot_loss[loss=2.766, ArTop10Accuracy=0.7769, over 11881.76 frames. ], batch size: 101, lr: 7.91e-03
255
+ 2024-08-06 12:40:18,514 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
256
+ 2024-08-06 12:42:17,618 INFO [trainer.py:765] (6/8) Epoch 16, batch 100, train_loss[loss=2.841, ArTop10Accuracy=0.7601, over 14646.00 frames. ], tot_loss[loss=2.746, ArTop10Accuracy=0.7803, over 4751.87 frames. ], batch size: 62, lr: 7.63e-03
257
+ 2024-08-06 12:43:49,564 INFO [trainer.py:765] (6/8) Epoch 16, batch 200, train_loss[loss=2.797, ArTop10Accuracy=0.7643, over 13659.00 frames. ], tot_loss[loss=2.74, ArTop10Accuracy=0.7815, over 7742.92 frames. ], batch size: 34, lr: 7.61e-03
258
+ 2024-08-06 12:45:18,501 INFO [trainer.py:765] (6/8) Epoch 16, batch 300, train_loss[loss=2.836, ArTop10Accuracy=0.7623, over 14430.00 frames. ], tot_loss[loss=2.739, ArTop10Accuracy=0.7816, over 9375.84 frames. ], batch size: 44, lr: 7.59e-03
259
+ 2024-08-06 12:46:45,208 INFO [trainer.py:765] (6/8) Epoch 16, batch 400, train_loss[loss=2.658, ArTop10Accuracy=0.8016, over 10086.00 frames. ], tot_loss[loss=2.739, ArTop10Accuracy=0.7816, over 10295.80 frames. ], batch size: 14, lr: 7.58e-03
260
+ 2024-08-06 12:48:16,310 INFO [trainer.py:765] (6/8) Epoch 16, batch 500, train_loss[loss=2.604, ArTop10Accuracy=0.8112, over 12246.00 frames. ], tot_loss[loss=2.732, ArTop10Accuracy=0.7829, over 10853.76 frames. ], batch size: 22, lr: 7.56e-03
261
+ 2024-08-06 12:49:46,641 INFO [trainer.py:765] (6/8) Epoch 16, batch 600, train_loss[loss=2.725, ArTop10Accuracy=0.7851, over 11598.00 frames. ], tot_loss[loss=2.738, ArTop10Accuracy=0.782, over 11364.55 frames. ], batch size: 18, lr: 7.54e-03
262
+ 2024-08-06 12:51:23,681 INFO [trainer.py:765] (6/8) Epoch 16, batch 700, train_loss[loss=2.59, ArTop10Accuracy=0.8143, over 10077.00 frames. ], tot_loss[loss=2.742, ArTop10Accuracy=0.781, over 11512.75 frames. ], batch size: 12, lr: 7.52e-03
263
+ 2024-08-06 12:52:43,500 INFO [trainer.py:765] (6/8) Epoch 16, batch 800, train_loss[loss=2.67, ArTop10Accuracy=0.7927, over 9582.00 frames. ], tot_loss[loss=2.746, ArTop10Accuracy=0.7802, over 11628.57 frames. ], batch size: 11, lr: 7.51e-03
264
+ 2024-08-06 12:53:06,015 INFO [trainer.py:803] (6/8) Computing validation loss
265
+ 2024-08-06 12:53:15,497 INFO [trainer.py:811] (6/8) Epoch 16, validation: loss=2.816, ArTop10Accuracy=0.7678, over 1827537.00 frames.
266
+ 2024-08-06 12:53:15,497 INFO [trainer.py:814] (6/8) Maximum memory allocated so far is 32996MB
267
+ 2024-08-06 12:53:16,186 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.291e+02 1.391e+02 1.487e+02 3.459e+02, threshold=2.783e+02, percent-clipped=0.1
268
+ 2024-08-06 12:54:06,482 INFO [trainer.py:765] (6/8) Epoch 16, batch 900, train_loss[loss=2.788, ArTop10Accuracy=0.772, over 12915.00 frames. ], tot_loss[loss=2.741, ArTop10Accuracy=0.7813, over 11680.34 frames. ], batch size: 27, lr: 7.49e-03
269
+ 2024-08-06 12:55:19,791 INFO [trainer.py:765] (6/8) Epoch 16, batch 1000, train_loss[loss=2.737, ArTop10Accuracy=0.7785, over 12723.00 frames. ], tot_loss[loss=2.748, ArTop10Accuracy=0.7801, over 11873.29 frames. ], batch size: 27, lr: 7.47e-03
270
+ 2024-08-06 12:56:33,162 INFO [trainer.py:765] (6/8) Epoch 16, batch 1100, train_loss[loss=2.79, ArTop10Accuracy=0.7745, over 13440.00 frames. ], tot_loss[loss=2.758, ArTop10Accuracy=0.7782, over 11956.98 frames. ], batch size: 34, lr: 7.45e-03
271
+ 2024-08-06 12:57:48,485 INFO [trainer.py:765] (6/8) Epoch 16, batch 1200, train_loss[loss=2.865, ArTop10Accuracy=0.7548, over 12549.00 frames. ], tot_loss[loss=2.755, ArTop10Accuracy=0.7788, over 11849.71 frames. ], batch size: 101, lr: 7.44e-03
272
+ 2024-08-06 12:58:48,420 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
273
+ 2024-08-06 13:00:47,899 INFO [trainer.py:765] (6/8) Epoch 17, batch 100, train_loss[loss=2.875, ArTop10Accuracy=0.76, over 14706.00 frames. ], tot_loss[loss=2.742, ArTop10Accuracy=0.7804, over 4765.55 frames. ], batch size: 62, lr: 7.18e-03
274
+ 2024-08-06 13:02:19,302 INFO [trainer.py:765] (6/8) Epoch 17, batch 200, train_loss[loss=2.777, ArTop10Accuracy=0.7749, over 13542.00 frames. ], tot_loss[loss=2.734, ArTop10Accuracy=0.7822, over 7755.64 frames. ], batch size: 34, lr: 7.17e-03
275
+ 2024-08-06 13:03:45,515 INFO [trainer.py:765] (6/8) Epoch 17, batch 300, train_loss[loss=2.828, ArTop10Accuracy=0.7632, over 14058.00 frames. ], tot_loss[loss=2.732, ArTop10Accuracy=0.7828, over 9393.18 frames. ], batch size: 44, lr: 7.15e-03
276
+ 2024-08-06 13:05:21,759 INFO [trainer.py:765] (6/8) Epoch 17, batch 400, train_loss[loss=2.758, ArTop10Accuracy=0.7779, over 10314.00 frames. ], tot_loss[loss=2.73, ArTop10Accuracy=0.7832, over 10294.26 frames. ], batch size: 14, lr: 7.14e-03
277
+ 2024-08-06 13:06:47,020 INFO [trainer.py:765] (6/8) Epoch 17, batch 500, train_loss[loss=2.681, ArTop10Accuracy=0.7888, over 12366.00 frames. ], tot_loss[loss=2.725, ArTop10Accuracy=0.7843, over 10838.45 frames. ], batch size: 22, lr: 7.12e-03
278
+ 2024-08-06 13:07:39,878 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.293e+02 1.386e+02 1.488e+02 3.253e+02, threshold=2.772e+02, percent-clipped=0.1
279
+ 2024-08-06 13:08:22,687 INFO [trainer.py:765] (6/8) Epoch 17, batch 600, train_loss[loss=2.668, ArTop10Accuracy=0.794, over 11466.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.7836, over 11379.70 frames. ], batch size: 18, lr: 7.10e-03
280
+ 2024-08-06 13:09:54,834 INFO [trainer.py:765] (6/8) Epoch 17, batch 700, train_loss[loss=2.621, ArTop10Accuracy=0.8043, over 10122.00 frames. ], tot_loss[loss=2.734, ArTop10Accuracy=0.7823, over 11526.17 frames. ], batch size: 12, lr: 7.09e-03
281
+ 2024-08-06 13:11:19,479 INFO [trainer.py:765] (6/8) Epoch 17, batch 800, train_loss[loss=2.704, ArTop10Accuracy=0.79, over 9333.00 frames. ], tot_loss[loss=2.737, ArTop10Accuracy=0.7818, over 11640.17 frames. ], batch size: 11, lr: 7.07e-03
282
+ 2024-08-06 13:12:35,668 INFO [trainer.py:765] (6/8) Epoch 17, batch 900, train_loss[loss=2.734, ArTop10Accuracy=0.7787, over 12672.00 frames. ], tot_loss[loss=2.735, ArTop10Accuracy=0.7826, over 11668.05 frames. ], batch size: 27, lr: 7.06e-03
283
+ 2024-08-06 13:13:53,060 INFO [trainer.py:765] (6/8) Epoch 17, batch 1000, train_loss[loss=2.713, ArTop10Accuracy=0.7853, over 12852.00 frames. ], tot_loss[loss=2.74, ArTop10Accuracy=0.7818, over 11869.79 frames. ], batch size: 27, lr: 7.04e-03
284
+ 2024-08-06 13:15:08,483 INFO [trainer.py:765] (6/8) Epoch 17, batch 1100, train_loss[loss=2.706, ArTop10Accuracy=0.7856, over 13716.00 frames. ], tot_loss[loss=2.744, ArTop10Accuracy=0.781, over 11962.45 frames. ], batch size: 34, lr: 7.02e-03
285
+ 2024-08-06 13:16:22,387 INFO [trainer.py:765] (6/8) Epoch 17, batch 1200, train_loss[loss=2.909, ArTop10Accuracy=0.7491, over 12246.00 frames. ], tot_loss[loss=2.744, ArTop10Accuracy=0.781, over 11873.02 frames. ], batch size: 101, lr: 7.01e-03
286
+ 2024-08-06 13:17:22,043 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
287
+ 2024-08-06 13:19:15,993 INFO [trainer.py:765] (6/8) Epoch 18, batch 100, train_loss[loss=2.776, ArTop10Accuracy=0.7781, over 14277.00 frames. ], tot_loss[loss=2.73, ArTop10Accuracy=0.783, over 4750.54 frames. ], batch size: 62, lr: 6.78e-03
288
+ 2024-08-06 13:20:46,600 INFO [trainer.py:765] (6/8) Epoch 18, batch 200, train_loss[loss=2.672, ArTop10Accuracy=0.7921, over 13611.00 frames. ], tot_loss[loss=2.726, ArTop10Accuracy=0.7837, over 7748.30 frames. ], batch size: 34, lr: 6.77e-03
289
+ 2024-08-06 13:21:55,104 INFO [trainer.py:803] (6/8) Computing validation loss
290
+ 2024-08-06 13:22:04,751 INFO [trainer.py:811] (6/8) Epoch 18, validation: loss=2.817, ArTop10Accuracy=0.768, over 1827537.00 frames.
291
+ 2024-08-06 13:22:04,752 INFO [trainer.py:814] (6/8) Maximum memory allocated so far is 32996MB
292
+ 2024-08-06 13:22:05,473 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.323e+02 1.409e+02 1.514e+02 3.209e+02, threshold=2.818e+02, percent-clipped=0.1
293
+ 2024-08-06 13:22:26,581 INFO [trainer.py:765] (6/8) Epoch 18, batch 300, train_loss[loss=2.838, ArTop10Accuracy=0.7638, over 13896.00 frames. ], tot_loss[loss=2.72, ArTop10Accuracy=0.7851, over 9377.58 frames. ], batch size: 44, lr: 6.76e-03
294
+ 2024-08-06 13:23:57,930 INFO [trainer.py:765] (6/8) Epoch 18, batch 400, train_loss[loss=2.641, ArTop10Accuracy=0.8014, over 10359.00 frames. ], tot_loss[loss=2.715, ArTop10Accuracy=0.7864, over 10291.94 frames. ], batch size: 14, lr: 6.74e-03
295
+ 2024-08-06 13:25:34,013 INFO [trainer.py:765] (6/8) Epoch 18, batch 500, train_loss[loss=2.668, ArTop10Accuracy=0.7913, over 12363.00 frames. ], tot_loss[loss=2.707, ArTop10Accuracy=0.7878, over 10844.48 frames. ], batch size: 22, lr: 6.73e-03
296
+ 2024-08-06 13:27:00,634 INFO [trainer.py:765] (6/8) Epoch 18, batch 600, train_loss[loss=2.764, ArTop10Accuracy=0.7797, over 11328.00 frames. ], tot_loss[loss=2.715, ArTop10Accuracy=0.7862, over 11349.84 frames. ], batch size: 18, lr: 6.71e-03
297
+ 2024-08-06 13:28:33,582 INFO [trainer.py:765] (6/8) Epoch 18, batch 700, train_loss[loss=2.591, ArTop10Accuracy=0.8194, over 10095.00 frames. ], tot_loss[loss=2.722, ArTop10Accuracy=0.7846, over 11508.81 frames. ], batch size: 12, lr: 6.70e-03
298
+ 2024-08-06 13:29:54,986 INFO [trainer.py:765] (6/8) Epoch 18, batch 800, train_loss[loss=2.675, ArTop10Accuracy=0.7975, over 10242.00 frames. ], tot_loss[loss=2.727, ArTop10Accuracy=0.7839, over 11641.75 frames. ], batch size: 12, lr: 6.68e-03
299
+ 2024-08-06 13:31:12,519 INFO [trainer.py:765] (6/8) Epoch 18, batch 900, train_loss[loss=2.736, ArTop10Accuracy=0.785, over 12867.00 frames. ], tot_loss[loss=2.723, ArTop10Accuracy=0.7847, over 11684.78 frames. ], batch size: 27, lr: 6.67e-03
300
+ 2024-08-06 13:32:26,552 INFO [trainer.py:765] (6/8) Epoch 18, batch 1000, train_loss[loss=2.675, ArTop10Accuracy=0.7963, over 13053.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.7837, over 11887.14 frames. ], batch size: 27, lr: 6.66e-03
301
+ 2024-08-06 13:33:41,498 INFO [trainer.py:765] (6/8) Epoch 18, batch 1100, train_loss[loss=2.771, ArTop10Accuracy=0.7765, over 13674.00 frames. ], tot_loss[loss=2.737, ArTop10Accuracy=0.7822, over 11950.20 frames. ], batch size: 34, lr: 6.64e-03
302
+ 2024-08-06 13:34:54,674 INFO [trainer.py:765] (6/8) Epoch 18, batch 1200, train_loss[loss=2.864, ArTop10Accuracy=0.756, over 12729.00 frames. ], tot_loss[loss=2.737, ArTop10Accuracy=0.7823, over 11875.52 frames. ], batch size: 103, lr: 6.63e-03
303
+ 2024-08-06 13:35:51,064 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.340e+02 1.433e+02 1.533e+02 2.444e+02, threshold=2.867e+02, percent-clipped=0.0
304
+ 2024-08-06 13:35:54,178 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
305
+ 2024-08-06 13:37:48,625 INFO [trainer.py:765] (6/8) Epoch 19, batch 100, train_loss[loss=2.803, ArTop10Accuracy=0.7693, over 14697.00 frames. ], tot_loss[loss=2.725, ArTop10Accuracy=0.784, over 4767.13 frames. ], batch size: 62, lr: 6.43e-03
306
+ 2024-08-06 13:39:23,257 INFO [trainer.py:765] (6/8) Epoch 19, batch 200, train_loss[loss=2.731, ArTop10Accuracy=0.7833, over 13434.00 frames. ], tot_loss[loss=2.714, ArTop10Accuracy=0.7859, over 7768.01 frames. ], batch size: 34, lr: 6.41e-03
307
+ 2024-08-06 13:40:48,360 INFO [trainer.py:765] (6/8) Epoch 19, batch 300, train_loss[loss=2.752, ArTop10Accuracy=0.7792, over 14064.00 frames. ], tot_loss[loss=2.709, ArTop10Accuracy=0.7868, over 9391.40 frames. ], batch size: 44, lr: 6.40e-03
308
+ 2024-08-06 13:42:21,067 INFO [trainer.py:765] (6/8) Epoch 19, batch 400, train_loss[loss=2.602, ArTop10Accuracy=0.8087, over 10113.00 frames. ], tot_loss[loss=2.704, ArTop10Accuracy=0.7881, over 10296.90 frames. ], batch size: 14, lr: 6.39e-03
309
+ 2024-08-06 13:43:44,955 INFO [trainer.py:765] (6/8) Epoch 19, batch 500, train_loss[loss=2.709, ArTop10Accuracy=0.7881, over 12324.00 frames. ], tot_loss[loss=2.702, ArTop10Accuracy=0.7884, over 10837.31 frames. ], batch size: 22, lr: 6.37e-03
310
+ 2024-08-06 13:45:16,682 INFO [trainer.py:765] (6/8) Epoch 19, batch 600, train_loss[loss=2.716, ArTop10Accuracy=0.7878, over 11298.00 frames. ], tot_loss[loss=2.709, ArTop10Accuracy=0.7873, over 11358.08 frames. ], batch size: 18, lr: 6.36e-03
311
+ 2024-08-06 13:46:48,324 INFO [trainer.py:765] (6/8) Epoch 19, batch 700, train_loss[loss=2.64, ArTop10Accuracy=0.7995, over 9423.00 frames. ], tot_loss[loss=2.713, ArTop10Accuracy=0.7867, over 11509.57 frames. ], batch size: 11, lr: 6.35e-03
312
+ 2024-08-06 13:48:11,884 INFO [trainer.py:765] (6/8) Epoch 19, batch 800, train_loss[loss=2.655, ArTop10Accuracy=0.7969, over 9987.00 frames. ], tot_loss[loss=2.718, ArTop10Accuracy=0.7858, over 11618.78 frames. ], batch size: 12, lr: 6.34e-03
313
+ 2024-08-06 13:49:27,259 INFO [trainer.py:765] (6/8) Epoch 19, batch 900, train_loss[loss=2.658, ArTop10Accuracy=0.7996, over 12774.00 frames. ], tot_loss[loss=2.717, ArTop10Accuracy=0.7859, over 11675.63 frames. ], batch size: 27, lr: 6.32e-03
314
+ 2024-08-06 13:50:40,655 INFO [trainer.py:803] (6/8) Computing validation loss
315
+ 2024-08-06 13:50:50,537 INFO [trainer.py:811] (6/8) Epoch 19, validation: loss=2.818, ArTop10Accuracy=0.7679, over 1827537.00 frames.
316
+ 2024-08-06 13:50:50,537 INFO [trainer.py:814] (6/8) Maximum memory allocated so far is 33001MB
317
+ 2024-08-06 13:50:51,489 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.371e+02 1.455e+02 1.550e+02 3.697e+02, threshold=2.909e+02, percent-clipped=0.2
318
+ 2024-08-06 13:50:52,915 INFO [trainer.py:765] (6/8) Epoch 19, batch 1000, train_loss[loss=2.717, ArTop10Accuracy=0.7874, over 13323.00 frames. ], tot_loss[loss=2.718, ArTop10Accuracy=0.7857, over 11862.01 frames. ], batch size: 28, lr: 6.31e-03
319
+ 2024-08-06 13:52:08,265 INFO [trainer.py:765] (6/8) Epoch 19, batch 1100, train_loss[loss=2.755, ArTop10Accuracy=0.7818, over 13548.00 frames. ], tot_loss[loss=2.724, ArTop10Accuracy=0.7847, over 11965.46 frames. ], batch size: 34, lr: 6.30e-03
320
+ 2024-08-06 13:53:22,311 INFO [trainer.py:765] (6/8) Epoch 19, batch 1200, train_loss[loss=2.826, ArTop10Accuracy=0.7675, over 12309.00 frames. ], tot_loss[loss=2.726, ArTop10Accuracy=0.7843, over 11867.08 frames. ], batch size: 101, lr: 6.28e-03
321
+ 2024-08-06 13:54:21,695 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
322
+ 2024-08-06 13:56:12,904 INFO [trainer.py:765] (6/8) Epoch 20, batch 100, train_loss[loss=2.797, ArTop10Accuracy=0.7733, over 14463.00 frames. ], tot_loss[loss=2.713, ArTop10Accuracy=0.7859, over 4772.15 frames. ], batch size: 62, lr: 6.10e-03
323
+ 2024-08-06 13:57:42,494 INFO [trainer.py:765] (6/8) Epoch 20, batch 200, train_loss[loss=2.71, ArTop10Accuracy=0.793, over 13584.00 frames. ], tot_loss[loss=2.708, ArTop10Accuracy=0.7872, over 7731.54 frames. ], batch size: 34, lr: 6.09e-03
324
+ 2024-08-06 13:59:15,430 INFO [trainer.py:765] (6/8) Epoch 20, batch 300, train_loss[loss=2.815, ArTop10Accuracy=0.7639, over 14337.00 frames. ], tot_loss[loss=2.703, ArTop10Accuracy=0.7882, over 9366.66 frames. ], batch size: 44, lr: 6.08e-03
325
+ 2024-08-06 14:00:44,356 INFO [trainer.py:765] (6/8) Epoch 20, batch 400, train_loss[loss=2.476, ArTop10Accuracy=0.8339, over 10269.00 frames. ], tot_loss[loss=2.698, ArTop10Accuracy=0.7895, over 10281.80 frames. ], batch size: 14, lr: 6.07e-03
326
+ 2024-08-06 14:02:14,854 INFO [trainer.py:765] (6/8) Epoch 20, batch 500, train_loss[loss=2.669, ArTop10Accuracy=0.7987, over 12246.00 frames. ], tot_loss[loss=2.695, ArTop10Accuracy=0.7901, over 10844.32 frames. ], batch size: 22, lr: 6.06e-03
327
+ 2024-08-06 14:03:40,855 INFO [trainer.py:765] (6/8) Epoch 20, batch 600, train_loss[loss=2.728, ArTop10Accuracy=0.7828, over 11280.00 frames. ], tot_loss[loss=2.698, ArTop10Accuracy=0.7897, over 11360.98 frames. ], batch size: 18, lr: 6.04e-03
328
+ 2024-08-06 14:05:13,864 INFO [trainer.py:765] (6/8) Epoch 20, batch 700, train_loss[loss=2.634, ArTop10Accuracy=0.8074, over 9540.00 frames. ], tot_loss[loss=2.701, ArTop10Accuracy=0.7889, over 11505.93 frames. ], batch size: 11, lr: 6.03e-03
329
+ 2024-08-06 14:05:30,791 INFO [optim.py:386] (6/8) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.365e+02 1.456e+02 1.550e+02 3.525e+02, threshold=2.913e+02, percent-clipped=0.1
330
+ 2024-08-06 14:06:34,509 INFO [trainer.py:765] (6/8) Epoch 20, batch 800, train_loss[loss=2.617, ArTop10Accuracy=0.8054, over 9417.00 frames. ], tot_loss[loss=2.708, ArTop10Accuracy=0.7877, over 11629.12 frames. ], batch size: 11, lr: 6.02e-03
331
+ 2024-08-06 14:07:50,944 INFO [trainer.py:765] (6/8) Epoch 20, batch 900, train_loss[loss=2.679, ArTop10Accuracy=0.7932, over 12960.00 frames. ], tot_loss[loss=2.702, ArTop10Accuracy=0.7887, over 11692.16 frames. ], batch size: 27, lr: 6.01e-03
332
+ 2024-08-06 14:09:07,172 INFO [trainer.py:765] (6/8) Epoch 20, batch 1000, train_loss[loss=2.667, ArTop10Accuracy=0.7966, over 12981.00 frames. ], tot_loss[loss=2.707, ArTop10Accuracy=0.7878, over 11875.75 frames. ], batch size: 27, lr: 6.00e-03
333
+ 2024-08-06 14:10:21,209 INFO [trainer.py:765] (6/8) Epoch 20, batch 1100, train_loss[loss=2.763, ArTop10Accuracy=0.7709, over 13563.00 frames. ], tot_loss[loss=2.715, ArTop10Accuracy=0.786, over 11962.47 frames. ], batch size: 34, lr: 5.99e-03
334
+ 2024-08-06 14:11:37,812 INFO [trainer.py:765] (6/8) Epoch 20, batch 1200, train_loss[loss=2.868, ArTop10Accuracy=0.7602, over 12078.00 frames. ], tot_loss[loss=2.715, ArTop10Accuracy=0.7859, over 11889.77 frames. ], batch size: 101, lr: 5.98e-03
335
+ 2024-08-06 14:12:37,393 INFO [trainer.py:650] (6/8) Reaches end of dataloader.
336
+ 2024-08-06 14:12:37,395 INFO [trainer.py:1069] (6/8) Done!
libritts-r/log/log-train-2024-08-06-08-06-14-7 ADDED
@@ -0,0 +1,336 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 08:06:14,313 INFO [trainer.py:870] (7/8) Training started
2
+ 2024-08-06 08:06:14,314 INFO [trainer.py:889] (7/8) Device: cuda:7
3
+ 2024-08-06 08:06:14,314 INFO [trainer.py:890] (7/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6867463', 'IP address': '0.104.202.7'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 20000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 08:06:14,314 INFO [trainer.py:892] (7/8) About to create model
5
+ 2024-08-06 08:06:15,084 INFO [trainer.py:899] (7/8) Number of model parameters: 367386628
6
+ 2024-08-06 08:06:16,739 INFO [trainer.py:914] (7/8) Using DDP
7
+ 2024-08-06 08:06:19,151 INFO [datamodule.py:427] (7/8) About to get train cuts
8
+ 2024-08-06 08:06:19,153 INFO [datamodule.py:434] (7/8) About to get dev cuts
9
+ 2024-08-06 08:06:19,154 INFO [datamodule.py:292] (7/8) Disable SpecAugment
10
+ 2024-08-06 08:06:19,154 INFO [datamodule.py:294] (7/8) About to create train dataset
11
+ 2024-08-06 08:06:19,155 INFO [datamodule.py:323] (7/8) Using DynamicBucketingSampler
12
+ 2024-08-06 08:06:19,768 INFO [datamodule.py:344] (7/8) About to create train dataloader
13
+ 2024-08-06 08:06:19,768 INFO [datamodule.py:367] (7/8) About to create dev dataset
14
+ 2024-08-06 08:06:20,094 INFO [datamodule.py:388] (7/8) About to create dev dataloader
15
+ 2024-08-06 08:08:02,126 INFO [trainer.py:765] (7/8) Epoch 1, batch 100, train_loss[loss=4.362, ArTop10Accuracy=0.4896, over 14457.00 frames. ], tot_loss[loss=5.049, ArTop10Accuracy=0.3745, over 4762.40 frames. ], batch size: 62, lr: 2.25e-02
16
+ 2024-08-06 08:09:28,833 INFO [trainer.py:765] (7/8) Epoch 1, batch 200, train_loss[loss=4.118, ArTop10Accuracy=0.5253, over 13509.00 frames. ], tot_loss[loss=4.493, ArTop10Accuracy=0.467, over 7761.95 frames. ], batch size: 34, lr: 3.00e-02
17
+ 2024-08-06 08:10:52,434 INFO [trainer.py:765] (7/8) Epoch 1, batch 300, train_loss[loss=3.871, ArTop10Accuracy=0.5676, over 14031.00 frames. ], tot_loss[loss=4.216, ArTop10Accuracy=0.5135, over 9370.01 frames. ], batch size: 44, lr: 3.00e-02
18
+ 2024-08-06 08:12:12,702 INFO [trainer.py:765] (7/8) Epoch 1, batch 400, train_loss[loss=3.659, ArTop10Accuracy=0.6127, over 10974.00 frames. ], tot_loss[loss=4.026, ArTop10Accuracy=0.5457, over 10298.14 frames. ], batch size: 15, lr: 3.00e-02
19
+ 2024-08-06 08:13:40,053 INFO [trainer.py:765] (7/8) Epoch 1, batch 500, train_loss[loss=3.59, ArTop10Accuracy=0.627, over 12051.00 frames. ], tot_loss[loss=3.88, ArTop10Accuracy=0.5713, over 10855.87 frames. ], batch size: 22, lr: 2.99e-02
20
+ 2024-08-06 08:15:00,246 INFO [trainer.py:765] (7/8) Epoch 1, batch 600, train_loss[loss=3.581, ArTop10Accuracy=0.6283, over 11385.00 frames. ], tot_loss[loss=3.765, ArTop10Accuracy=0.5915, over 11370.53 frames. ], batch size: 18, lr: 2.99e-02
21
+ 2024-08-06 08:16:26,428 INFO [trainer.py:765] (7/8) Epoch 1, batch 700, train_loss[loss=3.46, ArTop10Accuracy=0.6474, over 9840.00 frames. ], tot_loss[loss=3.686, ArTop10Accuracy=0.6055, over 11504.33 frames. ], batch size: 12, lr: 2.99e-02
22
+ 2024-08-06 08:17:43,021 INFO [trainer.py:765] (7/8) Epoch 1, batch 800, train_loss[loss=3.483, ArTop10Accuracy=0.6427, over 9441.00 frames. ], tot_loss[loss=3.625, ArTop10Accuracy=0.6167, over 11619.59 frames. ], batch size: 11, lr: 2.98e-02
23
+ 2024-08-06 08:18:56,154 INFO [trainer.py:765] (7/8) Epoch 1, batch 900, train_loss[loss=3.48, ArTop10Accuracy=0.6414, over 13068.00 frames. ], tot_loss[loss=3.565, ArTop10Accuracy=0.6279, over 11679.43 frames. ], batch size: 27, lr: 2.98e-02
24
+ 2024-08-06 08:20:12,866 INFO [trainer.py:765] (7/8) Epoch 1, batch 1000, train_loss[loss=3.465, ArTop10Accuracy=0.6448, over 13329.00 frames. ], tot_loss[loss=3.524, ArTop10Accuracy=0.6351, over 11878.27 frames. ], batch size: 28, lr: 2.97e-02
25
+ 2024-08-06 08:20:13,547 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 9.300e+01 1.871e+02 2.675e+02 4.030e+02 9.119e+03, threshold=5.351e+02, percent-clipped=0.0
26
+ 2024-08-06 08:21:29,160 INFO [trainer.py:765] (7/8) Epoch 1, batch 1100, train_loss[loss=3.453, ArTop10Accuracy=0.6489, over 14163.00 frames. ], tot_loss[loss=3.487, ArTop10Accuracy=0.6419, over 11947.56 frames. ], batch size: 35, lr: 2.96e-02
27
+ 2024-08-06 08:22:45,419 INFO [trainer.py:765] (7/8) Epoch 1, batch 1200, train_loss[loss=3.473, ArTop10Accuracy=0.6507, over 12900.00 frames. ], tot_loss[loss=3.462, ArTop10Accuracy=0.6467, over 11842.32 frames. ], batch size: 103, lr: 2.96e-02
28
+ 2024-08-06 08:23:45,173 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
29
+ 2024-08-06 08:25:36,245 INFO [trainer.py:765] (7/8) Epoch 2, batch 100, train_loss[loss=3.444, ArTop10Accuracy=0.6432, over 14592.00 frames. ], tot_loss[loss=3.421, ArTop10Accuracy=0.6525, over 4756.08 frames. ], batch size: 62, lr: 2.90e-02
30
+ 2024-08-06 08:26:58,964 INFO [trainer.py:765] (7/8) Epoch 2, batch 200, train_loss[loss=3.272, ArTop10Accuracy=0.6805, over 13635.00 frames. ], tot_loss[loss=3.39, ArTop10Accuracy=0.6589, over 7735.02 frames. ], batch size: 34, lr: 2.89e-02
31
+ 2024-08-06 08:28:25,540 INFO [trainer.py:765] (7/8) Epoch 2, batch 300, train_loss[loss=3.393, ArTop10Accuracy=0.6549, over 14247.00 frames. ], tot_loss[loss=3.372, ArTop10Accuracy=0.6625, over 9363.55 frames. ], batch size: 44, lr: 2.89e-02
32
+ 2024-08-06 08:29:48,645 INFO [trainer.py:765] (7/8) Epoch 2, batch 400, train_loss[loss=3.26, ArTop10Accuracy=0.6889, over 10353.00 frames. ], tot_loss[loss=3.355, ArTop10Accuracy=0.6658, over 10272.53 frames. ], batch size: 14, lr: 2.88e-02
33
+ 2024-08-06 08:31:22,910 INFO [trainer.py:765] (7/8) Epoch 2, batch 500, train_loss[loss=3.372, ArTop10Accuracy=0.6614, over 12180.00 frames. ], tot_loss[loss=3.339, ArTop10Accuracy=0.6693, over 10849.42 frames. ], batch size: 22, lr: 2.87e-02
34
+ 2024-08-06 08:32:45,693 INFO [trainer.py:765] (7/8) Epoch 2, batch 600, train_loss[loss=3.357, ArTop10Accuracy=0.6623, over 11406.00 frames. ], tot_loss[loss=3.329, ArTop10Accuracy=0.6711, over 11357.08 frames. ], batch size: 18, lr: 2.86e-02
35
+ 2024-08-06 08:34:13,589 INFO [trainer.py:765] (7/8) Epoch 2, batch 700, train_loss[loss=3.281, ArTop10Accuracy=0.6847, over 9357.00 frames. ], tot_loss[loss=3.323, ArTop10Accuracy=0.6721, over 11496.48 frames. ], batch size: 11, lr: 2.85e-02
36
+ 2024-08-06 08:34:31,180 INFO [trainer.py:803] (7/8) Computing validation loss
37
+ 2024-08-06 08:34:40,888 INFO [trainer.py:811] (7/8) Epoch 2, validation: loss=3.277, ArTop10Accuracy=0.6803, over 1827537.00 frames.
38
+ 2024-08-06 08:34:40,889 INFO [trainer.py:814] (7/8) Maximum memory allocated so far is 28320MB
39
+ 2024-08-06 08:34:41,706 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 7.953e+01 1.592e+02 2.200e+02 3.344e+02 2.949e+03, threshold=4.400e+02, percent-clipped=8.6
40
+ 2024-08-06 08:35:39,884 INFO [trainer.py:765] (7/8) Epoch 2, batch 800, train_loss[loss=3.251, ArTop10Accuracy=0.6922, over 10062.00 frames. ], tot_loss[loss=3.323, ArTop10Accuracy=0.6723, over 11621.00 frames. ], batch size: 12, lr: 2.84e-02
41
+ 2024-08-06 08:36:56,378 INFO [trainer.py:765] (7/8) Epoch 2, batch 900, train_loss[loss=3.305, ArTop10Accuracy=0.6733, over 12777.00 frames. ], tot_loss[loss=3.309, ArTop10Accuracy=0.6752, over 11675.12 frames. ], batch size: 27, lr: 2.83e-02
42
+ 2024-08-06 08:38:10,518 INFO [trainer.py:765] (7/8) Epoch 2, batch 1000, train_loss[loss=3.269, ArTop10Accuracy=0.6866, over 12738.00 frames. ], tot_loss[loss=3.299, ArTop10Accuracy=0.677, over 11877.04 frames. ], batch size: 27, lr: 2.82e-02
43
+ 2024-08-06 08:39:25,065 INFO [trainer.py:765] (7/8) Epoch 2, batch 1100, train_loss[loss=3.277, ArTop10Accuracy=0.6814, over 13695.00 frames. ], tot_loss[loss=3.289, ArTop10Accuracy=0.679, over 11951.32 frames. ], batch size: 34, lr: 2.81e-02
44
+ 2024-08-06 08:40:38,225 INFO [trainer.py:765] (7/8) Epoch 2, batch 1200, train_loss[loss=3.377, ArTop10Accuracy=0.6608, over 12480.00 frames. ], tot_loss[loss=3.278, ArTop10Accuracy=0.681, over 11872.33 frames. ], batch size: 103, lr: 2.80e-02
45
+ 2024-08-06 08:41:38,406 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
46
+ 2024-08-06 08:43:36,655 INFO [trainer.py:765] (7/8) Epoch 3, batch 100, train_loss[loss=3.251, ArTop10Accuracy=0.6868, over 14670.00 frames. ], tot_loss[loss=3.247, ArTop10Accuracy=0.6854, over 4767.29 frames. ], batch size: 62, lr: 2.67e-02
47
+ 2024-08-06 08:45:10,507 INFO [trainer.py:765] (7/8) Epoch 3, batch 200, train_loss[loss=3.166, ArTop10Accuracy=0.7034, over 13593.00 frames. ], tot_loss[loss=3.22, ArTop10Accuracy=0.6913, over 7727.64 frames. ], batch size: 34, lr: 2.66e-02
48
+ 2024-08-06 08:46:29,264 INFO [trainer.py:765] (7/8) Epoch 3, batch 300, train_loss[loss=3.151, ArTop10Accuracy=0.7055, over 14394.00 frames. ], tot_loss[loss=3.198, ArTop10Accuracy=0.6953, over 9351.33 frames. ], batch size: 45, lr: 2.64e-02
49
+ 2024-08-06 08:48:04,224 INFO [trainer.py:765] (7/8) Epoch 3, batch 400, train_loss[loss=3.084, ArTop10Accuracy=0.7187, over 10278.00 frames. ], tot_loss[loss=3.18, ArTop10Accuracy=0.6989, over 10262.26 frames. ], batch size: 14, lr: 2.63e-02
50
+ 2024-08-06 08:48:40,887 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 9.282e+01 1.561e+02 1.981e+02 2.686e+02 1.768e+03, threshold=3.962e+02, percent-clipped=7.6
51
+ 2024-08-06 08:49:25,547 INFO [trainer.py:765] (7/8) Epoch 3, batch 500, train_loss[loss=3.081, ArTop10Accuracy=0.7195, over 12162.00 frames. ], tot_loss[loss=3.166, ArTop10Accuracy=0.7019, over 10825.79 frames. ], batch size: 22, lr: 2.62e-02
52
+ 2024-08-06 08:51:00,482 INFO [trainer.py:765] (7/8) Epoch 3, batch 600, train_loss[loss=3.024, ArTop10Accuracy=0.7292, over 11460.00 frames. ], tot_loss[loss=3.154, ArTop10Accuracy=0.7042, over 11342.13 frames. ], batch size: 18, lr: 2.61e-02
53
+ 2024-08-06 08:52:31,623 INFO [trainer.py:765] (7/8) Epoch 3, batch 700, train_loss[loss=3.017, ArTop10Accuracy=0.7306, over 9441.00 frames. ], tot_loss[loss=3.146, ArTop10Accuracy=0.7056, over 11498.02 frames. ], batch size: 11, lr: 2.60e-02
54
+ 2024-08-06 08:53:57,394 INFO [trainer.py:765] (7/8) Epoch 3, batch 800, train_loss[loss=3.106, ArTop10Accuracy=0.7194, over 9345.00 frames. ], tot_loss[loss=3.141, ArTop10Accuracy=0.7065, over 11638.04 frames. ], batch size: 11, lr: 2.59e-02
55
+ 2024-08-06 08:55:15,124 INFO [trainer.py:765] (7/8) Epoch 3, batch 900, train_loss[loss=3.124, ArTop10Accuracy=0.7148, over 12777.00 frames. ], tot_loss[loss=3.12, ArTop10Accuracy=0.7106, over 11686.70 frames. ], batch size: 27, lr: 2.57e-02
56
+ 2024-08-06 08:56:31,563 INFO [trainer.py:765] (7/8) Epoch 3, batch 1000, train_loss[loss=3.048, ArTop10Accuracy=0.7245, over 13221.00 frames. ], tot_loss[loss=3.112, ArTop10Accuracy=0.7118, over 11891.42 frames. ], batch size: 28, lr: 2.56e-02
57
+ 2024-08-06 08:57:46,512 INFO [trainer.py:765] (7/8) Epoch 3, batch 1100, train_loss[loss=3.064, ArTop10Accuracy=0.7176, over 13782.00 frames. ], tot_loss[loss=3.105, ArTop10Accuracy=0.713, over 11944.77 frames. ], batch size: 34, lr: 2.55e-02
58
+ 2024-08-06 08:59:01,406 INFO [trainer.py:765] (7/8) Epoch 3, batch 1200, train_loss[loss=3.197, ArTop10Accuracy=0.6902, over 11523.00 frames. ], tot_loss[loss=3.094, ArTop10Accuracy=0.7152, over 11857.67 frames. ], batch size: 101, lr: 2.54e-02
59
+ 2024-08-06 09:00:01,956 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
60
+ 2024-08-06 09:01:50,747 INFO [trainer.py:765] (7/8) Epoch 4, batch 100, train_loss[loss=3.081, ArTop10Accuracy=0.7135, over 14556.00 frames. ], tot_loss[loss=3.068, ArTop10Accuracy=0.7198, over 4752.46 frames. ], batch size: 62, lr: 2.38e-02
61
+ 2024-08-06 09:02:52,864 INFO [trainer.py:803] (7/8) Computing validation loss
62
+ 2024-08-06 09:03:02,384 INFO [trainer.py:811] (7/8) Epoch 4, validation: loss=2.997, ArTop10Accuracy=0.7338, over 1827537.00 frames.
63
+ 2024-08-06 09:03:02,385 INFO [trainer.py:814] (7/8) Maximum memory allocated so far is 32854MB
64
+ 2024-08-06 09:03:03,370 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.499e+02 1.782e+02 2.273e+02 1.100e+03, threshold=3.565e+02, percent-clipped=4.7
65
+ 2024-08-06 09:03:29,279 INFO [trainer.py:765] (7/8) Epoch 4, batch 200, train_loss[loss=3.04, ArTop10Accuracy=0.7267, over 13431.00 frames. ], tot_loss[loss=3.05, ArTop10Accuracy=0.7235, over 7758.26 frames. ], batch size: 34, lr: 2.37e-02
66
+ 2024-08-06 09:05:01,738 INFO [trainer.py:765] (7/8) Epoch 4, batch 300, train_loss[loss=3.08, ArTop10Accuracy=0.7196, over 14442.00 frames. ], tot_loss[loss=3.04, ArTop10Accuracy=0.7252, over 9376.27 frames. ], batch size: 45, lr: 2.36e-02
67
+ 2024-08-06 09:06:28,158 INFO [trainer.py:765] (7/8) Epoch 4, batch 400, train_loss[loss=2.945, ArTop10Accuracy=0.7439, over 11004.00 frames. ], tot_loss[loss=3.035, ArTop10Accuracy=0.7263, over 10292.54 frames. ], batch size: 15, lr: 2.34e-02
68
+ 2024-08-06 09:08:01,931 INFO [trainer.py:765] (7/8) Epoch 4, batch 500, train_loss[loss=3.035, ArTop10Accuracy=0.7318, over 12075.00 frames. ], tot_loss[loss=3.026, ArTop10Accuracy=0.7281, over 10859.34 frames. ], batch size: 22, lr: 2.33e-02
69
+ 2024-08-06 09:09:28,546 INFO [trainer.py:765] (7/8) Epoch 4, batch 600, train_loss[loss=3.041, ArTop10Accuracy=0.7257, over 11595.00 frames. ], tot_loss[loss=3.022, ArTop10Accuracy=0.7289, over 11371.96 frames. ], batch size: 18, lr: 2.32e-02
70
+ 2024-08-06 09:10:59,871 INFO [trainer.py:765] (7/8) Epoch 4, batch 700, train_loss[loss=2.907, ArTop10Accuracy=0.7516, over 9414.00 frames. ], tot_loss[loss=3.02, ArTop10Accuracy=0.7293, over 11502.51 frames. ], batch size: 11, lr: 2.31e-02
71
+ 2024-08-06 09:12:17,518 INFO [trainer.py:765] (7/8) Epoch 4, batch 800, train_loss[loss=3.083, ArTop10Accuracy=0.7137, over 10341.00 frames. ], tot_loss[loss=3.021, ArTop10Accuracy=0.7287, over 11643.99 frames. ], batch size: 12, lr: 2.30e-02
72
+ 2024-08-06 09:13:33,218 INFO [trainer.py:765] (7/8) Epoch 4, batch 900, train_loss[loss=2.993, ArTop10Accuracy=0.7329, over 13167.00 frames. ], tot_loss[loss=3.013, ArTop10Accuracy=0.7305, over 11694.20 frames. ], batch size: 28, lr: 2.29e-02
73
+ 2024-08-06 09:14:47,526 INFO [trainer.py:765] (7/8) Epoch 4, batch 1000, train_loss[loss=2.966, ArTop10Accuracy=0.7427, over 13272.00 frames. ], tot_loss[loss=3.011, ArTop10Accuracy=0.7308, over 11890.45 frames. ], batch size: 28, lr: 2.28e-02
74
+ 2024-08-06 09:16:02,988 INFO [trainer.py:765] (7/8) Epoch 4, batch 1100, train_loss[loss=3.074, ArTop10Accuracy=0.7127, over 14052.00 frames. ], tot_loss[loss=3.012, ArTop10Accuracy=0.7307, over 11951.97 frames. ], batch size: 35, lr: 2.26e-02
75
+ 2024-08-06 09:16:53,297 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.440e+02 1.636e+02 1.968e+02 7.702e+02, threshold=3.273e+02, percent-clipped=1.3
76
+ 2024-08-06 09:17:18,350 INFO [trainer.py:765] (7/8) Epoch 4, batch 1200, train_loss[loss=3.072, ArTop10Accuracy=0.7225, over 12207.00 frames. ], tot_loss[loss=3.011, ArTop10Accuracy=0.7309, over 11891.15 frames. ], batch size: 101, lr: 2.25e-02
77
+ 2024-08-06 09:18:17,193 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
78
+ 2024-08-06 09:20:17,179 INFO [trainer.py:765] (7/8) Epoch 5, batch 100, train_loss[loss=2.969, ArTop10Accuracy=0.738, over 14637.00 frames. ], tot_loss[loss=2.991, ArTop10Accuracy=0.7338, over 4753.25 frames. ], batch size: 63, lr: 2.10e-02
79
+ 2024-08-06 09:21:52,302 INFO [trainer.py:765] (7/8) Epoch 5, batch 200, train_loss[loss=2.949, ArTop10Accuracy=0.742, over 13872.00 frames. ], tot_loss[loss=2.98, ArTop10Accuracy=0.736, over 7749.37 frames. ], batch size: 34, lr: 2.09e-02
80
+ 2024-08-06 09:23:19,247 INFO [trainer.py:765] (7/8) Epoch 5, batch 300, train_loss[loss=2.975, ArTop10Accuracy=0.737, over 14184.00 frames. ], tot_loss[loss=2.972, ArTop10Accuracy=0.7381, over 9371.28 frames. ], batch size: 44, lr: 2.08e-02
81
+ 2024-08-06 09:24:53,543 INFO [trainer.py:765] (7/8) Epoch 5, batch 400, train_loss[loss=2.977, ArTop10Accuracy=0.7381, over 10341.00 frames. ], tot_loss[loss=2.966, ArTop10Accuracy=0.7394, over 10277.25 frames. ], batch size: 14, lr: 2.07e-02
82
+ 2024-08-06 09:26:19,423 INFO [trainer.py:765] (7/8) Epoch 5, batch 500, train_loss[loss=2.844, ArTop10Accuracy=0.7612, over 12285.00 frames. ], tot_loss[loss=2.96, ArTop10Accuracy=0.7406, over 10834.88 frames. ], batch size: 22, lr: 2.06e-02
83
+ 2024-08-06 09:27:49,543 INFO [trainer.py:765] (7/8) Epoch 5, batch 600, train_loss[loss=2.869, ArTop10Accuracy=0.7593, over 11508.00 frames. ], tot_loss[loss=2.962, ArTop10Accuracy=0.7401, over 11349.06 frames. ], batch size: 18, lr: 2.05e-02
84
+ 2024-08-06 09:29:21,675 INFO [trainer.py:765] (7/8) Epoch 5, batch 700, train_loss[loss=2.854, ArTop10Accuracy=0.7634, over 10257.00 frames. ], tot_loss[loss=2.967, ArTop10Accuracy=0.7394, over 11495.18 frames. ], batch size: 12, lr: 2.04e-02
85
+ 2024-08-06 09:30:44,698 INFO [trainer.py:765] (7/8) Epoch 5, batch 800, train_loss[loss=2.925, ArTop10Accuracy=0.7477, over 10026.00 frames. ], tot_loss[loss=2.969, ArTop10Accuracy=0.7388, over 11633.89 frames. ], batch size: 12, lr: 2.03e-02
86
+ 2024-08-06 09:31:51,245 INFO [trainer.py:803] (7/8) Computing validation loss
87
+ 2024-08-06 09:32:00,762 INFO [trainer.py:811] (7/8) Epoch 5, validation: loss=2.926, ArTop10Accuracy=0.7466, over 1827537.00 frames.
88
+ 2024-08-06 09:32:00,763 INFO [trainer.py:814] (7/8) Maximum memory allocated so far is 33001MB
89
+ 2024-08-06 09:32:01,716 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.349e+02 1.525e+02 1.806e+02 1.007e+03, threshold=3.049e+02, percent-clipped=2.3
90
+ 2024-08-06 09:32:10,561 INFO [trainer.py:765] (7/8) Epoch 5, batch 900, train_loss[loss=2.964, ArTop10Accuracy=0.7428, over 12957.00 frames. ], tot_loss[loss=2.962, ArTop10Accuracy=0.7401, over 11679.29 frames. ], batch size: 27, lr: 2.02e-02
91
+ 2024-08-06 09:33:27,329 INFO [trainer.py:765] (7/8) Epoch 5, batch 1000, train_loss[loss=2.99, ArTop10Accuracy=0.7287, over 13071.00 frames. ], tot_loss[loss=2.962, ArTop10Accuracy=0.7401, over 11873.13 frames. ], batch size: 27, lr: 2.01e-02
92
+ 2024-08-06 09:34:42,306 INFO [trainer.py:765] (7/8) Epoch 5, batch 1100, train_loss[loss=2.925, ArTop10Accuracy=0.744, over 13614.00 frames. ], tot_loss[loss=2.962, ArTop10Accuracy=0.7402, over 11964.34 frames. ], batch size: 34, lr: 2.00e-02
93
+ 2024-08-06 09:35:56,337 INFO [trainer.py:765] (7/8) Epoch 5, batch 1200, train_loss[loss=3.056, ArTop10Accuracy=0.7254, over 12483.00 frames. ], tot_loss[loss=2.96, ArTop10Accuracy=0.7404, over 11842.33 frames. ], batch size: 101, lr: 1.99e-02
94
+ 2024-08-06 09:36:55,307 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
95
+ 2024-08-06 09:38:52,672 INFO [trainer.py:765] (7/8) Epoch 6, batch 100, train_loss[loss=2.971, ArTop10Accuracy=0.7363, over 14304.00 frames. ], tot_loss[loss=2.949, ArTop10Accuracy=0.7418, over 4755.22 frames. ], batch size: 62, lr: 1.85e-02
96
+ 2024-08-06 09:40:19,840 INFO [trainer.py:765] (7/8) Epoch 6, batch 200, train_loss[loss=2.939, ArTop10Accuracy=0.7429, over 13713.00 frames. ], tot_loss[loss=2.942, ArTop10Accuracy=0.7432, over 7740.92 frames. ], batch size: 34, lr: 1.84e-02
97
+ 2024-08-06 09:41:52,973 INFO [trainer.py:765] (7/8) Epoch 6, batch 300, train_loss[loss=2.971, ArTop10Accuracy=0.7397, over 14376.00 frames. ], tot_loss[loss=2.934, ArTop10Accuracy=0.7451, over 9374.66 frames. ], batch size: 45, lr: 1.83e-02
98
+ 2024-08-06 09:43:17,836 INFO [trainer.py:765] (7/8) Epoch 6, batch 400, train_loss[loss=2.886, ArTop10Accuracy=0.7542, over 10356.00 frames. ], tot_loss[loss=2.926, ArTop10Accuracy=0.7469, over 10317.75 frames. ], batch size: 14, lr: 1.83e-02
99
+ 2024-08-06 09:44:54,136 INFO [trainer.py:765] (7/8) Epoch 6, batch 500, train_loss[loss=2.996, ArTop10Accuracy=0.7306, over 12210.00 frames. ], tot_loss[loss=2.917, ArTop10Accuracy=0.7488, over 10868.63 frames. ], batch size: 22, lr: 1.82e-02
100
+ 2024-08-06 09:46:22,879 INFO [trainer.py:765] (7/8) Epoch 6, batch 600, train_loss[loss=2.871, ArTop10Accuracy=0.7539, over 11418.00 frames. ], tot_loss[loss=2.922, ArTop10Accuracy=0.7478, over 11372.97 frames. ], batch size: 18, lr: 1.81e-02
101
+ 2024-08-06 09:46:37,226 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.012e+02 1.339e+02 1.480e+02 1.701e+02 7.506e+02, threshold=2.959e+02, percent-clipped=1.1
102
+ 2024-08-06 09:47:57,878 INFO [trainer.py:765] (7/8) Epoch 6, batch 700, train_loss[loss=2.864, ArTop10Accuracy=0.7643, over 9345.00 frames. ], tot_loss[loss=2.927, ArTop10Accuracy=0.7467, over 11521.82 frames. ], batch size: 11, lr: 1.80e-02
103
+ 2024-08-06 09:49:15,961 INFO [trainer.py:765] (7/8) Epoch 6, batch 800, train_loss[loss=2.941, ArTop10Accuracy=0.7444, over 10086.00 frames. ], tot_loss[loss=2.925, ArTop10Accuracy=0.7472, over 11648.74 frames. ], batch size: 12, lr: 1.79e-02
104
+ 2024-08-06 09:50:32,141 INFO [trainer.py:765] (7/8) Epoch 6, batch 900, train_loss[loss=2.924, ArTop10Accuracy=0.7467, over 12903.00 frames. ], tot_loss[loss=2.921, ArTop10Accuracy=0.7481, over 11680.34 frames. ], batch size: 27, lr: 1.78e-02
105
+ 2024-08-06 09:51:47,308 INFO [trainer.py:765] (7/8) Epoch 6, batch 1000, train_loss[loss=2.873, ArTop10Accuracy=0.7545, over 12843.00 frames. ], tot_loss[loss=2.923, ArTop10Accuracy=0.7476, over 11881.10 frames. ], batch size: 27, lr: 1.77e-02
106
+ 2024-08-06 09:53:00,927 INFO [trainer.py:765] (7/8) Epoch 6, batch 1100, train_loss[loss=2.894, ArTop10Accuracy=0.7513, over 14028.00 frames. ], tot_loss[loss=2.927, ArTop10Accuracy=0.7467, over 11949.15 frames. ], batch size: 35, lr: 1.77e-02
107
+ 2024-08-06 09:54:14,343 INFO [trainer.py:765] (7/8) Epoch 6, batch 1200, train_loss[loss=3.005, ArTop10Accuracy=0.7324, over 12450.00 frames. ], tot_loss[loss=2.925, ArTop10Accuracy=0.7471, over 11859.26 frames. ], batch size: 101, lr: 1.76e-02
108
+ 2024-08-06 09:55:13,167 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
109
+ 2024-08-06 09:57:06,705 INFO [trainer.py:765] (7/8) Epoch 7, batch 100, train_loss[loss=2.988, ArTop10Accuracy=0.7343, over 14619.00 frames. ], tot_loss[loss=2.914, ArTop10Accuracy=0.7489, over 4754.07 frames. ], batch size: 62, lr: 1.64e-02
110
+ 2024-08-06 09:58:39,433 INFO [trainer.py:765] (7/8) Epoch 7, batch 200, train_loss[loss=2.872, ArTop10Accuracy=0.7554, over 13872.00 frames. ], tot_loss[loss=2.906, ArTop10Accuracy=0.7506, over 7746.71 frames. ], batch size: 35, lr: 1.64e-02
111
+ 2024-08-06 10:00:06,089 INFO [trainer.py:765] (7/8) Epoch 7, batch 300, train_loss[loss=2.922, ArTop10Accuracy=0.7488, over 14031.00 frames. ], tot_loss[loss=2.899, ArTop10Accuracy=0.7517, over 9369.72 frames. ], batch size: 44, lr: 1.63e-02
112
+ 2024-08-06 10:00:40,515 INFO [trainer.py:803] (7/8) Computing validation loss
113
+ 2024-08-06 10:00:50,245 INFO [trainer.py:811] (7/8) Epoch 7, validation: loss=2.88, ArTop10Accuracy=0.7554, over 1827537.00 frames.
114
+ 2024-08-06 10:00:50,246 INFO [trainer.py:814] (7/8) Maximum memory allocated so far is 33001MB
115
+ 2024-08-06 10:00:50,983 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.002e+02 1.286e+02 1.429e+02 1.605e+02 1.020e+03, threshold=2.857e+02, percent-clipped=1.5
116
+ 2024-08-06 10:01:49,123 INFO [trainer.py:765] (7/8) Epoch 7, batch 400, train_loss[loss=2.751, ArTop10Accuracy=0.7792, over 10182.00 frames. ], tot_loss[loss=2.894, ArTop10Accuracy=0.7523, over 10267.82 frames. ], batch size: 14, lr: 1.62e-02
117
+ 2024-08-06 10:03:21,465 INFO [trainer.py:765] (7/8) Epoch 7, batch 500, train_loss[loss=2.903, ArTop10Accuracy=0.7533, over 12147.00 frames. ], tot_loss[loss=2.891, ArTop10Accuracy=0.7532, over 10812.25 frames. ], batch size: 22, lr: 1.61e-02
118
+ 2024-08-06 10:04:51,890 INFO [trainer.py:765] (7/8) Epoch 7, batch 600, train_loss[loss=2.828, ArTop10Accuracy=0.7683, over 11442.00 frames. ], tot_loss[loss=2.893, ArTop10Accuracy=0.753, over 11357.88 frames. ], batch size: 18, lr: 1.61e-02
119
+ 2024-08-06 10:06:25,118 INFO [trainer.py:765] (7/8) Epoch 7, batch 700, train_loss[loss=2.884, ArTop10Accuracy=0.7584, over 10107.00 frames. ], tot_loss[loss=2.899, ArTop10Accuracy=0.7519, over 11531.01 frames. ], batch size: 12, lr: 1.60e-02
120
+ 2024-08-06 10:07:46,957 INFO [trainer.py:765] (7/8) Epoch 7, batch 800, train_loss[loss=2.865, ArTop10Accuracy=0.7595, over 10149.00 frames. ], tot_loss[loss=2.898, ArTop10Accuracy=0.752, over 11660.33 frames. ], batch size: 12, lr: 1.59e-02
121
+ 2024-08-06 10:09:02,830 INFO [trainer.py:765] (7/8) Epoch 7, batch 900, train_loss[loss=2.844, ArTop10Accuracy=0.7642, over 13023.00 frames. ], tot_loss[loss=2.892, ArTop10Accuracy=0.7533, over 11695.51 frames. ], batch size: 27, lr: 1.59e-02
122
+ 2024-08-06 10:10:19,642 INFO [trainer.py:765] (7/8) Epoch 7, batch 1000, train_loss[loss=2.875, ArTop10Accuracy=0.7589, over 13020.00 frames. ], tot_loss[loss=2.893, ArTop10Accuracy=0.753, over 11897.82 frames. ], batch size: 27, lr: 1.58e-02
123
+ 2024-08-06 10:11:35,214 INFO [trainer.py:765] (7/8) Epoch 7, batch 1100, train_loss[loss=2.877, ArTop10Accuracy=0.76, over 13656.00 frames. ], tot_loss[loss=2.903, ArTop10Accuracy=0.751, over 11971.94 frames. ], batch size: 34, lr: 1.57e-02
124
+ 2024-08-06 10:12:48,210 INFO [trainer.py:765] (7/8) Epoch 7, batch 1200, train_loss[loss=3.014, ArTop10Accuracy=0.7296, over 11841.00 frames. ], tot_loss[loss=2.901, ArTop10Accuracy=0.7513, over 11848.41 frames. ], batch size: 101, lr: 1.57e-02
125
+ 2024-08-06 10:13:46,715 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
126
+ 2024-08-06 10:15:03,607 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.017e+02 1.283e+02 1.410e+02 1.601e+02 1.017e+03, threshold=2.820e+02, percent-clipped=0.9
127
+ 2024-08-06 10:15:40,827 INFO [trainer.py:765] (7/8) Epoch 8, batch 100, train_loss[loss=2.945, ArTop10Accuracy=0.7462, over 14433.00 frames. ], tot_loss[loss=2.896, ArTop10Accuracy=0.7518, over 4743.12 frames. ], batch size: 62, lr: 1.47e-02
128
+ 2024-08-06 10:17:12,869 INFO [trainer.py:765] (7/8) Epoch 8, batch 200, train_loss[loss=2.804, ArTop10Accuracy=0.773, over 13626.00 frames. ], tot_loss[loss=2.878, ArTop10Accuracy=0.7555, over 7739.33 frames. ], batch size: 34, lr: 1.46e-02
129
+ 2024-08-06 10:18:37,904 INFO [trainer.py:765] (7/8) Epoch 8, batch 300, train_loss[loss=2.89, ArTop10Accuracy=0.7555, over 14010.00 frames. ], tot_loss[loss=2.87, ArTop10Accuracy=0.7572, over 9379.75 frames. ], batch size: 44, lr: 1.46e-02
130
+ 2024-08-06 10:20:06,347 INFO [trainer.py:765] (7/8) Epoch 8, batch 400, train_loss[loss=2.805, ArTop10Accuracy=0.7662, over 10131.00 frames. ], tot_loss[loss=2.867, ArTop10Accuracy=0.7579, over 10274.71 frames. ], batch size: 14, lr: 1.45e-02
131
+ 2024-08-06 10:21:32,417 INFO [trainer.py:765] (7/8) Epoch 8, batch 500, train_loss[loss=2.796, ArTop10Accuracy=0.7714, over 12276.00 frames. ], tot_loss[loss=2.862, ArTop10Accuracy=0.7587, over 10839.39 frames. ], batch size: 22, lr: 1.45e-02
132
+ 2024-08-06 10:23:00,980 INFO [trainer.py:765] (7/8) Epoch 8, batch 600, train_loss[loss=2.787, ArTop10Accuracy=0.7728, over 11571.00 frames. ], tot_loss[loss=2.86, ArTop10Accuracy=0.7592, over 11359.98 frames. ], batch size: 18, lr: 1.44e-02
133
+ 2024-08-06 10:24:37,793 INFO [trainer.py:765] (7/8) Epoch 8, batch 700, train_loss[loss=2.754, ArTop10Accuracy=0.7844, over 10128.00 frames. ], tot_loss[loss=2.866, ArTop10Accuracy=0.7583, over 11502.93 frames. ], batch size: 12, lr: 1.43e-02
134
+ 2024-08-06 10:25:56,094 INFO [trainer.py:765] (7/8) Epoch 8, batch 800, train_loss[loss=2.865, ArTop10Accuracy=0.7526, over 9339.00 frames. ], tot_loss[loss=2.87, ArTop10Accuracy=0.7573, over 11619.76 frames. ], batch size: 11, lr: 1.43e-02
135
+ 2024-08-06 10:27:12,252 INFO [trainer.py:765] (7/8) Epoch 8, batch 900, train_loss[loss=2.84, ArTop10Accuracy=0.7642, over 13110.00 frames. ], tot_loss[loss=2.866, ArTop10Accuracy=0.758, over 11686.94 frames. ], batch size: 27, lr: 1.42e-02
136
+ 2024-08-06 10:28:25,269 INFO [trainer.py:765] (7/8) Epoch 8, batch 1000, train_loss[loss=2.851, ArTop10Accuracy=0.7588, over 12954.00 frames. ], tot_loss[loss=2.869, ArTop10Accuracy=0.7574, over 11893.33 frames. ], batch size: 27, lr: 1.42e-02
137
+ 2024-08-06 10:29:07,161 INFO [trainer.py:803] (7/8) Computing validation loss
138
+ 2024-08-06 10:29:16,830 INFO [trainer.py:811] (7/8) Epoch 8, validation: loss=2.858, ArTop10Accuracy=0.7594, over 1827537.00 frames.
139
+ 2024-08-06 10:29:16,831 INFO [trainer.py:814] (7/8) Maximum memory allocated so far is 33001MB
140
+ 2024-08-06 10:29:17,497 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.275e+02 1.390e+02 1.547e+02 3.717e+02, threshold=2.781e+02, percent-clipped=0.7
141
+ 2024-08-06 10:29:51,738 INFO [trainer.py:765] (7/8) Epoch 8, batch 1100, train_loss[loss=2.876, ArTop10Accuracy=0.753, over 13728.00 frames. ], tot_loss[loss=2.877, ArTop10Accuracy=0.756, over 11966.78 frames. ], batch size: 34, lr: 1.41e-02
142
+ 2024-08-06 10:31:05,955 INFO [trainer.py:765] (7/8) Epoch 8, batch 1200, train_loss[loss=2.958, ArTop10Accuracy=0.7403, over 12834.00 frames. ], tot_loss[loss=2.877, ArTop10Accuracy=0.756, over 11882.51 frames. ], batch size: 103, lr: 1.40e-02
143
+ 2024-08-06 10:32:05,689 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
144
+ 2024-08-06 10:34:01,263 INFO [trainer.py:765] (7/8) Epoch 9, batch 100, train_loss[loss=2.89, ArTop10Accuracy=0.7558, over 14211.00 frames. ], tot_loss[loss=2.861, ArTop10Accuracy=0.7584, over 4769.26 frames. ], batch size: 62, lr: 1.32e-02
145
+ 2024-08-06 10:35:31,779 INFO [trainer.py:765] (7/8) Epoch 9, batch 200, train_loss[loss=2.84, ArTop10Accuracy=0.7597, over 13761.00 frames. ], tot_loss[loss=2.853, ArTop10Accuracy=0.7601, over 7753.37 frames. ], batch size: 34, lr: 1.32e-02
146
+ 2024-08-06 10:36:57,933 INFO [trainer.py:765] (7/8) Epoch 9, batch 300, train_loss[loss=2.919, ArTop10Accuracy=0.7475, over 14613.00 frames. ], tot_loss[loss=2.846, ArTop10Accuracy=0.7615, over 9383.26 frames. ], batch size: 44, lr: 1.31e-02
147
+ 2024-08-06 10:38:32,706 INFO [trainer.py:765] (7/8) Epoch 9, batch 400, train_loss[loss=2.867, ArTop10Accuracy=0.7587, over 10458.00 frames. ], tot_loss[loss=2.843, ArTop10Accuracy=0.7625, over 10292.17 frames. ], batch size: 14, lr: 1.31e-02
148
+ 2024-08-06 10:39:59,263 INFO [trainer.py:765] (7/8) Epoch 9, batch 500, train_loss[loss=2.778, ArTop10Accuracy=0.7717, over 12252.00 frames. ], tot_loss[loss=2.839, ArTop10Accuracy=0.7633, over 10865.89 frames. ], batch size: 22, lr: 1.30e-02
149
+ 2024-08-06 10:41:29,697 INFO [trainer.py:765] (7/8) Epoch 9, batch 600, train_loss[loss=2.849, ArTop10Accuracy=0.7639, over 11511.00 frames. ], tot_loss[loss=2.843, ArTop10Accuracy=0.7624, over 11375.79 frames. ], batch size: 18, lr: 1.30e-02
150
+ 2024-08-06 10:42:58,448 INFO [trainer.py:765] (7/8) Epoch 9, batch 700, train_loss[loss=2.758, ArTop10Accuracy=0.7781, over 10119.00 frames. ], tot_loss[loss=2.846, ArTop10Accuracy=0.7617, over 11528.05 frames. ], batch size: 12, lr: 1.29e-02
151
+ 2024-08-06 10:44:02,958 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.039e+02 1.253e+02 1.352e+02 1.493e+02 7.010e+02, threshold=2.704e+02, percent-clipped=0.6
152
+ 2024-08-06 10:44:19,676 INFO [trainer.py:765] (7/8) Epoch 9, batch 800, train_loss[loss=2.705, ArTop10Accuracy=0.7909, over 10119.00 frames. ], tot_loss[loss=2.849, ArTop10Accuracy=0.7611, over 11632.30 frames. ], batch size: 12, lr: 1.29e-02
153
+ 2024-08-06 10:45:35,729 INFO [trainer.py:765] (7/8) Epoch 9, batch 900, train_loss[loss=2.831, ArTop10Accuracy=0.7652, over 12933.00 frames. ], tot_loss[loss=2.843, ArTop10Accuracy=0.7623, over 11672.27 frames. ], batch size: 27, lr: 1.28e-02
154
+ 2024-08-06 10:46:51,278 INFO [trainer.py:765] (7/8) Epoch 9, batch 1000, train_loss[loss=2.873, ArTop10Accuracy=0.753, over 13110.00 frames. ], tot_loss[loss=2.85, ArTop10Accuracy=0.7612, over 11899.80 frames. ], batch size: 27, lr: 1.28e-02
155
+ 2024-08-06 10:48:06,254 INFO [trainer.py:765] (7/8) Epoch 9, batch 1100, train_loss[loss=2.87, ArTop10Accuracy=0.7525, over 13704.00 frames. ], tot_loss[loss=2.854, ArTop10Accuracy=0.7602, over 11962.63 frames. ], batch size: 34, lr: 1.28e-02
156
+ 2024-08-06 10:49:21,061 INFO [trainer.py:765] (7/8) Epoch 9, batch 1200, train_loss[loss=2.96, ArTop10Accuracy=0.738, over 12012.00 frames. ], tot_loss[loss=2.853, ArTop10Accuracy=0.7602, over 11866.13 frames. ], batch size: 101, lr: 1.27e-02
157
+ 2024-08-06 10:50:21,919 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
158
+ 2024-08-06 10:52:12,333 INFO [trainer.py:765] (7/8) Epoch 10, batch 100, train_loss[loss=2.963, ArTop10Accuracy=0.7393, over 14571.00 frames. ], tot_loss[loss=2.845, ArTop10Accuracy=0.7611, over 4761.85 frames. ], batch size: 63, lr: 1.20e-02
159
+ 2024-08-06 10:53:44,592 INFO [trainer.py:765] (7/8) Epoch 10, batch 200, train_loss[loss=2.862, ArTop10Accuracy=0.7616, over 13656.00 frames. ], tot_loss[loss=2.838, ArTop10Accuracy=0.7631, over 7751.64 frames. ], batch size: 34, lr: 1.20e-02
160
+ 2024-08-06 10:55:08,097 INFO [trainer.py:765] (7/8) Epoch 10, batch 300, train_loss[loss=2.885, ArTop10Accuracy=0.7585, over 14229.00 frames. ], tot_loss[loss=2.831, ArTop10Accuracy=0.7644, over 9362.96 frames. ], batch size: 44, lr: 1.19e-02
161
+ 2024-08-06 10:56:41,184 INFO [trainer.py:765] (7/8) Epoch 10, batch 400, train_loss[loss=2.817, ArTop10Accuracy=0.7678, over 10332.00 frames. ], tot_loss[loss=2.829, ArTop10Accuracy=0.7651, over 10282.26 frames. ], batch size: 14, lr: 1.19e-02
162
+ 2024-08-06 10:58:04,944 INFO [trainer.py:803] (7/8) Computing validation loss
163
+ 2024-08-06 10:58:14,560 INFO [trainer.py:811] (7/8) Epoch 10, validation: loss=2.842, ArTop10Accuracy=0.7624, over 1827537.00 frames.
164
+ 2024-08-06 10:58:14,560 INFO [trainer.py:814] (7/8) Maximum memory allocated so far is 33001MB
165
+ 2024-08-06 10:58:15,580 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.228e+02 1.320e+02 1.458e+02 6.096e+02, threshold=2.641e+02, percent-clipped=0.6
166
+ 2024-08-06 10:58:15,587 INFO [trainer.py:765] (7/8) Epoch 10, batch 500, train_loss[loss=2.76, ArTop10Accuracy=0.7781, over 12621.00 frames. ], tot_loss[loss=2.824, ArTop10Accuracy=0.7657, over 10826.79 frames. ], batch size: 23, lr: 1.19e-02
167
+ 2024-08-06 10:59:42,823 INFO [trainer.py:765] (7/8) Epoch 10, batch 600, train_loss[loss=2.799, ArTop10Accuracy=0.7657, over 11370.00 frames. ], tot_loss[loss=2.827, ArTop10Accuracy=0.7653, over 11339.91 frames. ], batch size: 18, lr: 1.18e-02
168
+ 2024-08-06 11:01:18,113 INFO [trainer.py:765] (7/8) Epoch 10, batch 700, train_loss[loss=2.74, ArTop10Accuracy=0.7784, over 10212.00 frames. ], tot_loss[loss=2.832, ArTop10Accuracy=0.7643, over 11501.92 frames. ], batch size: 12, lr: 1.18e-02
169
+ 2024-08-06 11:02:36,923 INFO [trainer.py:765] (7/8) Epoch 10, batch 800, train_loss[loss=2.903, ArTop10Accuracy=0.7523, over 10266.00 frames. ], tot_loss[loss=2.837, ArTop10Accuracy=0.7636, over 11651.31 frames. ], batch size: 12, lr: 1.17e-02
170
+ 2024-08-06 11:03:51,218 INFO [trainer.py:765] (7/8) Epoch 10, batch 900, train_loss[loss=2.735, ArTop10Accuracy=0.7816, over 13029.00 frames. ], tot_loss[loss=2.828, ArTop10Accuracy=0.7653, over 11691.04 frames. ], batch size: 27, lr: 1.17e-02
171
+ 2024-08-06 11:05:06,357 INFO [trainer.py:765] (7/8) Epoch 10, batch 1000, train_loss[loss=2.796, ArTop10Accuracy=0.767, over 12981.00 frames. ], tot_loss[loss=2.832, ArTop10Accuracy=0.7644, over 11907.49 frames. ], batch size: 27, lr: 1.17e-02
172
+ 2024-08-06 11:06:21,730 INFO [trainer.py:765] (7/8) Epoch 10, batch 1100, train_loss[loss=2.942, ArTop10Accuracy=0.7401, over 13641.00 frames. ], tot_loss[loss=2.836, ArTop10Accuracy=0.7638, over 11975.93 frames. ], batch size: 34, lr: 1.16e-02
173
+ 2024-08-06 11:07:34,778 INFO [trainer.py:765] (7/8) Epoch 10, batch 1200, train_loss[loss=2.959, ArTop10Accuracy=0.7386, over 12051.00 frames. ], tot_loss[loss=2.837, ArTop10Accuracy=0.7635, over 11872.57 frames. ], batch size: 103, lr: 1.16e-02
174
+ 2024-08-06 11:08:34,013 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
175
+ 2024-08-06 11:10:29,961 INFO [trainer.py:765] (7/8) Epoch 11, batch 100, train_loss[loss=2.832, ArTop10Accuracy=0.7628, over 14784.00 frames. ], tot_loss[loss=2.819, ArTop10Accuracy=0.766, over 4757.71 frames. ], batch size: 63, lr: 1.10e-02
176
+ 2024-08-06 11:12:04,680 INFO [trainer.py:765] (7/8) Epoch 11, batch 200, train_loss[loss=2.848, ArTop10Accuracy=0.7591, over 13668.00 frames. ], tot_loss[loss=2.812, ArTop10Accuracy=0.7679, over 7745.12 frames. ], batch size: 34, lr: 1.10e-02
177
+ 2024-08-06 11:12:22,833 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 9.884e+01 1.240e+02 1.333e+02 1.457e+02 6.939e+02, threshold=2.667e+02, percent-clipped=0.1
178
+ 2024-08-06 11:13:31,557 INFO [trainer.py:765] (7/8) Epoch 11, batch 300, train_loss[loss=2.783, ArTop10Accuracy=0.7766, over 14295.00 frames. ], tot_loss[loss=2.805, ArTop10Accuracy=0.7693, over 9379.76 frames. ], batch size: 44, lr: 1.09e-02
179
+ 2024-08-06 11:15:03,276 INFO [trainer.py:765] (7/8) Epoch 11, batch 400, train_loss[loss=2.744, ArTop10Accuracy=0.7853, over 10455.00 frames. ], tot_loss[loss=2.804, ArTop10Accuracy=0.7697, over 10293.17 frames. ], batch size: 14, lr: 1.09e-02
180
+ 2024-08-06 11:16:29,644 INFO [trainer.py:765] (7/8) Epoch 11, batch 500, train_loss[loss=2.775, ArTop10Accuracy=0.7748, over 12375.00 frames. ], tot_loss[loss=2.801, ArTop10Accuracy=0.7703, over 10869.57 frames. ], batch size: 22, lr: 1.09e-02
181
+ 2024-08-06 11:18:00,524 INFO [trainer.py:765] (7/8) Epoch 11, batch 600, train_loss[loss=2.686, ArTop10Accuracy=0.789, over 11559.00 frames. ], tot_loss[loss=2.803, ArTop10Accuracy=0.7701, over 11358.60 frames. ], batch size: 18, lr: 1.08e-02
182
+ 2024-08-06 11:19:34,519 INFO [trainer.py:765] (7/8) Epoch 11, batch 700, train_loss[loss=2.631, ArTop10Accuracy=0.8006, over 9285.00 frames. ], tot_loss[loss=2.809, ArTop10Accuracy=0.7687, over 11492.14 frames. ], batch size: 11, lr: 1.08e-02
183
+ 2024-08-06 11:20:55,489 INFO [trainer.py:765] (7/8) Epoch 11, batch 800, train_loss[loss=2.737, ArTop10Accuracy=0.7849, over 10227.00 frames. ], tot_loss[loss=2.816, ArTop10Accuracy=0.7675, over 11631.71 frames. ], batch size: 12, lr: 1.07e-02
184
+ 2024-08-06 11:22:13,711 INFO [trainer.py:765] (7/8) Epoch 11, batch 900, train_loss[loss=2.9, ArTop10Accuracy=0.7485, over 13065.00 frames. ], tot_loss[loss=2.812, ArTop10Accuracy=0.7683, over 11678.33 frames. ], batch size: 27, lr: 1.07e-02
185
+ 2024-08-06 11:23:31,805 INFO [trainer.py:765] (7/8) Epoch 11, batch 1000, train_loss[loss=2.812, ArTop10Accuracy=0.767, over 13032.00 frames. ], tot_loss[loss=2.816, ArTop10Accuracy=0.7674, over 11884.80 frames. ], batch size: 27, lr: 1.07e-02
186
+ 2024-08-06 11:24:46,908 INFO [trainer.py:765] (7/8) Epoch 11, batch 1100, train_loss[loss=2.843, ArTop10Accuracy=0.7617, over 13743.00 frames. ], tot_loss[loss=2.824, ArTop10Accuracy=0.7661, over 11952.21 frames. ], batch size: 34, lr: 1.06e-02
187
+ 2024-08-06 11:26:00,740 INFO [trainer.py:765] (7/8) Epoch 11, batch 1200, train_loss[loss=2.971, ArTop10Accuracy=0.7319, over 11898.00 frames. ], tot_loss[loss=2.827, ArTop10Accuracy=0.7654, over 11844.39 frames. ], batch size: 101, lr: 1.06e-02
188
+ 2024-08-06 11:26:15,853 INFO [trainer.py:803] (7/8) Computing validation loss
189
+ 2024-08-06 11:26:25,556 INFO [trainer.py:811] (7/8) Epoch 11, validation: loss=2.831, ArTop10Accuracy=0.7643, over 1827537.00 frames.
190
+ 2024-08-06 11:26:25,556 INFO [trainer.py:814] (7/8) Maximum memory allocated so far is 33001MB
191
+ 2024-08-06 11:26:26,191 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.251e+02 1.335e+02 1.441e+02 2.942e+02, threshold=2.669e+02, percent-clipped=0.1
192
+ 2024-08-06 11:27:09,788 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
193
+ 2024-08-06 11:29:03,457 INFO [trainer.py:765] (7/8) Epoch 12, batch 100, train_loss[loss=2.906, ArTop10Accuracy=0.7525, over 14562.00 frames. ], tot_loss[loss=2.804, ArTop10Accuracy=0.7693, over 4755.02 frames. ], batch size: 62, lr: 1.01e-02
194
+ 2024-08-06 11:30:30,680 INFO [trainer.py:765] (7/8) Epoch 12, batch 200, train_loss[loss=2.829, ArTop10Accuracy=0.7614, over 13722.00 frames. ], tot_loss[loss=2.796, ArTop10Accuracy=0.771, over 7750.22 frames. ], batch size: 34, lr: 1.01e-02
195
+ 2024-08-06 11:31:57,661 INFO [trainer.py:765] (7/8) Epoch 12, batch 300, train_loss[loss=2.844, ArTop10Accuracy=0.7598, over 14127.00 frames. ], tot_loss[loss=2.796, ArTop10Accuracy=0.7709, over 9370.85 frames. ], batch size: 45, lr: 1.01e-02
196
+ 2024-08-06 11:33:30,744 INFO [trainer.py:765] (7/8) Epoch 12, batch 400, train_loss[loss=2.777, ArTop10Accuracy=0.7773, over 10335.00 frames. ], tot_loss[loss=2.794, ArTop10Accuracy=0.7714, over 10290.92 frames. ], batch size: 14, lr: 1.00e-02
197
+ 2024-08-06 11:34:55,741 INFO [trainer.py:765] (7/8) Epoch 12, batch 500, train_loss[loss=2.726, ArTop10Accuracy=0.7936, over 12180.00 frames. ], tot_loss[loss=2.787, ArTop10Accuracy=0.7728, over 10849.09 frames. ], batch size: 22, lr: 1.00e-02
198
+ 2024-08-06 11:36:29,367 INFO [trainer.py:765] (7/8) Epoch 12, batch 600, train_loss[loss=2.77, ArTop10Accuracy=0.7734, over 11346.00 frames. ], tot_loss[loss=2.795, ArTop10Accuracy=0.7714, over 11371.49 frames. ], batch size: 18, lr: 9.97e-03
199
+ 2024-08-06 11:38:00,350 INFO [trainer.py:765] (7/8) Epoch 12, batch 700, train_loss[loss=2.692, ArTop10Accuracy=0.7974, over 9297.00 frames. ], tot_loss[loss=2.805, ArTop10Accuracy=0.7696, over 11504.84 frames. ], batch size: 11, lr: 9.93e-03
200
+ 2024-08-06 11:39:23,617 INFO [trainer.py:765] (7/8) Epoch 12, batch 800, train_loss[loss=2.756, ArTop10Accuracy=0.7772, over 10002.00 frames. ], tot_loss[loss=2.807, ArTop10Accuracy=0.7691, over 11641.67 frames. ], batch size: 12, lr: 9.90e-03
201
+ 2024-08-06 11:40:39,895 INFO [trainer.py:765] (7/8) Epoch 12, batch 900, train_loss[loss=2.87, ArTop10Accuracy=0.7533, over 13086.00 frames. ], tot_loss[loss=2.801, ArTop10Accuracy=0.7701, over 11681.88 frames. ], batch size: 27, lr: 9.87e-03
202
+ 2024-08-06 11:41:14,001 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.248e+02 1.348e+02 1.459e+02 5.540e+02, threshold=2.695e+02, percent-clipped=0.3
203
+ 2024-08-06 11:41:56,195 INFO [trainer.py:765] (7/8) Epoch 12, batch 1000, train_loss[loss=2.782, ArTop10Accuracy=0.7761, over 12906.00 frames. ], tot_loss[loss=2.804, ArTop10Accuracy=0.7696, over 11878.21 frames. ], batch size: 27, lr: 9.85e-03
204
+ 2024-08-06 11:43:14,326 INFO [trainer.py:765] (7/8) Epoch 12, batch 1100, train_loss[loss=2.795, ArTop10Accuracy=0.7758, over 13584.00 frames. ], tot_loss[loss=2.808, ArTop10Accuracy=0.7688, over 11961.32 frames. ], batch size: 34, lr: 9.82e-03
205
+ 2024-08-06 11:44:26,162 INFO [trainer.py:765] (7/8) Epoch 12, batch 1200, train_loss[loss=2.877, ArTop10Accuracy=0.751, over 12174.00 frames. ], tot_loss[loss=2.805, ArTop10Accuracy=0.7697, over 11871.83 frames. ], batch size: 101, lr: 9.79e-03
206
+ 2024-08-06 11:45:26,840 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
207
+ 2024-08-06 11:47:26,608 INFO [trainer.py:765] (7/8) Epoch 13, batch 100, train_loss[loss=2.856, ArTop10Accuracy=0.7638, over 14538.00 frames. ], tot_loss[loss=2.788, ArTop10Accuracy=0.7722, over 4752.84 frames. ], batch size: 63, lr: 9.37e-03
208
+ 2024-08-06 11:48:54,784 INFO [trainer.py:765] (7/8) Epoch 13, batch 200, train_loss[loss=2.783, ArTop10Accuracy=0.7744, over 13587.00 frames. ], tot_loss[loss=2.785, ArTop10Accuracy=0.7726, over 7712.35 frames. ], batch size: 34, lr: 9.34e-03
209
+ 2024-08-06 11:50:20,521 INFO [trainer.py:765] (7/8) Epoch 13, batch 300, train_loss[loss=2.848, ArTop10Accuracy=0.7612, over 14127.00 frames. ], tot_loss[loss=2.78, ArTop10Accuracy=0.7738, over 9346.13 frames. ], batch size: 44, lr: 9.31e-03
210
+ 2024-08-06 11:51:48,770 INFO [trainer.py:765] (7/8) Epoch 13, batch 400, train_loss[loss=2.655, ArTop10Accuracy=0.7974, over 10497.00 frames. ], tot_loss[loss=2.778, ArTop10Accuracy=0.7744, over 10273.88 frames. ], batch size: 14, lr: 9.28e-03
211
+ 2024-08-06 11:53:13,413 INFO [trainer.py:765] (7/8) Epoch 13, batch 500, train_loss[loss=2.696, ArTop10Accuracy=0.7907, over 12237.00 frames. ], tot_loss[loss=2.774, ArTop10Accuracy=0.7752, over 10830.80 frames. ], batch size: 22, lr: 9.26e-03
212
+ 2024-08-06 11:54:52,228 INFO [trainer.py:765] (7/8) Epoch 13, batch 600, train_loss[loss=2.751, ArTop10Accuracy=0.7839, over 11358.00 frames. ], tot_loss[loss=2.776, ArTop10Accuracy=0.7748, over 11350.36 frames. ], batch size: 18, lr: 9.23e-03
213
+ 2024-08-06 11:55:47,087 INFO [trainer.py:803] (7/8) Computing validation loss
214
+ 2024-08-06 11:55:56,834 INFO [trainer.py:811] (7/8) Epoch 13, validation: loss=2.824, ArTop10Accuracy=0.7662, over 1827537.00 frames.
215
+ 2024-08-06 11:55:56,835 INFO [trainer.py:814] (7/8) Maximum memory allocated so far is 33001MB
216
+ 2024-08-06 11:55:57,718 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.255e+02 1.343e+02 1.452e+02 4.888e+02, threshold=2.687e+02, percent-clipped=0.1
217
+ 2024-08-06 11:56:28,470 INFO [trainer.py:765] (7/8) Epoch 13, batch 700, train_loss[loss=2.716, ArTop10Accuracy=0.7877, over 9372.00 frames. ], tot_loss[loss=2.778, ArTop10Accuracy=0.7744, over 11502.85 frames. ], batch size: 11, lr: 9.20e-03
218
+ 2024-08-06 11:57:46,688 INFO [trainer.py:765] (7/8) Epoch 13, batch 800, train_loss[loss=2.765, ArTop10Accuracy=0.7779, over 9270.00 frames. ], tot_loss[loss=2.781, ArTop10Accuracy=0.7737, over 11637.14 frames. ], batch size: 11, lr: 9.18e-03
219
+ 2024-08-06 11:59:03,294 INFO [trainer.py:765] (7/8) Epoch 13, batch 900, train_loss[loss=2.746, ArTop10Accuracy=0.7813, over 12927.00 frames. ], tot_loss[loss=2.779, ArTop10Accuracy=0.7742, over 11686.24 frames. ], batch size: 27, lr: 9.15e-03
220
+ 2024-08-06 12:00:19,179 INFO [trainer.py:765] (7/8) Epoch 13, batch 1000, train_loss[loss=2.763, ArTop10Accuracy=0.784, over 12996.00 frames. ], tot_loss[loss=2.787, ArTop10Accuracy=0.7727, over 11882.38 frames. ], batch size: 27, lr: 9.13e-03
221
+ 2024-08-06 12:01:34,888 INFO [trainer.py:765] (7/8) Epoch 13, batch 1100, train_loss[loss=2.788, ArTop10Accuracy=0.7706, over 13569.00 frames. ], tot_loss[loss=2.794, ArTop10Accuracy=0.7711, over 11950.05 frames. ], batch size: 34, lr: 9.10e-03
222
+ 2024-08-06 12:02:48,668 INFO [trainer.py:765] (7/8) Epoch 13, batch 1200, train_loss[loss=2.882, ArTop10Accuracy=0.7557, over 12348.00 frames. ], tot_loss[loss=2.794, ArTop10Accuracy=0.7714, over 11841.55 frames. ], batch size: 101, lr: 9.08e-03
223
+ 2024-08-06 12:03:48,262 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
224
+ 2024-08-06 12:05:45,342 INFO [trainer.py:765] (7/8) Epoch 14, batch 100, train_loss[loss=2.801, ArTop10Accuracy=0.7691, over 14286.00 frames. ], tot_loss[loss=2.774, ArTop10Accuracy=0.7744, over 4789.12 frames. ], batch size: 62, lr: 8.71e-03
225
+ 2024-08-06 12:07:16,612 INFO [trainer.py:765] (7/8) Epoch 14, batch 200, train_loss[loss=2.759, ArTop10Accuracy=0.7766, over 13548.00 frames. ], tot_loss[loss=2.77, ArTop10Accuracy=0.7754, over 7763.87 frames. ], batch size: 34, lr: 8.69e-03
226
+ 2024-08-06 12:08:44,319 INFO [trainer.py:765] (7/8) Epoch 14, batch 300, train_loss[loss=2.798, ArTop10Accuracy=0.7739, over 14325.00 frames. ], tot_loss[loss=2.763, ArTop10Accuracy=0.7772, over 9404.43 frames. ], batch size: 45, lr: 8.66e-03
227
+ 2024-08-06 12:10:01,138 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.266e+02 1.374e+02 1.483e+02 6.480e+02, threshold=2.748e+02, percent-clipped=0.2
228
+ 2024-08-06 12:10:10,233 INFO [trainer.py:765] (7/8) Epoch 14, batch 400, train_loss[loss=2.795, ArTop10Accuracy=0.7679, over 10785.00 frames. ], tot_loss[loss=2.764, ArTop10Accuracy=0.777, over 10299.80 frames. ], batch size: 15, lr: 8.64e-03
229
+ 2024-08-06 12:11:36,157 INFO [trainer.py:765] (7/8) Epoch 14, batch 500, train_loss[loss=2.688, ArTop10Accuracy=0.7904, over 12291.00 frames. ], tot_loss[loss=2.757, ArTop10Accuracy=0.7783, over 10862.13 frames. ], batch size: 22, lr: 8.62e-03
230
+ 2024-08-06 12:13:05,999 INFO [trainer.py:765] (7/8) Epoch 14, batch 600, train_loss[loss=2.724, ArTop10Accuracy=0.7831, over 11988.00 frames. ], tot_loss[loss=2.762, ArTop10Accuracy=0.7775, over 11389.76 frames. ], batch size: 19, lr: 8.59e-03
231
+ 2024-08-06 12:14:38,559 INFO [trainer.py:765] (7/8) Epoch 14, batch 700, train_loss[loss=2.728, ArTop10Accuracy=0.7858, over 10293.00 frames. ], tot_loss[loss=2.765, ArTop10Accuracy=0.7771, over 11532.92 frames. ], batch size: 12, lr: 8.57e-03
232
+ 2024-08-06 12:15:58,076 INFO [trainer.py:765] (7/8) Epoch 14, batch 800, train_loss[loss=2.684, ArTop10Accuracy=0.7962, over 9432.00 frames. ], tot_loss[loss=2.771, ArTop10Accuracy=0.7758, over 11619.66 frames. ], batch size: 11, lr: 8.55e-03
233
+ 2024-08-06 12:17:12,872 INFO [trainer.py:765] (7/8) Epoch 14, batch 900, train_loss[loss=2.82, ArTop10Accuracy=0.7688, over 12945.00 frames. ], tot_loss[loss=2.767, ArTop10Accuracy=0.7764, over 11694.97 frames. ], batch size: 27, lr: 8.52e-03
234
+ 2024-08-06 12:18:29,621 INFO [trainer.py:765] (7/8) Epoch 14, batch 1000, train_loss[loss=2.708, ArTop10Accuracy=0.791, over 13026.00 frames. ], tot_loss[loss=2.771, ArTop10Accuracy=0.7758, over 11882.66 frames. ], batch size: 27, lr: 8.50e-03
235
+ 2024-08-06 12:19:45,383 INFO [trainer.py:765] (7/8) Epoch 14, batch 1100, train_loss[loss=2.799, ArTop10Accuracy=0.7704, over 13656.00 frames. ], tot_loss[loss=2.778, ArTop10Accuracy=0.7744, over 11972.38 frames. ], batch size: 35, lr: 8.48e-03
236
+ 2024-08-06 12:20:59,284 INFO [trainer.py:765] (7/8) Epoch 14, batch 1200, train_loss[loss=2.889, ArTop10Accuracy=0.7537, over 11946.00 frames. ], tot_loss[loss=2.777, ArTop10Accuracy=0.7747, over 11883.44 frames. ], batch size: 101, lr: 8.46e-03
237
+ 2024-08-06 12:21:58,392 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
238
+ 2024-08-06 12:23:51,969 INFO [trainer.py:765] (7/8) Epoch 15, batch 100, train_loss[loss=2.843, ArTop10Accuracy=0.7642, over 14451.00 frames. ], tot_loss[loss=2.762, ArTop10Accuracy=0.7777, over 4748.98 frames. ], batch size: 62, lr: 8.14e-03
239
+ 2024-08-06 12:24:00,606 INFO [trainer.py:803] (7/8) Computing validation loss
240
+ 2024-08-06 12:24:10,290 INFO [trainer.py:811] (7/8) Epoch 15, validation: loss=2.819, ArTop10Accuracy=0.7675, over 1827537.00 frames.
241
+ 2024-08-06 12:24:10,291 INFO [trainer.py:814] (7/8) Maximum memory allocated so far is 33001MB
242
+ 2024-08-06 12:24:11,100 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.284e+02 1.371e+02 1.488e+02 4.667e+02, threshold=2.743e+02, percent-clipped=0.2
243
+ 2024-08-06 12:25:29,995 INFO [trainer.py:765] (7/8) Epoch 15, batch 200, train_loss[loss=2.79, ArTop10Accuracy=0.7691, over 13515.00 frames. ], tot_loss[loss=2.756, ArTop10Accuracy=0.7787, over 7748.40 frames. ], batch size: 34, lr: 8.12e-03
244
+ 2024-08-06 12:26:58,700 INFO [trainer.py:765] (7/8) Epoch 15, batch 300, train_loss[loss=2.801, ArTop10Accuracy=0.7692, over 13812.00 frames. ], tot_loss[loss=2.749, ArTop10Accuracy=0.78, over 9360.32 frames. ], batch size: 44, lr: 8.09e-03
245
+ 2024-08-06 12:28:28,541 INFO [trainer.py:765] (7/8) Epoch 15, batch 400, train_loss[loss=2.615, ArTop10Accuracy=0.8103, over 11025.00 frames. ], tot_loss[loss=2.751, ArTop10Accuracy=0.7795, over 10292.66 frames. ], batch size: 15, lr: 8.07e-03
246
+ 2024-08-06 12:29:54,040 INFO [trainer.py:765] (7/8) Epoch 15, batch 500, train_loss[loss=2.716, ArTop10Accuracy=0.7863, over 12057.00 frames. ], tot_loss[loss=2.747, ArTop10Accuracy=0.7803, over 10846.17 frames. ], batch size: 22, lr: 8.05e-03
247
+ 2024-08-06 12:31:23,300 INFO [trainer.py:765] (7/8) Epoch 15, batch 600, train_loss[loss=2.654, ArTop10Accuracy=0.7999, over 11283.00 frames. ], tot_loss[loss=2.754, ArTop10Accuracy=0.7792, over 11370.72 frames. ], batch size: 18, lr: 8.03e-03
248
+ 2024-08-06 12:32:53,182 INFO [trainer.py:765] (7/8) Epoch 15, batch 700, train_loss[loss=2.632, ArTop10Accuracy=0.8045, over 10308.00 frames. ], tot_loss[loss=2.757, ArTop10Accuracy=0.7785, over 11518.17 frames. ], batch size: 12, lr: 8.01e-03
249
+ 2024-08-06 12:34:18,261 INFO [trainer.py:765] (7/8) Epoch 15, batch 800, train_loss[loss=2.534, ArTop10Accuracy=0.8238, over 10224.00 frames. ], tot_loss[loss=2.759, ArTop10Accuracy=0.7781, over 11628.39 frames. ], batch size: 12, lr: 7.99e-03
250
+ 2024-08-06 12:35:34,733 INFO [trainer.py:765] (7/8) Epoch 15, batch 900, train_loss[loss=2.727, ArTop10Accuracy=0.786, over 12687.00 frames. ], tot_loss[loss=2.754, ArTop10Accuracy=0.7792, over 11667.32 frames. ], batch size: 27, lr: 7.97e-03
251
+ 2024-08-06 12:36:50,547 INFO [trainer.py:765] (7/8) Epoch 15, batch 1000, train_loss[loss=2.801, ArTop10Accuracy=0.7699, over 12762.00 frames. ], tot_loss[loss=2.759, ArTop10Accuracy=0.7781, over 11870.70 frames. ], batch size: 27, lr: 7.95e-03
252
+ 2024-08-06 12:38:05,188 INFO [trainer.py:765] (7/8) Epoch 15, batch 1100, train_loss[loss=2.793, ArTop10Accuracy=0.7714, over 13743.00 frames. ], tot_loss[loss=2.767, ArTop10Accuracy=0.7765, over 11953.51 frames. ], batch size: 34, lr: 7.93e-03
253
+ 2024-08-06 12:38:12,847 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.293e+02 1.379e+02 1.467e+02 2.824e+02, threshold=2.759e+02, percent-clipped=0.1
254
+ 2024-08-06 12:39:18,795 INFO [trainer.py:765] (7/8) Epoch 15, batch 1200, train_loss[loss=2.905, ArTop10Accuracy=0.7478, over 12228.00 frames. ], tot_loss[loss=2.766, ArTop10Accuracy=0.7768, over 11851.53 frames. ], batch size: 101, lr: 7.91e-03
255
+ 2024-08-06 12:40:18,896 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
256
+ 2024-08-06 12:42:17,627 INFO [trainer.py:765] (7/8) Epoch 16, batch 100, train_loss[loss=2.832, ArTop10Accuracy=0.7641, over 14472.00 frames. ], tot_loss[loss=2.747, ArTop10Accuracy=0.7805, over 4759.95 frames. ], batch size: 62, lr: 7.63e-03
257
+ 2024-08-06 12:43:49,572 INFO [trainer.py:765] (7/8) Epoch 16, batch 200, train_loss[loss=2.744, ArTop10Accuracy=0.7798, over 13692.00 frames. ], tot_loss[loss=2.742, ArTop10Accuracy=0.7813, over 7743.89 frames. ], batch size: 34, lr: 7.61e-03
258
+ 2024-08-06 12:45:18,508 INFO [trainer.py:765] (7/8) Epoch 16, batch 300, train_loss[loss=2.813, ArTop10Accuracy=0.7684, over 14073.00 frames. ], tot_loss[loss=2.741, ArTop10Accuracy=0.7815, over 9383.12 frames. ], batch size: 44, lr: 7.59e-03
259
+ 2024-08-06 12:46:45,215 INFO [trainer.py:765] (7/8) Epoch 16, batch 400, train_loss[loss=2.627, ArTop10Accuracy=0.8022, over 10203.00 frames. ], tot_loss[loss=2.737, ArTop10Accuracy=0.7819, over 10285.65 frames. ], batch size: 14, lr: 7.58e-03
260
+ 2024-08-06 12:48:16,319 INFO [trainer.py:765] (7/8) Epoch 16, batch 500, train_loss[loss=2.705, ArTop10Accuracy=0.7881, over 12114.00 frames. ], tot_loss[loss=2.733, ArTop10Accuracy=0.7826, over 10840.99 frames. ], batch size: 22, lr: 7.56e-03
261
+ 2024-08-06 12:49:46,651 INFO [trainer.py:765] (7/8) Epoch 16, batch 600, train_loss[loss=2.678, ArTop10Accuracy=0.7914, over 11430.00 frames. ], tot_loss[loss=2.738, ArTop10Accuracy=0.7818, over 11353.37 frames. ], batch size: 18, lr: 7.54e-03
262
+ 2024-08-06 12:51:23,687 INFO [trainer.py:765] (7/8) Epoch 16, batch 700, train_loss[loss=2.576, ArTop10Accuracy=0.8136, over 10032.00 frames. ], tot_loss[loss=2.741, ArTop10Accuracy=0.7812, over 11511.29 frames. ], batch size: 12, lr: 7.52e-03
263
+ 2024-08-06 12:52:43,507 INFO [trainer.py:765] (7/8) Epoch 16, batch 800, train_loss[loss=2.719, ArTop10Accuracy=0.7814, over 9390.00 frames. ], tot_loss[loss=2.75, ArTop10Accuracy=0.7794, over 11637.75 frames. ], batch size: 11, lr: 7.51e-03
264
+ 2024-08-06 12:53:06,022 INFO [trainer.py:803] (7/8) Computing validation loss
265
+ 2024-08-06 12:53:15,499 INFO [trainer.py:811] (7/8) Epoch 16, validation: loss=2.816, ArTop10Accuracy=0.7678, over 1827537.00 frames.
266
+ 2024-08-06 12:53:15,499 INFO [trainer.py:814] (7/8) Maximum memory allocated so far is 33001MB
267
+ 2024-08-06 12:53:16,192 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.291e+02 1.391e+02 1.487e+02 3.459e+02, threshold=2.783e+02, percent-clipped=0.1
268
+ 2024-08-06 12:54:06,487 INFO [trainer.py:765] (7/8) Epoch 16, batch 900, train_loss[loss=2.729, ArTop10Accuracy=0.7871, over 12846.00 frames. ], tot_loss[loss=2.75, ArTop10Accuracy=0.7795, over 11697.95 frames. ], batch size: 27, lr: 7.49e-03
269
+ 2024-08-06 12:55:19,797 INFO [trainer.py:765] (7/8) Epoch 16, batch 1000, train_loss[loss=2.687, ArTop10Accuracy=0.7908, over 12930.00 frames. ], tot_loss[loss=2.751, ArTop10Accuracy=0.7794, over 11884.37 frames. ], batch size: 27, lr: 7.47e-03
270
+ 2024-08-06 12:56:33,171 INFO [trainer.py:765] (7/8) Epoch 16, batch 1100, train_loss[loss=2.778, ArTop10Accuracy=0.7749, over 13665.00 frames. ], tot_loss[loss=2.758, ArTop10Accuracy=0.7782, over 11949.15 frames. ], batch size: 34, lr: 7.45e-03
271
+ 2024-08-06 12:57:48,491 INFO [trainer.py:765] (7/8) Epoch 16, batch 1200, train_loss[loss=2.839, ArTop10Accuracy=0.7655, over 12969.00 frames. ], tot_loss[loss=2.758, ArTop10Accuracy=0.7781, over 11864.36 frames. ], batch size: 103, lr: 7.44e-03
272
+ 2024-08-06 12:58:48,019 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
273
+ 2024-08-06 13:00:47,906 INFO [trainer.py:765] (7/8) Epoch 17, batch 100, train_loss[loss=2.792, ArTop10Accuracy=0.7732, over 14166.00 frames. ], tot_loss[loss=2.743, ArTop10Accuracy=0.7807, over 4760.01 frames. ], batch size: 62, lr: 7.18e-03
274
+ 2024-08-06 13:02:19,308 INFO [trainer.py:765] (7/8) Epoch 17, batch 200, train_loss[loss=2.814, ArTop10Accuracy=0.7661, over 13584.00 frames. ], tot_loss[loss=2.734, ArTop10Accuracy=0.7825, over 7731.47 frames. ], batch size: 34, lr: 7.17e-03
275
+ 2024-08-06 13:03:45,523 INFO [trainer.py:765] (7/8) Epoch 17, batch 300, train_loss[loss=2.769, ArTop10Accuracy=0.7733, over 14361.00 frames. ], tot_loss[loss=2.731, ArTop10Accuracy=0.7829, over 9385.16 frames. ], batch size: 45, lr: 7.15e-03
276
+ 2024-08-06 13:05:21,768 INFO [trainer.py:765] (7/8) Epoch 17, batch 400, train_loss[loss=2.632, ArTop10Accuracy=0.8014, over 10311.00 frames. ], tot_loss[loss=2.728, ArTop10Accuracy=0.7838, over 10317.90 frames. ], batch size: 14, lr: 7.14e-03
277
+ 2024-08-06 13:06:47,027 INFO [trainer.py:765] (7/8) Epoch 17, batch 500, train_loss[loss=2.792, ArTop10Accuracy=0.7735, over 12228.00 frames. ], tot_loss[loss=2.725, ArTop10Accuracy=0.7845, over 10862.57 frames. ], batch size: 22, lr: 7.12e-03
278
+ 2024-08-06 13:07:39,886 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.293e+02 1.386e+02 1.488e+02 3.253e+02, threshold=2.772e+02, percent-clipped=0.1
279
+ 2024-08-06 13:08:22,694 INFO [trainer.py:765] (7/8) Epoch 17, batch 600, train_loss[loss=2.656, ArTop10Accuracy=0.8009, over 11367.00 frames. ], tot_loss[loss=2.726, ArTop10Accuracy=0.7845, over 11383.18 frames. ], batch size: 18, lr: 7.10e-03
280
+ 2024-08-06 13:09:54,842 INFO [trainer.py:765] (7/8) Epoch 17, batch 700, train_loss[loss=2.583, ArTop10Accuracy=0.8144, over 9984.00 frames. ], tot_loss[loss=2.732, ArTop10Accuracy=0.7831, over 11520.02 frames. ], batch size: 12, lr: 7.09e-03
281
+ 2024-08-06 13:11:19,487 INFO [trainer.py:765] (7/8) Epoch 17, batch 800, train_loss[loss=2.614, ArTop10Accuracy=0.8099, over 9345.00 frames. ], tot_loss[loss=2.736, ArTop10Accuracy=0.7823, over 11651.14 frames. ], batch size: 11, lr: 7.07e-03
282
+ 2024-08-06 13:12:35,676 INFO [trainer.py:765] (7/8) Epoch 17, batch 900, train_loss[loss=2.75, ArTop10Accuracy=0.7803, over 12879.00 frames. ], tot_loss[loss=2.736, ArTop10Accuracy=0.7824, over 11692.53 frames. ], batch size: 27, lr: 7.06e-03
283
+ 2024-08-06 13:13:53,068 INFO [trainer.py:765] (7/8) Epoch 17, batch 1000, train_loss[loss=2.727, ArTop10Accuracy=0.7848, over 12855.00 frames. ], tot_loss[loss=2.744, ArTop10Accuracy=0.7809, over 11882.04 frames. ], batch size: 27, lr: 7.04e-03
284
+ 2024-08-06 13:15:08,492 INFO [trainer.py:765] (7/8) Epoch 17, batch 1100, train_loss[loss=2.74, ArTop10Accuracy=0.7812, over 14052.00 frames. ], tot_loss[loss=2.747, ArTop10Accuracy=0.7804, over 11950.75 frames. ], batch size: 35, lr: 7.02e-03
285
+ 2024-08-06 13:16:22,394 INFO [trainer.py:765] (7/8) Epoch 17, batch 1200, train_loss[loss=2.863, ArTop10Accuracy=0.7594, over 12015.00 frames. ], tot_loss[loss=2.747, ArTop10Accuracy=0.7804, over 11855.95 frames. ], batch size: 101, lr: 7.01e-03
286
+ 2024-08-06 13:17:21,213 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
287
+ 2024-08-06 13:19:16,001 INFO [trainer.py:765] (7/8) Epoch 18, batch 100, train_loss[loss=2.784, ArTop10Accuracy=0.7674, over 14079.00 frames. ], tot_loss[loss=2.724, ArTop10Accuracy=0.7844, over 4751.48 frames. ], batch size: 62, lr: 6.78e-03
288
+ 2024-08-06 13:20:46,608 INFO [trainer.py:765] (7/8) Epoch 18, batch 200, train_loss[loss=2.649, ArTop10Accuracy=0.7999, over 13605.00 frames. ], tot_loss[loss=2.722, ArTop10Accuracy=0.7849, over 7735.17 frames. ], batch size: 34, lr: 6.77e-03
289
+ 2024-08-06 13:21:55,110 INFO [trainer.py:803] (7/8) Computing validation loss
290
+ 2024-08-06 13:22:04,751 INFO [trainer.py:811] (7/8) Epoch 18, validation: loss=2.817, ArTop10Accuracy=0.768, over 1827537.00 frames.
291
+ 2024-08-06 13:22:04,752 INFO [trainer.py:814] (7/8) Maximum memory allocated so far is 33001MB
292
+ 2024-08-06 13:22:05,480 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.323e+02 1.409e+02 1.514e+02 3.209e+02, threshold=2.818e+02, percent-clipped=0.1
293
+ 2024-08-06 13:22:26,587 INFO [trainer.py:765] (7/8) Epoch 18, batch 300, train_loss[loss=2.752, ArTop10Accuracy=0.7807, over 14289.00 frames. ], tot_loss[loss=2.716, ArTop10Accuracy=0.7863, over 9361.90 frames. ], batch size: 44, lr: 6.76e-03
294
+ 2024-08-06 13:23:57,938 INFO [trainer.py:765] (7/8) Epoch 18, batch 400, train_loss[loss=2.653, ArTop10Accuracy=0.7984, over 10791.00 frames. ], tot_loss[loss=2.715, ArTop10Accuracy=0.7863, over 10296.98 frames. ], batch size: 15, lr: 6.74e-03
295
+ 2024-08-06 13:25:34,019 INFO [trainer.py:765] (7/8) Epoch 18, batch 500, train_loss[loss=2.708, ArTop10Accuracy=0.7846, over 12288.00 frames. ], tot_loss[loss=2.713, ArTop10Accuracy=0.7865, over 10862.32 frames. ], batch size: 22, lr: 6.73e-03
296
+ 2024-08-06 13:27:00,640 INFO [trainer.py:765] (7/8) Epoch 18, batch 600, train_loss[loss=2.674, ArTop10Accuracy=0.7969, over 11604.00 frames. ], tot_loss[loss=2.719, ArTop10Accuracy=0.7857, over 11368.16 frames. ], batch size: 18, lr: 6.71e-03
297
+ 2024-08-06 13:28:33,590 INFO [trainer.py:765] (7/8) Epoch 18, batch 700, train_loss[loss=2.703, ArTop10Accuracy=0.7906, over 10236.00 frames. ], tot_loss[loss=2.724, ArTop10Accuracy=0.7847, over 11505.06 frames. ], batch size: 12, lr: 6.70e-03
298
+ 2024-08-06 13:29:54,993 INFO [trainer.py:765] (7/8) Epoch 18, batch 800, train_loss[loss=2.754, ArTop10Accuracy=0.7758, over 10239.00 frames. ], tot_loss[loss=2.725, ArTop10Accuracy=0.7845, over 11629.35 frames. ], batch size: 12, lr: 6.68e-03
299
+ 2024-08-06 13:31:12,525 INFO [trainer.py:765] (7/8) Epoch 18, batch 900, train_loss[loss=2.723, ArTop10Accuracy=0.7859, over 12897.00 frames. ], tot_loss[loss=2.721, ArTop10Accuracy=0.7852, over 11685.87 frames. ], batch size: 27, lr: 6.67e-03
300
+ 2024-08-06 13:32:26,558 INFO [trainer.py:765] (7/8) Epoch 18, batch 1000, train_loss[loss=2.771, ArTop10Accuracy=0.7752, over 12876.00 frames. ], tot_loss[loss=2.727, ArTop10Accuracy=0.784, over 11871.23 frames. ], batch size: 27, lr: 6.66e-03
301
+ 2024-08-06 13:33:41,504 INFO [trainer.py:765] (7/8) Epoch 18, batch 1100, train_loss[loss=2.757, ArTop10Accuracy=0.7748, over 13779.00 frames. ], tot_loss[loss=2.734, ArTop10Accuracy=0.7827, over 11937.42 frames. ], batch size: 34, lr: 6.64e-03
302
+ 2024-08-06 13:34:54,682 INFO [trainer.py:765] (7/8) Epoch 18, batch 1200, train_loss[loss=2.863, ArTop10Accuracy=0.7601, over 12321.00 frames. ], tot_loss[loss=2.734, ArTop10Accuracy=0.7828, over 11837.69 frames. ], batch size: 101, lr: 6.63e-03
303
+ 2024-08-06 13:35:51,070 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.340e+02 1.433e+02 1.533e+02 2.444e+02, threshold=2.867e+02, percent-clipped=0.0
304
+ 2024-08-06 13:35:54,176 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
305
+ 2024-08-06 13:37:48,630 INFO [trainer.py:765] (7/8) Epoch 19, batch 100, train_loss[loss=2.798, ArTop10Accuracy=0.7683, over 14376.00 frames. ], tot_loss[loss=2.723, ArTop10Accuracy=0.7846, over 4770.95 frames. ], batch size: 62, lr: 6.43e-03
306
+ 2024-08-06 13:39:23,263 INFO [trainer.py:765] (7/8) Epoch 19, batch 200, train_loss[loss=2.726, ArTop10Accuracy=0.7834, over 13749.00 frames. ], tot_loss[loss=2.721, ArTop10Accuracy=0.7848, over 7745.36 frames. ], batch size: 34, lr: 6.41e-03
307
+ 2024-08-06 13:40:48,366 INFO [trainer.py:765] (7/8) Epoch 19, batch 300, train_loss[loss=2.731, ArTop10Accuracy=0.7845, over 14517.00 frames. ], tot_loss[loss=2.713, ArTop10Accuracy=0.7862, over 9373.26 frames. ], batch size: 44, lr: 6.40e-03
308
+ 2024-08-06 13:42:21,074 INFO [trainer.py:765] (7/8) Epoch 19, batch 400, train_loss[loss=2.731, ArTop10Accuracy=0.7861, over 10323.00 frames. ], tot_loss[loss=2.705, ArTop10Accuracy=0.7879, over 10293.26 frames. ], batch size: 14, lr: 6.39e-03
309
+ 2024-08-06 13:43:44,961 INFO [trainer.py:765] (7/8) Epoch 19, batch 500, train_loss[loss=2.731, ArTop10Accuracy=0.7784, over 12534.00 frames. ], tot_loss[loss=2.7, ArTop10Accuracy=0.789, over 10849.95 frames. ], batch size: 23, lr: 6.37e-03
310
+ 2024-08-06 13:45:16,688 INFO [trainer.py:765] (7/8) Epoch 19, batch 600, train_loss[loss=2.637, ArTop10Accuracy=0.8015, over 11577.00 frames. ], tot_loss[loss=2.703, ArTop10Accuracy=0.7885, over 11382.15 frames. ], batch size: 18, lr: 6.36e-03
311
+ 2024-08-06 13:46:48,330 INFO [trainer.py:765] (7/8) Epoch 19, batch 700, train_loss[loss=2.646, ArTop10Accuracy=0.7997, over 10125.00 frames. ], tot_loss[loss=2.71, ArTop10Accuracy=0.787, over 11524.94 frames. ], batch size: 12, lr: 6.35e-03
312
+ 2024-08-06 13:48:11,890 INFO [trainer.py:765] (7/8) Epoch 19, batch 800, train_loss[loss=2.537, ArTop10Accuracy=0.8126, over 10260.00 frames. ], tot_loss[loss=2.715, ArTop10Accuracy=0.786, over 11639.76 frames. ], batch size: 12, lr: 6.34e-03
313
+ 2024-08-06 13:49:27,268 INFO [trainer.py:765] (7/8) Epoch 19, batch 900, train_loss[loss=2.684, ArTop10Accuracy=0.7954, over 13089.00 frames. ], tot_loss[loss=2.71, ArTop10Accuracy=0.7871, over 11707.01 frames. ], batch size: 27, lr: 6.32e-03
314
+ 2024-08-06 13:50:40,660 INFO [trainer.py:803] (7/8) Computing validation loss
315
+ 2024-08-06 13:50:50,536 INFO [trainer.py:811] (7/8) Epoch 19, validation: loss=2.818, ArTop10Accuracy=0.7679, over 1827537.00 frames.
316
+ 2024-08-06 13:50:50,537 INFO [trainer.py:814] (7/8) Maximum memory allocated so far is 33001MB
317
+ 2024-08-06 13:50:51,497 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.371e+02 1.455e+02 1.550e+02 3.697e+02, threshold=2.909e+02, percent-clipped=0.2
318
+ 2024-08-06 13:50:52,921 INFO [trainer.py:765] (7/8) Epoch 19, batch 1000, train_loss[loss=2.774, ArTop10Accuracy=0.773, over 12831.00 frames. ], tot_loss[loss=2.716, ArTop10Accuracy=0.7858, over 11889.69 frames. ], batch size: 27, lr: 6.31e-03
319
+ 2024-08-06 13:52:08,274 INFO [trainer.py:765] (7/8) Epoch 19, batch 1100, train_loss[loss=2.682, ArTop10Accuracy=0.792, over 13770.00 frames. ], tot_loss[loss=2.725, ArTop10Accuracy=0.784, over 11949.85 frames. ], batch size: 34, lr: 6.30e-03
320
+ 2024-08-06 13:53:22,320 INFO [trainer.py:765] (7/8) Epoch 19, batch 1200, train_loss[loss=2.872, ArTop10Accuracy=0.7564, over 12657.00 frames. ], tot_loss[loss=2.725, ArTop10Accuracy=0.7842, over 11853.51 frames. ], batch size: 103, lr: 6.28e-03
321
+ 2024-08-06 13:54:21,906 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
322
+ 2024-08-06 13:56:12,912 INFO [trainer.py:765] (7/8) Epoch 20, batch 100, train_loss[loss=2.794, ArTop10Accuracy=0.7677, over 14673.00 frames. ], tot_loss[loss=2.706, ArTop10Accuracy=0.7875, over 4740.81 frames. ], batch size: 62, lr: 6.10e-03
323
+ 2024-08-06 13:57:42,501 INFO [trainer.py:765] (7/8) Epoch 20, batch 200, train_loss[loss=2.691, ArTop10Accuracy=0.7936, over 13551.00 frames. ], tot_loss[loss=2.704, ArTop10Accuracy=0.788, over 7749.97 frames. ], batch size: 34, lr: 6.09e-03
324
+ 2024-08-06 13:59:15,436 INFO [trainer.py:765] (7/8) Epoch 20, batch 300, train_loss[loss=2.764, ArTop10Accuracy=0.7777, over 13926.00 frames. ], tot_loss[loss=2.696, ArTop10Accuracy=0.7898, over 9400.91 frames. ], batch size: 44, lr: 6.08e-03
325
+ 2024-08-06 14:00:44,362 INFO [trainer.py:765] (7/8) Epoch 20, batch 400, train_loss[loss=2.598, ArTop10Accuracy=0.8109, over 10725.00 frames. ], tot_loss[loss=2.696, ArTop10Accuracy=0.7898, over 10319.38 frames. ], batch size: 15, lr: 6.07e-03
326
+ 2024-08-06 14:02:14,860 INFO [trainer.py:765] (7/8) Epoch 20, batch 500, train_loss[loss=2.713, ArTop10Accuracy=0.7871, over 12249.00 frames. ], tot_loss[loss=2.692, ArTop10Accuracy=0.7905, over 10864.98 frames. ], batch size: 22, lr: 6.06e-03
327
+ 2024-08-06 14:03:40,862 INFO [trainer.py:765] (7/8) Epoch 20, batch 600, train_loss[loss=2.639, ArTop10Accuracy=0.8041, over 11127.00 frames. ], tot_loss[loss=2.694, ArTop10Accuracy=0.79, over 11364.86 frames. ], batch size: 18, lr: 6.04e-03
328
+ 2024-08-06 14:05:13,870 INFO [trainer.py:765] (7/8) Epoch 20, batch 700, train_loss[loss=2.727, ArTop10Accuracy=0.7852, over 10017.00 frames. ], tot_loss[loss=2.701, ArTop10Accuracy=0.7889, over 11514.99 frames. ], batch size: 12, lr: 6.03e-03
329
+ 2024-08-06 14:05:30,798 INFO [optim.py:386] (7/8) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.365e+02 1.456e+02 1.550e+02 3.525e+02, threshold=2.913e+02, percent-clipped=0.1
330
+ 2024-08-06 14:06:34,515 INFO [trainer.py:765] (7/8) Epoch 20, batch 800, train_loss[loss=2.786, ArTop10Accuracy=0.7703, over 10329.00 frames. ], tot_loss[loss=2.703, ArTop10Accuracy=0.7884, over 11642.78 frames. ], batch size: 12, lr: 6.02e-03
331
+ 2024-08-06 14:07:50,950 INFO [trainer.py:765] (7/8) Epoch 20, batch 900, train_loss[loss=2.812, ArTop10Accuracy=0.7704, over 12780.00 frames. ], tot_loss[loss=2.703, ArTop10Accuracy=0.7884, over 11670.27 frames. ], batch size: 27, lr: 6.01e-03
332
+ 2024-08-06 14:09:07,180 INFO [trainer.py:765] (7/8) Epoch 20, batch 1000, train_loss[loss=2.698, ArTop10Accuracy=0.7897, over 12972.00 frames. ], tot_loss[loss=2.71, ArTop10Accuracy=0.7871, over 11868.65 frames. ], batch size: 27, lr: 6.00e-03
333
+ 2024-08-06 14:10:21,216 INFO [trainer.py:765] (7/8) Epoch 20, batch 1100, train_loss[loss=2.704, ArTop10Accuracy=0.7918, over 13494.00 frames. ], tot_loss[loss=2.717, ArTop10Accuracy=0.7857, over 11953.96 frames. ], batch size: 34, lr: 5.99e-03
334
+ 2024-08-06 14:11:37,819 INFO [trainer.py:765] (7/8) Epoch 20, batch 1200, train_loss[loss=2.794, ArTop10Accuracy=0.7705, over 11277.00 frames. ], tot_loss[loss=2.72, ArTop10Accuracy=0.7853, over 11847.41 frames. ], batch size: 101, lr: 5.98e-03
335
+ 2024-08-06 14:12:37,438 INFO [trainer.py:650] (7/8) Reaches end of dataloader.
336
+ 2024-08-06 14:12:37,442 INFO [trainer.py:1069] (7/8) Done!
libritts-r/log/log-train-2024-08-06-14-23-41-0 ADDED
The diff for this file is too large to render. See raw diff
 
libritts-r/log/log-train-2024-08-06-14-23-41-1 ADDED
The diff for this file is too large to render. See raw diff
 
libritts-r/log/log-train-2024-08-06-14-23-41-2 ADDED
The diff for this file is too large to render. See raw diff
 
libritts-r/log/log-train-2024-08-06-14-23-41-3 ADDED
The diff for this file is too large to render. See raw diff
 
libritts-r/log/log-train-2024-08-06-14-23-41-4 ADDED
The diff for this file is too large to render. See raw diff
 
libritts-r/log/log-train-2024-08-06-14-23-41-5 ADDED
The diff for this file is too large to render. See raw diff
 
libritts-r/log/log-train-2024-08-06-14-23-41-6 ADDED
The diff for this file is too large to render. See raw diff
 
libritts-r/log/log-train-2024-08-06-14-23-41-7 ADDED
The diff for this file is too large to render. See raw diff
 
libritts-r/tensorboard_stage1/events.out.tfevents.1722931336.6867463.3160.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e4fdcf08995c4e16faac48f01f61e78a24c67e8d0f4ca11829b8f07000ebe0c
3
+ size 135
libritts-r/tensorboard_stage1/events.out.tfevents.1722931437.6867463.17896.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c65dd1cf55da46628acbeaccf4d3fdba589d779fb4908ec4adf688777d119534
3
+ size 88
libritts-r/tensorboard_stage1/events.out.tfevents.1722931574.6867463.20306.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e402ea5b9f65580ee8f730e153fdfe780d5b67538a8246fc3f618309f88ca1d
3
+ size 103227
libritts-r/tensorboard_stage2/events.out.tfevents.1722954221.6867463.1063288.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0a6ca0899fbb9fecb7e2a6a080cf907cb090980428971e0d996640a6d8a8e93
3
+ size 434306
libritts/log/log-train-2024-08-06-03-01-46-0 ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 03:01:46,136 INFO [trainer.py:870] (0/8) Training started
2
+ 2024-08-06 03:01:46,140 INFO [trainer.py:889] (0/8) Device: cuda:0
3
+ 2024-08-06 03:01:46,141 INFO [trainer.py:890] (0/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'main', 'icefall-git-sha1': '7d2e5f4-dirty', 'icefall-git-date': 'Tue Aug 6 02:59:12 2024', 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6865771', 'IP address': '0.104.195.107'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 1000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': True, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 03:01:46,141 INFO [trainer.py:892] (0/8) About to create model
5
+ 2024-08-06 03:01:47,114 INFO [trainer.py:899] (0/8) Number of model parameters: 367386628
6
+ 2024-08-06 03:01:47,976 INFO [trainer.py:914] (0/8) Using DDP
7
+ 2024-08-06 03:01:50,133 INFO [datamodule.py:427] (0/8) About to get train cuts
8
+ 2024-08-06 03:01:50,144 INFO [datamodule.py:434] (0/8) About to get dev cuts
9
+ 2024-08-06 03:01:50,151 INFO [datamodule.py:292] (0/8) Disable SpecAugment
10
+ 2024-08-06 03:01:50,151 INFO [datamodule.py:294] (0/8) About to create train dataset
11
+ 2024-08-06 03:01:50,152 INFO [datamodule.py:323] (0/8) Using DynamicBucketingSampler
12
+ 2024-08-06 03:01:50,769 INFO [datamodule.py:344] (0/8) About to create train dataloader
13
+ 2024-08-06 03:01:50,769 INFO [datamodule.py:367] (0/8) About to create dev dataset
14
+ 2024-08-06 03:01:51,096 INFO [datamodule.py:388] (0/8) About to create dev dataloader
15
+ 2024-08-06 03:01:51,096 INFO [trainer.py:1104] (0/8) Sanity check -- see if any of the batches in epoch 1 would cause OOM.
libritts/log/log-train-2024-08-06-03-01-46-1 ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 03:01:46,127 INFO [trainer.py:870] (1/8) Training started
2
+ 2024-08-06 03:01:46,128 INFO [trainer.py:889] (1/8) Device: cuda:1
3
+ 2024-08-06 03:01:46,128 INFO [trainer.py:890] (1/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'main', 'icefall-git-sha1': '7d2e5f4-dirty', 'icefall-git-date': 'Tue Aug 6 02:59:12 2024', 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6865771', 'IP address': '0.104.195.107'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 1000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': True, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 03:01:46,128 INFO [trainer.py:892] (1/8) About to create model
5
+ 2024-08-06 03:01:47,130 INFO [trainer.py:899] (1/8) Number of model parameters: 367386628
6
+ 2024-08-06 03:01:48,044 INFO [trainer.py:914] (1/8) Using DDP
7
+ 2024-08-06 03:01:50,132 INFO [datamodule.py:427] (1/8) About to get train cuts
8
+ 2024-08-06 03:01:50,144 INFO [datamodule.py:434] (1/8) About to get dev cuts
9
+ 2024-08-06 03:01:50,151 INFO [datamodule.py:292] (1/8) Disable SpecAugment
10
+ 2024-08-06 03:01:50,151 INFO [datamodule.py:294] (1/8) About to create train dataset
11
+ 2024-08-06 03:01:50,152 INFO [datamodule.py:323] (1/8) Using DynamicBucketingSampler
12
+ 2024-08-06 03:01:50,773 INFO [datamodule.py:344] (1/8) About to create train dataloader
13
+ 2024-08-06 03:01:50,773 INFO [datamodule.py:367] (1/8) About to create dev dataset
14
+ 2024-08-06 03:01:51,101 INFO [datamodule.py:388] (1/8) About to create dev dataloader
15
+ 2024-08-06 03:01:51,101 INFO [trainer.py:1104] (1/8) Sanity check -- see if any of the batches in epoch 1 would cause OOM.
libritts/log/log-train-2024-08-06-03-01-46-2 ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 03:01:46,149 INFO [trainer.py:870] (2/8) Training started
2
+ 2024-08-06 03:01:46,150 INFO [trainer.py:889] (2/8) Device: cuda:2
3
+ 2024-08-06 03:01:46,150 INFO [trainer.py:890] (2/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'main', 'icefall-git-sha1': '7d2e5f4-dirty', 'icefall-git-date': 'Tue Aug 6 02:59:12 2024', 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6865771', 'IP address': '0.104.195.107'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 1000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': True, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 03:01:46,150 INFO [trainer.py:892] (2/8) About to create model
5
+ 2024-08-06 03:01:46,898 INFO [trainer.py:899] (2/8) Number of model parameters: 367386628
6
+ 2024-08-06 03:01:47,589 INFO [trainer.py:914] (2/8) Using DDP
7
+ 2024-08-06 03:01:50,133 INFO [datamodule.py:427] (2/8) About to get train cuts
8
+ 2024-08-06 03:01:50,144 INFO [datamodule.py:434] (2/8) About to get dev cuts
9
+ 2024-08-06 03:01:50,151 INFO [datamodule.py:292] (2/8) Disable SpecAugment
10
+ 2024-08-06 03:01:50,151 INFO [datamodule.py:294] (2/8) About to create train dataset
11
+ 2024-08-06 03:01:50,152 INFO [datamodule.py:323] (2/8) Using DynamicBucketingSampler
12
+ 2024-08-06 03:01:50,778 INFO [datamodule.py:344] (2/8) About to create train dataloader
13
+ 2024-08-06 03:01:50,778 INFO [datamodule.py:367] (2/8) About to create dev dataset
14
+ 2024-08-06 03:01:51,108 INFO [datamodule.py:388] (2/8) About to create dev dataloader
15
+ 2024-08-06 03:01:51,108 INFO [trainer.py:1104] (2/8) Sanity check -- see if any of the batches in epoch 1 would cause OOM.
libritts/log/log-train-2024-08-06-03-01-46-3 ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 03:01:46,149 INFO [trainer.py:870] (3/8) Training started
2
+ 2024-08-06 03:01:46,150 INFO [trainer.py:889] (3/8) Device: cuda:3
3
+ 2024-08-06 03:01:46,151 INFO [trainer.py:890] (3/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'main', 'icefall-git-sha1': '7d2e5f4-dirty', 'icefall-git-date': 'Tue Aug 6 02:59:12 2024', 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6865771', 'IP address': '0.104.195.107'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 1000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': True, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 03:01:46,151 INFO [trainer.py:892] (3/8) About to create model
5
+ 2024-08-06 03:01:46,887 INFO [trainer.py:899] (3/8) Number of model parameters: 367386628
6
+ 2024-08-06 03:01:47,590 INFO [trainer.py:914] (3/8) Using DDP
7
+ 2024-08-06 03:01:50,133 INFO [datamodule.py:427] (3/8) About to get train cuts
8
+ 2024-08-06 03:01:50,144 INFO [datamodule.py:434] (3/8) About to get dev cuts
9
+ 2024-08-06 03:01:50,151 INFO [datamodule.py:292] (3/8) Disable SpecAugment
10
+ 2024-08-06 03:01:50,151 INFO [datamodule.py:294] (3/8) About to create train dataset
11
+ 2024-08-06 03:01:50,153 INFO [datamodule.py:323] (3/8) Using DynamicBucketingSampler
12
+ 2024-08-06 03:01:50,772 INFO [datamodule.py:344] (3/8) About to create train dataloader
13
+ 2024-08-06 03:01:50,773 INFO [datamodule.py:367] (3/8) About to create dev dataset
14
+ 2024-08-06 03:01:51,100 INFO [datamodule.py:388] (3/8) About to create dev dataloader
15
+ 2024-08-06 03:01:51,100 INFO [trainer.py:1104] (3/8) Sanity check -- see if any of the batches in epoch 1 would cause OOM.
libritts/log/log-train-2024-08-06-03-01-46-4 ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 03:01:46,133 INFO [trainer.py:870] (4/8) Training started
2
+ 2024-08-06 03:01:46,133 INFO [trainer.py:889] (4/8) Device: cuda:4
3
+ 2024-08-06 03:01:46,134 INFO [trainer.py:890] (4/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'main', 'icefall-git-sha1': '7d2e5f4-dirty', 'icefall-git-date': 'Tue Aug 6 02:59:12 2024', 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6865771', 'IP address': '0.104.195.107'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 1000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': True, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 03:01:46,134 INFO [trainer.py:892] (4/8) About to create model
5
+ 2024-08-06 03:01:47,116 INFO [trainer.py:899] (4/8) Number of model parameters: 367386628
6
+ 2024-08-06 03:01:47,977 INFO [trainer.py:914] (4/8) Using DDP
7
+ 2024-08-06 03:01:50,133 INFO [datamodule.py:427] (4/8) About to get train cuts
8
+ 2024-08-06 03:01:50,144 INFO [datamodule.py:434] (4/8) About to get dev cuts
9
+ 2024-08-06 03:01:50,151 INFO [datamodule.py:292] (4/8) Disable SpecAugment
10
+ 2024-08-06 03:01:50,151 INFO [datamodule.py:294] (4/8) About to create train dataset
11
+ 2024-08-06 03:01:50,152 INFO [datamodule.py:323] (4/8) Using DynamicBucketingSampler
12
+ 2024-08-06 03:01:50,768 INFO [datamodule.py:344] (4/8) About to create train dataloader
13
+ 2024-08-06 03:01:50,768 INFO [datamodule.py:367] (4/8) About to create dev dataset
14
+ 2024-08-06 03:01:51,099 INFO [datamodule.py:388] (4/8) About to create dev dataloader
15
+ 2024-08-06 03:01:51,099 INFO [trainer.py:1104] (4/8) Sanity check -- see if any of the batches in epoch 1 would cause OOM.
libritts/log/log-train-2024-08-06-03-01-46-5 ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 03:01:46,128 INFO [trainer.py:870] (5/8) Training started
2
+ 2024-08-06 03:01:46,129 INFO [trainer.py:889] (5/8) Device: cuda:5
3
+ 2024-08-06 03:01:46,129 INFO [trainer.py:890] (5/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'main', 'icefall-git-sha1': '7d2e5f4-dirty', 'icefall-git-date': 'Tue Aug 6 02:59:12 2024', 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6865771', 'IP address': '0.104.195.107'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 1000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': True, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 03:01:46,129 INFO [trainer.py:892] (5/8) About to create model
5
+ 2024-08-06 03:01:47,129 INFO [trainer.py:899] (5/8) Number of model parameters: 367386628
6
+ 2024-08-06 03:01:48,055 INFO [trainer.py:914] (5/8) Using DDP
7
+ 2024-08-06 03:01:50,131 INFO [datamodule.py:427] (5/8) About to get train cuts
8
+ 2024-08-06 03:01:50,144 INFO [datamodule.py:434] (5/8) About to get dev cuts
9
+ 2024-08-06 03:01:50,150 INFO [datamodule.py:292] (5/8) Disable SpecAugment
10
+ 2024-08-06 03:01:50,150 INFO [datamodule.py:294] (5/8) About to create train dataset
11
+ 2024-08-06 03:01:50,152 INFO [datamodule.py:323] (5/8) Using DynamicBucketingSampler
12
+ 2024-08-06 03:01:50,777 INFO [datamodule.py:344] (5/8) About to create train dataloader
13
+ 2024-08-06 03:01:50,777 INFO [datamodule.py:367] (5/8) About to create dev dataset
14
+ 2024-08-06 03:01:51,111 INFO [datamodule.py:388] (5/8) About to create dev dataloader
15
+ 2024-08-06 03:01:51,111 INFO [trainer.py:1104] (5/8) Sanity check -- see if any of the batches in epoch 1 would cause OOM.
libritts/log/log-train-2024-08-06-03-01-46-6 ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 03:01:46,128 INFO [trainer.py:870] (6/8) Training started
2
+ 2024-08-06 03:01:46,129 INFO [trainer.py:889] (6/8) Device: cuda:6
3
+ 2024-08-06 03:01:46,129 INFO [trainer.py:890] (6/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'main', 'icefall-git-sha1': '7d2e5f4-dirty', 'icefall-git-date': 'Tue Aug 6 02:59:12 2024', 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6865771', 'IP address': '0.104.195.107'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 1000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': True, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 03:01:46,129 INFO [trainer.py:892] (6/8) About to create model
5
+ 2024-08-06 03:01:46,923 INFO [trainer.py:899] (6/8) Number of model parameters: 367386628
6
+ 2024-08-06 03:01:47,594 INFO [trainer.py:914] (6/8) Using DDP
7
+ 2024-08-06 03:01:50,130 INFO [datamodule.py:427] (6/8) About to get train cuts
8
+ 2024-08-06 03:01:50,144 INFO [datamodule.py:434] (6/8) About to get dev cuts
9
+ 2024-08-06 03:01:50,151 INFO [datamodule.py:292] (6/8) Disable SpecAugment
10
+ 2024-08-06 03:01:50,151 INFO [datamodule.py:294] (6/8) About to create train dataset
11
+ 2024-08-06 03:01:50,152 INFO [datamodule.py:323] (6/8) Using DynamicBucketingSampler
12
+ 2024-08-06 03:01:50,772 INFO [datamodule.py:344] (6/8) About to create train dataloader
13
+ 2024-08-06 03:01:50,772 INFO [datamodule.py:367] (6/8) About to create dev dataset
14
+ 2024-08-06 03:01:51,100 INFO [datamodule.py:388] (6/8) About to create dev dataloader
15
+ 2024-08-06 03:01:51,100 INFO [trainer.py:1104] (6/8) Sanity check -- see if any of the batches in epoch 1 would cause OOM.
libritts/log/log-train-2024-08-06-03-01-46-7 ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 03:01:46,127 INFO [trainer.py:870] (7/8) Training started
2
+ 2024-08-06 03:01:46,128 INFO [trainer.py:889] (7/8) Device: cuda:7
3
+ 2024-08-06 03:01:46,128 INFO [trainer.py:890] (7/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'main', 'icefall-git-sha1': '7d2e5f4-dirty', 'icefall-git-date': 'Tue Aug 6 02:59:12 2024', 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6865771', 'IP address': '0.104.195.107'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 1000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': True, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 03:01:46,128 INFO [trainer.py:892] (7/8) About to create model
5
+ 2024-08-06 03:01:47,129 INFO [trainer.py:899] (7/8) Number of model parameters: 367386628
6
+ 2024-08-06 03:01:48,055 INFO [trainer.py:914] (7/8) Using DDP
7
+ 2024-08-06 03:01:50,131 INFO [datamodule.py:427] (7/8) About to get train cuts
8
+ 2024-08-06 03:01:50,144 INFO [datamodule.py:434] (7/8) About to get dev cuts
9
+ 2024-08-06 03:01:50,150 INFO [datamodule.py:292] (7/8) Disable SpecAugment
10
+ 2024-08-06 03:01:50,150 INFO [datamodule.py:294] (7/8) About to create train dataset
11
+ 2024-08-06 03:01:50,152 INFO [datamodule.py:323] (7/8) Using DynamicBucketingSampler
12
+ 2024-08-06 03:01:50,766 INFO [datamodule.py:344] (7/8) About to create train dataloader
13
+ 2024-08-06 03:01:50,766 INFO [datamodule.py:367] (7/8) About to create dev dataset
14
+ 2024-08-06 03:01:51,090 INFO [datamodule.py:388] (7/8) About to create dev dataloader
15
+ 2024-08-06 03:01:51,091 INFO [trainer.py:1104] (7/8) Sanity check -- see if any of the batches in epoch 1 would cause OOM.
libritts/log/log-train-2024-08-06-03-26-50-0 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 03:26:50,289 INFO [trainer.py:870] (0/8) Training started
2
+ 2024-08-06 03:26:50,293 INFO [trainer.py:889] (0/8) Device: cuda:0
3
+ 2024-08-06 03:26:50,293 INFO [trainer.py:890] (0/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'main', 'icefall-git-sha1': '7d2e5f4-dirty', 'icefall-git-date': 'Tue Aug 6 02:59:12 2024', 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6865771', 'IP address': '0.104.195.107'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 1000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 03:26:50,293 INFO [trainer.py:892] (0/8) About to create model
5
+ 2024-08-06 03:26:51,081 INFO [trainer.py:899] (0/8) Number of model parameters: 367386628
6
+ 2024-08-06 03:26:51,961 INFO [trainer.py:914] (0/8) Using DDP
7
+ 2024-08-06 03:26:54,035 INFO [datamodule.py:427] (0/8) About to get train cuts
8
+ 2024-08-06 03:26:54,036 INFO [datamodule.py:434] (0/8) About to get dev cuts
9
+ 2024-08-06 03:26:54,037 INFO [datamodule.py:292] (0/8) Disable SpecAugment
10
+ 2024-08-06 03:26:54,037 INFO [datamodule.py:294] (0/8) About to create train dataset
11
+ 2024-08-06 03:26:54,038 INFO [datamodule.py:323] (0/8) Using DynamicBucketingSampler
12
+ 2024-08-06 03:26:54,644 INFO [datamodule.py:344] (0/8) About to create train dataloader
13
+ 2024-08-06 03:26:54,644 INFO [datamodule.py:367] (0/8) About to create dev dataset
14
+ 2024-08-06 03:26:54,968 INFO [datamodule.py:388] (0/8) About to create dev dataloader
libritts/log/log-train-2024-08-06-03-26-50-1 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 03:26:50,319 INFO [trainer.py:870] (1/8) Training started
2
+ 2024-08-06 03:26:50,320 INFO [trainer.py:889] (1/8) Device: cuda:1
3
+ 2024-08-06 03:26:50,320 INFO [trainer.py:890] (1/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'main', 'icefall-git-sha1': '7d2e5f4-dirty', 'icefall-git-date': 'Tue Aug 6 02:59:12 2024', 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6865771', 'IP address': '0.104.195.107'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 1000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 03:26:50,320 INFO [trainer.py:892] (1/8) About to create model
5
+ 2024-08-06 03:26:51,037 INFO [trainer.py:899] (1/8) Number of model parameters: 367386628
6
+ 2024-08-06 03:26:51,836 INFO [trainer.py:914] (1/8) Using DDP
7
+ 2024-08-06 03:26:54,036 INFO [datamodule.py:427] (1/8) About to get train cuts
8
+ 2024-08-06 03:26:54,038 INFO [datamodule.py:434] (1/8) About to get dev cuts
9
+ 2024-08-06 03:26:54,039 INFO [datamodule.py:292] (1/8) Disable SpecAugment
10
+ 2024-08-06 03:26:54,039 INFO [datamodule.py:294] (1/8) About to create train dataset
11
+ 2024-08-06 03:26:54,040 INFO [datamodule.py:323] (1/8) Using DynamicBucketingSampler
12
+ 2024-08-06 03:26:54,657 INFO [datamodule.py:344] (1/8) About to create train dataloader
13
+ 2024-08-06 03:26:54,657 INFO [datamodule.py:367] (1/8) About to create dev dataset
14
+ 2024-08-06 03:26:54,990 INFO [datamodule.py:388] (1/8) About to create dev dataloader
libritts/log/log-train-2024-08-06-03-26-50-2 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 03:26:50,323 INFO [trainer.py:870] (2/8) Training started
2
+ 2024-08-06 03:26:50,324 INFO [trainer.py:889] (2/8) Device: cuda:2
3
+ 2024-08-06 03:26:50,324 INFO [trainer.py:890] (2/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'main', 'icefall-git-sha1': '7d2e5f4-dirty', 'icefall-git-date': 'Tue Aug 6 02:59:12 2024', 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6865771', 'IP address': '0.104.195.107'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 1000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 03:26:50,324 INFO [trainer.py:892] (2/8) About to create model
5
+ 2024-08-06 03:26:51,090 INFO [trainer.py:899] (2/8) Number of model parameters: 367386628
6
+ 2024-08-06 03:26:51,890 INFO [trainer.py:914] (2/8) Using DDP
7
+ 2024-08-06 03:26:54,038 INFO [datamodule.py:427] (2/8) About to get train cuts
8
+ 2024-08-06 03:26:54,040 INFO [datamodule.py:434] (2/8) About to get dev cuts
9
+ 2024-08-06 03:26:54,041 INFO [datamodule.py:292] (2/8) Disable SpecAugment
10
+ 2024-08-06 03:26:54,041 INFO [datamodule.py:294] (2/8) About to create train dataset
11
+ 2024-08-06 03:26:54,041 INFO [datamodule.py:323] (2/8) Using DynamicBucketingSampler
12
+ 2024-08-06 03:26:54,668 INFO [datamodule.py:344] (2/8) About to create train dataloader
13
+ 2024-08-06 03:26:54,669 INFO [datamodule.py:367] (2/8) About to create dev dataset
14
+ 2024-08-06 03:26:55,004 INFO [datamodule.py:388] (2/8) About to create dev dataloader
libritts/log/log-train-2024-08-06-03-26-50-3 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 03:26:50,316 INFO [trainer.py:870] (3/8) Training started
2
+ 2024-08-06 03:26:50,317 INFO [trainer.py:889] (3/8) Device: cuda:3
3
+ 2024-08-06 03:26:50,317 INFO [trainer.py:890] (3/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'main', 'icefall-git-sha1': '7d2e5f4-dirty', 'icefall-git-date': 'Tue Aug 6 02:59:12 2024', 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6865771', 'IP address': '0.104.195.107'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 1000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 03:26:50,317 INFO [trainer.py:892] (3/8) About to create model
5
+ 2024-08-06 03:26:51,135 INFO [trainer.py:899] (3/8) Number of model parameters: 367386628
6
+ 2024-08-06 03:26:51,959 INFO [trainer.py:914] (3/8) Using DDP
7
+ 2024-08-06 03:26:54,035 INFO [datamodule.py:427] (3/8) About to get train cuts
8
+ 2024-08-06 03:26:54,036 INFO [datamodule.py:434] (3/8) About to get dev cuts
9
+ 2024-08-06 03:26:54,038 INFO [datamodule.py:292] (3/8) Disable SpecAugment
10
+ 2024-08-06 03:26:54,038 INFO [datamodule.py:294] (3/8) About to create train dataset
11
+ 2024-08-06 03:26:54,038 INFO [datamodule.py:323] (3/8) Using DynamicBucketingSampler
12
+ 2024-08-06 03:26:54,678 INFO [datamodule.py:344] (3/8) About to create train dataloader
13
+ 2024-08-06 03:26:54,679 INFO [datamodule.py:367] (3/8) About to create dev dataset
14
+ 2024-08-06 03:26:55,067 INFO [datamodule.py:388] (3/8) About to create dev dataloader
libritts/log/log-train-2024-08-06-03-26-50-4 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 03:26:50,273 INFO [trainer.py:870] (4/8) Training started
2
+ 2024-08-06 03:26:50,274 INFO [trainer.py:889] (4/8) Device: cuda:4
3
+ 2024-08-06 03:26:50,274 INFO [trainer.py:890] (4/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'main', 'icefall-git-sha1': '7d2e5f4-dirty', 'icefall-git-date': 'Tue Aug 6 02:59:12 2024', 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6865771', 'IP address': '0.104.195.107'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 1000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 03:26:50,274 INFO [trainer.py:892] (4/8) About to create model
5
+ 2024-08-06 03:26:51,035 INFO [trainer.py:899] (4/8) Number of model parameters: 367386628
6
+ 2024-08-06 03:26:51,876 INFO [trainer.py:914] (4/8) Using DDP
7
+ 2024-08-06 03:26:54,038 INFO [datamodule.py:427] (4/8) About to get train cuts
8
+ 2024-08-06 03:26:54,039 INFO [datamodule.py:434] (4/8) About to get dev cuts
9
+ 2024-08-06 03:26:54,041 INFO [datamodule.py:292] (4/8) Disable SpecAugment
10
+ 2024-08-06 03:26:54,041 INFO [datamodule.py:294] (4/8) About to create train dataset
11
+ 2024-08-06 03:26:54,041 INFO [datamodule.py:323] (4/8) Using DynamicBucketingSampler
12
+ 2024-08-06 03:26:54,651 INFO [datamodule.py:344] (4/8) About to create train dataloader
13
+ 2024-08-06 03:26:54,651 INFO [datamodule.py:367] (4/8) About to create dev dataset
14
+ 2024-08-06 03:26:54,977 INFO [datamodule.py:388] (4/8) About to create dev dataloader
libritts/log/log-train-2024-08-06-03-26-50-5 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-08-06 03:26:50,320 INFO [trainer.py:870] (5/8) Training started
2
+ 2024-08-06 03:26:50,321 INFO [trainer.py:889] (5/8) Device: cuda:5
3
+ 2024-08-06 03:26:50,321 INFO [trainer.py:890] (5/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '279b0c87015a615b81b147251814d737a548f397', 'k2-git-date': 'Wed May 24 22:24:09 2023', 'lhotse-version': '1.26.0', 'torch-version': '2.0.1+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'main', 'icefall-git-sha1': '7d2e5f4-dirty', 'icefall-git-date': 'Tue Aug 6 02:59:12 2024', 'icefall-path': '/workspace/icefall_llm', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/usr/local/lib/python3.10/dist-packages/lhotse/__init__.py', 'hostname': '6865771', 'IP address': '0.104.195.107'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('exp/valle'), 'optimizer_name': 'ScaledAdam', 'scheduler_name': 'Eden', 'base_lr': 0.03, 'warmup_steps': 200, 'seed': 42, 'inf_check': False, 'save_every_n': 1000, 'keep_last_k': 20, 'average_period': 0, 'accumulate_grad_steps': 1, 'dtype': 'bfloat16', 'filter_min_duration': 0.5, 'filter_max_duration': 14.0, 'train_stage': 1, 'visualize': False, 'oom_check': False, 'model_name': 'valle', 'decoder_dim': 1024, 'nhead': 16, 'num_decoder_layers': 12, 'scale_factor': 1.0, 'norm_first': True, 'add_prenet': False, 'prefix_mode': 1, 'share_embedding': True, 'prepend_bos': False, 'num_quantizers': 8, 'scaling_xformers': False, 'manifest_dir': PosixPath('data/tokenized'), 'max_duration': 320, 'bucketing_sampler': True, 'num_buckets': 6, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 0.1, 'on_the_fly_feats': False, 'shuffle': True, 'buffer_size': 40000, 'shuffle_buffer_size': 100000, 'drop_last': False, 'return_cuts': True, 'num_workers': 8, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'input_strategy': 'PrecomputedFeatures', 'dataset': 'libritts', 'text_tokens': 'data/tokenized/unique_text_tokens.k2symbols', 'sampling_rate': 24000}
4
+ 2024-08-06 03:26:50,321 INFO [trainer.py:892] (5/8) About to create model
5
+ 2024-08-06 03:26:51,111 INFO [trainer.py:899] (5/8) Number of model parameters: 367386628
6
+ 2024-08-06 03:26:51,974 INFO [trainer.py:914] (5/8) Using DDP
7
+ 2024-08-06 03:26:54,033 INFO [datamodule.py:427] (5/8) About to get train cuts
8
+ 2024-08-06 03:26:54,036 INFO [datamodule.py:434] (5/8) About to get dev cuts
9
+ 2024-08-06 03:26:54,037 INFO [datamodule.py:292] (5/8) Disable SpecAugment
10
+ 2024-08-06 03:26:54,037 INFO [datamodule.py:294] (5/8) About to create train dataset
11
+ 2024-08-06 03:26:54,038 INFO [datamodule.py:323] (5/8) Using DynamicBucketingSampler
12
+ 2024-08-06 03:26:54,648 INFO [datamodule.py:344] (5/8) About to create train dataloader
13
+ 2024-08-06 03:26:54,649 INFO [datamodule.py:367] (5/8) About to create dev dataset
14
+ 2024-08-06 03:26:54,978 INFO [datamodule.py:388] (5/8) About to create dev dataloader