jaymanvirk's picture
Upload folder using huggingface_hub
ada2a99 verified
[2024-05-17 07:48:45,981][00034] Saving configuration to /kaggle/working/train_dir/default_experiment/config.json...
[2024-05-17 07:48:45,983][00034] Rollout worker 0 uses device cpu
[2024-05-17 07:48:45,984][00034] Rollout worker 1 uses device cpu
[2024-05-17 07:48:45,985][00034] Rollout worker 2 uses device cpu
[2024-05-17 07:48:45,986][00034] Rollout worker 3 uses device cpu
[2024-05-17 07:48:45,987][00034] Rollout worker 4 uses device cpu
[2024-05-17 07:48:45,988][00034] Rollout worker 5 uses device cpu
[2024-05-17 07:48:45,989][00034] Rollout worker 6 uses device cpu
[2024-05-17 07:48:45,990][00034] Rollout worker 7 uses device cpu
[2024-05-17 07:48:46,151][00034] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-05-17 07:48:46,152][00034] InferenceWorker_p0-w0: min num requests: 2
[2024-05-17 07:48:46,191][00034] Starting all processes...
[2024-05-17 07:48:46,191][00034] Starting process learner_proc0
[2024-05-17 07:48:46,294][00034] Starting all processes...
[2024-05-17 07:48:46,302][00034] Starting process inference_proc0-0
[2024-05-17 07:48:46,302][00034] Starting process rollout_proc0
[2024-05-17 07:48:46,303][00034] Starting process rollout_proc1
[2024-05-17 07:48:46,303][00034] Starting process rollout_proc2
[2024-05-17 07:48:46,303][00034] Starting process rollout_proc3
[2024-05-17 07:48:46,303][00034] Starting process rollout_proc4
[2024-05-17 07:48:46,304][00034] Starting process rollout_proc5
[2024-05-17 07:48:46,304][00034] Starting process rollout_proc6
[2024-05-17 07:48:46,307][00034] Starting process rollout_proc7
[2024-05-17 07:48:55,292][00138] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-05-17 07:48:55,300][00138] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2024-05-17 07:48:55,349][00138] Num visible devices: 1
[2024-05-17 07:48:55,421][00138] Setting fixed seed 0
[2024-05-17 07:48:55,424][00138] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-05-17 07:48:55,424][00138] Initializing actor-critic model on device cuda:0
[2024-05-17 07:48:55,425][00138] RunningMeanStd input shape: (3, 72, 128)
[2024-05-17 07:48:55,428][00138] RunningMeanStd input shape: (1,)
[2024-05-17 07:48:55,474][00138] ConvEncoder: input_channels=3
[2024-05-17 07:48:55,799][00156] Worker 4 uses CPU cores [0]
[2024-05-17 07:48:55,807][00157] Worker 5 uses CPU cores [1]
[2024-05-17 07:48:55,881][00154] Worker 1 uses CPU cores [1]
[2024-05-17 07:48:55,990][00155] Worker 3 uses CPU cores [3]
[2024-05-17 07:48:55,991][00152] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-05-17 07:48:55,992][00152] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2024-05-17 07:48:56,017][00152] Num visible devices: 1
[2024-05-17 07:48:56,024][00158] Worker 6 uses CPU cores [2]
[2024-05-17 07:48:56,043][00138] Conv encoder output size: 512
[2024-05-17 07:48:56,044][00138] Policy head output size: 512
[2024-05-17 07:48:56,051][00159] Worker 7 uses CPU cores [3]
[2024-05-17 07:48:56,098][00138] Created Actor Critic model with architecture:
[2024-05-17 07:48:56,099][00138] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2024-05-17 07:48:56,110][00151] Worker 0 uses CPU cores [0]
[2024-05-17 07:48:56,167][00153] Worker 2 uses CPU cores [2]
[2024-05-17 07:48:56,387][00138] Using optimizer <class 'torch.optim.adam.Adam'>
[2024-05-17 07:48:58,437][00138] No checkpoints found
[2024-05-17 07:48:58,438][00138] Did not load from checkpoint, starting from scratch!
[2024-05-17 07:48:58,438][00138] Initialized policy 0 weights for model version 0
[2024-05-17 07:48:58,442][00138] LearnerWorker_p0 finished initialization!
[2024-05-17 07:48:58,442][00138] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-05-17 07:48:58,545][00152] RunningMeanStd input shape: (3, 72, 128)
[2024-05-17 07:48:58,546][00152] RunningMeanStd input shape: (1,)
[2024-05-17 07:48:58,562][00152] ConvEncoder: input_channels=3
[2024-05-17 07:48:58,690][00152] Conv encoder output size: 512
[2024-05-17 07:48:58,690][00152] Policy head output size: 512
[2024-05-17 07:48:58,746][00034] Inference worker 0-0 is ready!
[2024-05-17 07:48:58,747][00034] All inference workers are ready! Signal rollout workers to start!
[2024-05-17 07:48:58,858][00156] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-05-17 07:48:58,860][00158] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-05-17 07:48:58,860][00151] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-05-17 07:48:58,861][00155] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-05-17 07:48:58,861][00154] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-05-17 07:48:58,860][00153] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-05-17 07:48:58,862][00157] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-05-17 07:48:58,863][00159] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-05-17 07:48:59,758][00159] Decorrelating experience for 0 frames...
[2024-05-17 07:48:59,759][00157] Decorrelating experience for 0 frames...
[2024-05-17 07:49:00,224][00153] Decorrelating experience for 0 frames...
[2024-05-17 07:49:00,227][00158] Decorrelating experience for 0 frames...
[2024-05-17 07:49:00,248][00156] Decorrelating experience for 0 frames...
[2024-05-17 07:49:00,260][00151] Decorrelating experience for 0 frames...
[2024-05-17 07:49:00,592][00154] Decorrelating experience for 0 frames...
[2024-05-17 07:49:00,637][00157] Decorrelating experience for 32 frames...
[2024-05-17 07:49:00,689][00034] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-05-17 07:49:00,771][00156] Decorrelating experience for 32 frames...
[2024-05-17 07:49:00,793][00155] Decorrelating experience for 0 frames...
[2024-05-17 07:49:00,818][00153] Decorrelating experience for 32 frames...
[2024-05-17 07:49:01,443][00159] Decorrelating experience for 32 frames...
[2024-05-17 07:49:01,776][00158] Decorrelating experience for 32 frames...
[2024-05-17 07:49:01,789][00154] Decorrelating experience for 32 frames...
[2024-05-17 07:49:01,873][00157] Decorrelating experience for 64 frames...
[2024-05-17 07:49:01,907][00156] Decorrelating experience for 64 frames...
[2024-05-17 07:49:01,989][00151] Decorrelating experience for 32 frames...
[2024-05-17 07:49:02,021][00153] Decorrelating experience for 64 frames...
[2024-05-17 07:49:02,376][00159] Decorrelating experience for 64 frames...
[2024-05-17 07:49:02,492][00154] Decorrelating experience for 64 frames...
[2024-05-17 07:49:02,547][00158] Decorrelating experience for 64 frames...
[2024-05-17 07:49:02,722][00151] Decorrelating experience for 64 frames...
[2024-05-17 07:49:02,969][00156] Decorrelating experience for 96 frames...
[2024-05-17 07:49:03,003][00155] Decorrelating experience for 32 frames...
[2024-05-17 07:49:03,126][00154] Decorrelating experience for 96 frames...
[2024-05-17 07:49:03,509][00159] Decorrelating experience for 96 frames...
[2024-05-17 07:49:03,602][00153] Decorrelating experience for 96 frames...
[2024-05-17 07:49:03,920][00156] Decorrelating experience for 128 frames...
[2024-05-17 07:49:04,086][00157] Decorrelating experience for 96 frames...
[2024-05-17 07:49:04,088][00155] Decorrelating experience for 64 frames...
[2024-05-17 07:49:04,222][00151] Decorrelating experience for 96 frames...
[2024-05-17 07:49:04,637][00154] Decorrelating experience for 128 frames...
[2024-05-17 07:49:04,710][00158] Decorrelating experience for 96 frames...
[2024-05-17 07:49:05,172][00159] Decorrelating experience for 128 frames...
[2024-05-17 07:49:05,229][00155] Decorrelating experience for 96 frames...
[2024-05-17 07:49:05,394][00154] Decorrelating experience for 160 frames...
[2024-05-17 07:49:05,413][00153] Decorrelating experience for 128 frames...
[2024-05-17 07:49:05,689][00034] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-05-17 07:49:05,790][00151] Decorrelating experience for 128 frames...
[2024-05-17 07:49:05,826][00158] Decorrelating experience for 128 frames...
[2024-05-17 07:49:05,967][00157] Decorrelating experience for 128 frames...
[2024-05-17 07:49:06,142][00034] Heartbeat connected on Batcher_0
[2024-05-17 07:49:06,146][00034] Heartbeat connected on LearnerWorker_p0
[2024-05-17 07:49:06,178][00034] Heartbeat connected on InferenceWorker_p0-w0
[2024-05-17 07:49:06,458][00151] Decorrelating experience for 160 frames...
[2024-05-17 07:49:06,469][00158] Decorrelating experience for 160 frames...
[2024-05-17 07:49:06,491][00155] Decorrelating experience for 128 frames...
[2024-05-17 07:49:06,683][00154] Decorrelating experience for 192 frames...
[2024-05-17 07:49:06,852][00159] Decorrelating experience for 160 frames...
[2024-05-17 07:49:07,088][00151] Decorrelating experience for 192 frames...
[2024-05-17 07:49:07,236][00153] Decorrelating experience for 160 frames...
[2024-05-17 07:49:07,442][00155] Decorrelating experience for 160 frames...
[2024-05-17 07:49:07,605][00157] Decorrelating experience for 160 frames...
[2024-05-17 07:49:08,053][00154] Decorrelating experience for 224 frames...
[2024-05-17 07:49:08,103][00155] Decorrelating experience for 192 frames...
[2024-05-17 07:49:08,275][00153] Decorrelating experience for 192 frames...
[2024-05-17 07:49:08,277][00156] Decorrelating experience for 160 frames...
[2024-05-17 07:49:08,371][00151] Decorrelating experience for 224 frames...
[2024-05-17 07:49:08,414][00034] Heartbeat connected on RolloutWorker_w1
[2024-05-17 07:49:08,682][00158] Decorrelating experience for 192 frames...
[2024-05-17 07:49:08,807][00034] Heartbeat connected on RolloutWorker_w0
[2024-05-17 07:49:08,811][00157] Decorrelating experience for 192 frames...
[2024-05-17 07:49:09,285][00156] Decorrelating experience for 192 frames...
[2024-05-17 07:49:09,472][00159] Decorrelating experience for 192 frames...
[2024-05-17 07:49:09,512][00155] Decorrelating experience for 224 frames...
[2024-05-17 07:49:09,607][00157] Decorrelating experience for 224 frames...
[2024-05-17 07:49:09,850][00153] Decorrelating experience for 224 frames...
[2024-05-17 07:49:09,986][00034] Heartbeat connected on RolloutWorker_w3
[2024-05-17 07:49:10,081][00034] Heartbeat connected on RolloutWorker_w5
[2024-05-17 07:49:10,124][00158] Decorrelating experience for 224 frames...
[2024-05-17 07:49:10,309][00034] Heartbeat connected on RolloutWorker_w2
[2024-05-17 07:49:10,547][00034] Heartbeat connected on RolloutWorker_w6
[2024-05-17 07:49:10,689][00034] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-05-17 07:49:10,694][00034] Avg episode reward: [(0, '0.320')]
[2024-05-17 07:49:11,106][00159] Decorrelating experience for 224 frames...
[2024-05-17 07:49:11,130][00156] Decorrelating experience for 224 frames...
[2024-05-17 07:49:11,758][00034] Heartbeat connected on RolloutWorker_w4
[2024-05-17 07:49:11,784][00034] Heartbeat connected on RolloutWorker_w7
[2024-05-17 07:49:11,999][00138] Signal inference workers to stop experience collection...
[2024-05-17 07:49:12,018][00152] InferenceWorker_p0-w0: stopping experience collection
[2024-05-17 07:49:14,733][00138] Signal inference workers to resume experience collection...
[2024-05-17 07:49:14,734][00152] InferenceWorker_p0-w0: resuming experience collection
[2024-05-17 07:49:15,689][00034] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 183.7. Samples: 2756. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2024-05-17 07:49:15,693][00034] Avg episode reward: [(0, '1.804')]
[2024-05-17 07:49:19,322][00152] Updated weights for policy 0, policy_version 10 (0.0251)
[2024-05-17 07:49:20,689][00034] Fps is (10 sec: 4915.3, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 49152. Throughput: 0: 597.0. Samples: 11940. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 07:49:20,690][00034] Avg episode reward: [(0, '4.011')]
[2024-05-17 07:49:24,622][00152] Updated weights for policy 0, policy_version 20 (0.0022)
[2024-05-17 07:49:25,689][00034] Fps is (10 sec: 8192.3, 60 sec: 3440.7, 300 sec: 3440.7). Total num frames: 86016. Throughput: 0: 732.8. Samples: 18320. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2024-05-17 07:49:25,690][00034] Avg episode reward: [(0, '4.435')]
[2024-05-17 07:49:29,354][00152] Updated weights for policy 0, policy_version 30 (0.0022)
[2024-05-17 07:49:30,689][00034] Fps is (10 sec: 8191.9, 60 sec: 4369.1, 300 sec: 4369.1). Total num frames: 131072. Throughput: 0: 1017.6. Samples: 30528. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 07:49:30,691][00034] Avg episode reward: [(0, '4.370')]
[2024-05-17 07:49:30,740][00138] Saving new best policy, reward=4.370!
[2024-05-17 07:49:33,845][00152] Updated weights for policy 0, policy_version 40 (0.0018)
[2024-05-17 07:49:35,689][00034] Fps is (10 sec: 9011.2, 60 sec: 5032.3, 300 sec: 5032.3). Total num frames: 176128. Throughput: 0: 1258.1. Samples: 44032. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2024-05-17 07:49:35,693][00034] Avg episode reward: [(0, '4.667')]
[2024-05-17 07:49:35,695][00138] Saving new best policy, reward=4.667!
[2024-05-17 07:49:38,436][00152] Updated weights for policy 0, policy_version 50 (0.0018)
[2024-05-17 07:49:40,689][00034] Fps is (10 sec: 9011.2, 60 sec: 5529.6, 300 sec: 5529.6). Total num frames: 221184. Throughput: 0: 1266.4. Samples: 50656. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 07:49:40,691][00034] Avg episode reward: [(0, '4.410')]
[2024-05-17 07:49:43,130][00152] Updated weights for policy 0, policy_version 60 (0.0016)
[2024-05-17 07:49:45,689][00034] Fps is (10 sec: 9011.1, 60 sec: 5916.5, 300 sec: 5916.5). Total num frames: 266240. Throughput: 0: 1423.9. Samples: 64076. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 07:49:45,691][00034] Avg episode reward: [(0, '4.295')]
[2024-05-17 07:49:47,619][00152] Updated weights for policy 0, policy_version 70 (0.0031)
[2024-05-17 07:49:50,689][00034] Fps is (10 sec: 9011.2, 60 sec: 6225.9, 300 sec: 6225.9). Total num frames: 311296. Throughput: 0: 1723.1. Samples: 77540. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:49:50,690][00034] Avg episode reward: [(0, '4.570')]
[2024-05-17 07:49:52,265][00152] Updated weights for policy 0, policy_version 80 (0.0018)
[2024-05-17 07:49:55,689][00034] Fps is (10 sec: 9011.0, 60 sec: 6479.1, 300 sec: 6479.1). Total num frames: 356352. Throughput: 0: 1872.5. Samples: 84264. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0)
[2024-05-17 07:49:55,691][00034] Avg episode reward: [(0, '4.296')]
[2024-05-17 07:49:57,448][00152] Updated weights for policy 0, policy_version 90 (0.0018)
[2024-05-17 07:50:00,689][00034] Fps is (10 sec: 8192.1, 60 sec: 6553.6, 300 sec: 6553.6). Total num frames: 393216. Throughput: 0: 2081.1. Samples: 96404. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 07:50:00,693][00034] Avg episode reward: [(0, '4.117')]
[2024-05-17 07:50:02,010][00152] Updated weights for policy 0, policy_version 100 (0.0017)
[2024-05-17 07:50:05,689][00034] Fps is (10 sec: 8192.2, 60 sec: 7304.5, 300 sec: 6742.7). Total num frames: 438272. Throughput: 0: 2173.4. Samples: 109744. Policy #0 lag: (min: 0.0, avg: 0.9, max: 4.0)
[2024-05-17 07:50:05,691][00034] Avg episode reward: [(0, '4.429')]
[2024-05-17 07:50:06,555][00152] Updated weights for policy 0, policy_version 110 (0.0027)
[2024-05-17 07:50:10,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8055.5, 300 sec: 6904.7). Total num frames: 483328. Throughput: 0: 2180.5. Samples: 116444. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:50:10,691][00034] Avg episode reward: [(0, '4.652')]
[2024-05-17 07:50:11,060][00152] Updated weights for policy 0, policy_version 120 (0.0016)
[2024-05-17 07:50:15,538][00152] Updated weights for policy 0, policy_version 130 (0.0016)
[2024-05-17 07:50:15,689][00034] Fps is (10 sec: 9420.7, 60 sec: 8806.4, 300 sec: 7099.7). Total num frames: 532480. Throughput: 0: 2208.6. Samples: 129916. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:50:15,691][00034] Avg episode reward: [(0, '4.429')]
[2024-05-17 07:50:20,095][00152] Updated weights for policy 0, policy_version 140 (0.0019)
[2024-05-17 07:50:20,689][00034] Fps is (10 sec: 9420.7, 60 sec: 8806.4, 300 sec: 7219.2). Total num frames: 577536. Throughput: 0: 2214.0. Samples: 143664. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 07:50:20,691][00034] Avg episode reward: [(0, '4.503')]
[2024-05-17 07:50:24,653][00152] Updated weights for policy 0, policy_version 150 (0.0020)
[2024-05-17 07:50:25,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8942.9, 300 sec: 7324.6). Total num frames: 622592. Throughput: 0: 2217.1. Samples: 150424. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2024-05-17 07:50:25,690][00034] Avg episode reward: [(0, '4.471')]
[2024-05-17 07:50:29,647][00152] Updated weights for policy 0, policy_version 160 (0.0027)
[2024-05-17 07:50:30,689][00034] Fps is (10 sec: 8601.1, 60 sec: 8874.6, 300 sec: 7372.8). Total num frames: 663552. Throughput: 0: 2194.4. Samples: 162824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-05-17 07:50:30,691][00034] Avg episode reward: [(0, '4.550')]
[2024-05-17 07:50:34,132][00152] Updated weights for policy 0, policy_version 170 (0.0021)
[2024-05-17 07:50:35,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8874.7, 300 sec: 7459.0). Total num frames: 708608. Throughput: 0: 2196.2. Samples: 176368. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 07:50:35,690][00034] Avg episode reward: [(0, '4.494')]
[2024-05-17 07:50:38,589][00152] Updated weights for policy 0, policy_version 180 (0.0023)
[2024-05-17 07:50:40,689][00034] Fps is (10 sec: 9011.8, 60 sec: 8874.7, 300 sec: 7536.6). Total num frames: 753664. Throughput: 0: 2198.1. Samples: 183180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-05-17 07:50:40,691][00034] Avg episode reward: [(0, '4.694')]
[2024-05-17 07:50:40,698][00138] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000184_753664.pth...
[2024-05-17 07:50:40,791][00138] Saving new best policy, reward=4.694!
[2024-05-17 07:50:43,540][00152] Updated weights for policy 0, policy_version 190 (0.0019)
[2024-05-17 07:50:45,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8874.7, 300 sec: 7606.9). Total num frames: 798720. Throughput: 0: 2226.4. Samples: 196592. Policy #0 lag: (min: 0.0, avg: 1.2, max: 4.0)
[2024-05-17 07:50:45,691][00034] Avg episode reward: [(0, '4.803')]
[2024-05-17 07:50:45,693][00138] Saving new best policy, reward=4.803!
[2024-05-17 07:50:47,763][00152] Updated weights for policy 0, policy_version 200 (0.0018)
[2024-05-17 07:50:50,693][00034] Fps is (10 sec: 8598.4, 60 sec: 8805.9, 300 sec: 7633.2). Total num frames: 839680. Throughput: 0: 2233.3. Samples: 210252. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 07:50:50,697][00034] Avg episode reward: [(0, '4.870')]
[2024-05-17 07:50:50,738][00138] Saving new best policy, reward=4.870!
[2024-05-17 07:50:52,636][00152] Updated weights for policy 0, policy_version 210 (0.0029)
[2024-05-17 07:50:55,689][00034] Fps is (10 sec: 8601.3, 60 sec: 8806.4, 300 sec: 7693.3). Total num frames: 884736. Throughput: 0: 2224.8. Samples: 216560. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-05-17 07:50:55,691][00034] Avg episode reward: [(0, '4.898')]
[2024-05-17 07:50:55,695][00138] Saving new best policy, reward=4.898!
[2024-05-17 07:50:57,149][00152] Updated weights for policy 0, policy_version 220 (0.0023)
[2024-05-17 07:51:00,689][00034] Fps is (10 sec: 8604.7, 60 sec: 8874.7, 300 sec: 7714.1). Total num frames: 925696. Throughput: 0: 2219.9. Samples: 229812. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:51:00,692][00034] Avg episode reward: [(0, '5.032')]
[2024-05-17 07:51:00,711][00138] Saving new best policy, reward=5.032!
[2024-05-17 07:51:02,509][00152] Updated weights for policy 0, policy_version 230 (0.0021)
[2024-05-17 07:51:05,689][00034] Fps is (10 sec: 8192.1, 60 sec: 8806.4, 300 sec: 7733.2). Total num frames: 966656. Throughput: 0: 2179.6. Samples: 241748. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:51:05,691][00034] Avg episode reward: [(0, '5.285')]
[2024-05-17 07:51:05,730][00138] Saving new best policy, reward=5.285!
[2024-05-17 07:51:07,126][00152] Updated weights for policy 0, policy_version 240 (0.0023)
[2024-05-17 07:51:10,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8806.4, 300 sec: 7782.4). Total num frames: 1011712. Throughput: 0: 2177.3. Samples: 248404. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 07:51:10,693][00034] Avg episode reward: [(0, '5.420')]
[2024-05-17 07:51:10,747][00138] Saving new best policy, reward=5.420!
[2024-05-17 07:51:11,638][00152] Updated weights for policy 0, policy_version 250 (0.0016)
[2024-05-17 07:51:15,689][00034] Fps is (10 sec: 9011.4, 60 sec: 8738.1, 300 sec: 7827.9). Total num frames: 1056768. Throughput: 0: 2201.3. Samples: 261880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-05-17 07:51:15,691][00034] Avg episode reward: [(0, '5.655')]
[2024-05-17 07:51:15,692][00138] Saving new best policy, reward=5.655!
[2024-05-17 07:51:16,220][00152] Updated weights for policy 0, policy_version 260 (0.0029)
[2024-05-17 07:51:20,689][00034] Fps is (10 sec: 9010.8, 60 sec: 8738.1, 300 sec: 7870.2). Total num frames: 1101824. Throughput: 0: 2200.2. Samples: 275376. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2024-05-17 07:51:20,691][00034] Avg episode reward: [(0, '5.473')]
[2024-05-17 07:51:20,755][00152] Updated weights for policy 0, policy_version 270 (0.0022)
[2024-05-17 07:51:25,360][00152] Updated weights for policy 0, policy_version 280 (0.0016)
[2024-05-17 07:51:25,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 7909.5). Total num frames: 1146880. Throughput: 0: 2196.8. Samples: 282036. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-05-17 07:51:25,691][00034] Avg episode reward: [(0, '5.538')]
[2024-05-17 07:51:29,822][00152] Updated weights for policy 0, policy_version 290 (0.0018)
[2024-05-17 07:51:30,689][00034] Fps is (10 sec: 9011.6, 60 sec: 8806.5, 300 sec: 7946.2). Total num frames: 1191936. Throughput: 0: 2199.6. Samples: 295576. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2024-05-17 07:51:30,691][00034] Avg episode reward: [(0, '6.105')]
[2024-05-17 07:51:30,700][00138] Saving new best policy, reward=6.105!
[2024-05-17 07:51:34,962][00152] Updated weights for policy 0, policy_version 300 (0.0022)
[2024-05-17 07:51:35,689][00034] Fps is (10 sec: 8601.5, 60 sec: 8738.1, 300 sec: 7954.2). Total num frames: 1232896. Throughput: 0: 2169.4. Samples: 307868. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 07:51:35,691][00034] Avg episode reward: [(0, '5.997')]
[2024-05-17 07:51:39,587][00152] Updated weights for policy 0, policy_version 310 (0.0021)
[2024-05-17 07:51:40,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 7987.2). Total num frames: 1277952. Throughput: 0: 2181.0. Samples: 314704. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0)
[2024-05-17 07:51:40,692][00034] Avg episode reward: [(0, '6.933')]
[2024-05-17 07:51:40,702][00138] Saving new best policy, reward=6.933!
[2024-05-17 07:51:44,020][00152] Updated weights for policy 0, policy_version 320 (0.0016)
[2024-05-17 07:51:45,689][00034] Fps is (10 sec: 9011.3, 60 sec: 8738.1, 300 sec: 8018.2). Total num frames: 1323008. Throughput: 0: 2186.5. Samples: 328204. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0)
[2024-05-17 07:51:45,690][00034] Avg episode reward: [(0, '7.432')]
[2024-05-17 07:51:45,750][00138] Saving new best policy, reward=7.432!
[2024-05-17 07:51:48,451][00152] Updated weights for policy 0, policy_version 330 (0.0029)
[2024-05-17 07:51:50,689][00034] Fps is (10 sec: 9010.9, 60 sec: 8806.9, 300 sec: 8047.4). Total num frames: 1368064. Throughput: 0: 2223.9. Samples: 341824. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2024-05-17 07:51:50,691][00034] Avg episode reward: [(0, '7.213')]
[2024-05-17 07:51:53,154][00152] Updated weights for policy 0, policy_version 340 (0.0019)
[2024-05-17 07:51:55,689][00034] Fps is (10 sec: 9011.1, 60 sec: 8806.4, 300 sec: 8075.0). Total num frames: 1413120. Throughput: 0: 2225.0. Samples: 348528. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:51:55,691][00034] Avg episode reward: [(0, '8.107')]
[2024-05-17 07:51:55,692][00138] Saving new best policy, reward=8.107!
[2024-05-17 07:51:57,697][00152] Updated weights for policy 0, policy_version 350 (0.0028)
[2024-05-17 07:52:00,689][00034] Fps is (10 sec: 9011.4, 60 sec: 8874.7, 300 sec: 8101.0). Total num frames: 1458176. Throughput: 0: 2222.2. Samples: 361880. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 07:52:00,691][00034] Avg episode reward: [(0, '8.846')]
[2024-05-17 07:52:00,698][00138] Saving new best policy, reward=8.846!
[2024-05-17 07:52:02,227][00152] Updated weights for policy 0, policy_version 360 (0.0018)
[2024-05-17 07:52:05,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8874.7, 300 sec: 8103.4). Total num frames: 1499136. Throughput: 0: 2194.5. Samples: 374128. Policy #0 lag: (min: 0.0, avg: 0.8, max: 4.0)
[2024-05-17 07:52:05,690][00034] Avg episode reward: [(0, '8.167')]
[2024-05-17 07:52:07,524][00152] Updated weights for policy 0, policy_version 370 (0.0022)
[2024-05-17 07:52:10,689][00034] Fps is (10 sec: 8192.0, 60 sec: 8806.4, 300 sec: 8105.8). Total num frames: 1540096. Throughput: 0: 2192.9. Samples: 380716. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 07:52:10,691][00034] Avg episode reward: [(0, '9.629')]
[2024-05-17 07:52:10,741][00138] Saving new best policy, reward=9.629!
[2024-05-17 07:52:12,162][00152] Updated weights for policy 0, policy_version 380 (0.0022)
[2024-05-17 07:52:15,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8806.4, 300 sec: 8129.0). Total num frames: 1585152. Throughput: 0: 2182.0. Samples: 393768. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 07:52:15,691][00034] Avg episode reward: [(0, '10.320')]
[2024-05-17 07:52:15,693][00138] Saving new best policy, reward=10.320!
[2024-05-17 07:52:16,815][00152] Updated weights for policy 0, policy_version 390 (0.0016)
[2024-05-17 07:52:20,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8806.5, 300 sec: 8151.0). Total num frames: 1630208. Throughput: 0: 2205.7. Samples: 407124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-05-17 07:52:20,693][00034] Avg episode reward: [(0, '10.034')]
[2024-05-17 07:52:21,357][00152] Updated weights for policy 0, policy_version 400 (0.0023)
[2024-05-17 07:52:25,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8806.4, 300 sec: 8172.0). Total num frames: 1675264. Throughput: 0: 2200.5. Samples: 413728. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0)
[2024-05-17 07:52:25,691][00034] Avg episode reward: [(0, '9.527')]
[2024-05-17 07:52:26,015][00152] Updated weights for policy 0, policy_version 410 (0.0017)
[2024-05-17 07:52:30,617][00152] Updated weights for policy 0, policy_version 420 (0.0016)
[2024-05-17 07:52:30,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8806.4, 300 sec: 8192.0). Total num frames: 1720320. Throughput: 0: 2198.8. Samples: 427148. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:52:30,690][00034] Avg episode reward: [(0, '9.455')]
[2024-05-17 07:52:35,079][00152] Updated weights for policy 0, policy_version 430 (0.0027)
[2024-05-17 07:52:35,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8874.7, 300 sec: 8211.1). Total num frames: 1765376. Throughput: 0: 2195.8. Samples: 440636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2024-05-17 07:52:35,695][00034] Avg episode reward: [(0, '10.758')]
[2024-05-17 07:52:35,697][00138] Saving new best policy, reward=10.758!
[2024-05-17 07:52:40,154][00152] Updated weights for policy 0, policy_version 440 (0.0016)
[2024-05-17 07:52:40,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8806.4, 300 sec: 8210.6). Total num frames: 1806336. Throughput: 0: 2173.7. Samples: 446344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2024-05-17 07:52:40,690][00034] Avg episode reward: [(0, '12.389')]
[2024-05-17 07:52:40,699][00138] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000441_1806336.pth...
[2024-05-17 07:52:40,795][00138] Saving new best policy, reward=12.389!
[2024-05-17 07:52:44,623][00152] Updated weights for policy 0, policy_version 450 (0.0017)
[2024-05-17 07:52:45,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8806.4, 300 sec: 8228.4). Total num frames: 1851392. Throughput: 0: 2172.8. Samples: 459656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-05-17 07:52:45,693][00034] Avg episode reward: [(0, '12.146')]
[2024-05-17 07:52:49,092][00152] Updated weights for policy 0, policy_version 460 (0.0017)
[2024-05-17 07:52:50,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8738.2, 300 sec: 8227.6). Total num frames: 1892352. Throughput: 0: 2206.2. Samples: 473408. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2024-05-17 07:52:50,690][00034] Avg episode reward: [(0, '13.126')]
[2024-05-17 07:52:50,746][00138] Saving new best policy, reward=13.126!
[2024-05-17 07:52:53,688][00152] Updated weights for policy 0, policy_version 470 (0.0021)
[2024-05-17 07:52:55,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 8244.3). Total num frames: 1937408. Throughput: 0: 2208.5. Samples: 480100. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 07:52:55,693][00034] Avg episode reward: [(0, '14.880')]
[2024-05-17 07:52:55,718][00138] Saving new best policy, reward=14.880!
[2024-05-17 07:52:58,233][00152] Updated weights for policy 0, policy_version 480 (0.0016)
[2024-05-17 07:53:00,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8260.3). Total num frames: 1982464. Throughput: 0: 2222.4. Samples: 493776. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0)
[2024-05-17 07:53:00,691][00034] Avg episode reward: [(0, '13.821')]
[2024-05-17 07:53:02,812][00152] Updated weights for policy 0, policy_version 490 (0.0022)
[2024-05-17 07:53:05,689][00034] Fps is (10 sec: 9420.8, 60 sec: 8874.7, 300 sec: 8292.3). Total num frames: 2031616. Throughput: 0: 2225.2. Samples: 507256. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0)
[2024-05-17 07:53:05,690][00034] Avg episode reward: [(0, '13.640')]
[2024-05-17 07:53:07,434][00152] Updated weights for policy 0, policy_version 500 (0.0022)
[2024-05-17 07:53:10,689][00034] Fps is (10 sec: 9010.8, 60 sec: 8874.6, 300 sec: 8290.3). Total num frames: 2072576. Throughput: 0: 2221.6. Samples: 513700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 07:53:10,691][00034] Avg episode reward: [(0, '14.359')]
[2024-05-17 07:53:12,367][00152] Updated weights for policy 0, policy_version 510 (0.0016)
[2024-05-17 07:53:15,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8874.7, 300 sec: 8304.4). Total num frames: 2117632. Throughput: 0: 2206.7. Samples: 526448. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0)
[2024-05-17 07:53:15,691][00034] Avg episode reward: [(0, '14.960')]
[2024-05-17 07:53:15,693][00138] Saving new best policy, reward=14.960!
[2024-05-17 07:53:17,072][00152] Updated weights for policy 0, policy_version 520 (0.0021)
[2024-05-17 07:53:20,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8874.6, 300 sec: 8318.0). Total num frames: 2162688. Throughput: 0: 2208.0. Samples: 539996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 07:53:20,691][00034] Avg episode reward: [(0, '15.806')]
[2024-05-17 07:53:20,702][00138] Saving new best policy, reward=15.806!
[2024-05-17 07:53:21,579][00152] Updated weights for policy 0, policy_version 530 (0.0019)
[2024-05-17 07:53:25,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8874.7, 300 sec: 8331.1). Total num frames: 2207744. Throughput: 0: 2229.9. Samples: 546688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-05-17 07:53:25,691][00034] Avg episode reward: [(0, '14.104')]
[2024-05-17 07:53:25,836][00152] Updated weights for policy 0, policy_version 540 (0.0016)
[2024-05-17 07:53:30,511][00152] Updated weights for policy 0, policy_version 550 (0.0018)
[2024-05-17 07:53:30,689][00034] Fps is (10 sec: 9011.1, 60 sec: 8874.6, 300 sec: 8343.7). Total num frames: 2252800. Throughput: 0: 2240.2. Samples: 560468. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0)
[2024-05-17 07:53:30,692][00034] Avg episode reward: [(0, '15.263')]
[2024-05-17 07:53:34,939][00152] Updated weights for policy 0, policy_version 560 (0.0019)
[2024-05-17 07:53:35,689][00034] Fps is (10 sec: 9011.1, 60 sec: 8874.7, 300 sec: 8355.8). Total num frames: 2297856. Throughput: 0: 2238.5. Samples: 574140. Policy #0 lag: (min: 0.0, avg: 0.9, max: 4.0)
[2024-05-17 07:53:35,691][00034] Avg episode reward: [(0, '16.833')]
[2024-05-17 07:53:35,693][00138] Saving new best policy, reward=16.833!
[2024-05-17 07:53:39,382][00152] Updated weights for policy 0, policy_version 570 (0.0016)
[2024-05-17 07:53:40,691][00034] Fps is (10 sec: 8600.2, 60 sec: 8874.4, 300 sec: 8352.9). Total num frames: 2338816. Throughput: 0: 2238.6. Samples: 580844. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 07:53:40,698][00034] Avg episode reward: [(0, '16.987')]
[2024-05-17 07:53:40,735][00138] Saving new best policy, reward=16.987!
[2024-05-17 07:53:44,406][00152] Updated weights for policy 0, policy_version 580 (0.0023)
[2024-05-17 07:53:45,689][00034] Fps is (10 sec: 8601.7, 60 sec: 8874.7, 300 sec: 8364.5). Total num frames: 2383872. Throughput: 0: 2212.6. Samples: 593344. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-05-17 07:53:45,690][00034] Avg episode reward: [(0, '18.914')]
[2024-05-17 07:53:45,692][00138] Saving new best policy, reward=18.914!
[2024-05-17 07:53:48,981][00152] Updated weights for policy 0, policy_version 590 (0.0016)
[2024-05-17 07:53:50,689][00034] Fps is (10 sec: 9422.6, 60 sec: 9011.2, 300 sec: 8389.7). Total num frames: 2433024. Throughput: 0: 2217.7. Samples: 607052. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0)
[2024-05-17 07:53:50,691][00034] Avg episode reward: [(0, '18.751')]
[2024-05-17 07:53:53,447][00152] Updated weights for policy 0, policy_version 600 (0.0026)
[2024-05-17 07:53:55,689][00034] Fps is (10 sec: 9011.1, 60 sec: 8942.9, 300 sec: 8386.4). Total num frames: 2473984. Throughput: 0: 2225.6. Samples: 613852. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 07:53:55,691][00034] Avg episode reward: [(0, '18.102')]
[2024-05-17 07:53:58,043][00152] Updated weights for policy 0, policy_version 610 (0.0015)
[2024-05-17 07:54:00,689][00034] Fps is (10 sec: 8601.7, 60 sec: 8942.9, 300 sec: 8539.1). Total num frames: 2519040. Throughput: 0: 2244.7. Samples: 627460. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-05-17 07:54:00,691][00034] Avg episode reward: [(0, '18.811')]
[2024-05-17 07:54:02,858][00152] Updated weights for policy 0, policy_version 620 (0.0017)
[2024-05-17 07:54:05,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8874.7, 300 sec: 8691.9). Total num frames: 2564096. Throughput: 0: 2236.6. Samples: 640640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:54:05,691][00034] Avg episode reward: [(0, '21.229')]
[2024-05-17 07:54:05,693][00138] Saving new best policy, reward=21.229!
[2024-05-17 07:54:07,205][00152] Updated weights for policy 0, policy_version 630 (0.0016)
[2024-05-17 07:54:10,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8943.0, 300 sec: 8830.7). Total num frames: 2609152. Throughput: 0: 2236.4. Samples: 647328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 3.0)
[2024-05-17 07:54:10,691][00034] Avg episode reward: [(0, '20.949')]
[2024-05-17 07:54:11,977][00152] Updated weights for policy 0, policy_version 640 (0.0016)
[2024-05-17 07:54:15,689][00034] Fps is (10 sec: 8192.0, 60 sec: 8806.4, 300 sec: 8802.9). Total num frames: 2646016. Throughput: 0: 2201.1. Samples: 659516. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2024-05-17 07:54:15,691][00034] Avg episode reward: [(0, '19.127')]
[2024-05-17 07:54:17,027][00152] Updated weights for policy 0, policy_version 650 (0.0020)
[2024-05-17 07:54:20,689][00034] Fps is (10 sec: 8191.9, 60 sec: 8806.4, 300 sec: 8830.7). Total num frames: 2691072. Throughput: 0: 2182.8. Samples: 672368. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2024-05-17 07:54:20,691][00034] Avg episode reward: [(0, '20.624')]
[2024-05-17 07:54:21,905][00152] Updated weights for policy 0, policy_version 660 (0.0024)
[2024-05-17 07:54:25,689][00034] Fps is (10 sec: 9010.8, 60 sec: 8806.3, 300 sec: 8830.7). Total num frames: 2736128. Throughput: 0: 2177.4. Samples: 678824. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:54:25,691][00034] Avg episode reward: [(0, '20.268')]
[2024-05-17 07:54:26,580][00152] Updated weights for policy 0, policy_version 670 (0.0022)
[2024-05-17 07:54:30,689][00034] Fps is (10 sec: 8601.7, 60 sec: 8738.2, 300 sec: 8816.8). Total num frames: 2777088. Throughput: 0: 2184.6. Samples: 691652. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2024-05-17 07:54:30,691][00034] Avg episode reward: [(0, '19.978')]
[2024-05-17 07:54:31,466][00152] Updated weights for policy 0, policy_version 680 (0.0039)
[2024-05-17 07:54:35,689][00034] Fps is (10 sec: 8601.9, 60 sec: 8738.1, 300 sec: 8816.8). Total num frames: 2822144. Throughput: 0: 2169.8. Samples: 704692. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 07:54:35,691][00034] Avg episode reward: [(0, '21.276')]
[2024-05-17 07:54:35,692][00138] Saving new best policy, reward=21.276!
[2024-05-17 07:54:36,242][00152] Updated weights for policy 0, policy_version 690 (0.0016)
[2024-05-17 07:54:40,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8738.4, 300 sec: 8802.9). Total num frames: 2863104. Throughput: 0: 2161.5. Samples: 711120. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:54:40,691][00034] Avg episode reward: [(0, '21.449')]
[2024-05-17 07:54:40,699][00138] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000699_2863104.pth...
[2024-05-17 07:54:40,805][00138] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000184_753664.pth
[2024-05-17 07:54:40,815][00138] Saving new best policy, reward=21.449!
[2024-05-17 07:54:41,118][00152] Updated weights for policy 0, policy_version 700 (0.0017)
[2024-05-17 07:54:45,689][00034] Fps is (10 sec: 7782.4, 60 sec: 8601.6, 300 sec: 8775.2). Total num frames: 2899968. Throughput: 0: 2129.4. Samples: 723284. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 07:54:45,691][00034] Avg episode reward: [(0, '21.375')]
[2024-05-17 07:54:46,400][00152] Updated weights for policy 0, policy_version 710 (0.0020)
[2024-05-17 07:54:50,689][00034] Fps is (10 sec: 7782.4, 60 sec: 8465.1, 300 sec: 8761.3). Total num frames: 2940928. Throughput: 0: 2095.6. Samples: 734944. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2024-05-17 07:54:50,693][00034] Avg episode reward: [(0, '20.359')]
[2024-05-17 07:54:51,409][00152] Updated weights for policy 0, policy_version 720 (0.0021)
[2024-05-17 07:54:55,689][00034] Fps is (10 sec: 8601.5, 60 sec: 8533.3, 300 sec: 8789.0). Total num frames: 2985984. Throughput: 0: 2087.5. Samples: 741268. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:54:55,693][00034] Avg episode reward: [(0, '20.657')]
[2024-05-17 07:54:56,245][00152] Updated weights for policy 0, policy_version 730 (0.0024)
[2024-05-17 07:55:00,689][00034] Fps is (10 sec: 8601.5, 60 sec: 8465.1, 300 sec: 8775.2). Total num frames: 3026944. Throughput: 0: 2103.3. Samples: 754164. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0)
[2024-05-17 07:55:00,691][00034] Avg episode reward: [(0, '21.722')]
[2024-05-17 07:55:00,700][00138] Saving new best policy, reward=21.722!
[2024-05-17 07:55:01,221][00152] Updated weights for policy 0, policy_version 740 (0.0017)
[2024-05-17 07:55:05,689][00034] Fps is (10 sec: 8192.1, 60 sec: 8396.8, 300 sec: 8761.3). Total num frames: 3067904. Throughput: 0: 2101.2. Samples: 766924. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 07:55:05,690][00034] Avg episode reward: [(0, '23.079')]
[2024-05-17 07:55:05,693][00138] Saving new best policy, reward=23.079!
[2024-05-17 07:55:05,905][00152] Updated weights for policy 0, policy_version 750 (0.0016)
[2024-05-17 07:55:10,689][00034] Fps is (10 sec: 8192.0, 60 sec: 8328.5, 300 sec: 8733.5). Total num frames: 3108864. Throughput: 0: 2096.9. Samples: 773184. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0)
[2024-05-17 07:55:10,691][00034] Avg episode reward: [(0, '23.549')]
[2024-05-17 07:55:10,700][00138] Saving new best policy, reward=23.549!
[2024-05-17 07:55:10,898][00152] Updated weights for policy 0, policy_version 760 (0.0024)
[2024-05-17 07:55:15,440][00152] Updated weights for policy 0, policy_version 770 (0.0016)
[2024-05-17 07:55:15,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8465.1, 300 sec: 8733.5). Total num frames: 3153920. Throughput: 0: 2089.2. Samples: 785668. Policy #0 lag: (min: 0.0, avg: 0.9, max: 4.0)
[2024-05-17 07:55:15,693][00034] Avg episode reward: [(0, '25.232')]
[2024-05-17 07:55:15,695][00138] Saving new best policy, reward=25.232!
[2024-05-17 07:55:20,689][00034] Fps is (10 sec: 8192.0, 60 sec: 8328.5, 300 sec: 8705.7). Total num frames: 3190784. Throughput: 0: 2058.2. Samples: 797312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-05-17 07:55:20,694][00034] Avg episode reward: [(0, '26.393')]
[2024-05-17 07:55:20,704][00138] Saving new best policy, reward=26.393!
[2024-05-17 07:55:20,913][00152] Updated weights for policy 0, policy_version 780 (0.0019)
[2024-05-17 07:55:25,689][00034] Fps is (10 sec: 7782.5, 60 sec: 8260.3, 300 sec: 8705.8). Total num frames: 3231744. Throughput: 0: 2050.9. Samples: 803412. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 07:55:25,691][00034] Avg episode reward: [(0, '23.842')]
[2024-05-17 07:55:25,887][00152] Updated weights for policy 0, policy_version 790 (0.0024)
[2024-05-17 07:55:30,450][00152] Updated weights for policy 0, policy_version 800 (0.0020)
[2024-05-17 07:55:30,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8328.5, 300 sec: 8705.7). Total num frames: 3276800. Throughput: 0: 2073.4. Samples: 816588. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 07:55:30,693][00034] Avg episode reward: [(0, '24.722')]
[2024-05-17 07:55:35,053][00152] Updated weights for policy 0, policy_version 810 (0.0020)
[2024-05-17 07:55:35,689][00034] Fps is (10 sec: 9011.1, 60 sec: 8328.5, 300 sec: 8705.7). Total num frames: 3321856. Throughput: 0: 2113.1. Samples: 830032. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2024-05-17 07:55:35,691][00034] Avg episode reward: [(0, '26.841')]
[2024-05-17 07:55:35,693][00138] Saving new best policy, reward=26.841!
[2024-05-17 07:55:39,627][00152] Updated weights for policy 0, policy_version 820 (0.0031)
[2024-05-17 07:55:40,689][00034] Fps is (10 sec: 9010.9, 60 sec: 8396.8, 300 sec: 8705.7). Total num frames: 3366912. Throughput: 0: 2117.9. Samples: 836576. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0)
[2024-05-17 07:55:40,691][00034] Avg episode reward: [(0, '25.518')]
[2024-05-17 07:55:44,280][00152] Updated weights for policy 0, policy_version 830 (0.0017)
[2024-05-17 07:55:45,689][00034] Fps is (10 sec: 9011.0, 60 sec: 8533.3, 300 sec: 8719.7). Total num frames: 3411968. Throughput: 0: 2131.6. Samples: 850088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2024-05-17 07:55:45,691][00034] Avg episode reward: [(0, '22.746')]
[2024-05-17 07:55:49,187][00152] Updated weights for policy 0, policy_version 840 (0.0031)
[2024-05-17 07:55:50,689][00034] Fps is (10 sec: 8601.9, 60 sec: 8533.3, 300 sec: 8705.7). Total num frames: 3452928. Throughput: 0: 2121.1. Samples: 862372. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:55:50,693][00034] Avg episode reward: [(0, '23.744')]
[2024-05-17 07:55:53,741][00152] Updated weights for policy 0, policy_version 850 (0.0016)
[2024-05-17 07:55:55,689][00034] Fps is (10 sec: 8192.2, 60 sec: 8465.1, 300 sec: 8705.7). Total num frames: 3493888. Throughput: 0: 2129.6. Samples: 869016. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 07:55:55,691][00034] Avg episode reward: [(0, '24.842')]
[2024-05-17 07:55:58,305][00152] Updated weights for policy 0, policy_version 860 (0.0024)
[2024-05-17 07:56:00,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8533.3, 300 sec: 8719.6). Total num frames: 3538944. Throughput: 0: 2150.8. Samples: 882456. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2024-05-17 07:56:00,690][00034] Avg episode reward: [(0, '24.234')]
[2024-05-17 07:56:03,119][00152] Updated weights for policy 0, policy_version 870 (0.0023)
[2024-05-17 07:56:05,689][00034] Fps is (10 sec: 9011.3, 60 sec: 8601.6, 300 sec: 8719.6). Total num frames: 3584000. Throughput: 0: 2183.9. Samples: 895588. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 07:56:05,694][00034] Avg episode reward: [(0, '23.104')]
[2024-05-17 07:56:07,920][00152] Updated weights for policy 0, policy_version 880 (0.0023)
[2024-05-17 07:56:10,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8669.9, 300 sec: 8719.6). Total num frames: 3629056. Throughput: 0: 2196.5. Samples: 902256. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 07:56:10,693][00034] Avg episode reward: [(0, '22.968')]
[2024-05-17 07:56:12,272][00152] Updated weights for policy 0, policy_version 890 (0.0026)
[2024-05-17 07:56:15,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8669.9, 300 sec: 8719.6). Total num frames: 3674112. Throughput: 0: 2203.0. Samples: 915724. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0)
[2024-05-17 07:56:15,693][00034] Avg episode reward: [(0, '24.367')]
[2024-05-17 07:56:17,055][00152] Updated weights for policy 0, policy_version 900 (0.0020)
[2024-05-17 07:56:20,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 8705.7). Total num frames: 3715072. Throughput: 0: 2203.1. Samples: 929172. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2024-05-17 07:56:20,691][00034] Avg episode reward: [(0, '25.675')]
[2024-05-17 07:56:22,076][00152] Updated weights for policy 0, policy_version 910 (0.0017)
[2024-05-17 07:56:25,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8806.4, 300 sec: 8705.7). Total num frames: 3760128. Throughput: 0: 2181.2. Samples: 934728. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:56:25,691][00034] Avg episode reward: [(0, '25.285')]
[2024-05-17 07:56:26,448][00152] Updated weights for policy 0, policy_version 920 (0.0016)
[2024-05-17 07:56:30,689][00034] Fps is (10 sec: 9011.0, 60 sec: 8806.4, 300 sec: 8719.6). Total num frames: 3805184. Throughput: 0: 2178.2. Samples: 948108. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0)
[2024-05-17 07:56:30,691][00034] Avg episode reward: [(0, '25.278')]
[2024-05-17 07:56:31,323][00152] Updated weights for policy 0, policy_version 930 (0.0021)
[2024-05-17 07:56:35,664][00152] Updated weights for policy 0, policy_version 940 (0.0016)
[2024-05-17 07:56:35,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8806.4, 300 sec: 8719.6). Total num frames: 3850240. Throughput: 0: 2201.4. Samples: 961436. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 07:56:35,691][00034] Avg episode reward: [(0, '25.703')]
[2024-05-17 07:56:40,325][00152] Updated weights for policy 0, policy_version 950 (0.0023)
[2024-05-17 07:56:40,689][00034] Fps is (10 sec: 8601.4, 60 sec: 8738.1, 300 sec: 8705.7). Total num frames: 3891200. Throughput: 0: 2205.1. Samples: 968248. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:56:40,691][00034] Avg episode reward: [(0, '25.140')]
[2024-05-17 07:56:40,700][00138] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000950_3891200.pth...
[2024-05-17 07:56:40,804][00138] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000441_1806336.pth
[2024-05-17 07:56:44,827][00152] Updated weights for policy 0, policy_version 960 (0.0016)
[2024-05-17 07:56:45,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8806.4, 300 sec: 8719.6). Total num frames: 3940352. Throughput: 0: 2201.4. Samples: 981520. Policy #0 lag: (min: 0.0, avg: 0.8, max: 4.0)
[2024-05-17 07:56:45,693][00034] Avg episode reward: [(0, '23.722')]
[2024-05-17 07:56:49,418][00152] Updated weights for policy 0, policy_version 970 (0.0023)
[2024-05-17 07:56:50,689][00034] Fps is (10 sec: 9011.5, 60 sec: 8806.4, 300 sec: 8705.7). Total num frames: 3981312. Throughput: 0: 2215.9. Samples: 995304. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 07:56:50,691][00034] Avg episode reward: [(0, '26.203')]
[2024-05-17 07:56:54,352][00152] Updated weights for policy 0, policy_version 980 (0.0024)
[2024-05-17 07:56:55,689][00034] Fps is (10 sec: 8192.0, 60 sec: 8806.4, 300 sec: 8691.9). Total num frames: 4022272. Throughput: 0: 2208.4. Samples: 1001632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 07:56:55,694][00034] Avg episode reward: [(0, '24.705')]
[2024-05-17 07:56:59,034][00152] Updated weights for policy 0, policy_version 990 (0.0017)
[2024-05-17 07:57:00,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8806.4, 300 sec: 8705.7). Total num frames: 4067328. Throughput: 0: 2185.2. Samples: 1014060. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2024-05-17 07:57:00,691][00034] Avg episode reward: [(0, '27.169')]
[2024-05-17 07:57:00,699][00138] Saving new best policy, reward=27.169!
[2024-05-17 07:57:03,706][00152] Updated weights for policy 0, policy_version 1000 (0.0018)
[2024-05-17 07:57:05,689][00034] Fps is (10 sec: 9011.3, 60 sec: 8806.4, 300 sec: 8719.6). Total num frames: 4112384. Throughput: 0: 2182.3. Samples: 1027376. Policy #0 lag: (min: 0.0, avg: 1.2, max: 4.0)
[2024-05-17 07:57:05,691][00034] Avg episode reward: [(0, '28.066')]
[2024-05-17 07:57:05,693][00138] Saving new best policy, reward=28.066!
[2024-05-17 07:57:08,233][00152] Updated weights for policy 0, policy_version 1010 (0.0023)
[2024-05-17 07:57:10,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 8705.7). Total num frames: 4153344. Throughput: 0: 2211.5. Samples: 1034248. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0)
[2024-05-17 07:57:10,691][00034] Avg episode reward: [(0, '26.119')]
[2024-05-17 07:57:12,615][00152] Updated weights for policy 0, policy_version 1020 (0.0016)
[2024-05-17 07:57:15,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8806.4, 300 sec: 8719.6). Total num frames: 4202496. Throughput: 0: 2216.8. Samples: 1047864. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:57:15,691][00034] Avg episode reward: [(0, '25.665')]
[2024-05-17 07:57:17,288][00152] Updated weights for policy 0, policy_version 1030 (0.0026)
[2024-05-17 07:57:20,689][00034] Fps is (10 sec: 9420.8, 60 sec: 8874.7, 300 sec: 8719.6). Total num frames: 4247552. Throughput: 0: 2219.6. Samples: 1061316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0)
[2024-05-17 07:57:20,691][00034] Avg episode reward: [(0, '25.589')]
[2024-05-17 07:57:21,883][00152] Updated weights for policy 0, policy_version 1040 (0.0020)
[2024-05-17 07:57:25,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8806.4, 300 sec: 8705.7). Total num frames: 4288512. Throughput: 0: 2220.5. Samples: 1068168. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:57:25,691][00034] Avg episode reward: [(0, '26.035')]
[2024-05-17 07:57:26,892][00152] Updated weights for policy 0, policy_version 1050 (0.0015)
[2024-05-17 07:57:30,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8806.4, 300 sec: 8705.7). Total num frames: 4333568. Throughput: 0: 2204.5. Samples: 1080724. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:57:30,691][00034] Avg episode reward: [(0, '29.102')]
[2024-05-17 07:57:30,700][00138] Saving new best policy, reward=29.102!
[2024-05-17 07:57:31,268][00152] Updated weights for policy 0, policy_version 1060 (0.0019)
[2024-05-17 07:57:35,689][00034] Fps is (10 sec: 9010.6, 60 sec: 8806.3, 300 sec: 8719.6). Total num frames: 4378624. Throughput: 0: 2199.7. Samples: 1094292. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0)
[2024-05-17 07:57:35,694][00034] Avg episode reward: [(0, '28.493')]
[2024-05-17 07:57:35,933][00152] Updated weights for policy 0, policy_version 1070 (0.0017)
[2024-05-17 07:57:40,350][00152] Updated weights for policy 0, policy_version 1080 (0.0020)
[2024-05-17 07:57:40,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8874.7, 300 sec: 8719.6). Total num frames: 4423680. Throughput: 0: 2211.0. Samples: 1101128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 07:57:40,691][00034] Avg episode reward: [(0, '27.753')]
[2024-05-17 07:57:44,778][00152] Updated weights for policy 0, policy_version 1090 (0.0021)
[2024-05-17 07:57:45,689][00034] Fps is (10 sec: 9011.8, 60 sec: 8806.4, 300 sec: 8733.5). Total num frames: 4468736. Throughput: 0: 2237.6. Samples: 1114752. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 07:57:45,692][00034] Avg episode reward: [(0, '27.971')]
[2024-05-17 07:57:49,483][00152] Updated weights for policy 0, policy_version 1100 (0.0018)
[2024-05-17 07:57:50,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8874.7, 300 sec: 8733.5). Total num frames: 4513792. Throughput: 0: 2245.9. Samples: 1128440. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0)
[2024-05-17 07:57:50,694][00034] Avg episode reward: [(0, '28.880')]
[2024-05-17 07:57:53,898][00152] Updated weights for policy 0, policy_version 1110 (0.0020)
[2024-05-17 07:57:55,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8942.9, 300 sec: 8733.5). Total num frames: 4558848. Throughput: 0: 2245.2. Samples: 1135284. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:57:55,693][00034] Avg episode reward: [(0, '26.303')]
[2024-05-17 07:57:58,947][00152] Updated weights for policy 0, policy_version 1120 (0.0037)
[2024-05-17 07:58:00,689][00034] Fps is (10 sec: 8601.5, 60 sec: 8874.7, 300 sec: 8705.7). Total num frames: 4599808. Throughput: 0: 2211.9. Samples: 1147400. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 07:58:00,691][00034] Avg episode reward: [(0, '25.903')]
[2024-05-17 07:58:03,668][00152] Updated weights for policy 0, policy_version 1130 (0.0016)
[2024-05-17 07:58:05,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8874.7, 300 sec: 8719.6). Total num frames: 4644864. Throughput: 0: 2208.4. Samples: 1160692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-05-17 07:58:05,691][00034] Avg episode reward: [(0, '24.704')]
[2024-05-17 07:58:08,082][00152] Updated weights for policy 0, policy_version 1140 (0.0021)
[2024-05-17 07:58:10,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8942.9, 300 sec: 8719.6). Total num frames: 4689920. Throughput: 0: 2207.2. Samples: 1167492. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:58:10,691][00034] Avg episode reward: [(0, '25.414')]
[2024-05-17 07:58:12,680][00152] Updated weights for policy 0, policy_version 1150 (0.0021)
[2024-05-17 07:58:15,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8874.7, 300 sec: 8719.6). Total num frames: 4734976. Throughput: 0: 2220.8. Samples: 1180660. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:58:15,691][00034] Avg episode reward: [(0, '23.987')]
[2024-05-17 07:58:17,554][00152] Updated weights for policy 0, policy_version 1160 (0.0022)
[2024-05-17 07:58:20,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8874.7, 300 sec: 8719.6). Total num frames: 4780032. Throughput: 0: 2218.5. Samples: 1194124. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:58:20,691][00034] Avg episode reward: [(0, '22.686')]
[2024-05-17 07:58:22,071][00152] Updated weights for policy 0, policy_version 1170 (0.0023)
[2024-05-17 07:58:25,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8942.9, 300 sec: 8719.6). Total num frames: 4825088. Throughput: 0: 2215.1. Samples: 1200808. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 07:58:25,690][00034] Avg episode reward: [(0, '24.655')]
[2024-05-17 07:58:26,621][00152] Updated weights for policy 0, policy_version 1180 (0.0024)
[2024-05-17 07:58:30,689][00034] Fps is (10 sec: 8191.8, 60 sec: 8806.4, 300 sec: 8691.8). Total num frames: 4861952. Throughput: 0: 2206.3. Samples: 1214036. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 07:58:30,691][00034] Avg episode reward: [(0, '25.822')]
[2024-05-17 07:58:31,653][00152] Updated weights for policy 0, policy_version 1190 (0.0019)
[2024-05-17 07:58:35,689][00034] Fps is (10 sec: 8192.0, 60 sec: 8806.5, 300 sec: 8705.8). Total num frames: 4907008. Throughput: 0: 2177.5. Samples: 1226428. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2024-05-17 07:58:35,691][00034] Avg episode reward: [(0, '25.622')]
[2024-05-17 07:58:36,293][00152] Updated weights for policy 0, policy_version 1200 (0.0019)
[2024-05-17 07:58:40,689][00034] Fps is (10 sec: 9011.4, 60 sec: 8806.4, 300 sec: 8705.7). Total num frames: 4952064. Throughput: 0: 2175.9. Samples: 1233200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 07:58:40,693][00034] Avg episode reward: [(0, '23.203')]
[2024-05-17 07:58:40,702][00138] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001209_4952064.pth...
[2024-05-17 07:58:40,799][00138] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000699_2863104.pth
[2024-05-17 07:58:40,941][00152] Updated weights for policy 0, policy_version 1210 (0.0016)
[2024-05-17 07:58:45,580][00152] Updated weights for policy 0, policy_version 1220 (0.0018)
[2024-05-17 07:58:45,689][00034] Fps is (10 sec: 9011.1, 60 sec: 8806.4, 300 sec: 8691.9). Total num frames: 4997120. Throughput: 0: 2202.4. Samples: 1246508. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-05-17 07:58:45,691][00034] Avg episode reward: [(0, '23.400')]
[2024-05-17 07:58:50,019][00152] Updated weights for policy 0, policy_version 1230 (0.0016)
[2024-05-17 07:58:50,689][00034] Fps is (10 sec: 9011.3, 60 sec: 8806.4, 300 sec: 8705.7). Total num frames: 5042176. Throughput: 0: 2204.7. Samples: 1259904. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 07:58:50,690][00034] Avg episode reward: [(0, '25.706')]
[2024-05-17 07:58:54,636][00152] Updated weights for policy 0, policy_version 1240 (0.0026)
[2024-05-17 07:58:55,689][00034] Fps is (10 sec: 9011.3, 60 sec: 8806.4, 300 sec: 8705.7). Total num frames: 5087232. Throughput: 0: 2201.4. Samples: 1266556. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 07:58:55,691][00034] Avg episode reward: [(0, '29.762')]
[2024-05-17 07:58:55,692][00138] Saving new best policy, reward=29.762!
[2024-05-17 07:58:59,195][00152] Updated weights for policy 0, policy_version 1250 (0.0022)
[2024-05-17 07:59:00,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8806.4, 300 sec: 8691.9). Total num frames: 5128192. Throughput: 0: 2212.6. Samples: 1280228. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:59:00,691][00034] Avg episode reward: [(0, '26.799')]
[2024-05-17 07:59:04,314][00152] Updated weights for policy 0, policy_version 1260 (0.0016)
[2024-05-17 07:59:05,689][00034] Fps is (10 sec: 8601.3, 60 sec: 8806.3, 300 sec: 8691.8). Total num frames: 5173248. Throughput: 0: 2188.7. Samples: 1292616. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 07:59:05,691][00034] Avg episode reward: [(0, '26.059')]
[2024-05-17 07:59:08,915][00152] Updated weights for policy 0, policy_version 1270 (0.0021)
[2024-05-17 07:59:10,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8806.4, 300 sec: 8719.6). Total num frames: 5218304. Throughput: 0: 2191.8. Samples: 1299440. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 07:59:10,690][00034] Avg episode reward: [(0, '26.001')]
[2024-05-17 07:59:13,066][00152] Updated weights for policy 0, policy_version 1280 (0.0019)
[2024-05-17 07:59:15,689][00034] Fps is (10 sec: 9011.6, 60 sec: 8806.4, 300 sec: 8719.6). Total num frames: 5263360. Throughput: 0: 2203.3. Samples: 1313184. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:59:15,691][00034] Avg episode reward: [(0, '29.034')]
[2024-05-17 07:59:17,445][00152] Updated weights for policy 0, policy_version 1290 (0.0017)
[2024-05-17 07:59:20,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8806.4, 300 sec: 8719.6). Total num frames: 5308416. Throughput: 0: 2237.0. Samples: 1327092. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 07:59:20,690][00034] Avg episode reward: [(0, '28.008')]
[2024-05-17 07:59:21,956][00152] Updated weights for policy 0, policy_version 1300 (0.0020)
[2024-05-17 07:59:25,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8806.4, 300 sec: 8733.5). Total num frames: 5353472. Throughput: 0: 2237.2. Samples: 1333872. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-05-17 07:59:25,693][00034] Avg episode reward: [(0, '26.332')]
[2024-05-17 07:59:26,624][00152] Updated weights for policy 0, policy_version 1310 (0.0021)
[2024-05-17 07:59:30,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8943.0, 300 sec: 8733.5). Total num frames: 5398528. Throughput: 0: 2249.1. Samples: 1347716. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2024-05-17 07:59:30,691][00034] Avg episode reward: [(0, '26.478')]
[2024-05-17 07:59:31,113][00152] Updated weights for policy 0, policy_version 1320 (0.0025)
[2024-05-17 07:59:35,689][00034] Fps is (10 sec: 9010.9, 60 sec: 8942.9, 300 sec: 8747.4). Total num frames: 5443584. Throughput: 0: 2223.3. Samples: 1359952. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2024-05-17 07:59:35,691][00034] Avg episode reward: [(0, '26.523')]
[2024-05-17 07:59:36,186][00152] Updated weights for policy 0, policy_version 1330 (0.0016)
[2024-05-17 07:59:40,661][00152] Updated weights for policy 0, policy_version 1340 (0.0015)
[2024-05-17 07:59:40,689][00034] Fps is (10 sec: 9010.9, 60 sec: 8942.9, 300 sec: 8775.2). Total num frames: 5488640. Throughput: 0: 2228.7. Samples: 1366848. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0)
[2024-05-17 07:59:40,691][00034] Avg episode reward: [(0, '26.583')]
[2024-05-17 07:59:45,046][00152] Updated weights for policy 0, policy_version 1350 (0.0019)
[2024-05-17 07:59:45,689][00034] Fps is (10 sec: 9011.5, 60 sec: 8942.9, 300 sec: 8789.0). Total num frames: 5533696. Throughput: 0: 2227.1. Samples: 1380448. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0)
[2024-05-17 07:59:45,691][00034] Avg episode reward: [(0, '27.875')]
[2024-05-17 07:59:49,587][00152] Updated weights for policy 0, policy_version 1360 (0.0026)
[2024-05-17 07:59:50,689][00034] Fps is (10 sec: 9011.4, 60 sec: 8942.9, 300 sec: 8789.0). Total num frames: 5578752. Throughput: 0: 2256.6. Samples: 1394164. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2024-05-17 07:59:50,691][00034] Avg episode reward: [(0, '29.628')]
[2024-05-17 07:59:54,101][00152] Updated weights for policy 0, policy_version 1370 (0.0017)
[2024-05-17 07:59:55,689][00034] Fps is (10 sec: 9011.1, 60 sec: 8942.9, 300 sec: 8802.9). Total num frames: 5623808. Throughput: 0: 2254.6. Samples: 1400896. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2024-05-17 07:59:55,691][00034] Avg episode reward: [(0, '29.230')]
[2024-05-17 07:59:58,597][00152] Updated weights for policy 0, policy_version 1380 (0.0016)
[2024-05-17 08:00:00,689][00034] Fps is (10 sec: 9011.2, 60 sec: 9011.2, 300 sec: 8816.8). Total num frames: 5668864. Throughput: 0: 2253.2. Samples: 1414580. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 08:00:00,691][00034] Avg episode reward: [(0, '30.017')]
[2024-05-17 08:00:00,698][00138] Saving new best policy, reward=30.017!
[2024-05-17 08:00:03,214][00152] Updated weights for policy 0, policy_version 1390 (0.0017)
[2024-05-17 08:00:05,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8943.0, 300 sec: 8816.8). Total num frames: 5709824. Throughput: 0: 2233.5. Samples: 1427600. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 08:00:05,693][00034] Avg episode reward: [(0, '26.135')]
[2024-05-17 08:00:08,449][00152] Updated weights for policy 0, policy_version 1400 (0.0020)
[2024-05-17 08:00:10,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8942.9, 300 sec: 8816.8). Total num frames: 5754880. Throughput: 0: 2207.0. Samples: 1433188. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 08:00:10,693][00034] Avg episode reward: [(0, '25.799')]
[2024-05-17 08:00:13,049][00152] Updated weights for policy 0, policy_version 1410 (0.0024)
[2024-05-17 08:00:15,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8874.7, 300 sec: 8830.7). Total num frames: 5795840. Throughput: 0: 2198.1. Samples: 1446632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-05-17 08:00:15,691][00034] Avg episode reward: [(0, '25.767')]
[2024-05-17 08:00:17,573][00152] Updated weights for policy 0, policy_version 1420 (0.0017)
[2024-05-17 08:00:20,689][00034] Fps is (10 sec: 8601.5, 60 sec: 8874.7, 300 sec: 8844.6). Total num frames: 5840896. Throughput: 0: 2218.1. Samples: 1459764. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 08:00:20,691][00034] Avg episode reward: [(0, '26.049')]
[2024-05-17 08:00:22,312][00152] Updated weights for policy 0, policy_version 1430 (0.0022)
[2024-05-17 08:00:25,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8874.7, 300 sec: 8844.6). Total num frames: 5885952. Throughput: 0: 2206.6. Samples: 1466144. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 08:00:25,692][00034] Avg episode reward: [(0, '28.780')]
[2024-05-17 08:00:26,997][00152] Updated weights for policy 0, policy_version 1440 (0.0017)
[2024-05-17 08:00:30,689][00034] Fps is (10 sec: 8601.2, 60 sec: 8806.3, 300 sec: 8830.7). Total num frames: 5926912. Throughput: 0: 2198.0. Samples: 1479360. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 08:00:30,691][00034] Avg episode reward: [(0, '28.778')]
[2024-05-17 08:00:31,669][00152] Updated weights for policy 0, policy_version 1450 (0.0022)
[2024-05-17 08:00:35,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8806.4, 300 sec: 8830.7). Total num frames: 5971968. Throughput: 0: 2177.2. Samples: 1492140. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 08:00:35,690][00034] Avg episode reward: [(0, '30.809')]
[2024-05-17 08:00:35,692][00138] Saving new best policy, reward=30.809!
[2024-05-17 08:00:36,605][00152] Updated weights for policy 0, policy_version 1460 (0.0025)
[2024-05-17 08:00:40,689][00034] Fps is (10 sec: 8192.4, 60 sec: 8669.9, 300 sec: 8802.9). Total num frames: 6008832. Throughput: 0: 2159.1. Samples: 1498056. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2024-05-17 08:00:40,691][00034] Avg episode reward: [(0, '28.495')]
[2024-05-17 08:00:40,699][00138] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001467_6008832.pth...
[2024-05-17 08:00:40,803][00138] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000950_3891200.pth
[2024-05-17 08:00:41,821][00152] Updated weights for policy 0, policy_version 1470 (0.0019)
[2024-05-17 08:00:45,689][00034] Fps is (10 sec: 8192.0, 60 sec: 8669.9, 300 sec: 8816.8). Total num frames: 6053888. Throughput: 0: 2137.7. Samples: 1510776. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 08:00:45,691][00034] Avg episode reward: [(0, '28.879')]
[2024-05-17 08:00:46,517][00152] Updated weights for policy 0, policy_version 1480 (0.0017)
[2024-05-17 08:00:50,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8816.8). Total num frames: 6094848. Throughput: 0: 2142.8. Samples: 1524028. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 08:00:50,690][00034] Avg episode reward: [(0, '28.911')]
[2024-05-17 08:00:51,097][00152] Updated weights for policy 0, policy_version 1490 (0.0023)
[2024-05-17 08:00:55,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8816.8). Total num frames: 6139904. Throughput: 0: 2158.5. Samples: 1530320. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 08:00:55,691][00034] Avg episode reward: [(0, '27.169')]
[2024-05-17 08:00:55,942][00152] Updated weights for policy 0, policy_version 1500 (0.0029)
[2024-05-17 08:01:00,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8533.3, 300 sec: 8802.9). Total num frames: 6180864. Throughput: 0: 2142.7. Samples: 1543052. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0)
[2024-05-17 08:01:00,691][00034] Avg episode reward: [(0, '26.885')]
[2024-05-17 08:01:00,928][00152] Updated weights for policy 0, policy_version 1510 (0.0023)
[2024-05-17 08:01:05,689][00034] Fps is (10 sec: 8192.0, 60 sec: 8533.3, 300 sec: 8789.0). Total num frames: 6221824. Throughput: 0: 2127.0. Samples: 1555480. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2024-05-17 08:01:05,692][00034] Avg episode reward: [(0, '28.733')]
[2024-05-17 08:01:05,773][00152] Updated weights for policy 0, policy_version 1520 (0.0016)
[2024-05-17 08:01:10,664][00152] Updated weights for policy 0, policy_version 1530 (0.0021)
[2024-05-17 08:01:10,689][00034] Fps is (10 sec: 8601.7, 60 sec: 8533.3, 300 sec: 8789.0). Total num frames: 6266880. Throughput: 0: 2133.7. Samples: 1562160. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 08:01:10,691][00034] Avg episode reward: [(0, '30.975')]
[2024-05-17 08:01:10,703][00138] Saving new best policy, reward=30.975!
[2024-05-17 08:01:15,551][00152] Updated weights for policy 0, policy_version 1540 (0.0017)
[2024-05-17 08:01:15,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8533.3, 300 sec: 8789.0). Total num frames: 6307840. Throughput: 0: 2102.4. Samples: 1573968. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 08:01:15,691][00034] Avg episode reward: [(0, '31.391')]
[2024-05-17 08:01:15,692][00138] Saving new best policy, reward=31.391!
[2024-05-17 08:01:20,217][00152] Updated weights for policy 0, policy_version 1550 (0.0022)
[2024-05-17 08:01:20,689][00034] Fps is (10 sec: 8192.0, 60 sec: 8465.1, 300 sec: 8775.2). Total num frames: 6348800. Throughput: 0: 2112.4. Samples: 1587200. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-05-17 08:01:20,690][00034] Avg episode reward: [(0, '29.620')]
[2024-05-17 08:01:24,744][00152] Updated weights for policy 0, policy_version 1560 (0.0022)
[2024-05-17 08:01:25,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8533.3, 300 sec: 8789.0). Total num frames: 6397952. Throughput: 0: 2128.0. Samples: 1593816. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2024-05-17 08:01:25,690][00034] Avg episode reward: [(0, '30.596')]
[2024-05-17 08:01:29,474][00152] Updated weights for policy 0, policy_version 1570 (0.0022)
[2024-05-17 08:01:30,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8533.4, 300 sec: 8775.2). Total num frames: 6438912. Throughput: 0: 2148.4. Samples: 1607456. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 08:01:30,691][00034] Avg episode reward: [(0, '29.943')]
[2024-05-17 08:01:33,890][00152] Updated weights for policy 0, policy_version 1580 (0.0017)
[2024-05-17 08:01:35,689][00034] Fps is (10 sec: 9011.3, 60 sec: 8601.6, 300 sec: 8802.9). Total num frames: 6488064. Throughput: 0: 2150.4. Samples: 1620796. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 08:01:35,691][00034] Avg episode reward: [(0, '28.181')]
[2024-05-17 08:01:38,326][00152] Updated weights for policy 0, policy_version 1590 (0.0017)
[2024-05-17 08:01:40,689][00034] Fps is (10 sec: 9420.6, 60 sec: 8738.1, 300 sec: 8789.0). Total num frames: 6533120. Throughput: 0: 2165.3. Samples: 1627760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-05-17 08:01:40,691][00034] Avg episode reward: [(0, '27.831')]
[2024-05-17 08:01:43,603][00152] Updated weights for policy 0, policy_version 1600 (0.0017)
[2024-05-17 08:01:45,689][00034] Fps is (10 sec: 8192.0, 60 sec: 8601.6, 300 sec: 8775.2). Total num frames: 6569984. Throughput: 0: 2156.7. Samples: 1640104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-05-17 08:01:45,693][00034] Avg episode reward: [(0, '27.194')]
[2024-05-17 08:01:48,148][00152] Updated weights for policy 0, policy_version 1610 (0.0017)
[2024-05-17 08:01:50,689][00034] Fps is (10 sec: 8191.8, 60 sec: 8669.8, 300 sec: 8789.0). Total num frames: 6615040. Throughput: 0: 2176.8. Samples: 1653436. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-05-17 08:01:50,691][00034] Avg episode reward: [(0, '28.021')]
[2024-05-17 08:01:52,733][00152] Updated weights for policy 0, policy_version 1620 (0.0022)
[2024-05-17 08:01:55,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8669.9, 300 sec: 8789.0). Total num frames: 6660096. Throughput: 0: 2172.2. Samples: 1659908. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-05-17 08:01:55,691][00034] Avg episode reward: [(0, '27.233')]
[2024-05-17 08:01:57,615][00152] Updated weights for policy 0, policy_version 1630 (0.0017)
[2024-05-17 08:02:00,689][00034] Fps is (10 sec: 8602.0, 60 sec: 8669.9, 300 sec: 8775.2). Total num frames: 6701056. Throughput: 0: 2200.0. Samples: 1672968. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 08:02:00,691][00034] Avg episode reward: [(0, '25.505')]
[2024-05-17 08:02:02,338][00152] Updated weights for policy 0, policy_version 1640 (0.0017)
[2024-05-17 08:02:05,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 8789.0). Total num frames: 6746112. Throughput: 0: 2196.9. Samples: 1686060. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-05-17 08:02:05,691][00034] Avg episode reward: [(0, '24.926')]
[2024-05-17 08:02:06,873][00152] Updated weights for policy 0, policy_version 1650 (0.0016)
[2024-05-17 08:02:10,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8775.2). Total num frames: 6791168. Throughput: 0: 2197.0. Samples: 1692680. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 08:02:10,694][00034] Avg episode reward: [(0, '25.038')]
[2024-05-17 08:02:11,346][00152] Updated weights for policy 0, policy_version 1660 (0.0017)
[2024-05-17 08:02:15,689][00034] Fps is (10 sec: 8192.0, 60 sec: 8669.9, 300 sec: 8747.4). Total num frames: 6828032. Throughput: 0: 2182.0. Samples: 1705644. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 08:02:15,693][00034] Avg episode reward: [(0, '26.361')]
[2024-05-17 08:02:16,689][00152] Updated weights for policy 0, policy_version 1670 (0.0017)
[2024-05-17 08:02:20,689][00034] Fps is (10 sec: 8192.0, 60 sec: 8738.1, 300 sec: 8761.3). Total num frames: 6873088. Throughput: 0: 2162.6. Samples: 1718112. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 08:02:20,691][00034] Avg episode reward: [(0, '27.903')]
[2024-05-17 08:02:21,456][00152] Updated weights for policy 0, policy_version 1680 (0.0017)
[2024-05-17 08:02:25,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8669.9, 300 sec: 8761.3). Total num frames: 6918144. Throughput: 0: 2146.5. Samples: 1724352. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 08:02:25,691][00034] Avg episode reward: [(0, '31.411')]
[2024-05-17 08:02:25,695][00138] Saving new best policy, reward=31.411!
[2024-05-17 08:02:26,164][00152] Updated weights for policy 0, policy_version 1690 (0.0025)
[2024-05-17 08:02:30,689][00034] Fps is (10 sec: 8601.7, 60 sec: 8669.9, 300 sec: 8747.4). Total num frames: 6959104. Throughput: 0: 2162.0. Samples: 1737392. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 08:02:30,690][00034] Avg episode reward: [(0, '29.759')]
[2024-05-17 08:02:30,886][00152] Updated weights for policy 0, policy_version 1700 (0.0024)
[2024-05-17 08:02:35,502][00152] Updated weights for policy 0, policy_version 1710 (0.0018)
[2024-05-17 08:02:35,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8747.4). Total num frames: 7004160. Throughput: 0: 2156.0. Samples: 1750456. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 08:02:35,693][00034] Avg episode reward: [(0, '26.940')]
[2024-05-17 08:02:40,171][00152] Updated weights for policy 0, policy_version 1720 (0.0027)
[2024-05-17 08:02:40,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8601.6, 300 sec: 8747.4). Total num frames: 7049216. Throughput: 0: 2161.4. Samples: 1757172. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 08:02:40,691][00034] Avg episode reward: [(0, '24.672')]
[2024-05-17 08:02:40,700][00138] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001721_7049216.pth...
[2024-05-17 08:02:40,803][00138] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001209_4952064.pth
[2024-05-17 08:02:44,756][00152] Updated weights for policy 0, policy_version 1730 (0.0019)
[2024-05-17 08:02:45,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8733.5). Total num frames: 7090176. Throughput: 0: 2164.5. Samples: 1770372. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 08:02:45,694][00034] Avg episode reward: [(0, '24.847')]
[2024-05-17 08:02:50,059][00152] Updated weights for policy 0, policy_version 1740 (0.0016)
[2024-05-17 08:02:50,689][00034] Fps is (10 sec: 8192.0, 60 sec: 8601.7, 300 sec: 8719.6). Total num frames: 7131136. Throughput: 0: 2140.9. Samples: 1782400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 08:02:50,693][00034] Avg episode reward: [(0, '25.950')]
[2024-05-17 08:02:54,714][00152] Updated weights for policy 0, policy_version 1750 (0.0025)
[2024-05-17 08:02:55,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8733.5). Total num frames: 7176192. Throughput: 0: 2137.3. Samples: 1788860. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2024-05-17 08:02:55,691][00034] Avg episode reward: [(0, '26.782')]
[2024-05-17 08:02:59,299][00152] Updated weights for policy 0, policy_version 1760 (0.0016)
[2024-05-17 08:03:00,689][00034] Fps is (10 sec: 8601.2, 60 sec: 8601.5, 300 sec: 8719.6). Total num frames: 7217152. Throughput: 0: 2144.7. Samples: 1802156. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 08:03:00,693][00034] Avg episode reward: [(0, '28.157')]
[2024-05-17 08:03:04,148][00152] Updated weights for policy 0, policy_version 1770 (0.0023)
[2024-05-17 08:03:05,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8719.6). Total num frames: 7262208. Throughput: 0: 2156.8. Samples: 1815168. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 08:03:05,691][00034] Avg episode reward: [(0, '28.326')]
[2024-05-17 08:03:08,735][00152] Updated weights for policy 0, policy_version 1780 (0.0022)
[2024-05-17 08:03:10,689][00034] Fps is (10 sec: 9011.6, 60 sec: 8601.6, 300 sec: 8719.6). Total num frames: 7307264. Throughput: 0: 2164.9. Samples: 1821772. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 08:03:10,691][00034] Avg episode reward: [(0, '30.040')]
[2024-05-17 08:03:13,440][00152] Updated weights for policy 0, policy_version 1790 (0.0027)
[2024-05-17 08:03:15,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8705.7). Total num frames: 7348224. Throughput: 0: 2165.3. Samples: 1834832. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 08:03:15,694][00034] Avg episode reward: [(0, '30.263')]
[2024-05-17 08:03:18,178][00152] Updated weights for policy 0, policy_version 1800 (0.0023)
[2024-05-17 08:03:20,689][00034] Fps is (10 sec: 8192.0, 60 sec: 8601.6, 300 sec: 8691.8). Total num frames: 7389184. Throughput: 0: 2146.0. Samples: 1847028. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-05-17 08:03:20,691][00034] Avg episode reward: [(0, '31.222')]
[2024-05-17 08:03:23,196][00152] Updated weights for policy 0, policy_version 1810 (0.0022)
[2024-05-17 08:03:25,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8719.6). Total num frames: 7434240. Throughput: 0: 2139.8. Samples: 1853464. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2024-05-17 08:03:25,691][00034] Avg episode reward: [(0, '32.035')]
[2024-05-17 08:03:25,693][00138] Saving new best policy, reward=32.035!
[2024-05-17 08:03:27,882][00152] Updated weights for policy 0, policy_version 1820 (0.0019)
[2024-05-17 08:03:30,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8669.9, 300 sec: 8719.6). Total num frames: 7479296. Throughput: 0: 2142.8. Samples: 1866800. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2024-05-17 08:03:30,690][00034] Avg episode reward: [(0, '31.902')]
[2024-05-17 08:03:32,363][00152] Updated weights for policy 0, policy_version 1830 (0.0016)
[2024-05-17 08:03:35,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8669.9, 300 sec: 8719.6). Total num frames: 7524352. Throughput: 0: 2167.6. Samples: 1879944. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0)
[2024-05-17 08:03:35,691][00034] Avg episode reward: [(0, '30.681')]
[2024-05-17 08:03:37,182][00152] Updated weights for policy 0, policy_version 1840 (0.0024)
[2024-05-17 08:03:40,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8705.7). Total num frames: 7565312. Throughput: 0: 2173.0. Samples: 1886644. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2024-05-17 08:03:40,690][00034] Avg episode reward: [(0, '29.943')]
[2024-05-17 08:03:41,871][00152] Updated weights for policy 0, policy_version 1850 (0.0022)
[2024-05-17 08:03:45,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8705.7). Total num frames: 7610368. Throughput: 0: 2170.1. Samples: 1899808. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2024-05-17 08:03:45,691][00034] Avg episode reward: [(0, '29.418')]
[2024-05-17 08:03:46,393][00152] Updated weights for policy 0, policy_version 1860 (0.0026)
[2024-05-17 08:03:50,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8691.9). Total num frames: 7651328. Throughput: 0: 2175.6. Samples: 1913068. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0)
[2024-05-17 08:03:50,691][00034] Avg episode reward: [(0, '27.806')]
[2024-05-17 08:03:51,695][00152] Updated weights for policy 0, policy_version 1870 (0.0019)
[2024-05-17 08:03:55,689][00034] Fps is (10 sec: 8601.3, 60 sec: 8669.8, 300 sec: 8705.7). Total num frames: 7696384. Throughput: 0: 2153.4. Samples: 1918676. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 08:03:55,694][00034] Avg episode reward: [(0, '27.532')]
[2024-05-17 08:03:56,253][00152] Updated weights for policy 0, policy_version 1880 (0.0017)
[2024-05-17 08:04:00,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8691.9). Total num frames: 7737344. Throughput: 0: 2156.6. Samples: 1931880. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0)
[2024-05-17 08:04:00,691][00034] Avg episode reward: [(0, '26.566')]
[2024-05-17 08:04:01,050][00152] Updated weights for policy 0, policy_version 1890 (0.0016)
[2024-05-17 08:04:05,655][00152] Updated weights for policy 0, policy_version 1900 (0.0017)
[2024-05-17 08:04:05,689][00034] Fps is (10 sec: 8601.9, 60 sec: 8669.9, 300 sec: 8691.9). Total num frames: 7782400. Throughput: 0: 2178.7. Samples: 1945068. Policy #0 lag: (min: 0.0, avg: 1.3, max: 4.0)
[2024-05-17 08:04:05,690][00034] Avg episode reward: [(0, '24.649')]
[2024-05-17 08:04:10,308][00152] Updated weights for policy 0, policy_version 1910 (0.0027)
[2024-05-17 08:04:10,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8678.0). Total num frames: 7823360. Throughput: 0: 2181.4. Samples: 1951628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0)
[2024-05-17 08:04:10,690][00034] Avg episode reward: [(0, '26.513')]
[2024-05-17 08:04:14,690][00152] Updated weights for policy 0, policy_version 1920 (0.0016)
[2024-05-17 08:04:15,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8678.0). Total num frames: 7868416. Throughput: 0: 2178.4. Samples: 1964828. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0)
[2024-05-17 08:04:15,691][00034] Avg episode reward: [(0, '26.229')]
[2024-05-17 08:04:19,314][00152] Updated weights for policy 0, policy_version 1930 (0.0016)
[2024-05-17 08:04:20,689][00034] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8678.0). Total num frames: 7913472. Throughput: 0: 2185.4. Samples: 1978288. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0)
[2024-05-17 08:04:20,694][00034] Avg episode reward: [(0, '26.098')]
[2024-05-17 08:04:24,619][00152] Updated weights for policy 0, policy_version 1940 (0.0017)
[2024-05-17 08:04:25,689][00034] Fps is (10 sec: 8601.3, 60 sec: 8669.8, 300 sec: 8664.1). Total num frames: 7954432. Throughput: 0: 2165.8. Samples: 1984104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-05-17 08:04:25,691][00034] Avg episode reward: [(0, '26.085')]
[2024-05-17 08:04:29,325][00152] Updated weights for policy 0, policy_version 1950 (0.0017)
[2024-05-17 08:04:30,689][00034] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8664.1). Total num frames: 7999488. Throughput: 0: 2154.0. Samples: 1996740. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-05-17 08:04:30,691][00034] Avg episode reward: [(0, '26.789')]
[2024-05-17 08:04:31,603][00138] Stopping Batcher_0...
[2024-05-17 08:04:31,605][00138] Loop batcher_evt_loop terminating...
[2024-05-17 08:04:31,606][00138] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
[2024-05-17 08:04:31,603][00034] Component Batcher_0 stopped!
[2024-05-17 08:04:31,618][00153] Stopping RolloutWorker_w2...
[2024-05-17 08:04:31,619][00034] Component RolloutWorker_w2 stopped!
[2024-05-17 08:04:31,619][00153] Loop rollout_proc2_evt_loop terminating...
[2024-05-17 08:04:31,635][00155] Stopping RolloutWorker_w3...
[2024-05-17 08:04:31,636][00034] Component RolloutWorker_w3 stopped!
[2024-05-17 08:04:31,640][00159] Stopping RolloutWorker_w7...
[2024-05-17 08:04:31,637][00155] Loop rollout_proc3_evt_loop terminating...
[2024-05-17 08:04:31,640][00034] Component RolloutWorker_w7 stopped!
[2024-05-17 08:04:31,643][00159] Loop rollout_proc7_evt_loop terminating...
[2024-05-17 08:04:31,644][00034] Component RolloutWorker_w0 stopped!
[2024-05-17 08:04:31,648][00034] Component RolloutWorker_w4 stopped!
[2024-05-17 08:04:31,642][00151] Stopping RolloutWorker_w0...
[2024-05-17 08:04:31,644][00156] Stopping RolloutWorker_w4...
[2024-05-17 08:04:31,655][00151] Loop rollout_proc0_evt_loop terminating...
[2024-05-17 08:04:31,655][00156] Loop rollout_proc4_evt_loop terminating...
[2024-05-17 08:04:31,665][00154] Stopping RolloutWorker_w1...
[2024-05-17 08:04:31,663][00034] Component RolloutWorker_w6 stopped!
[2024-05-17 08:04:31,668][00158] Stopping RolloutWorker_w6...
[2024-05-17 08:04:31,667][00034] Component RolloutWorker_w1 stopped!
[2024-05-17 08:04:31,672][00158] Loop rollout_proc6_evt_loop terminating...
[2024-05-17 08:04:31,669][00154] Loop rollout_proc1_evt_loop terminating...
[2024-05-17 08:04:31,681][00157] Stopping RolloutWorker_w5...
[2024-05-17 08:04:31,681][00157] Loop rollout_proc5_evt_loop terminating...
[2024-05-17 08:04:31,681][00034] Component RolloutWorker_w5 stopped!
[2024-05-17 08:04:31,678][00152] Weights refcount: 2 0
[2024-05-17 08:04:31,685][00152] Stopping InferenceWorker_p0-w0...
[2024-05-17 08:04:31,685][00152] Loop inference_proc0-0_evt_loop terminating...
[2024-05-17 08:04:31,685][00034] Component InferenceWorker_p0-w0 stopped!
[2024-05-17 08:04:31,738][00138] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001467_6008832.pth
[2024-05-17 08:04:31,748][00138] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
[2024-05-17 08:04:31,878][00138] Stopping LearnerWorker_p0...
[2024-05-17 08:04:31,881][00138] Loop learner_proc0_evt_loop terminating...
[2024-05-17 08:04:31,878][00034] Component LearnerWorker_p0 stopped!
[2024-05-17 08:04:31,883][00034] Waiting for process learner_proc0 to stop...
[2024-05-17 08:04:33,319][00034] Waiting for process inference_proc0-0 to join...
[2024-05-17 08:04:33,324][00034] Waiting for process rollout_proc0 to join...
[2024-05-17 08:04:33,739][00034] Waiting for process rollout_proc1 to join...
[2024-05-17 08:04:33,740][00034] Waiting for process rollout_proc2 to join...
[2024-05-17 08:04:33,742][00034] Waiting for process rollout_proc3 to join...
[2024-05-17 08:04:33,743][00034] Waiting for process rollout_proc4 to join...
[2024-05-17 08:04:33,744][00034] Waiting for process rollout_proc5 to join...
[2024-05-17 08:04:33,745][00034] Waiting for process rollout_proc6 to join...
[2024-05-17 08:04:33,746][00034] Waiting for process rollout_proc7 to join...
[2024-05-17 08:04:33,747][00034] Batcher 0 profile tree view:
batching: 43.9480, releasing_batches: 0.0631
[2024-05-17 08:04:33,748][00034] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 274.7907
update_model: 10.3614
weight_update: 0.0018
one_step: 0.0029
handle_policy_step: 606.2618
deserialize: 21.6837, stack: 3.4947, obs_to_device_normalize: 133.6802, forward: 310.9525, send_messages: 24.9377
prepare_outputs: 81.5748
to_cpu: 49.2411
[2024-05-17 08:04:33,749][00034] Learner 0 profile tree view:
misc: 0.0118, prepare_batch: 22.7175
train: 120.7019
epoch_init: 0.0158, minibatch_init: 0.0142, losses_postprocess: 1.0961, kl_divergence: 0.6949, after_optimizer: 58.3429
calculate_losses: 40.3449
losses_init: 0.0103, forward_head: 1.6195, bptt_initial: 26.4830, tail: 1.7402, advantages_returns: 0.4923, losses: 6.2020
bptt: 3.2489
bptt_forward_core: 3.0791
update: 19.1653
clip: 1.8770
[2024-05-17 08:04:33,750][00034] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.4658, enqueue_policy_requests: 26.4736, env_step: 783.4645, overhead: 17.5642, complete_rollouts: 3.5811
save_policy_outputs: 31.1710
split_output_tensors: 11.2507
[2024-05-17 08:04:33,751][00034] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.4440, enqueue_policy_requests: 26.2389, env_step: 780.6597, overhead: 17.8043, complete_rollouts: 3.6695
save_policy_outputs: 31.4153
split_output_tensors: 11.5644
[2024-05-17 08:04:33,753][00034] Loop Runner_EvtLoop terminating...
[2024-05-17 08:04:33,754][00034] Runner profile tree view:
main_loop: 947.5639
[2024-05-17 08:04:33,755][00034] Collected {0: 8007680}, FPS: 8450.8
[2024-05-17 08:04:58,205][00034] Loading existing experiment configuration from /kaggle/working/train_dir/default_experiment/config.json
[2024-05-17 08:04:58,206][00034] Overriding arg 'num_workers' with value 1 passed from command line
[2024-05-17 08:04:58,207][00034] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-05-17 08:04:58,208][00034] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-05-17 08:04:58,209][00034] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-05-17 08:04:58,210][00034] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-05-17 08:04:58,211][00034] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-05-17 08:04:58,213][00034] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-05-17 08:04:58,214][00034] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-05-17 08:04:58,215][00034] Adding new argument 'hf_repository'='jaymanvirk/ppo_sample_factory_doom_health_gathering_supreme' that is not in the saved config file!
[2024-05-17 08:04:58,216][00034] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-05-17 08:04:58,217][00034] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-05-17 08:04:58,218][00034] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-05-17 08:04:58,219][00034] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-05-17 08:04:58,220][00034] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-05-17 08:04:58,251][00034] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-05-17 08:04:58,254][00034] RunningMeanStd input shape: (3, 72, 128)
[2024-05-17 08:04:58,256][00034] RunningMeanStd input shape: (1,)
[2024-05-17 08:04:58,275][00034] ConvEncoder: input_channels=3
[2024-05-17 08:04:58,397][00034] Conv encoder output size: 512
[2024-05-17 08:04:58,399][00034] Policy head output size: 512
[2024-05-17 08:04:58,618][00034] Loading state from checkpoint /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
[2024-05-17 08:04:59,478][00034] Num frames 100...
[2024-05-17 08:04:59,617][00034] Num frames 200...
[2024-05-17 08:04:59,753][00034] Num frames 300...
[2024-05-17 08:04:59,914][00034] Num frames 400...
[2024-05-17 08:05:00,079][00034] Num frames 500...
[2024-05-17 08:05:00,217][00034] Num frames 600...
[2024-05-17 08:05:00,356][00034] Num frames 700...
[2024-05-17 08:05:00,503][00034] Num frames 800...
[2024-05-17 08:05:00,641][00034] Num frames 900...
[2024-05-17 08:05:00,782][00034] Num frames 1000...
[2024-05-17 08:05:00,934][00034] Num frames 1100...
[2024-05-17 08:05:01,091][00034] Avg episode rewards: #0: 27.680, true rewards: #0: 11.680
[2024-05-17 08:05:01,093][00034] Avg episode reward: 27.680, avg true_objective: 11.680
[2024-05-17 08:05:01,137][00034] Num frames 1200...
[2024-05-17 08:05:01,278][00034] Num frames 1300...
[2024-05-17 08:05:01,424][00034] Num frames 1400...
[2024-05-17 08:05:01,567][00034] Num frames 1500...
[2024-05-17 08:05:01,714][00034] Num frames 1600...
[2024-05-17 08:05:01,857][00034] Num frames 1700...
[2024-05-17 08:05:01,999][00034] Num frames 1800...
[2024-05-17 08:05:02,150][00034] Num frames 1900...
[2024-05-17 08:05:02,303][00034] Num frames 2000...
[2024-05-17 08:05:02,449][00034] Num frames 2100...
[2024-05-17 08:05:02,588][00034] Num frames 2200...
[2024-05-17 08:05:02,731][00034] Num frames 2300...
[2024-05-17 08:05:02,872][00034] Num frames 2400...
[2024-05-17 08:05:03,046][00034] Avg episode rewards: #0: 29.425, true rewards: #0: 12.425
[2024-05-17 08:05:03,047][00034] Avg episode reward: 29.425, avg true_objective: 12.425
[2024-05-17 08:05:03,068][00034] Num frames 2500...
[2024-05-17 08:05:03,205][00034] Num frames 2600...
[2024-05-17 08:05:03,349][00034] Num frames 2700...
[2024-05-17 08:05:03,495][00034] Num frames 2800...
[2024-05-17 08:05:03,647][00034] Num frames 2900...
[2024-05-17 08:05:03,793][00034] Num frames 3000...
[2024-05-17 08:05:03,936][00034] Num frames 3100...
[2024-05-17 08:05:04,075][00034] Num frames 3200...
[2024-05-17 08:05:04,219][00034] Num frames 3300...
[2024-05-17 08:05:04,359][00034] Num frames 3400...
[2024-05-17 08:05:04,512][00034] Num frames 3500...
[2024-05-17 08:05:04,660][00034] Num frames 3600...
[2024-05-17 08:05:04,808][00034] Num frames 3700...
[2024-05-17 08:05:04,951][00034] Num frames 3800...
[2024-05-17 08:05:05,092][00034] Num frames 3900...
[2024-05-17 08:05:05,231][00034] Num frames 4000...
[2024-05-17 08:05:05,429][00034] Avg episode rewards: #0: 32.660, true rewards: #0: 13.660
[2024-05-17 08:05:05,431][00034] Avg episode reward: 32.660, avg true_objective: 13.660
[2024-05-17 08:05:05,434][00034] Num frames 4100...
[2024-05-17 08:05:05,581][00034] Num frames 4200...
[2024-05-17 08:05:05,726][00034] Num frames 4300...
[2024-05-17 08:05:05,871][00034] Num frames 4400...
[2024-05-17 08:05:06,018][00034] Num frames 4500...
[2024-05-17 08:05:06,164][00034] Num frames 4600...
[2024-05-17 08:05:06,308][00034] Num frames 4700...
[2024-05-17 08:05:06,447][00034] Num frames 4800...
[2024-05-17 08:05:06,588][00034] Num frames 4900...
[2024-05-17 08:05:06,722][00034] Num frames 5000...
[2024-05-17 08:05:06,900][00034] Avg episode rewards: #0: 29.975, true rewards: #0: 12.725
[2024-05-17 08:05:06,901][00034] Avg episode reward: 29.975, avg true_objective: 12.725
[2024-05-17 08:05:06,915][00034] Num frames 5100...
[2024-05-17 08:05:07,050][00034] Num frames 5200...
[2024-05-17 08:05:07,191][00034] Num frames 5300...
[2024-05-17 08:05:07,332][00034] Num frames 5400...
[2024-05-17 08:05:07,478][00034] Num frames 5500...
[2024-05-17 08:05:07,625][00034] Num frames 5600...
[2024-05-17 08:05:07,766][00034] Num frames 5700...
[2024-05-17 08:05:07,913][00034] Num frames 5800...
[2024-05-17 08:05:08,062][00034] Num frames 5900...
[2024-05-17 08:05:08,207][00034] Num frames 6000...
[2024-05-17 08:05:08,376][00034] Avg episode rewards: #0: 29.164, true rewards: #0: 12.164
[2024-05-17 08:05:08,378][00034] Avg episode reward: 29.164, avg true_objective: 12.164
[2024-05-17 08:05:08,404][00034] Num frames 6100...
[2024-05-17 08:05:08,544][00034] Num frames 6200...
[2024-05-17 08:05:08,693][00034] Num frames 6300...
[2024-05-17 08:05:08,835][00034] Num frames 6400...
[2024-05-17 08:05:08,970][00034] Num frames 6500...
[2024-05-17 08:05:09,109][00034] Num frames 6600...
[2024-05-17 08:05:09,250][00034] Num frames 6700...
[2024-05-17 08:05:09,387][00034] Num frames 6800...
[2024-05-17 08:05:09,525][00034] Num frames 6900...
[2024-05-17 08:05:09,668][00034] Num frames 7000...
[2024-05-17 08:05:09,823][00034] Avg episode rewards: #0: 27.457, true rewards: #0: 11.790
[2024-05-17 08:05:09,824][00034] Avg episode reward: 27.457, avg true_objective: 11.790
[2024-05-17 08:05:09,860][00034] Num frames 7100...
[2024-05-17 08:05:10,009][00034] Num frames 7200...
[2024-05-17 08:05:10,159][00034] Num frames 7300...
[2024-05-17 08:05:10,301][00034] Num frames 7400...
[2024-05-17 08:05:10,441][00034] Num frames 7500...
[2024-05-17 08:05:10,585][00034] Num frames 7600...
[2024-05-17 08:05:10,727][00034] Num frames 7700...
[2024-05-17 08:05:10,864][00034] Num frames 7800...
[2024-05-17 08:05:10,999][00034] Num frames 7900...
[2024-05-17 08:05:11,140][00034] Num frames 8000...
[2024-05-17 08:05:11,284][00034] Num frames 8100...
[2024-05-17 08:05:11,429][00034] Num frames 8200...
[2024-05-17 08:05:11,565][00034] Num frames 8300...
[2024-05-17 08:05:11,742][00034] Avg episode rewards: #0: 27.980, true rewards: #0: 11.980
[2024-05-17 08:05:11,744][00034] Avg episode reward: 27.980, avg true_objective: 11.980
[2024-05-17 08:05:11,764][00034] Num frames 8400...
[2024-05-17 08:05:11,903][00034] Num frames 8500...
[2024-05-17 08:05:12,042][00034] Num frames 8600...
[2024-05-17 08:05:12,183][00034] Num frames 8700...
[2024-05-17 08:05:12,323][00034] Num frames 8800...
[2024-05-17 08:05:12,464][00034] Num frames 8900...
[2024-05-17 08:05:12,602][00034] Num frames 9000...
[2024-05-17 08:05:12,739][00034] Num frames 9100...
[2024-05-17 08:05:12,886][00034] Avg episode rewards: #0: 26.333, true rewards: #0: 11.457
[2024-05-17 08:05:12,887][00034] Avg episode reward: 26.333, avg true_objective: 11.457
[2024-05-17 08:05:12,935][00034] Num frames 9200...
[2024-05-17 08:05:13,077][00034] Num frames 9300...
[2024-05-17 08:05:13,216][00034] Num frames 9400...
[2024-05-17 08:05:13,357][00034] Num frames 9500...
[2024-05-17 08:05:13,500][00034] Num frames 9600...
[2024-05-17 08:05:13,643][00034] Num frames 9700...
[2024-05-17 08:05:13,783][00034] Num frames 9800...
[2024-05-17 08:05:13,884][00034] Avg episode rewards: #0: 24.926, true rewards: #0: 10.926
[2024-05-17 08:05:13,885][00034] Avg episode reward: 24.926, avg true_objective: 10.926
[2024-05-17 08:05:13,977][00034] Num frames 9900...
[2024-05-17 08:05:14,117][00034] Num frames 10000...
[2024-05-17 08:05:14,256][00034] Num frames 10100...
[2024-05-17 08:05:14,403][00034] Num frames 10200...
[2024-05-17 08:05:14,552][00034] Num frames 10300...
[2024-05-17 08:05:14,695][00034] Num frames 10400...
[2024-05-17 08:05:14,809][00034] Avg episode rewards: #0: 23.641, true rewards: #0: 10.441
[2024-05-17 08:05:14,810][00034] Avg episode reward: 23.641, avg true_objective: 10.441
[2024-05-17 08:05:51,871][00034] Replay video saved to /kaggle/working/train_dir/default_experiment/replay.mp4!