diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -1,50 +1,49 @@ -[2024-07-04 18:09:12,563][02159] Saving configuration to /content/train_dir/default_experiment/config.json... -[2024-07-04 18:09:12,565][02159] Rollout worker 0 uses device cpu -[2024-07-04 18:09:12,566][02159] Rollout worker 1 uses device cpu -[2024-07-04 18:09:12,567][02159] Rollout worker 2 uses device cpu -[2024-07-04 18:09:12,568][02159] Rollout worker 3 uses device cpu -[2024-07-04 18:09:12,571][02159] Rollout worker 4 uses device cpu -[2024-07-04 18:09:12,571][02159] Rollout worker 5 uses device cpu -[2024-07-04 18:09:12,573][02159] Rollout worker 6 uses device cpu -[2024-07-04 18:09:12,574][02159] Rollout worker 7 uses device cpu -[2024-07-04 18:09:12,672][02159] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-07-04 18:09:12,673][02159] InferenceWorker_p0-w0: min num requests: 2 -[2024-07-04 18:09:12,706][02159] Starting all processes... -[2024-07-04 18:09:12,707][02159] Starting process learner_proc0 -[2024-07-04 18:09:14,405][02159] Starting all processes... -[2024-07-04 18:09:14,411][02159] Starting process inference_proc0-0 -[2024-07-04 18:09:14,411][02159] Starting process rollout_proc0 -[2024-07-04 18:09:14,412][02159] Starting process rollout_proc1 -[2024-07-04 18:09:14,413][02159] Starting process rollout_proc2 -[2024-07-04 18:09:14,413][02159] Starting process rollout_proc3 -[2024-07-04 18:09:14,414][02159] Starting process rollout_proc4 -[2024-07-04 18:09:14,414][02159] Starting process rollout_proc5 -[2024-07-04 18:09:14,416][02159] Starting process rollout_proc6 -[2024-07-04 18:09:14,420][02159] Starting process rollout_proc7 -[2024-07-04 18:09:17,137][04783] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] -[2024-07-04 18:09:17,169][04785] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] -[2024-07-04 18:09:17,196][04768] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-07-04 18:09:17,197][04768] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2024-07-04 18:09:17,215][04768] Num visible devices: 1 -[2024-07-04 18:09:17,250][04768] Starting seed is not provided -[2024-07-04 18:09:17,251][04768] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-07-04 18:09:17,251][04768] Initializing actor-critic model on device cuda:0 -[2024-07-04 18:09:17,252][04768] RunningMeanStd input shape: (3, 72, 128) -[2024-07-04 18:09:17,254][04768] RunningMeanStd input shape: (1,) -[2024-07-04 18:09:17,275][04768] ConvEncoder: input_channels=3 -[2024-07-04 18:09:17,375][04786] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] -[2024-07-04 18:09:17,477][04782] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] -[2024-07-04 18:09:17,483][04784] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] -[2024-07-04 18:09:17,532][04789] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] -[2024-07-04 18:09:17,536][04781] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-07-04 18:09:17,536][04781] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2024-07-04 18:09:17,538][04788] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] -[2024-07-04 18:09:17,552][04781] Num visible devices: 1 -[2024-07-04 18:09:17,563][04768] Conv encoder output size: 512 -[2024-07-04 18:09:17,564][04768] Policy head output size: 512 -[2024-07-04 18:09:17,614][04787] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] -[2024-07-04 18:09:17,618][04768] Created Actor Critic model with architecture: -[2024-07-04 18:09:17,618][04768] ActorCriticSharedWeights( +[2024-07-04 19:21:09,875][02883] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-07-04 19:21:09,878][02883] Rollout worker 0 uses device cpu +[2024-07-04 19:21:09,879][02883] Rollout worker 1 uses device cpu +[2024-07-04 19:21:09,881][02883] Rollout worker 2 uses device cpu +[2024-07-04 19:21:09,882][02883] Rollout worker 3 uses device cpu +[2024-07-04 19:21:09,883][02883] Rollout worker 4 uses device cpu +[2024-07-04 19:21:09,885][02883] Rollout worker 5 uses device cpu +[2024-07-04 19:21:09,886][02883] Rollout worker 6 uses device cpu +[2024-07-04 19:21:09,887][02883] Rollout worker 7 uses device cpu +[2024-07-04 19:21:09,990][02883] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-07-04 19:21:09,991][02883] InferenceWorker_p0-w0: min num requests: 2 +[2024-07-04 19:21:10,025][02883] Starting all processes... +[2024-07-04 19:21:10,026][02883] Starting process learner_proc0 +[2024-07-04 19:21:11,743][02883] Starting all processes... +[2024-07-04 19:21:11,749][02883] Starting process inference_proc0-0 +[2024-07-04 19:21:11,749][02883] Starting process rollout_proc0 +[2024-07-04 19:21:11,750][02883] Starting process rollout_proc1 +[2024-07-04 19:21:11,751][02883] Starting process rollout_proc2 +[2024-07-04 19:21:11,751][02883] Starting process rollout_proc3 +[2024-07-04 19:21:11,752][02883] Starting process rollout_proc4 +[2024-07-04 19:21:11,754][02883] Starting process rollout_proc5 +[2024-07-04 19:21:11,758][02883] Starting process rollout_proc6 +[2024-07-04 19:21:11,758][02883] Starting process rollout_proc7 +[2024-07-04 19:21:14,347][04982] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] +[2024-07-04 19:21:14,562][04967] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-07-04 19:21:14,563][04967] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-07-04 19:21:14,581][04967] Num visible devices: 1 +[2024-07-04 19:21:14,609][04967] Starting seed is not provided +[2024-07-04 19:21:14,611][04967] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-07-04 19:21:14,611][04967] Initializing actor-critic model on device cuda:0 +[2024-07-04 19:21:14,612][04967] RunningMeanStd input shape: (3, 72, 128) +[2024-07-04 19:21:14,615][04967] RunningMeanStd input shape: (1,) +[2024-07-04 19:21:14,635][04967] ConvEncoder: input_channels=3 +[2024-07-04 19:21:14,652][04988] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] +[2024-07-04 19:21:14,771][04981] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] +[2024-07-04 19:21:14,783][04987] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] +[2024-07-04 19:21:14,816][04984] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] +[2024-07-04 19:21:14,860][04983] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] +[2024-07-04 19:21:14,892][04967] Conv encoder output size: 512 +[2024-07-04 19:21:14,892][04967] Policy head output size: 512 +[2024-07-04 19:21:14,897][04986] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] +[2024-07-04 19:21:14,914][04980] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-07-04 19:21:14,915][04980] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-07-04 19:21:14,929][04980] Num visible devices: 1 +[2024-07-04 19:21:14,947][04967] Created Actor Critic model with architecture: +[2024-07-04 19:21:14,947][04967] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( @@ -85,597 +84,1529 @@ (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) -[2024-07-04 18:09:17,843][04768] Using optimizer -[2024-07-04 18:09:18,814][04768] No checkpoints found -[2024-07-04 18:09:18,815][04768] Did not load from checkpoint, starting from scratch! -[2024-07-04 18:09:18,815][04768] Initialized policy 0 weights for model version 0 -[2024-07-04 18:09:18,817][04768] LearnerWorker_p0 finished initialization! -[2024-07-04 18:09:18,817][04768] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-07-04 18:09:18,904][04781] RunningMeanStd input shape: (3, 72, 128) -[2024-07-04 18:09:18,905][04781] RunningMeanStd input shape: (1,) -[2024-07-04 18:09:18,917][04781] ConvEncoder: input_channels=3 -[2024-07-04 18:09:19,026][04781] Conv encoder output size: 512 -[2024-07-04 18:09:19,027][04781] Policy head output size: 512 -[2024-07-04 18:09:19,085][02159] Inference worker 0-0 is ready! -[2024-07-04 18:09:19,087][02159] All inference workers are ready! Signal rollout workers to start! -[2024-07-04 18:09:19,129][04786] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-07-04 18:09:19,134][04789] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-07-04 18:09:19,137][04784] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-07-04 18:09:19,138][04785] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-07-04 18:09:19,141][04783] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-07-04 18:09:19,143][04782] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-07-04 18:09:19,144][04788] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-07-04 18:09:19,148][04787] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-07-04 18:09:19,456][04785] Decorrelating experience for 0 frames... -[2024-07-04 18:09:19,456][04788] Decorrelating experience for 0 frames... -[2024-07-04 18:09:19,456][04789] Decorrelating experience for 0 frames... -[2024-07-04 18:09:19,456][04786] Decorrelating experience for 0 frames... -[2024-07-04 18:09:19,703][04783] Decorrelating experience for 0 frames... -[2024-07-04 18:09:19,710][04787] Decorrelating experience for 0 frames... -[2024-07-04 18:09:19,731][04788] Decorrelating experience for 32 frames... -[2024-07-04 18:09:19,733][04785] Decorrelating experience for 32 frames... -[2024-07-04 18:09:19,733][04789] Decorrelating experience for 32 frames... -[2024-07-04 18:09:19,740][04784] Decorrelating experience for 0 frames... -[2024-07-04 18:09:19,994][04784] Decorrelating experience for 32 frames... -[2024-07-04 18:09:20,029][04783] Decorrelating experience for 32 frames... -[2024-07-04 18:09:20,036][04782] Decorrelating experience for 0 frames... -[2024-07-04 18:09:20,038][04787] Decorrelating experience for 32 frames... -[2024-07-04 18:09:20,081][04785] Decorrelating experience for 64 frames... -[2024-07-04 18:09:20,106][04788] Decorrelating experience for 64 frames... -[2024-07-04 18:09:20,271][04786] Decorrelating experience for 32 frames... -[2024-07-04 18:09:20,282][04782] Decorrelating experience for 32 frames... -[2024-07-04 18:09:20,335][04789] Decorrelating experience for 64 frames... -[2024-07-04 18:09:20,377][04787] Decorrelating experience for 64 frames... -[2024-07-04 18:09:20,395][04784] Decorrelating experience for 64 frames... -[2024-07-04 18:09:20,407][04785] Decorrelating experience for 96 frames... -[2024-07-04 18:09:20,548][04783] Decorrelating experience for 64 frames... -[2024-07-04 18:09:20,627][04786] Decorrelating experience for 64 frames... -[2024-07-04 18:09:20,699][04787] Decorrelating experience for 96 frames... -[2024-07-04 18:09:20,704][04782] Decorrelating experience for 64 frames... -[2024-07-04 18:09:20,818][04784] Decorrelating experience for 96 frames... -[2024-07-04 18:09:20,866][04783] Decorrelating experience for 96 frames... -[2024-07-04 18:09:20,895][04789] Decorrelating experience for 96 frames... -[2024-07-04 18:09:20,979][04788] Decorrelating experience for 96 frames... -[2024-07-04 18:09:21,033][04782] Decorrelating experience for 96 frames... -[2024-07-04 18:09:21,119][04786] Decorrelating experience for 96 frames... -[2024-07-04 18:09:22,008][04768] Signal inference workers to stop experience collection... -[2024-07-04 18:09:22,013][04781] InferenceWorker_p0-w0: stopping experience collection -[2024-07-04 18:09:23,523][02159] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 2216. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-07-04 18:09:23,525][02159] Avg episode reward: [(0, '1.747')] -[2024-07-04 18:09:23,826][04768] Signal inference workers to resume experience collection... -[2024-07-04 18:09:23,827][04781] InferenceWorker_p0-w0: resuming experience collection -[2024-07-04 18:09:25,945][04781] Updated weights for policy 0, policy_version 10 (0.0194) -[2024-07-04 18:09:28,220][04781] Updated weights for policy 0, policy_version 20 (0.0013) -[2024-07-04 18:09:28,524][02159] Fps is (10 sec: 17202.3, 60 sec: 17202.3, 300 sec: 17202.3). Total num frames: 86016. Throughput: 0: 3623.8. Samples: 20336. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-07-04 18:09:28,526][02159] Avg episode reward: [(0, '4.635')] -[2024-07-04 18:09:30,328][04781] Updated weights for policy 0, policy_version 30 (0.0013) -[2024-07-04 18:09:32,417][04781] Updated weights for policy 0, policy_version 40 (0.0013) -[2024-07-04 18:09:32,664][02159] Heartbeat connected on Batcher_0 -[2024-07-04 18:09:32,668][02159] Heartbeat connected on LearnerWorker_p0 -[2024-07-04 18:09:32,677][02159] Heartbeat connected on InferenceWorker_p0-w0 -[2024-07-04 18:09:32,680][02159] Heartbeat connected on RolloutWorker_w0 -[2024-07-04 18:09:32,683][02159] Heartbeat connected on RolloutWorker_w1 -[2024-07-04 18:09:32,688][02159] Heartbeat connected on RolloutWorker_w2 -[2024-07-04 18:09:32,691][02159] Heartbeat connected on RolloutWorker_w3 -[2024-07-04 18:09:32,695][02159] Heartbeat connected on RolloutWorker_w4 -[2024-07-04 18:09:32,698][02159] Heartbeat connected on RolloutWorker_w5 -[2024-07-04 18:09:32,704][02159] Heartbeat connected on RolloutWorker_w6 -[2024-07-04 18:09:32,711][02159] Heartbeat connected on RolloutWorker_w7 -[2024-07-04 18:09:33,523][02159] Fps is (10 sec: 18432.0, 60 sec: 18432.0, 300 sec: 18432.0). Total num frames: 184320. Throughput: 0: 3273.0. Samples: 34946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-07-04 18:09:33,526][02159] Avg episode reward: [(0, '4.434')] -[2024-07-04 18:09:33,528][04768] Saving new best policy, reward=4.434! -[2024-07-04 18:09:34,517][04781] Updated weights for policy 0, policy_version 50 (0.0012) -[2024-07-04 18:09:36,609][04781] Updated weights for policy 0, policy_version 60 (0.0012) -[2024-07-04 18:09:38,523][02159] Fps is (10 sec: 19661.1, 60 sec: 18841.4, 300 sec: 18841.4). Total num frames: 282624. Throughput: 0: 4144.0. Samples: 64376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-07-04 18:09:38,526][02159] Avg episode reward: [(0, '4.774')] -[2024-07-04 18:09:38,533][04768] Saving new best policy, reward=4.774! -[2024-07-04 18:09:38,692][04781] Updated weights for policy 0, policy_version 70 (0.0013) -[2024-07-04 18:09:41,053][04781] Updated weights for policy 0, policy_version 80 (0.0013) -[2024-07-04 18:09:43,239][04781] Updated weights for policy 0, policy_version 90 (0.0013) -[2024-07-04 18:09:43,523][02159] Fps is (10 sec: 18841.6, 60 sec: 18636.8, 300 sec: 18636.8). Total num frames: 372736. Throughput: 0: 4497.4. Samples: 92164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2024-07-04 18:09:43,525][02159] Avg episode reward: [(0, '4.653')] -[2024-07-04 18:09:45,347][04781] Updated weights for policy 0, policy_version 100 (0.0013) -[2024-07-04 18:09:47,462][04781] Updated weights for policy 0, policy_version 110 (0.0012) -[2024-07-04 18:09:48,524][02159] Fps is (10 sec: 18841.6, 60 sec: 18841.5, 300 sec: 18841.5). Total num frames: 471040. Throughput: 0: 4185.4. Samples: 106852. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2024-07-04 18:09:48,525][02159] Avg episode reward: [(0, '4.497')] -[2024-07-04 18:09:49,551][04781] Updated weights for policy 0, policy_version 120 (0.0013) -[2024-07-04 18:09:51,638][04781] Updated weights for policy 0, policy_version 130 (0.0013) -[2024-07-04 18:09:53,523][02159] Fps is (10 sec: 19251.0, 60 sec: 18841.5, 300 sec: 18841.5). Total num frames: 565248. Throughput: 0: 4462.7. Samples: 136096. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) -[2024-07-04 18:09:53,526][02159] Avg episode reward: [(0, '4.762')] -[2024-07-04 18:09:53,791][04781] Updated weights for policy 0, policy_version 140 (0.0012) -[2024-07-04 18:09:56,018][04781] Updated weights for policy 0, policy_version 150 (0.0013) -[2024-07-04 18:09:58,139][04781] Updated weights for policy 0, policy_version 160 (0.0013) -[2024-07-04 18:09:58,523][02159] Fps is (10 sec: 18841.7, 60 sec: 18841.5, 300 sec: 18841.5). Total num frames: 659456. Throughput: 0: 4634.4. Samples: 164422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-07-04 18:09:58,526][02159] Avg episode reward: [(0, '4.845')] -[2024-07-04 18:09:58,556][04768] Saving new best policy, reward=4.845! -[2024-07-04 18:10:00,250][04781] Updated weights for policy 0, policy_version 170 (0.0012) -[2024-07-04 18:10:02,341][04781] Updated weights for policy 0, policy_version 180 (0.0013) -[2024-07-04 18:10:03,523][02159] Fps is (10 sec: 19251.2, 60 sec: 18943.9, 300 sec: 18943.9). Total num frames: 757760. Throughput: 0: 4418.0. Samples: 178936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-07-04 18:10:03,525][02159] Avg episode reward: [(0, '5.081')] -[2024-07-04 18:10:03,528][04768] Saving new best policy, reward=5.081! -[2024-07-04 18:10:04,433][04781] Updated weights for policy 0, policy_version 190 (0.0012) -[2024-07-04 18:10:06,526][04781] Updated weights for policy 0, policy_version 200 (0.0012) -[2024-07-04 18:10:08,523][02159] Fps is (10 sec: 19660.8, 60 sec: 19023.6, 300 sec: 19023.6). Total num frames: 856064. Throughput: 0: 4576.0. Samples: 208138. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-07-04 18:10:08,525][02159] Avg episode reward: [(0, '5.253')] -[2024-07-04 18:10:08,533][04768] Saving new best policy, reward=5.253! -[2024-07-04 18:10:08,725][04781] Updated weights for policy 0, policy_version 210 (0.0012) -[2024-07-04 18:10:10,926][04781] Updated weights for policy 0, policy_version 220 (0.0012) -[2024-07-04 18:10:13,010][04781] Updated weights for policy 0, policy_version 230 (0.0012) -[2024-07-04 18:10:13,523][02159] Fps is (10 sec: 19251.2, 60 sec: 19005.4, 300 sec: 19005.4). Total num frames: 950272. Throughput: 0: 4805.5. Samples: 236584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-07-04 18:10:13,525][02159] Avg episode reward: [(0, '5.371')] -[2024-07-04 18:10:13,528][04768] Saving new best policy, reward=5.371! -[2024-07-04 18:10:15,118][04781] Updated weights for policy 0, policy_version 240 (0.0012) -[2024-07-04 18:10:17,219][04781] Updated weights for policy 0, policy_version 250 (0.0012) -[2024-07-04 18:10:18,523][02159] Fps is (10 sec: 18841.6, 60 sec: 18990.5, 300 sec: 18990.5). Total num frames: 1044480. Throughput: 0: 4807.7. Samples: 251294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2024-07-04 18:10:18,526][02159] Avg episode reward: [(0, '5.741')] -[2024-07-04 18:10:18,535][04768] Saving new best policy, reward=5.741! -[2024-07-04 18:10:19,379][04781] Updated weights for policy 0, policy_version 260 (0.0012) -[2024-07-04 18:10:21,511][04781] Updated weights for policy 0, policy_version 270 (0.0013) -[2024-07-04 18:10:23,523][02159] Fps is (10 sec: 19251.2, 60 sec: 19046.4, 300 sec: 19046.4). Total num frames: 1142784. Throughput: 0: 4790.5. Samples: 279948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-07-04 18:10:23,525][02159] Avg episode reward: [(0, '5.533')] -[2024-07-04 18:10:23,692][04781] Updated weights for policy 0, policy_version 280 (0.0012) -[2024-07-04 18:10:25,797][04781] Updated weights for policy 0, policy_version 290 (0.0012) -[2024-07-04 18:10:27,852][04781] Updated weights for policy 0, policy_version 300 (0.0013) -[2024-07-04 18:10:28,523][02159] Fps is (10 sec: 19660.8, 60 sec: 19251.3, 300 sec: 19093.6). Total num frames: 1241088. Throughput: 0: 4816.7. Samples: 308914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-07-04 18:10:28,526][02159] Avg episode reward: [(0, '6.158')] -[2024-07-04 18:10:28,533][04768] Saving new best policy, reward=6.158! -[2024-07-04 18:10:29,948][04781] Updated weights for policy 0, policy_version 310 (0.0013) -[2024-07-04 18:10:32,065][04781] Updated weights for policy 0, policy_version 320 (0.0012) -[2024-07-04 18:10:33,523][02159] Fps is (10 sec: 19660.9, 60 sec: 19251.2, 300 sec: 19134.1). Total num frames: 1339392. Throughput: 0: 4816.4. Samples: 323590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-07-04 18:10:33,526][02159] Avg episode reward: [(0, '6.725')] -[2024-07-04 18:10:33,528][04768] Saving new best policy, reward=6.725! -[2024-07-04 18:10:34,147][04781] Updated weights for policy 0, policy_version 330 (0.0012) -[2024-07-04 18:10:36,343][04781] Updated weights for policy 0, policy_version 340 (0.0012) -[2024-07-04 18:10:38,512][04781] Updated weights for policy 0, policy_version 350 (0.0012) -[2024-07-04 18:10:38,523][02159] Fps is (10 sec: 19251.2, 60 sec: 19182.9, 300 sec: 19114.6). Total num frames: 1433600. Throughput: 0: 4803.1. Samples: 352236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-07-04 18:10:38,526][02159] Avg episode reward: [(0, '7.359')] -[2024-07-04 18:10:38,532][04768] Saving new best policy, reward=7.359! -[2024-07-04 18:10:40,619][04781] Updated weights for policy 0, policy_version 360 (0.0012) -[2024-07-04 18:10:42,724][04781] Updated weights for policy 0, policy_version 370 (0.0012) -[2024-07-04 18:10:43,523][02159] Fps is (10 sec: 18841.5, 60 sec: 19251.2, 300 sec: 19097.6). Total num frames: 1527808. Throughput: 0: 4823.8. Samples: 381494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-07-04 18:10:43,526][02159] Avg episode reward: [(0, '7.402')] -[2024-07-04 18:10:43,529][04768] Saving new best policy, reward=7.402! -[2024-07-04 18:10:44,833][04781] Updated weights for policy 0, policy_version 380 (0.0013) -[2024-07-04 18:10:46,918][04781] Updated weights for policy 0, policy_version 390 (0.0012) -[2024-07-04 18:10:48,524][02159] Fps is (10 sec: 19251.1, 60 sec: 19251.2, 300 sec: 19130.7). Total num frames: 1626112. Throughput: 0: 4825.1. Samples: 396064. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2024-07-04 18:10:48,526][02159] Avg episode reward: [(0, '8.141')] -[2024-07-04 18:10:48,533][04768] Saving new best policy, reward=8.141! -[2024-07-04 18:10:49,038][04781] Updated weights for policy 0, policy_version 400 (0.0012) -[2024-07-04 18:10:51,201][04781] Updated weights for policy 0, policy_version 410 (0.0012) -[2024-07-04 18:10:53,334][04781] Updated weights for policy 0, policy_version 420 (0.0013) -[2024-07-04 18:10:53,523][02159] Fps is (10 sec: 19251.3, 60 sec: 19251.2, 300 sec: 19114.6). Total num frames: 1720320. Throughput: 0: 4812.4. Samples: 424696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-07-04 18:10:53,525][02159] Avg episode reward: [(0, '9.569')] -[2024-07-04 18:10:53,538][04768] Saving new best policy, reward=9.569! -[2024-07-04 18:10:55,428][04781] Updated weights for policy 0, policy_version 430 (0.0012) -[2024-07-04 18:10:57,514][04781] Updated weights for policy 0, policy_version 440 (0.0012) -[2024-07-04 18:10:58,523][02159] Fps is (10 sec: 19251.4, 60 sec: 19319.5, 300 sec: 19143.4). Total num frames: 1818624. Throughput: 0: 4832.3. Samples: 454038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-07-04 18:10:58,526][02159] Avg episode reward: [(0, '8.960')] -[2024-07-04 18:10:59,608][04781] Updated weights for policy 0, policy_version 450 (0.0012) -[2024-07-04 18:11:01,664][04781] Updated weights for policy 0, policy_version 460 (0.0012) -[2024-07-04 18:11:03,523][02159] Fps is (10 sec: 19660.7, 60 sec: 19319.5, 300 sec: 19169.3). Total num frames: 1916928. Throughput: 0: 4832.0. Samples: 468736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-07-04 18:11:03,525][02159] Avg episode reward: [(0, '10.328')] -[2024-07-04 18:11:03,528][04768] Saving new best policy, reward=10.328! -[2024-07-04 18:11:03,833][04781] Updated weights for policy 0, policy_version 470 (0.0012) -[2024-07-04 18:11:06,003][04781] Updated weights for policy 0, policy_version 480 (0.0013) -[2024-07-04 18:11:08,155][04781] Updated weights for policy 0, policy_version 490 (0.0012) -[2024-07-04 18:11:08,523][02159] Fps is (10 sec: 19251.1, 60 sec: 19251.2, 300 sec: 19153.6). Total num frames: 2011136. Throughput: 0: 4829.3. Samples: 497268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-07-04 18:11:08,526][02159] Avg episode reward: [(0, '13.755')] -[2024-07-04 18:11:08,534][04768] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000491_2011136.pth... -[2024-07-04 18:11:08,612][04768] Saving new best policy, reward=13.755! -[2024-07-04 18:11:10,263][04781] Updated weights for policy 0, policy_version 500 (0.0012) -[2024-07-04 18:11:12,331][04781] Updated weights for policy 0, policy_version 510 (0.0013) -[2024-07-04 18:11:13,523][02159] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19176.7). Total num frames: 2109440. Throughput: 0: 4839.0. Samples: 526670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-07-04 18:11:13,526][02159] Avg episode reward: [(0, '14.175')] -[2024-07-04 18:11:13,528][04768] Saving new best policy, reward=14.175! -[2024-07-04 18:11:14,398][04781] Updated weights for policy 0, policy_version 520 (0.0013) -[2024-07-04 18:11:16,508][04781] Updated weights for policy 0, policy_version 530 (0.0012) -[2024-07-04 18:11:18,523][02159] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 19197.8). Total num frames: 2207744. Throughput: 0: 4839.9. Samples: 541384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-07-04 18:11:18,526][02159] Avg episode reward: [(0, '14.465')] -[2024-07-04 18:11:18,533][04768] Saving new best policy, reward=14.465! -[2024-07-04 18:11:18,732][04781] Updated weights for policy 0, policy_version 540 (0.0013) -[2024-07-04 18:11:20,903][04781] Updated weights for policy 0, policy_version 550 (0.0012) -[2024-07-04 18:11:23,008][04781] Updated weights for policy 0, policy_version 560 (0.0013) -[2024-07-04 18:11:23,523][02159] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19182.9). Total num frames: 2301952. Throughput: 0: 4828.5. Samples: 569520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-07-04 18:11:23,527][02159] Avg episode reward: [(0, '18.393')] -[2024-07-04 18:11:23,529][04768] Saving new best policy, reward=18.393! -[2024-07-04 18:11:25,085][04781] Updated weights for policy 0, policy_version 570 (0.0012) -[2024-07-04 18:11:27,159][04781] Updated weights for policy 0, policy_version 580 (0.0013) -[2024-07-04 18:11:28,523][02159] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19202.0). Total num frames: 2400256. Throughput: 0: 4835.6. Samples: 599096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-07-04 18:11:28,525][02159] Avg episode reward: [(0, '18.739')] -[2024-07-04 18:11:28,532][04768] Saving new best policy, reward=18.739! -[2024-07-04 18:11:29,245][04781] Updated weights for policy 0, policy_version 590 (0.0012) -[2024-07-04 18:11:31,371][04781] Updated weights for policy 0, policy_version 600 (0.0012) -[2024-07-04 18:11:33,523][02159] Fps is (10 sec: 19251.0, 60 sec: 19251.2, 300 sec: 19188.2). Total num frames: 2494464. Throughput: 0: 4837.0. Samples: 613728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-07-04 18:11:33,526][02159] Avg episode reward: [(0, '18.011')] -[2024-07-04 18:11:33,578][04781] Updated weights for policy 0, policy_version 610 (0.0013) -[2024-07-04 18:11:35,675][04781] Updated weights for policy 0, policy_version 620 (0.0012) -[2024-07-04 18:11:37,718][04781] Updated weights for policy 0, policy_version 630 (0.0012) -[2024-07-04 18:11:38,524][02159] Fps is (10 sec: 19250.8, 60 sec: 19319.4, 300 sec: 19205.7). Total num frames: 2592768. Throughput: 0: 4840.0. Samples: 642498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-07-04 18:11:38,526][02159] Avg episode reward: [(0, '17.751')] -[2024-07-04 18:11:39,798][04781] Updated weights for policy 0, policy_version 640 (0.0012) -[2024-07-04 18:11:41,916][04781] Updated weights for policy 0, policy_version 650 (0.0012) -[2024-07-04 18:11:43,523][02159] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 19221.9). Total num frames: 2691072. Throughput: 0: 4841.8. Samples: 671918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-07-04 18:11:43,526][02159] Avg episode reward: [(0, '19.244')] -[2024-07-04 18:11:43,528][04768] Saving new best policy, reward=19.244! -[2024-07-04 18:11:44,011][04781] Updated weights for policy 0, policy_version 660 (0.0013) -[2024-07-04 18:11:46,231][04781] Updated weights for policy 0, policy_version 670 (0.0012) -[2024-07-04 18:11:48,382][04781] Updated weights for policy 0, policy_version 680 (0.0012) -[2024-07-04 18:11:48,523][02159] Fps is (10 sec: 19251.4, 60 sec: 19319.5, 300 sec: 19208.8). Total num frames: 2785280. Throughput: 0: 4827.6. Samples: 685980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-07-04 18:11:48,526][02159] Avg episode reward: [(0, '18.019')] -[2024-07-04 18:11:50,460][04781] Updated weights for policy 0, policy_version 690 (0.0012) -[2024-07-04 18:11:52,522][04781] Updated weights for policy 0, policy_version 700 (0.0012) -[2024-07-04 18:11:53,523][02159] Fps is (10 sec: 19251.2, 60 sec: 19387.7, 300 sec: 19223.9). Total num frames: 2883584. Throughput: 0: 4844.4. Samples: 715264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-07-04 18:11:53,526][02159] Avg episode reward: [(0, '22.378')] -[2024-07-04 18:11:53,529][04768] Saving new best policy, reward=22.378! -[2024-07-04 18:11:54,613][04781] Updated weights for policy 0, policy_version 710 (0.0012) -[2024-07-04 18:11:56,698][04781] Updated weights for policy 0, policy_version 720 (0.0012) -[2024-07-04 18:11:58,523][02159] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 19238.0). Total num frames: 2981888. Throughput: 0: 4848.9. Samples: 744872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-07-04 18:11:58,525][02159] Avg episode reward: [(0, '22.491')] -[2024-07-04 18:11:58,533][04768] Saving new best policy, reward=22.491! -[2024-07-04 18:11:58,838][04781] Updated weights for policy 0, policy_version 730 (0.0013) -[2024-07-04 18:12:01,053][04781] Updated weights for policy 0, policy_version 740 (0.0012) -[2024-07-04 18:12:03,191][04781] Updated weights for policy 0, policy_version 750 (0.0012) -[2024-07-04 18:12:03,523][02159] Fps is (10 sec: 19251.2, 60 sec: 19319.5, 300 sec: 19225.6). Total num frames: 3076096. Throughput: 0: 4830.9. Samples: 758774. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2024-07-04 18:12:03,526][02159] Avg episode reward: [(0, '24.514')] -[2024-07-04 18:12:03,528][04768] Saving new best policy, reward=24.514! -[2024-07-04 18:12:05,289][04781] Updated weights for policy 0, policy_version 760 (0.0013) -[2024-07-04 18:12:07,391][04781] Updated weights for policy 0, policy_version 770 (0.0013) -[2024-07-04 18:12:08,523][02159] Fps is (10 sec: 19251.2, 60 sec: 19387.7, 300 sec: 19238.8). Total num frames: 3174400. Throughput: 0: 4854.3. Samples: 787962. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2024-07-04 18:12:08,526][02159] Avg episode reward: [(0, '21.793')] -[2024-07-04 18:12:09,440][04781] Updated weights for policy 0, policy_version 780 (0.0011) -[2024-07-04 18:12:11,521][04781] Updated weights for policy 0, policy_version 790 (0.0013) -[2024-07-04 18:12:13,523][02159] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 19251.2). Total num frames: 3272704. Throughput: 0: 4844.8. Samples: 817114. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-07-04 18:12:13,526][02159] Avg episode reward: [(0, '21.078')] -[2024-07-04 18:12:13,714][04781] Updated weights for policy 0, policy_version 800 (0.0013) -[2024-07-04 18:12:15,906][04781] Updated weights for policy 0, policy_version 810 (0.0012) -[2024-07-04 18:12:17,984][04781] Updated weights for policy 0, policy_version 820 (0.0012) -[2024-07-04 18:12:18,523][02159] Fps is (10 sec: 19251.1, 60 sec: 19319.5, 300 sec: 19239.5). Total num frames: 3366912. Throughput: 0: 4830.8. Samples: 831114. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2024-07-04 18:12:18,526][02159] Avg episode reward: [(0, '23.839')] -[2024-07-04 18:12:20,055][04781] Updated weights for policy 0, policy_version 830 (0.0012) -[2024-07-04 18:12:22,196][04781] Updated weights for policy 0, policy_version 840 (0.0012) -[2024-07-04 18:12:23,523][02159] Fps is (10 sec: 19251.3, 60 sec: 19387.7, 300 sec: 19251.2). Total num frames: 3465216. Throughput: 0: 4841.9. Samples: 860384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-07-04 18:12:23,525][02159] Avg episode reward: [(0, '21.011')] -[2024-07-04 18:12:24,272][04781] Updated weights for policy 0, policy_version 850 (0.0012) -[2024-07-04 18:12:26,415][04781] Updated weights for policy 0, policy_version 860 (0.0012) -[2024-07-04 18:12:28,523][02159] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19240.1). Total num frames: 3559424. Throughput: 0: 4825.3. Samples: 889056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-07-04 18:12:28,525][02159] Avg episode reward: [(0, '20.468')] -[2024-07-04 18:12:28,642][04781] Updated weights for policy 0, policy_version 870 (0.0012) -[2024-07-04 18:12:30,835][04781] Updated weights for policy 0, policy_version 880 (0.0012) -[2024-07-04 18:12:32,945][04781] Updated weights for policy 0, policy_version 890 (0.0012) -[2024-07-04 18:12:33,523][02159] Fps is (10 sec: 18841.5, 60 sec: 19319.5, 300 sec: 19229.6). Total num frames: 3653632. Throughput: 0: 4824.1. Samples: 903066. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-07-04 18:12:33,525][02159] Avg episode reward: [(0, '21.472')] -[2024-07-04 18:12:35,094][04781] Updated weights for policy 0, policy_version 900 (0.0013) -[2024-07-04 18:12:37,247][04781] Updated weights for policy 0, policy_version 910 (0.0013) -[2024-07-04 18:12:38,531][02159] Fps is (10 sec: 19236.8, 60 sec: 19317.1, 300 sec: 19240.0). Total num frames: 3751936. Throughput: 0: 4812.4. Samples: 931858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-07-04 18:12:38,536][02159] Avg episode reward: [(0, '21.460')] -[2024-07-04 18:12:39,358][04781] Updated weights for policy 0, policy_version 920 (0.0012) -[2024-07-04 18:12:41,534][04781] Updated weights for policy 0, policy_version 930 (0.0013) -[2024-07-04 18:12:43,523][02159] Fps is (10 sec: 18841.6, 60 sec: 19182.9, 300 sec: 19210.2). Total num frames: 3842048. Throughput: 0: 4783.5. Samples: 960130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-07-04 18:12:43,526][02159] Avg episode reward: [(0, '22.188')] -[2024-07-04 18:12:43,788][04781] Updated weights for policy 0, policy_version 940 (0.0013) -[2024-07-04 18:12:45,848][04781] Updated weights for policy 0, policy_version 950 (0.0012) -[2024-07-04 18:12:47,986][04781] Updated weights for policy 0, policy_version 960 (0.0012) -[2024-07-04 18:12:48,524][02159] Fps is (10 sec: 18855.5, 60 sec: 19251.2, 300 sec: 19221.2). Total num frames: 3940352. Throughput: 0: 4797.9. Samples: 974678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-07-04 18:12:48,526][02159] Avg episode reward: [(0, '19.636')] -[2024-07-04 18:12:50,105][04781] Updated weights for policy 0, policy_version 970 (0.0013) -[2024-07-04 18:12:51,811][04768] Stopping Batcher_0... -[2024-07-04 18:12:51,811][02159] Component Batcher_0 stopped! -[2024-07-04 18:12:51,811][04768] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-07-04 18:12:51,813][04768] Loop batcher_evt_loop terminating... -[2024-07-04 18:12:51,829][04785] Stopping RolloutWorker_w2... -[2024-07-04 18:12:51,829][04785] Loop rollout_proc2_evt_loop terminating... -[2024-07-04 18:12:51,830][04789] Stopping RolloutWorker_w7... -[2024-07-04 18:12:51,830][04786] Stopping RolloutWorker_w3... -[2024-07-04 18:12:51,830][04789] Loop rollout_proc7_evt_loop terminating... -[2024-07-04 18:12:51,830][04782] Stopping RolloutWorker_w0... -[2024-07-04 18:12:51,830][04786] Loop rollout_proc3_evt_loop terminating... -[2024-07-04 18:12:51,831][04787] Stopping RolloutWorker_w5... -[2024-07-04 18:12:51,829][02159] Component RolloutWorker_w2 stopped! -[2024-07-04 18:12:51,831][04787] Loop rollout_proc5_evt_loop terminating... -[2024-07-04 18:12:51,832][04784] Stopping RolloutWorker_w4... -[2024-07-04 18:12:51,831][04782] Loop rollout_proc0_evt_loop terminating... -[2024-07-04 18:12:51,832][04784] Loop rollout_proc4_evt_loop terminating... -[2024-07-04 18:12:51,832][04781] Weights refcount: 2 0 -[2024-07-04 18:12:51,832][04783] Stopping RolloutWorker_w1... -[2024-07-04 18:12:51,833][04788] Stopping RolloutWorker_w6... -[2024-07-04 18:12:51,833][04783] Loop rollout_proc1_evt_loop terminating... -[2024-07-04 18:12:51,833][04788] Loop rollout_proc6_evt_loop terminating... -[2024-07-04 18:12:51,832][02159] Component RolloutWorker_w7 stopped! -[2024-07-04 18:12:51,834][04781] Stopping InferenceWorker_p0-w0... -[2024-07-04 18:12:51,834][04781] Loop inference_proc0-0_evt_loop terminating... -[2024-07-04 18:12:51,834][02159] Component RolloutWorker_w3 stopped! -[2024-07-04 18:12:51,835][02159] Component RolloutWorker_w0 stopped! -[2024-07-04 18:12:51,837][02159] Component RolloutWorker_w5 stopped! -[2024-07-04 18:12:51,839][02159] Component RolloutWorker_w4 stopped! -[2024-07-04 18:12:51,840][02159] Component RolloutWorker_w1 stopped! -[2024-07-04 18:12:51,842][02159] Component RolloutWorker_w6 stopped! -[2024-07-04 18:12:51,844][02159] Component InferenceWorker_p0-w0 stopped! -[2024-07-04 18:12:51,891][04768] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-07-04 18:12:52,016][04768] Stopping LearnerWorker_p0... -[2024-07-04 18:12:52,017][04768] Loop learner_proc0_evt_loop terminating... -[2024-07-04 18:12:52,017][02159] Component LearnerWorker_p0 stopped! -[2024-07-04 18:12:52,019][02159] Waiting for process learner_proc0 to stop... -[2024-07-04 18:12:52,805][02159] Waiting for process inference_proc0-0 to join... -[2024-07-04 18:12:52,807][02159] Waiting for process rollout_proc0 to join... -[2024-07-04 18:12:52,809][02159] Waiting for process rollout_proc1 to join... -[2024-07-04 18:12:52,811][02159] Waiting for process rollout_proc2 to join... -[2024-07-04 18:12:52,813][02159] Waiting for process rollout_proc3 to join... -[2024-07-04 18:12:52,815][02159] Waiting for process rollout_proc4 to join... -[2024-07-04 18:12:52,817][02159] Waiting for process rollout_proc5 to join... -[2024-07-04 18:12:52,818][02159] Waiting for process rollout_proc6 to join... -[2024-07-04 18:12:52,820][02159] Waiting for process rollout_proc7 to join... -[2024-07-04 18:12:52,822][02159] Batcher 0 profile tree view: -batching: 16.0773, releasing_batches: 0.0238 -[2024-07-04 18:12:52,823][02159] InferenceWorker_p0-w0 profile tree view: -wait_policy: 0.0001 - wait_policy_total: 3.8893 -update_model: 3.4935 - weight_update: 0.0013 -one_step: 0.0030 - handle_policy_step: 194.0125 - deserialize: 7.8810, stack: 1.2906, obs_to_device_normalize: 45.2995, forward: 95.6950, send_messages: 13.1087 - prepare_outputs: 22.0335 - to_cpu: 13.1896 -[2024-07-04 18:12:52,824][02159] Learner 0 profile tree view: -misc: 0.0049, prepare_batch: 6.6491 -train: 18.5372 - epoch_init: 0.0056, minibatch_init: 0.0063, losses_postprocess: 0.4912, kl_divergence: 0.3722, after_optimizer: 2.0643 - calculate_losses: 8.6103 - losses_init: 0.0036, forward_head: 0.6840, bptt_initial: 4.5650, tail: 0.6411, advantages_returns: 0.1587, losses: 1.1995 - bptt: 1.1848 - bptt_forward_core: 1.1282 - update: 6.6508 - clip: 0.7339 -[2024-07-04 18:12:52,826][02159] RolloutWorker_w0 profile tree view: -wait_for_trajectories: 0.1520, enqueue_policy_requests: 7.1947, env_step: 135.8228, overhead: 6.3251, complete_rollouts: 0.2324 -save_policy_outputs: 8.8245 - split_output_tensors: 3.5413 -[2024-07-04 18:12:52,828][02159] RolloutWorker_w7 profile tree view: -wait_for_trajectories: 0.1523, enqueue_policy_requests: 7.1059, env_step: 135.8799, overhead: 6.2706, complete_rollouts: 0.2327 -save_policy_outputs: 8.8091 - split_output_tensors: 3.5261 -[2024-07-04 18:12:52,829][02159] Loop Runner_EvtLoop terminating... -[2024-07-04 18:12:52,831][02159] Runner profile tree view: -main_loop: 220.1254 -[2024-07-04 18:12:52,832][02159] Collected {0: 4005888}, FPS: 18198.2 -[2024-07-04 18:15:16,827][02159] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2024-07-04 18:15:16,829][02159] Overriding arg 'num_workers' with value 1 passed from command line -[2024-07-04 18:15:16,830][02159] Adding new argument 'no_render'=True that is not in the saved config file! -[2024-07-04 18:15:16,831][02159] Adding new argument 'save_video'=True that is not in the saved config file! -[2024-07-04 18:15:16,832][02159] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2024-07-04 18:15:16,835][02159] Adding new argument 'video_name'=None that is not in the saved config file! -[2024-07-04 18:15:16,836][02159] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2024-07-04 18:15:16,837][02159] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2024-07-04 18:15:16,838][02159] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2024-07-04 18:15:16,839][02159] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2024-07-04 18:15:16,841][02159] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2024-07-04 18:15:16,842][02159] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2024-07-04 18:15:16,843][02159] Adding new argument 'train_script'=None that is not in the saved config file! -[2024-07-04 18:15:16,845][02159] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2024-07-04 18:15:16,845][02159] Using frameskip 1 and render_action_repeat=4 for evaluation -[2024-07-04 18:15:16,874][02159] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-07-04 18:15:16,877][02159] RunningMeanStd input shape: (3, 72, 128) -[2024-07-04 18:15:16,880][02159] RunningMeanStd input shape: (1,) -[2024-07-04 18:15:16,895][02159] ConvEncoder: input_channels=3 -[2024-07-04 18:15:17,010][02159] Conv encoder output size: 512 -[2024-07-04 18:15:17,013][02159] Policy head output size: 512 -[2024-07-04 18:15:17,168][02159] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-07-04 18:15:17,926][02159] Num frames 100... -[2024-07-04 18:15:18,058][02159] Num frames 200... -[2024-07-04 18:15:18,211][02159] Num frames 300... -[2024-07-04 18:15:18,341][02159] Num frames 400... -[2024-07-04 18:15:18,418][02159] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160 -[2024-07-04 18:15:18,420][02159] Avg episode reward: 5.160, avg true_objective: 4.160 -[2024-07-04 18:15:18,528][02159] Num frames 500... -[2024-07-04 18:15:18,655][02159] Num frames 600... -[2024-07-04 18:15:18,779][02159] Num frames 700... -[2024-07-04 18:15:18,907][02159] Num frames 800... -[2024-07-04 18:15:19,032][02159] Num frames 900... -[2024-07-04 18:15:19,161][02159] Avg episode rewards: #0: 7.300, true rewards: #0: 4.800 -[2024-07-04 18:15:19,162][02159] Avg episode reward: 7.300, avg true_objective: 4.800 -[2024-07-04 18:15:19,216][02159] Num frames 1000... -[2024-07-04 18:15:19,342][02159] Num frames 1100... -[2024-07-04 18:15:19,470][02159] Num frames 1200... -[2024-07-04 18:15:19,599][02159] Num frames 1300... -[2024-07-04 18:15:19,728][02159] Num frames 1400... -[2024-07-04 18:15:19,857][02159] Num frames 1500... -[2024-07-04 18:15:19,984][02159] Num frames 1600... -[2024-07-04 18:15:20,112][02159] Num frames 1700... -[2024-07-04 18:15:20,238][02159] Num frames 1800... -[2024-07-04 18:15:20,365][02159] Num frames 1900... -[2024-07-04 18:15:20,493][02159] Num frames 2000... -[2024-07-04 18:15:20,622][02159] Num frames 2100... -[2024-07-04 18:15:20,751][02159] Num frames 2200... -[2024-07-04 18:15:20,879][02159] Num frames 2300... -[2024-07-04 18:15:20,939][02159] Avg episode rewards: #0: 14.347, true rewards: #0: 7.680 -[2024-07-04 18:15:20,940][02159] Avg episode reward: 14.347, avg true_objective: 7.680 -[2024-07-04 18:15:21,061][02159] Num frames 2400... -[2024-07-04 18:15:21,188][02159] Num frames 2500... -[2024-07-04 18:15:21,316][02159] Num frames 2600... -[2024-07-04 18:15:21,442][02159] Num frames 2700... -[2024-07-04 18:15:21,571][02159] Num frames 2800... -[2024-07-04 18:15:21,701][02159] Num frames 2900... -[2024-07-04 18:15:21,828][02159] Num frames 3000... -[2024-07-04 18:15:21,954][02159] Num frames 3100... -[2024-07-04 18:15:22,089][02159] Num frames 3200... -[2024-07-04 18:15:22,189][02159] Avg episode rewards: #0: 14.830, true rewards: #0: 8.080 -[2024-07-04 18:15:22,190][02159] Avg episode reward: 14.830, avg true_objective: 8.080 -[2024-07-04 18:15:22,281][02159] Num frames 3300... -[2024-07-04 18:15:22,410][02159] Num frames 3400... -[2024-07-04 18:15:22,538][02159] Num frames 3500... -[2024-07-04 18:15:22,666][02159] Num frames 3600... -[2024-07-04 18:15:22,795][02159] Num frames 3700... -[2024-07-04 18:15:22,927][02159] Num frames 3800... -[2024-07-04 18:15:23,057][02159] Num frames 3900... -[2024-07-04 18:15:23,190][02159] Num frames 4000... -[2024-07-04 18:15:23,313][02159] Avg episode rewards: #0: 15.902, true rewards: #0: 8.102 -[2024-07-04 18:15:23,314][02159] Avg episode reward: 15.902, avg true_objective: 8.102 -[2024-07-04 18:15:23,385][02159] Num frames 4100... -[2024-07-04 18:15:23,521][02159] Num frames 4200... -[2024-07-04 18:15:23,657][02159] Num frames 4300... -[2024-07-04 18:15:23,791][02159] Num frames 4400... -[2024-07-04 18:15:23,927][02159] Num frames 4500... -[2024-07-04 18:15:24,061][02159] Num frames 4600... -[2024-07-04 18:15:24,195][02159] Num frames 4700... -[2024-07-04 18:15:24,328][02159] Num frames 4800... -[2024-07-04 18:15:24,461][02159] Num frames 4900... -[2024-07-04 18:15:24,595][02159] Num frames 5000... -[2024-07-04 18:15:24,783][02159] Avg episode rewards: #0: 16.792, true rewards: #0: 8.458 -[2024-07-04 18:15:24,785][02159] Avg episode reward: 16.792, avg true_objective: 8.458 -[2024-07-04 18:15:24,823][02159] Num frames 5100... -[2024-07-04 18:15:24,948][02159] Num frames 5200... -[2024-07-04 18:15:25,074][02159] Num frames 5300... -[2024-07-04 18:15:25,200][02159] Num frames 5400... -[2024-07-04 18:15:25,325][02159] Num frames 5500... -[2024-07-04 18:15:25,455][02159] Num frames 5600... -[2024-07-04 18:15:25,583][02159] Num frames 5700... -[2024-07-04 18:15:25,710][02159] Num frames 5800... -[2024-07-04 18:15:25,860][02159] Avg episode rewards: #0: 16.393, true rewards: #0: 8.393 -[2024-07-04 18:15:25,862][02159] Avg episode reward: 16.393, avg true_objective: 8.393 -[2024-07-04 18:15:25,896][02159] Num frames 5900... -[2024-07-04 18:15:26,024][02159] Num frames 6000... -[2024-07-04 18:15:26,152][02159] Num frames 6100... -[2024-07-04 18:15:26,281][02159] Num frames 6200... -[2024-07-04 18:15:26,408][02159] Num frames 6300... -[2024-07-04 18:15:26,533][02159] Num frames 6400... -[2024-07-04 18:15:26,659][02159] Num frames 6500... -[2024-07-04 18:15:26,786][02159] Num frames 6600... -[2024-07-04 18:15:26,879][02159] Avg episode rewards: #0: 16.161, true rewards: #0: 8.286 -[2024-07-04 18:15:26,880][02159] Avg episode reward: 16.161, avg true_objective: 8.286 -[2024-07-04 18:15:26,969][02159] Num frames 6700... -[2024-07-04 18:15:27,096][02159] Num frames 6800... -[2024-07-04 18:15:27,227][02159] Num frames 6900... -[2024-07-04 18:15:27,355][02159] Num frames 7000... -[2024-07-04 18:15:27,480][02159] Num frames 7100... -[2024-07-04 18:15:27,607][02159] Num frames 7200... -[2024-07-04 18:15:27,732][02159] Num frames 7300... -[2024-07-04 18:15:27,790][02159] Avg episode rewards: #0: 15.779, true rewards: #0: 8.112 -[2024-07-04 18:15:27,791][02159] Avg episode reward: 15.779, avg true_objective: 8.112 -[2024-07-04 18:15:27,916][02159] Num frames 7400... -[2024-07-04 18:15:28,044][02159] Num frames 7500... -[2024-07-04 18:15:28,168][02159] Num frames 7600... -[2024-07-04 18:15:28,298][02159] Num frames 7700... -[2024-07-04 18:15:28,427][02159] Num frames 7800... -[2024-07-04 18:15:28,559][02159] Num frames 7900... -[2024-07-04 18:15:28,686][02159] Num frames 8000... -[2024-07-04 18:15:28,816][02159] Num frames 8100... -[2024-07-04 18:15:28,946][02159] Num frames 8200... -[2024-07-04 18:15:29,073][02159] Num frames 8300... -[2024-07-04 18:15:29,201][02159] Avg episode rewards: #0: 16.757, true rewards: #0: 8.357 -[2024-07-04 18:15:29,202][02159] Avg episode reward: 16.757, avg true_objective: 8.357 -[2024-07-04 18:15:49,147][02159] Replay video saved to /content/train_dir/default_experiment/replay.mp4! -[2024-07-04 18:23:54,057][02159] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2024-07-04 18:23:54,059][02159] Overriding arg 'num_workers' with value 1 passed from command line -[2024-07-04 18:23:54,059][02159] Adding new argument 'no_render'=True that is not in the saved config file! -[2024-07-04 18:23:54,061][02159] Adding new argument 'save_video'=True that is not in the saved config file! -[2024-07-04 18:23:54,062][02159] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2024-07-04 18:23:54,064][02159] Adding new argument 'video_name'=None that is not in the saved config file! -[2024-07-04 18:23:54,066][02159] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2024-07-04 18:23:54,067][02159] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2024-07-04 18:23:54,069][02159] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2024-07-04 18:23:54,070][02159] Adding new argument 'hf_repository'='Hamze-Hammami/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2024-07-04 18:23:54,071][02159] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2024-07-04 18:23:54,073][02159] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2024-07-04 18:23:54,075][02159] Adding new argument 'train_script'=None that is not in the saved config file! -[2024-07-04 18:23:54,076][02159] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2024-07-04 18:23:54,078][02159] Using frameskip 1 and render_action_repeat=4 for evaluation -[2024-07-04 18:23:54,103][02159] RunningMeanStd input shape: (3, 72, 128) -[2024-07-04 18:23:54,105][02159] RunningMeanStd input shape: (1,) -[2024-07-04 18:23:54,117][02159] ConvEncoder: input_channels=3 -[2024-07-04 18:23:54,156][02159] Conv encoder output size: 512 -[2024-07-04 18:23:54,157][02159] Policy head output size: 512 -[2024-07-04 18:23:54,176][02159] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-07-04 18:23:54,591][02159] Num frames 100... -[2024-07-04 18:23:54,722][02159] Num frames 200... -[2024-07-04 18:23:54,849][02159] Num frames 300... -[2024-07-04 18:23:55,017][02159] Avg episode rewards: #0: 8.900, true rewards: #0: 3.900 -[2024-07-04 18:23:55,018][02159] Avg episode reward: 8.900, avg true_objective: 3.900 -[2024-07-04 18:23:55,033][02159] Num frames 400... -[2024-07-04 18:23:55,163][02159] Num frames 500... -[2024-07-04 18:23:55,291][02159] Num frames 600... -[2024-07-04 18:23:55,420][02159] Num frames 700... -[2024-07-04 18:23:55,556][02159] Num frames 800... -[2024-07-04 18:23:55,619][02159] Avg episode rewards: #0: 10.530, true rewards: #0: 4.030 -[2024-07-04 18:23:55,621][02159] Avg episode reward: 10.530, avg true_objective: 4.030 -[2024-07-04 18:23:55,742][02159] Num frames 900... -[2024-07-04 18:23:55,873][02159] Num frames 1000... -[2024-07-04 18:23:56,002][02159] Num frames 1100... -[2024-07-04 18:23:56,175][02159] Avg episode rewards: #0: 9.300, true rewards: #0: 3.967 -[2024-07-04 18:23:56,177][02159] Avg episode reward: 9.300, avg true_objective: 3.967 -[2024-07-04 18:23:56,193][02159] Num frames 1200... -[2024-07-04 18:23:56,328][02159] Num frames 1300... -[2024-07-04 18:23:56,463][02159] Num frames 1400... -[2024-07-04 18:23:56,596][02159] Num frames 1500... -[2024-07-04 18:23:56,711][02159] Avg episode rewards: #0: 8.868, true rewards: #0: 3.867 -[2024-07-04 18:23:56,712][02159] Avg episode reward: 8.868, avg true_objective: 3.867 -[2024-07-04 18:23:56,787][02159] Num frames 1600... -[2024-07-04 18:23:56,924][02159] Num frames 1700... -[2024-07-04 18:23:57,059][02159] Num frames 1800... -[2024-07-04 18:23:57,194][02159] Num frames 1900... -[2024-07-04 18:23:57,329][02159] Num frames 2000... -[2024-07-04 18:23:57,458][02159] Avg episode rewards: #0: 8.910, true rewards: #0: 4.110 -[2024-07-04 18:23:57,460][02159] Avg episode reward: 8.910, avg true_objective: 4.110 -[2024-07-04 18:23:57,521][02159] Num frames 2100... -[2024-07-04 18:23:57,657][02159] Num frames 2200... -[2024-07-04 18:23:57,793][02159] Num frames 2300... -[2024-07-04 18:23:57,923][02159] Num frames 2400... -[2024-07-04 18:23:58,052][02159] Num frames 2500... -[2024-07-04 18:23:58,178][02159] Num frames 2600... -[2024-07-04 18:23:58,305][02159] Num frames 2700... -[2024-07-04 18:23:58,424][02159] Avg episode rewards: #0: 9.418, true rewards: #0: 4.585 -[2024-07-04 18:23:58,426][02159] Avg episode reward: 9.418, avg true_objective: 4.585 -[2024-07-04 18:23:58,489][02159] Num frames 2800... -[2024-07-04 18:23:58,616][02159] Num frames 2900... -[2024-07-04 18:23:58,743][02159] Num frames 3000... -[2024-07-04 18:23:58,879][02159] Num frames 3100... -[2024-07-04 18:23:59,008][02159] Num frames 3200... -[2024-07-04 18:23:59,139][02159] Num frames 3300... -[2024-07-04 18:23:59,266][02159] Num frames 3400... -[2024-07-04 18:23:59,393][02159] Num frames 3500... -[2024-07-04 18:23:59,452][02159] Avg episode rewards: #0: 10.576, true rewards: #0: 5.004 -[2024-07-04 18:23:59,454][02159] Avg episode reward: 10.576, avg true_objective: 5.004 -[2024-07-04 18:23:59,580][02159] Num frames 3600... -[2024-07-04 18:23:59,706][02159] Num frames 3700... -[2024-07-04 18:23:59,834][02159] Num frames 3800... -[2024-07-04 18:23:59,966][02159] Avg episode rewards: #0: 9.825, true rewards: #0: 4.825 -[2024-07-04 18:23:59,968][02159] Avg episode reward: 9.825, avg true_objective: 4.825 -[2024-07-04 18:24:00,022][02159] Num frames 3900... -[2024-07-04 18:24:00,147][02159] Num frames 4000... -[2024-07-04 18:24:00,274][02159] Num frames 4100... -[2024-07-04 18:24:00,404][02159] Num frames 4200... -[2024-07-04 18:24:00,533][02159] Num frames 4300... -[2024-07-04 18:24:00,659][02159] Num frames 4400... -[2024-07-04 18:24:00,786][02159] Num frames 4500... -[2024-07-04 18:24:00,912][02159] Num frames 4600... -[2024-07-04 18:24:01,041][02159] Num frames 4700... -[2024-07-04 18:24:01,168][02159] Num frames 4800... -[2024-07-04 18:24:01,296][02159] Num frames 4900... -[2024-07-04 18:24:01,425][02159] Num frames 5000... -[2024-07-04 18:24:01,550][02159] Num frames 5100... -[2024-07-04 18:24:01,679][02159] Num frames 5200... -[2024-07-04 18:24:01,806][02159] Num frames 5300... -[2024-07-04 18:24:01,935][02159] Num frames 5400... -[2024-07-04 18:24:02,005][02159] Avg episode rewards: #0: 13.123, true rewards: #0: 6.012 -[2024-07-04 18:24:02,007][02159] Avg episode reward: 13.123, avg true_objective: 6.012 -[2024-07-04 18:24:02,124][02159] Num frames 5500... -[2024-07-04 18:24:02,255][02159] Num frames 5600... -[2024-07-04 18:24:02,385][02159] Num frames 5700... -[2024-07-04 18:24:02,515][02159] Num frames 5800... -[2024-07-04 18:24:02,641][02159] Num frames 5900... -[2024-07-04 18:24:02,768][02159] Num frames 6000... -[2024-07-04 18:24:02,898][02159] Num frames 6100... -[2024-07-04 18:24:03,030][02159] Num frames 6200... -[2024-07-04 18:24:03,158][02159] Num frames 6300... -[2024-07-04 18:24:03,287][02159] Num frames 6400... -[2024-07-04 18:24:03,413][02159] Num frames 6500... -[2024-07-04 18:24:03,544][02159] Num frames 6600... -[2024-07-04 18:24:03,672][02159] Num frames 6700... -[2024-07-04 18:24:03,799][02159] Num frames 6800... -[2024-07-04 18:24:03,927][02159] Num frames 6900... -[2024-07-04 18:24:04,056][02159] Num frames 7000... -[2024-07-04 18:24:04,183][02159] Num frames 7100... -[2024-07-04 18:24:04,316][02159] Num frames 7200... -[2024-07-04 18:24:04,448][02159] Num frames 7300... -[2024-07-04 18:24:04,587][02159] Avg episode rewards: #0: 16.463, true rewards: #0: 7.363 -[2024-07-04 18:24:04,588][02159] Avg episode reward: 16.463, avg true_objective: 7.363 -[2024-07-04 18:24:22,135][02159] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-07-04 19:21:14,976][04985] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] +[2024-07-04 19:21:15,157][04967] Using optimizer +[2024-07-04 19:21:16,143][04967] No checkpoints found +[2024-07-04 19:21:16,143][04967] Did not load from checkpoint, starting from scratch! +[2024-07-04 19:21:16,143][04967] Initialized policy 0 weights for model version 0 +[2024-07-04 19:21:16,145][04967] LearnerWorker_p0 finished initialization! +[2024-07-04 19:21:16,145][04967] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-07-04 19:21:16,230][04980] RunningMeanStd input shape: (3, 72, 128) +[2024-07-04 19:21:16,231][04980] RunningMeanStd input shape: (1,) +[2024-07-04 19:21:16,244][04980] ConvEncoder: input_channels=3 +[2024-07-04 19:21:16,352][04980] Conv encoder output size: 512 +[2024-07-04 19:21:16,353][04980] Policy head output size: 512 +[2024-07-04 19:21:16,405][02883] Inference worker 0-0 is ready! +[2024-07-04 19:21:16,406][02883] All inference workers are ready! Signal rollout workers to start! +[2024-07-04 19:21:16,440][04983] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-07-04 19:21:16,440][04981] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-07-04 19:21:16,440][04982] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-07-04 19:21:16,440][04988] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-07-04 19:21:16,461][04987] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-07-04 19:21:16,461][04985] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-07-04 19:21:16,461][04984] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-07-04 19:21:16,461][04986] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-07-04 19:21:16,499][04982] VizDoom game.init() threw an exception ViZDoomUnexpectedExitException('Controlled ViZDoom instance exited unexpectedly.'). Terminate process... +[2024-07-04 19:21:16,500][04982] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() +Traceback (most recent call last): + File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init + self.game.init() +vizdoom.vizdoom.ViZDoomUnexpectedExitException: Controlled ViZDoom instance exited unexpectedly. + +During handling of the above exception, another exception occurred: + +Traceback (most recent call last): + File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal + slot_callable(*args) + File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init + env_runner.init(self.timing) + File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init + self._reset() + File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset + observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 + File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 467, in reset + return self.env.reset(seed=seed, options=options) + File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 125, in reset + obs, info = self.env.reset(**kwargs) + File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 110, in reset + obs, info = self.env.reset(**kwargs) + File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset + return self.env.reset(**kwargs) + File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 515, in reset + obs, info = self.env.reset(seed=seed, options=options) + File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 82, in reset + obs, info = self.env.reset(**kwargs) + File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 467, in reset + return self.env.reset(seed=seed, options=options) + File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset + return self.env.reset(**kwargs) + File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset + self._ensure_initialized() + File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized + self.initialize() + File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize + self._game_init() + File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init + raise EnvCriticalError() +sample_factory.envs.env_utils.EnvCriticalError +[2024-07-04 19:21:16,501][04982] Unhandled exception in evt loop rollout_proc1_evt_loop +[2024-07-04 19:21:16,754][04981] Decorrelating experience for 0 frames... +[2024-07-04 19:21:16,754][04983] Decorrelating experience for 0 frames... +[2024-07-04 19:21:16,754][04984] Decorrelating experience for 0 frames... +[2024-07-04 19:21:16,754][04988] Decorrelating experience for 0 frames... +[2024-07-04 19:21:16,997][04981] Decorrelating experience for 32 frames... +[2024-07-04 19:21:17,012][04986] Decorrelating experience for 0 frames... +[2024-07-04 19:21:17,019][04984] Decorrelating experience for 32 frames... +[2024-07-04 19:21:17,058][04988] Decorrelating experience for 32 frames... +[2024-07-04 19:21:17,088][04987] Decorrelating experience for 0 frames... +[2024-07-04 19:21:17,135][04983] Decorrelating experience for 32 frames... +[2024-07-04 19:21:17,150][04985] Decorrelating experience for 0 frames... +[2024-07-04 19:21:17,301][04986] Decorrelating experience for 32 frames... +[2024-07-04 19:21:17,354][04981] Decorrelating experience for 64 frames... +[2024-07-04 19:21:17,380][04988] Decorrelating experience for 64 frames... +[2024-07-04 19:21:17,387][04984] Decorrelating experience for 64 frames... +[2024-07-04 19:21:17,399][04987] Decorrelating experience for 32 frames... +[2024-07-04 19:21:17,632][04983] Decorrelating experience for 64 frames... +[2024-07-04 19:21:17,644][04986] Decorrelating experience for 64 frames... +[2024-07-04 19:21:17,645][04981] Decorrelating experience for 96 frames... +[2024-07-04 19:21:17,690][04984] Decorrelating experience for 96 frames... +[2024-07-04 19:21:17,753][04987] Decorrelating experience for 64 frames... +[2024-07-04 19:21:17,910][04985] Decorrelating experience for 32 frames... +[2024-07-04 19:21:17,928][04988] Decorrelating experience for 96 frames... +[2024-07-04 19:21:17,946][04986] Decorrelating experience for 96 frames... +[2024-07-04 19:21:18,019][04983] Decorrelating experience for 96 frames... +[2024-07-04 19:21:18,239][04985] Decorrelating experience for 64 frames... +[2024-07-04 19:21:18,294][04987] Decorrelating experience for 96 frames... +[2024-07-04 19:21:18,534][04985] Decorrelating experience for 96 frames... +[2024-07-04 19:21:19,406][04967] Signal inference workers to stop experience collection... +[2024-07-04 19:21:19,412][04980] InferenceWorker_p0-w0: stopping experience collection +[2024-07-04 19:21:20,543][02883] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 2976. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-07-04 19:21:20,544][02883] Avg episode reward: [(0, '3.120')] +[2024-07-04 19:21:21,202][04967] Signal inference workers to resume experience collection... +[2024-07-04 19:21:21,202][04980] InferenceWorker_p0-w0: resuming experience collection +[2024-07-04 19:21:23,370][04980] Updated weights for policy 0, policy_version 10 (0.0193) +[2024-07-04 19:21:25,543][02883] Fps is (10 sec: 15564.5, 60 sec: 15564.5, 300 sec: 15564.5). Total num frames: 77824. Throughput: 0: 3312.3. Samples: 19538. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:21:25,546][02883] Avg episode reward: [(0, '4.339')] +[2024-07-04 19:21:25,768][04980] Updated weights for policy 0, policy_version 20 (0.0013) +[2024-07-04 19:21:28,207][04980] Updated weights for policy 0, policy_version 30 (0.0013) +[2024-07-04 19:21:29,982][02883] Heartbeat connected on Batcher_0 +[2024-07-04 19:21:29,985][02883] Heartbeat connected on LearnerWorker_p0 +[2024-07-04 19:21:29,993][02883] Heartbeat connected on InferenceWorker_p0-w0 +[2024-07-04 19:21:29,998][02883] Heartbeat connected on RolloutWorker_w0 +[2024-07-04 19:21:30,007][02883] Heartbeat connected on RolloutWorker_w2 +[2024-07-04 19:21:30,010][02883] Heartbeat connected on RolloutWorker_w3 +[2024-07-04 19:21:30,014][02883] Heartbeat connected on RolloutWorker_w4 +[2024-07-04 19:21:30,018][02883] Heartbeat connected on RolloutWorker_w5 +[2024-07-04 19:21:30,021][02883] Heartbeat connected on RolloutWorker_w6 +[2024-07-04 19:21:30,024][02883] Heartbeat connected on RolloutWorker_w7 +[2024-07-04 19:21:30,543][02883] Fps is (10 sec: 15974.2, 60 sec: 15974.2, 300 sec: 15974.2). Total num frames: 159744. Throughput: 0: 2915.0. Samples: 32126. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:21:30,546][04980] Updated weights for policy 0, policy_version 40 (0.0012) +[2024-07-04 19:21:30,545][02883] Avg episode reward: [(0, '4.317')] +[2024-07-04 19:21:30,548][04967] Saving new best policy, reward=4.317! +[2024-07-04 19:21:32,876][04980] Updated weights for policy 0, policy_version 50 (0.0012) +[2024-07-04 19:21:35,218][04980] Updated weights for policy 0, policy_version 60 (0.0012) +[2024-07-04 19:21:35,543][02883] Fps is (10 sec: 17203.2, 60 sec: 16656.9, 300 sec: 16656.9). Total num frames: 249856. Throughput: 0: 3703.6. Samples: 58530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:21:35,546][02883] Avg episode reward: [(0, '4.504')] +[2024-07-04 19:21:35,554][04967] Saving new best policy, reward=4.504! +[2024-07-04 19:21:37,472][04980] Updated weights for policy 0, policy_version 70 (0.0012) +[2024-07-04 19:21:39,918][04980] Updated weights for policy 0, policy_version 80 (0.0012) +[2024-07-04 19:21:40,543][02883] Fps is (10 sec: 17612.9, 60 sec: 16793.6, 300 sec: 16793.6). Total num frames: 335872. Throughput: 0: 4076.2. Samples: 84500. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-07-04 19:21:40,545][02883] Avg episode reward: [(0, '4.423')] +[2024-07-04 19:21:42,343][04980] Updated weights for policy 0, policy_version 90 (0.0012) +[2024-07-04 19:21:44,615][04980] Updated weights for policy 0, policy_version 100 (0.0012) +[2024-07-04 19:21:45,543][02883] Fps is (10 sec: 17203.0, 60 sec: 16875.4, 300 sec: 16875.4). Total num frames: 421888. Throughput: 0: 3784.4. Samples: 97586. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:21:45,546][02883] Avg episode reward: [(0, '4.423')] +[2024-07-04 19:21:46,936][04980] Updated weights for policy 0, policy_version 110 (0.0012) +[2024-07-04 19:21:49,229][04980] Updated weights for policy 0, policy_version 120 (0.0013) +[2024-07-04 19:21:50,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17066.6, 300 sec: 17066.6). Total num frames: 512000. Throughput: 0: 4044.1. Samples: 124300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:21:50,545][02883] Avg episode reward: [(0, '4.361')] +[2024-07-04 19:21:51,526][04980] Updated weights for policy 0, policy_version 130 (0.0012) +[2024-07-04 19:21:53,949][04980] Updated weights for policy 0, policy_version 140 (0.0013) +[2024-07-04 19:21:55,543][02883] Fps is (10 sec: 17613.0, 60 sec: 17086.1, 300 sec: 17086.1). Total num frames: 598016. Throughput: 0: 4208.6. Samples: 150276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:21:55,546][02883] Avg episode reward: [(0, '4.633')] +[2024-07-04 19:21:55,553][04967] Saving new best policy, reward=4.633! +[2024-07-04 19:21:56,265][04980] Updated weights for policy 0, policy_version 150 (0.0012) +[2024-07-04 19:21:58,570][04980] Updated weights for policy 0, policy_version 160 (0.0013) +[2024-07-04 19:22:00,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17203.2, 300 sec: 17203.2). Total num frames: 688128. Throughput: 0: 4016.2. Samples: 163626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:22:00,546][02883] Avg episode reward: [(0, '4.899')] +[2024-07-04 19:22:00,549][04967] Saving new best policy, reward=4.899! +[2024-07-04 19:22:00,868][04980] Updated weights for policy 0, policy_version 170 (0.0012) +[2024-07-04 19:22:03,185][04980] Updated weights for policy 0, policy_version 180 (0.0012) +[2024-07-04 19:22:05,501][04980] Updated weights for policy 0, policy_version 190 (0.0012) +[2024-07-04 19:22:05,543][02883] Fps is (10 sec: 18022.3, 60 sec: 17294.2, 300 sec: 17294.2). Total num frames: 778240. Throughput: 0: 4163.7. Samples: 190342. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:22:05,545][02883] Avg episode reward: [(0, '5.167')] +[2024-07-04 19:22:05,554][04967] Saving new best policy, reward=5.167! +[2024-07-04 19:22:07,898][04980] Updated weights for policy 0, policy_version 200 (0.0012) +[2024-07-04 19:22:10,226][04980] Updated weights for policy 0, policy_version 210 (0.0012) +[2024-07-04 19:22:10,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17285.1, 300 sec: 17285.1). Total num frames: 864256. Throughput: 0: 4376.9. Samples: 216498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:22:10,546][02883] Avg episode reward: [(0, '5.121')] +[2024-07-04 19:22:12,493][04980] Updated weights for policy 0, policy_version 220 (0.0012) +[2024-07-04 19:22:14,885][04980] Updated weights for policy 0, policy_version 230 (0.0012) +[2024-07-04 19:22:15,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17352.1, 300 sec: 17352.1). Total num frames: 954368. Throughput: 0: 4386.0. Samples: 229498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:22:15,545][02883] Avg episode reward: [(0, '5.374')] +[2024-07-04 19:22:15,554][04967] Saving new best policy, reward=5.374! +[2024-07-04 19:22:17,165][04980] Updated weights for policy 0, policy_version 240 (0.0012) +[2024-07-04 19:22:19,440][04980] Updated weights for policy 0, policy_version 250 (0.0012) +[2024-07-04 19:22:20,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17339.7, 300 sec: 17339.7). Total num frames: 1040384. Throughput: 0: 4397.4. Samples: 256414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:22:20,545][02883] Avg episode reward: [(0, '5.983')] +[2024-07-04 19:22:20,550][04967] Saving new best policy, reward=5.983! +[2024-07-04 19:22:21,833][04980] Updated weights for policy 0, policy_version 260 (0.0012) +[2024-07-04 19:22:24,142][04980] Updated weights for policy 0, policy_version 270 (0.0012) +[2024-07-04 19:22:25,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17544.5, 300 sec: 17392.2). Total num frames: 1130496. Throughput: 0: 4406.1. Samples: 282774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:22:25,546][02883] Avg episode reward: [(0, '6.748')] +[2024-07-04 19:22:25,553][04967] Saving new best policy, reward=6.748! +[2024-07-04 19:22:26,412][04980] Updated weights for policy 0, policy_version 280 (0.0012) +[2024-07-04 19:22:28,721][04980] Updated weights for policy 0, policy_version 290 (0.0012) +[2024-07-04 19:22:30,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17612.8, 300 sec: 17378.7). Total num frames: 1216512. Throughput: 0: 4414.6. Samples: 296244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:22:30,546][02883] Avg episode reward: [(0, '7.833')] +[2024-07-04 19:22:30,548][04967] Saving new best policy, reward=7.833! +[2024-07-04 19:22:31,010][04980] Updated weights for policy 0, policy_version 300 (0.0012) +[2024-07-04 19:22:33,347][04980] Updated weights for policy 0, policy_version 310 (0.0012) +[2024-07-04 19:22:35,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17612.8, 300 sec: 17421.6). Total num frames: 1306624. Throughput: 0: 4408.8. Samples: 322698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:22:35,546][02883] Avg episode reward: [(0, '9.941')] +[2024-07-04 19:22:35,553][04967] Saving new best policy, reward=9.941! +[2024-07-04 19:22:35,694][04980] Updated weights for policy 0, policy_version 320 (0.0012) +[2024-07-04 19:22:37,997][04980] Updated weights for policy 0, policy_version 330 (0.0013) +[2024-07-04 19:22:40,301][04980] Updated weights for policy 0, policy_version 340 (0.0012) +[2024-07-04 19:22:40,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17681.0, 300 sec: 17459.2). Total num frames: 1396736. Throughput: 0: 4422.3. Samples: 349280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:22:40,546][02883] Avg episode reward: [(0, '8.639')] +[2024-07-04 19:22:42,612][04980] Updated weights for policy 0, policy_version 350 (0.0012) +[2024-07-04 19:22:44,897][04980] Updated weights for policy 0, policy_version 360 (0.0012) +[2024-07-04 19:22:45,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17681.1, 300 sec: 17444.1). Total num frames: 1482752. Throughput: 0: 4420.7. Samples: 362556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:22:45,546][02883] Avg episode reward: [(0, '12.328')] +[2024-07-04 19:22:45,554][04967] Saving new best policy, reward=12.328! +[2024-07-04 19:22:47,247][04980] Updated weights for policy 0, policy_version 370 (0.0012) +[2024-07-04 19:22:49,663][04980] Updated weights for policy 0, policy_version 380 (0.0012) +[2024-07-04 19:22:50,543][02883] Fps is (10 sec: 17203.1, 60 sec: 17612.8, 300 sec: 17430.7). Total num frames: 1568768. Throughput: 0: 4407.4. Samples: 388674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:22:50,545][02883] Avg episode reward: [(0, '13.001')] +[2024-07-04 19:22:50,556][04967] Saving new best policy, reward=13.001! +[2024-07-04 19:22:51,907][04980] Updated weights for policy 0, policy_version 390 (0.0012) +[2024-07-04 19:22:54,270][04980] Updated weights for policy 0, policy_version 400 (0.0012) +[2024-07-04 19:22:55,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17681.1, 300 sec: 17461.9). Total num frames: 1658880. Throughput: 0: 4419.1. Samples: 415356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:22:55,546][02883] Avg episode reward: [(0, '10.161')] +[2024-07-04 19:22:56,568][04980] Updated weights for policy 0, policy_version 410 (0.0012) +[2024-07-04 19:22:58,836][04980] Updated weights for policy 0, policy_version 420 (0.0012) +[2024-07-04 19:23:00,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17681.0, 300 sec: 17489.9). Total num frames: 1748992. Throughput: 0: 4427.6. Samples: 428738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:23:00,546][02883] Avg episode reward: [(0, '13.949')] +[2024-07-04 19:23:00,548][04967] Saving new best policy, reward=13.949! +[2024-07-04 19:23:01,181][04980] Updated weights for policy 0, policy_version 430 (0.0012) +[2024-07-04 19:23:03,560][04980] Updated weights for policy 0, policy_version 440 (0.0012) +[2024-07-04 19:23:05,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17612.8, 300 sec: 17476.2). Total num frames: 1835008. Throughput: 0: 4413.9. Samples: 455038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:23:05,545][02883] Avg episode reward: [(0, '14.033')] +[2024-07-04 19:23:05,554][04967] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000448_1835008.pth... +[2024-07-04 19:23:05,626][04967] Saving new best policy, reward=14.033! +[2024-07-04 19:23:05,810][04980] Updated weights for policy 0, policy_version 450 (0.0012) +[2024-07-04 19:23:08,064][04980] Updated weights for policy 0, policy_version 460 (0.0012) +[2024-07-04 19:23:10,338][04980] Updated weights for policy 0, policy_version 470 (0.0012) +[2024-07-04 19:23:10,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17681.1, 300 sec: 17501.1). Total num frames: 1925120. Throughput: 0: 4431.3. Samples: 482184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:23:10,545][02883] Avg episode reward: [(0, '18.178')] +[2024-07-04 19:23:10,557][04967] Saving new best policy, reward=18.178! +[2024-07-04 19:23:12,595][04980] Updated weights for policy 0, policy_version 480 (0.0012) +[2024-07-04 19:23:15,019][04980] Updated weights for policy 0, policy_version 490 (0.0012) +[2024-07-04 19:23:15,543][02883] Fps is (10 sec: 18022.3, 60 sec: 17681.1, 300 sec: 17523.7). Total num frames: 2015232. Throughput: 0: 4433.2. Samples: 495740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:23:15,545][02883] Avg episode reward: [(0, '15.720')] +[2024-07-04 19:23:17,345][04980] Updated weights for policy 0, policy_version 500 (0.0012) +[2024-07-04 19:23:19,628][04980] Updated weights for policy 0, policy_version 510 (0.0012) +[2024-07-04 19:23:20,543][02883] Fps is (10 sec: 18022.6, 60 sec: 17749.4, 300 sec: 17544.5). Total num frames: 2105344. Throughput: 0: 4426.4. Samples: 521886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:23:20,546][02883] Avg episode reward: [(0, '18.289')] +[2024-07-04 19:23:20,549][04967] Saving new best policy, reward=18.289! +[2024-07-04 19:23:21,891][04980] Updated weights for policy 0, policy_version 520 (0.0012) +[2024-07-04 19:23:24,122][04980] Updated weights for policy 0, policy_version 530 (0.0012) +[2024-07-04 19:23:25,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17749.3, 300 sec: 17563.6). Total num frames: 2195456. Throughput: 0: 4438.9. Samples: 549030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:23:25,545][02883] Avg episode reward: [(0, '17.467')] +[2024-07-04 19:23:26,421][04980] Updated weights for policy 0, policy_version 540 (0.0012) +[2024-07-04 19:23:28,732][04980] Updated weights for policy 0, policy_version 550 (0.0012) +[2024-07-04 19:23:30,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17749.3, 300 sec: 17549.8). Total num frames: 2281472. Throughput: 0: 4437.7. Samples: 562254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-07-04 19:23:30,546][02883] Avg episode reward: [(0, '18.734')] +[2024-07-04 19:23:30,549][04967] Saving new best policy, reward=18.734! +[2024-07-04 19:23:31,140][04980] Updated weights for policy 0, policy_version 560 (0.0013) +[2024-07-04 19:23:33,364][04980] Updated weights for policy 0, policy_version 570 (0.0012) +[2024-07-04 19:23:35,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17749.3, 300 sec: 17567.3). Total num frames: 2371584. Throughput: 0: 4449.1. Samples: 588884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-07-04 19:23:35,546][02883] Avg episode reward: [(0, '20.791')] +[2024-07-04 19:23:35,553][04967] Saving new best policy, reward=20.791! +[2024-07-04 19:23:35,677][04980] Updated weights for policy 0, policy_version 580 (0.0012) +[2024-07-04 19:23:37,921][04980] Updated weights for policy 0, policy_version 590 (0.0012) +[2024-07-04 19:23:40,172][04980] Updated weights for policy 0, policy_version 600 (0.0012) +[2024-07-04 19:23:40,543][02883] Fps is (10 sec: 18022.2, 60 sec: 17749.3, 300 sec: 17583.5). Total num frames: 2461696. Throughput: 0: 4457.8. Samples: 615956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:23:40,546][02883] Avg episode reward: [(0, '17.888')] +[2024-07-04 19:23:42,601][04980] Updated weights for policy 0, policy_version 610 (0.0012) +[2024-07-04 19:23:44,888][04980] Updated weights for policy 0, policy_version 620 (0.0013) +[2024-07-04 19:23:45,543][02883] Fps is (10 sec: 17612.6, 60 sec: 17749.3, 300 sec: 17570.4). Total num frames: 2547712. Throughput: 0: 4445.4. Samples: 628780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:23:45,546][02883] Avg episode reward: [(0, '20.992')] +[2024-07-04 19:23:45,553][04967] Saving new best policy, reward=20.992! +[2024-07-04 19:23:47,210][04980] Updated weights for policy 0, policy_version 630 (0.0012) +[2024-07-04 19:23:49,402][04980] Updated weights for policy 0, policy_version 640 (0.0011) +[2024-07-04 19:23:50,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17817.6, 300 sec: 17585.5). Total num frames: 2637824. Throughput: 0: 4460.7. Samples: 655768. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-07-04 19:23:50,546][02883] Avg episode reward: [(0, '19.676')] +[2024-07-04 19:23:51,774][04980] Updated weights for policy 0, policy_version 650 (0.0012) +[2024-07-04 19:23:54,069][04980] Updated weights for policy 0, policy_version 660 (0.0012) +[2024-07-04 19:23:55,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17817.6, 300 sec: 17599.6). Total num frames: 2727936. Throughput: 0: 4443.1. Samples: 682122. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:23:55,545][02883] Avg episode reward: [(0, '21.335')] +[2024-07-04 19:23:55,554][04967] Saving new best policy, reward=21.335! +[2024-07-04 19:23:56,470][04980] Updated weights for policy 0, policy_version 670 (0.0013) +[2024-07-04 19:23:58,852][04980] Updated weights for policy 0, policy_version 680 (0.0012) +[2024-07-04 19:24:00,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17749.3, 300 sec: 17587.2). Total num frames: 2813952. Throughput: 0: 4427.7. Samples: 694986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:24:00,545][02883] Avg episode reward: [(0, '23.293')] +[2024-07-04 19:24:00,548][04967] Saving new best policy, reward=23.293! +[2024-07-04 19:24:01,081][04980] Updated weights for policy 0, policy_version 690 (0.0012) +[2024-07-04 19:24:03,420][04980] Updated weights for policy 0, policy_version 700 (0.0012) +[2024-07-04 19:24:05,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17817.6, 300 sec: 17600.4). Total num frames: 2904064. Throughput: 0: 4446.2. Samples: 721964. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-07-04 19:24:05,546][02883] Avg episode reward: [(0, '24.192')] +[2024-07-04 19:24:05,554][04967] Saving new best policy, reward=24.192! +[2024-07-04 19:24:05,657][04980] Updated weights for policy 0, policy_version 710 (0.0012) +[2024-07-04 19:24:07,991][04980] Updated weights for policy 0, policy_version 720 (0.0012) +[2024-07-04 19:24:10,333][04980] Updated weights for policy 0, policy_version 730 (0.0013) +[2024-07-04 19:24:10,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17817.6, 300 sec: 17612.8). Total num frames: 2994176. Throughput: 0: 4430.8. Samples: 748414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-07-04 19:24:10,546][02883] Avg episode reward: [(0, '22.708')] +[2024-07-04 19:24:12,623][04980] Updated weights for policy 0, policy_version 740 (0.0013) +[2024-07-04 19:24:14,986][04980] Updated weights for policy 0, policy_version 750 (0.0012) +[2024-07-04 19:24:15,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17749.3, 300 sec: 17601.1). Total num frames: 3080192. Throughput: 0: 4431.3. Samples: 761662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-07-04 19:24:15,545][02883] Avg episode reward: [(0, '26.208')] +[2024-07-04 19:24:15,554][04967] Saving new best policy, reward=26.208! +[2024-07-04 19:24:17,281][04980] Updated weights for policy 0, policy_version 760 (0.0012) +[2024-07-04 19:24:19,595][04980] Updated weights for policy 0, policy_version 770 (0.0012) +[2024-07-04 19:24:20,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17749.3, 300 sec: 17612.8). Total num frames: 3170304. Throughput: 0: 4431.1. Samples: 788282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:24:20,546][02883] Avg episode reward: [(0, '25.238')] +[2024-07-04 19:24:21,860][04980] Updated weights for policy 0, policy_version 780 (0.0012) +[2024-07-04 19:24:24,268][04980] Updated weights for policy 0, policy_version 790 (0.0012) +[2024-07-04 19:24:25,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17681.0, 300 sec: 17601.7). Total num frames: 3256320. Throughput: 0: 4414.2. Samples: 814594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:24:25,546][02883] Avg episode reward: [(0, '23.316')] +[2024-07-04 19:24:26,561][04980] Updated weights for policy 0, policy_version 800 (0.0012) +[2024-07-04 19:24:28,772][04980] Updated weights for policy 0, policy_version 810 (0.0012) +[2024-07-04 19:24:30,543][02883] Fps is (10 sec: 17612.6, 60 sec: 17749.3, 300 sec: 17612.8). Total num frames: 3346432. Throughput: 0: 4431.6. Samples: 828204. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-07-04 19:24:30,545][02883] Avg episode reward: [(0, '23.710')] +[2024-07-04 19:24:31,088][04980] Updated weights for policy 0, policy_version 820 (0.0012) +[2024-07-04 19:24:33,319][04980] Updated weights for policy 0, policy_version 830 (0.0012) +[2024-07-04 19:24:35,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17749.3, 300 sec: 17623.3). Total num frames: 3436544. Throughput: 0: 4434.7. Samples: 855330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:24:35,546][02883] Avg episode reward: [(0, '28.501')] +[2024-07-04 19:24:35,554][04967] Saving new best policy, reward=28.501! +[2024-07-04 19:24:35,658][04980] Updated weights for policy 0, policy_version 840 (0.0013) +[2024-07-04 19:24:38,023][04980] Updated weights for policy 0, policy_version 850 (0.0013) +[2024-07-04 19:24:40,328][04980] Updated weights for policy 0, policy_version 860 (0.0012) +[2024-07-04 19:24:40,543][02883] Fps is (10 sec: 17613.0, 60 sec: 17681.1, 300 sec: 17612.8). Total num frames: 3522560. Throughput: 0: 4433.3. Samples: 881622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:24:40,546][02883] Avg episode reward: [(0, '26.818')] +[2024-07-04 19:24:42,593][04980] Updated weights for policy 0, policy_version 870 (0.0012) +[2024-07-04 19:24:44,876][04980] Updated weights for policy 0, policy_version 880 (0.0012) +[2024-07-04 19:24:45,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17749.4, 300 sec: 17622.8). Total num frames: 3612672. Throughput: 0: 4446.8. Samples: 895090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:24:45,546][02883] Avg episode reward: [(0, '25.579')] +[2024-07-04 19:24:47,164][04980] Updated weights for policy 0, policy_version 890 (0.0012) +[2024-07-04 19:24:49,449][04980] Updated weights for policy 0, policy_version 900 (0.0012) +[2024-07-04 19:24:50,543][02883] Fps is (10 sec: 18022.3, 60 sec: 17749.3, 300 sec: 17632.3). Total num frames: 3702784. Throughput: 0: 4444.0. Samples: 921942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-07-04 19:24:50,545][02883] Avg episode reward: [(0, '24.590')] +[2024-07-04 19:24:51,854][04980] Updated weights for policy 0, policy_version 910 (0.0012) +[2024-07-04 19:24:54,155][04980] Updated weights for policy 0, policy_version 920 (0.0012) +[2024-07-04 19:24:55,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17749.4, 300 sec: 17641.4). Total num frames: 3792896. Throughput: 0: 4441.7. Samples: 948292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:24:55,546][02883] Avg episode reward: [(0, '25.844')] +[2024-07-04 19:24:56,454][04980] Updated weights for policy 0, policy_version 930 (0.0011) +[2024-07-04 19:24:58,708][04980] Updated weights for policy 0, policy_version 940 (0.0012) +[2024-07-04 19:25:00,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17749.3, 300 sec: 17631.4). Total num frames: 3878912. Throughput: 0: 4445.8. Samples: 961724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:25:00,546][02883] Avg episode reward: [(0, '25.994')] +[2024-07-04 19:25:01,034][04980] Updated weights for policy 0, policy_version 950 (0.0012) +[2024-07-04 19:25:03,355][04980] Updated weights for policy 0, policy_version 960 (0.0012) +[2024-07-04 19:25:05,543][02883] Fps is (10 sec: 17612.6, 60 sec: 17749.3, 300 sec: 17640.1). Total num frames: 3969024. Throughput: 0: 4438.2. Samples: 988000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-07-04 19:25:05,546][02883] Avg episode reward: [(0, '24.448')] +[2024-07-04 19:25:05,556][04967] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000969_3969024.pth... +[2024-07-04 19:25:05,772][04980] Updated weights for policy 0, policy_version 970 (0.0013) +[2024-07-04 19:25:08,050][04980] Updated weights for policy 0, policy_version 980 (0.0012) +[2024-07-04 19:25:10,326][04980] Updated weights for policy 0, policy_version 990 (0.0012) +[2024-07-04 19:25:10,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17681.1, 300 sec: 17630.6). Total num frames: 4055040. Throughput: 0: 4447.0. Samples: 1014710. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-07-04 19:25:10,546][02883] Avg episode reward: [(0, '28.043')] +[2024-07-04 19:25:12,638][04980] Updated weights for policy 0, policy_version 1000 (0.0013) +[2024-07-04 19:25:14,856][04980] Updated weights for policy 0, policy_version 1010 (0.0012) +[2024-07-04 19:25:15,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17749.4, 300 sec: 17638.9). Total num frames: 4145152. Throughput: 0: 4443.6. Samples: 1028164. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-07-04 19:25:15,546][02883] Avg episode reward: [(0, '26.635')] +[2024-07-04 19:25:17,235][04980] Updated weights for policy 0, policy_version 1020 (0.0012) +[2024-07-04 19:25:19,652][04980] Updated weights for policy 0, policy_version 1030 (0.0013) +[2024-07-04 19:25:20,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17681.0, 300 sec: 17629.9). Total num frames: 4231168. Throughput: 0: 4420.0. Samples: 1054228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:25:20,546][02883] Avg episode reward: [(0, '23.336')] +[2024-07-04 19:25:21,942][04980] Updated weights for policy 0, policy_version 1040 (0.0012) +[2024-07-04 19:25:24,320][04980] Updated weights for policy 0, policy_version 1050 (0.0012) +[2024-07-04 19:25:25,543][02883] Fps is (10 sec: 17612.5, 60 sec: 17749.3, 300 sec: 17637.9). Total num frames: 4321280. Throughput: 0: 4422.8. Samples: 1080650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:25:25,546][02883] Avg episode reward: [(0, '24.703')] +[2024-07-04 19:25:26,622][04980] Updated weights for policy 0, policy_version 1060 (0.0012) +[2024-07-04 19:25:28,980][04980] Updated weights for policy 0, policy_version 1070 (0.0012) +[2024-07-04 19:25:30,543][02883] Fps is (10 sec: 17612.6, 60 sec: 17681.0, 300 sec: 17629.2). Total num frames: 4407296. Throughput: 0: 4416.7. Samples: 1093840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:25:30,545][02883] Avg episode reward: [(0, '25.958')] +[2024-07-04 19:25:31,346][04980] Updated weights for policy 0, policy_version 1080 (0.0012) +[2024-07-04 19:25:33,711][04980] Updated weights for policy 0, policy_version 1090 (0.0013) +[2024-07-04 19:25:35,543][02883] Fps is (10 sec: 17203.4, 60 sec: 17612.8, 300 sec: 17620.8). Total num frames: 4493312. Throughput: 0: 4396.6. Samples: 1119790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:25:35,546][02883] Avg episode reward: [(0, '22.350')] +[2024-07-04 19:25:35,992][04980] Updated weights for policy 0, policy_version 1100 (0.0011) +[2024-07-04 19:25:38,281][04980] Updated weights for policy 0, policy_version 1110 (0.0013) +[2024-07-04 19:25:40,543][02883] Fps is (10 sec: 17613.0, 60 sec: 17681.0, 300 sec: 17628.5). Total num frames: 4583424. Throughput: 0: 4412.4. Samples: 1146852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:25:40,546][02883] Avg episode reward: [(0, '24.901')] +[2024-07-04 19:25:40,552][04980] Updated weights for policy 0, policy_version 1120 (0.0012) +[2024-07-04 19:25:42,835][04980] Updated weights for policy 0, policy_version 1130 (0.0012) +[2024-07-04 19:25:45,191][04980] Updated weights for policy 0, policy_version 1140 (0.0012) +[2024-07-04 19:25:45,543][02883] Fps is (10 sec: 18022.2, 60 sec: 17681.0, 300 sec: 17636.0). Total num frames: 4673536. Throughput: 0: 4410.3. Samples: 1160186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:25:45,545][02883] Avg episode reward: [(0, '22.057')] +[2024-07-04 19:25:47,587][04980] Updated weights for policy 0, policy_version 1150 (0.0012) +[2024-07-04 19:25:49,827][04980] Updated weights for policy 0, policy_version 1160 (0.0012) +[2024-07-04 19:25:50,543][02883] Fps is (10 sec: 18022.3, 60 sec: 17681.0, 300 sec: 17643.1). Total num frames: 4763648. Throughput: 0: 4411.6. Samples: 1186522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:25:50,546][02883] Avg episode reward: [(0, '24.245')] +[2024-07-04 19:25:52,101][04980] Updated weights for policy 0, policy_version 1170 (0.0012) +[2024-07-04 19:25:54,394][04980] Updated weights for policy 0, policy_version 1180 (0.0012) +[2024-07-04 19:25:55,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17681.0, 300 sec: 17650.0). Total num frames: 4853760. Throughput: 0: 4417.2. Samples: 1213486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:25:55,546][02883] Avg episode reward: [(0, '24.830')] +[2024-07-04 19:25:56,661][04980] Updated weights for policy 0, policy_version 1190 (0.0013) +[2024-07-04 19:25:59,062][04980] Updated weights for policy 0, policy_version 1200 (0.0012) +[2024-07-04 19:26:00,543][02883] Fps is (10 sec: 17613.0, 60 sec: 17681.1, 300 sec: 17642.1). Total num frames: 4939776. Throughput: 0: 4409.8. Samples: 1226606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:26:00,547][02883] Avg episode reward: [(0, '25.866')] +[2024-07-04 19:26:01,383][04980] Updated weights for policy 0, policy_version 1210 (0.0012) +[2024-07-04 19:26:03,648][04980] Updated weights for policy 0, policy_version 1220 (0.0012) +[2024-07-04 19:26:05,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17681.1, 300 sec: 17648.7). Total num frames: 5029888. Throughput: 0: 4421.0. Samples: 1253172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:26:05,546][02883] Avg episode reward: [(0, '27.411')] +[2024-07-04 19:26:05,947][04980] Updated weights for policy 0, policy_version 1230 (0.0012) +[2024-07-04 19:26:08,177][04980] Updated weights for policy 0, policy_version 1240 (0.0012) +[2024-07-04 19:26:10,481][04980] Updated weights for policy 0, policy_version 1250 (0.0012) +[2024-07-04 19:26:10,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17749.3, 300 sec: 17655.2). Total num frames: 5120000. Throughput: 0: 4437.7. Samples: 1280346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:26:10,546][02883] Avg episode reward: [(0, '26.485')] +[2024-07-04 19:26:12,860][04980] Updated weights for policy 0, policy_version 1260 (0.0013) +[2024-07-04 19:26:15,150][04980] Updated weights for policy 0, policy_version 1270 (0.0012) +[2024-07-04 19:26:15,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17681.0, 300 sec: 17647.5). Total num frames: 5206016. Throughput: 0: 4431.7. Samples: 1293266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:26:15,546][02883] Avg episode reward: [(0, '24.642')] +[2024-07-04 19:26:17,490][04980] Updated weights for policy 0, policy_version 1280 (0.0012) +[2024-07-04 19:26:19,795][04980] Updated weights for policy 0, policy_version 1290 (0.0012) +[2024-07-04 19:26:20,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17749.3, 300 sec: 17689.2). Total num frames: 5296128. Throughput: 0: 4443.0. Samples: 1319724. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:26:20,545][02883] Avg episode reward: [(0, '25.744')] +[2024-07-04 19:26:22,104][04980] Updated weights for policy 0, policy_version 1300 (0.0012) +[2024-07-04 19:26:24,373][04980] Updated weights for policy 0, policy_version 1310 (0.0012) +[2024-07-04 19:26:25,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17681.1, 300 sec: 17703.0). Total num frames: 5382144. Throughput: 0: 4437.0. Samples: 1346516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:26:25,548][02883] Avg episode reward: [(0, '22.798')] +[2024-07-04 19:26:26,737][04980] Updated weights for policy 0, policy_version 1320 (0.0012) +[2024-07-04 19:26:29,028][04980] Updated weights for policy 0, policy_version 1330 (0.0012) +[2024-07-04 19:26:30,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17749.4, 300 sec: 17703.1). Total num frames: 5472256. Throughput: 0: 4434.8. Samples: 1359750. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:26:30,546][02883] Avg episode reward: [(0, '25.382')] +[2024-07-04 19:26:31,296][04980] Updated weights for policy 0, policy_version 1340 (0.0012) +[2024-07-04 19:26:33,532][04980] Updated weights for policy 0, policy_version 1350 (0.0012) +[2024-07-04 19:26:35,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17817.6, 300 sec: 17716.9). Total num frames: 5562368. Throughput: 0: 4451.9. Samples: 1386856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:26:35,545][02883] Avg episode reward: [(0, '26.943')] +[2024-07-04 19:26:35,836][04980] Updated weights for policy 0, policy_version 1360 (0.0012) +[2024-07-04 19:26:38,105][04980] Updated weights for policy 0, policy_version 1370 (0.0012) +[2024-07-04 19:26:40,492][04980] Updated weights for policy 0, policy_version 1380 (0.0013) +[2024-07-04 19:26:40,543][02883] Fps is (10 sec: 18022.3, 60 sec: 17817.6, 300 sec: 17730.8). Total num frames: 5652480. Throughput: 0: 4442.3. Samples: 1413388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:26:40,547][02883] Avg episode reward: [(0, '26.319')] +[2024-07-04 19:26:42,782][04980] Updated weights for policy 0, policy_version 1390 (0.0012) +[2024-07-04 19:26:45,085][04980] Updated weights for policy 0, policy_version 1400 (0.0012) +[2024-07-04 19:26:45,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17817.6, 300 sec: 17730.8). Total num frames: 5742592. Throughput: 0: 4449.4. Samples: 1426828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:26:45,546][02883] Avg episode reward: [(0, '25.411')] +[2024-07-04 19:26:47,311][04980] Updated weights for policy 0, policy_version 1410 (0.0012) +[2024-07-04 19:26:49,570][04980] Updated weights for policy 0, policy_version 1420 (0.0012) +[2024-07-04 19:26:50,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17817.6, 300 sec: 17744.7). Total num frames: 5832704. Throughput: 0: 4459.3. Samples: 1453840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:26:50,546][02883] Avg episode reward: [(0, '29.233')] +[2024-07-04 19:26:50,548][04967] Saving new best policy, reward=29.233! +[2024-07-04 19:26:51,876][04980] Updated weights for policy 0, policy_version 1430 (0.0012) +[2024-07-04 19:26:54,258][04980] Updated weights for policy 0, policy_version 1440 (0.0013) +[2024-07-04 19:26:55,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17749.4, 300 sec: 17730.8). Total num frames: 5918720. Throughput: 0: 4440.9. Samples: 1480186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:26:55,545][02883] Avg episode reward: [(0, '29.607')] +[2024-07-04 19:26:55,555][04967] Saving new best policy, reward=29.607! +[2024-07-04 19:26:56,580][04980] Updated weights for policy 0, policy_version 1450 (0.0012) +[2024-07-04 19:26:58,832][04980] Updated weights for policy 0, policy_version 1460 (0.0013) +[2024-07-04 19:27:00,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17817.6, 300 sec: 17730.8). Total num frames: 6008832. Throughput: 0: 4454.0. Samples: 1493694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:27:00,546][02883] Avg episode reward: [(0, '26.102')] +[2024-07-04 19:27:01,128][04980] Updated weights for policy 0, policy_version 1470 (0.0012) +[2024-07-04 19:27:03,424][04980] Updated weights for policy 0, policy_version 1480 (0.0012) +[2024-07-04 19:27:05,543][02883] Fps is (10 sec: 18022.3, 60 sec: 17817.6, 300 sec: 17744.7). Total num frames: 6098944. Throughput: 0: 4462.0. Samples: 1520512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:27:05,545][02883] Avg episode reward: [(0, '27.708')] +[2024-07-04 19:27:05,554][04967] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001489_6098944.pth... +[2024-07-04 19:27:05,621][04967] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000448_1835008.pth +[2024-07-04 19:27:05,763][04980] Updated weights for policy 0, policy_version 1490 (0.0012) +[2024-07-04 19:27:08,135][04980] Updated weights for policy 0, policy_version 1500 (0.0012) +[2024-07-04 19:27:10,420][04980] Updated weights for policy 0, policy_version 1510 (0.0012) +[2024-07-04 19:27:10,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17749.3, 300 sec: 17730.8). Total num frames: 6184960. Throughput: 0: 4450.7. Samples: 1546798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:27:10,546][02883] Avg episode reward: [(0, '27.378')] +[2024-07-04 19:27:12,685][04980] Updated weights for policy 0, policy_version 1520 (0.0012) +[2024-07-04 19:27:14,972][04980] Updated weights for policy 0, policy_version 1530 (0.0012) +[2024-07-04 19:27:15,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17817.6, 300 sec: 17744.7). Total num frames: 6275072. Throughput: 0: 4456.0. Samples: 1560272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:27:15,546][02883] Avg episode reward: [(0, '27.549')] +[2024-07-04 19:27:17,245][04980] Updated weights for policy 0, policy_version 1540 (0.0012) +[2024-07-04 19:27:19,563][04980] Updated weights for policy 0, policy_version 1550 (0.0013) +[2024-07-04 19:27:20,543][02883] Fps is (10 sec: 17612.6, 60 sec: 17749.3, 300 sec: 17730.8). Total num frames: 6361088. Throughput: 0: 4449.5. Samples: 1587082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:27:20,545][02883] Avg episode reward: [(0, '23.405')] +[2024-07-04 19:27:22,024][04980] Updated weights for policy 0, policy_version 1560 (0.0013) +[2024-07-04 19:27:24,290][04980] Updated weights for policy 0, policy_version 1570 (0.0012) +[2024-07-04 19:27:25,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17817.6, 300 sec: 17744.7). Total num frames: 6451200. Throughput: 0: 4442.8. Samples: 1613312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:27:25,546][02883] Avg episode reward: [(0, '28.550')] +[2024-07-04 19:27:26,587][04980] Updated weights for policy 0, policy_version 1580 (0.0012) +[2024-07-04 19:27:28,864][04980] Updated weights for policy 0, policy_version 1590 (0.0012) +[2024-07-04 19:27:30,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17817.6, 300 sec: 17744.7). Total num frames: 6541312. Throughput: 0: 4441.2. Samples: 1626684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:27:30,546][02883] Avg episode reward: [(0, '28.869')] +[2024-07-04 19:27:31,148][04980] Updated weights for policy 0, policy_version 1600 (0.0012) +[2024-07-04 19:27:33,489][04980] Updated weights for policy 0, policy_version 1610 (0.0012) +[2024-07-04 19:27:35,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17749.3, 300 sec: 17730.8). Total num frames: 6627328. Throughput: 0: 4430.7. Samples: 1653220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:27:35,545][02883] Avg episode reward: [(0, '25.829')] +[2024-07-04 19:27:35,853][04980] Updated weights for policy 0, policy_version 1620 (0.0013) +[2024-07-04 19:27:38,128][04980] Updated weights for policy 0, policy_version 1630 (0.0012) +[2024-07-04 19:27:40,394][04980] Updated weights for policy 0, policy_version 1640 (0.0012) +[2024-07-04 19:27:40,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17749.3, 300 sec: 17744.7). Total num frames: 6717440. Throughput: 0: 4439.4. Samples: 1679960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:27:40,545][02883] Avg episode reward: [(0, '25.066')] +[2024-07-04 19:27:42,680][04980] Updated weights for policy 0, policy_version 1650 (0.0012) +[2024-07-04 19:27:44,948][04980] Updated weights for policy 0, policy_version 1660 (0.0012) +[2024-07-04 19:27:45,543][02883] Fps is (10 sec: 18022.3, 60 sec: 17749.3, 300 sec: 17758.6). Total num frames: 6807552. Throughput: 0: 4438.8. Samples: 1693440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:27:45,545][02883] Avg episode reward: [(0, '27.818')] +[2024-07-04 19:27:47,327][04980] Updated weights for policy 0, policy_version 1670 (0.0012) +[2024-07-04 19:27:49,631][04980] Updated weights for policy 0, policy_version 1680 (0.0012) +[2024-07-04 19:27:50,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17681.0, 300 sec: 17744.7). Total num frames: 6893568. Throughput: 0: 4429.2. Samples: 1719826. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:27:50,545][02883] Avg episode reward: [(0, '27.653')] +[2024-07-04 19:27:51,893][04980] Updated weights for policy 0, policy_version 1690 (0.0012) +[2024-07-04 19:27:54,162][04980] Updated weights for policy 0, policy_version 1700 (0.0012) +[2024-07-04 19:27:55,543][02883] Fps is (10 sec: 18022.3, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 6987776. Throughput: 0: 4450.2. Samples: 1747058. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:27:55,546][02883] Avg episode reward: [(0, '28.670')] +[2024-07-04 19:27:56,406][04980] Updated weights for policy 0, policy_version 1710 (0.0012) +[2024-07-04 19:27:58,684][04980] Updated weights for policy 0, policy_version 1720 (0.0012) +[2024-07-04 19:28:00,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17749.3, 300 sec: 17758.6). Total num frames: 7073792. Throughput: 0: 4453.1. Samples: 1760660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:28:00,545][02883] Avg episode reward: [(0, '26.984')] +[2024-07-04 19:28:01,063][04980] Updated weights for policy 0, policy_version 1730 (0.0013) +[2024-07-04 19:28:03,445][04980] Updated weights for policy 0, policy_version 1740 (0.0012) +[2024-07-04 19:28:05,543][02883] Fps is (10 sec: 17613.0, 60 sec: 17749.3, 300 sec: 17758.6). Total num frames: 7163904. Throughput: 0: 4437.3. Samples: 1786762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:28:05,545][02883] Avg episode reward: [(0, '28.869')] +[2024-07-04 19:28:05,716][04980] Updated weights for policy 0, policy_version 1750 (0.0012) +[2024-07-04 19:28:08,019][04980] Updated weights for policy 0, policy_version 1760 (0.0012) +[2024-07-04 19:28:10,312][04980] Updated weights for policy 0, policy_version 1770 (0.0012) +[2024-07-04 19:28:10,543][02883] Fps is (10 sec: 17613.0, 60 sec: 17749.4, 300 sec: 17744.7). Total num frames: 7249920. Throughput: 0: 4448.1. Samples: 1813476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:28:10,545][02883] Avg episode reward: [(0, '30.814')] +[2024-07-04 19:28:10,548][04967] Saving new best policy, reward=30.814! +[2024-07-04 19:28:12,579][04980] Updated weights for policy 0, policy_version 1780 (0.0012) +[2024-07-04 19:28:14,955][04980] Updated weights for policy 0, policy_version 1790 (0.0012) +[2024-07-04 19:28:15,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17749.3, 300 sec: 17744.7). Total num frames: 7340032. Throughput: 0: 4450.4. Samples: 1826952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:28:15,546][02883] Avg episode reward: [(0, '28.418')] +[2024-07-04 19:28:17,290][04980] Updated weights for policy 0, policy_version 1800 (0.0012) +[2024-07-04 19:28:19,577][04980] Updated weights for policy 0, policy_version 1810 (0.0012) +[2024-07-04 19:28:20,543][02883] Fps is (10 sec: 18022.2, 60 sec: 17817.6, 300 sec: 17744.7). Total num frames: 7430144. Throughput: 0: 4446.3. Samples: 1853304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:28:20,546][02883] Avg episode reward: [(0, '31.809')] +[2024-07-04 19:28:20,548][04967] Saving new best policy, reward=31.809! +[2024-07-04 19:28:21,824][04980] Updated weights for policy 0, policy_version 1820 (0.0012) +[2024-07-04 19:28:24,132][04980] Updated weights for policy 0, policy_version 1830 (0.0012) +[2024-07-04 19:28:25,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 7520256. Throughput: 0: 4448.9. Samples: 1880160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:28:25,545][02883] Avg episode reward: [(0, '28.574')] +[2024-07-04 19:28:26,438][04980] Updated weights for policy 0, policy_version 1840 (0.0012) +[2024-07-04 19:28:28,796][04980] Updated weights for policy 0, policy_version 1850 (0.0012) +[2024-07-04 19:28:30,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17749.3, 300 sec: 17744.7). Total num frames: 7606272. Throughput: 0: 4439.6. Samples: 1893222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:28:30,545][02883] Avg episode reward: [(0, '27.688')] +[2024-07-04 19:28:31,128][04980] Updated weights for policy 0, policy_version 1860 (0.0012) +[2024-07-04 19:28:33,398][04980] Updated weights for policy 0, policy_version 1870 (0.0012) +[2024-07-04 19:28:35,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17817.6, 300 sec: 17744.7). Total num frames: 7696384. Throughput: 0: 4448.7. Samples: 1920016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:28:35,545][02883] Avg episode reward: [(0, '26.603')] +[2024-07-04 19:28:35,656][04980] Updated weights for policy 0, policy_version 1880 (0.0013) +[2024-07-04 19:28:37,927][04980] Updated weights for policy 0, policy_version 1890 (0.0012) +[2024-07-04 19:28:40,210][04980] Updated weights for policy 0, policy_version 1900 (0.0012) +[2024-07-04 19:28:40,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 7786496. Throughput: 0: 4441.6. Samples: 1946932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:28:40,545][02883] Avg episode reward: [(0, '30.755')] +[2024-07-04 19:28:42,705][04980] Updated weights for policy 0, policy_version 1910 (0.0012) +[2024-07-04 19:28:44,978][04980] Updated weights for policy 0, policy_version 1920 (0.0012) +[2024-07-04 19:28:45,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17749.3, 300 sec: 17744.7). Total num frames: 7872512. Throughput: 0: 4418.8. Samples: 1959508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:28:45,546][02883] Avg episode reward: [(0, '29.889')] +[2024-07-04 19:28:47,240][04980] Updated weights for policy 0, policy_version 1930 (0.0012) +[2024-07-04 19:28:49,497][04980] Updated weights for policy 0, policy_version 1940 (0.0011) +[2024-07-04 19:28:50,543][02883] Fps is (10 sec: 17613.0, 60 sec: 17817.6, 300 sec: 17744.7). Total num frames: 7962624. Throughput: 0: 4444.0. Samples: 1986740. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:28:50,545][02883] Avg episode reward: [(0, '28.193')] +[2024-07-04 19:28:51,762][04980] Updated weights for policy 0, policy_version 1950 (0.0012) +[2024-07-04 19:28:54,103][04980] Updated weights for policy 0, policy_version 1960 (0.0012) +[2024-07-04 19:28:55,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17749.3, 300 sec: 17758.6). Total num frames: 8052736. Throughput: 0: 4441.2. Samples: 2013330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:28:55,545][02883] Avg episode reward: [(0, '30.850')] +[2024-07-04 19:28:56,436][04980] Updated weights for policy 0, policy_version 1970 (0.0013) +[2024-07-04 19:28:58,765][04980] Updated weights for policy 0, policy_version 1980 (0.0012) +[2024-07-04 19:29:00,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17749.4, 300 sec: 17744.7). Total num frames: 8138752. Throughput: 0: 4435.6. Samples: 2026554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:29:00,546][02883] Avg episode reward: [(0, '30.598')] +[2024-07-04 19:29:01,000][04980] Updated weights for policy 0, policy_version 1990 (0.0012) +[2024-07-04 19:29:03,271][04980] Updated weights for policy 0, policy_version 2000 (0.0012) +[2024-07-04 19:29:05,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17749.3, 300 sec: 17744.7). Total num frames: 8228864. Throughput: 0: 4450.8. Samples: 2053592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:29:05,545][02883] Avg episode reward: [(0, '29.701')] +[2024-07-04 19:29:05,555][04967] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002009_8228864.pth... +[2024-07-04 19:29:05,622][04967] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000969_3969024.pth +[2024-07-04 19:29:05,655][04980] Updated weights for policy 0, policy_version 2010 (0.0012) +[2024-07-04 19:29:07,857][04980] Updated weights for policy 0, policy_version 2020 (0.0012) +[2024-07-04 19:29:10,238][04980] Updated weights for policy 0, policy_version 2030 (0.0012) +[2024-07-04 19:29:10,543][02883] Fps is (10 sec: 18022.3, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 8318976. Throughput: 0: 4440.6. Samples: 2079988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:29:10,546][02883] Avg episode reward: [(0, '27.050')] +[2024-07-04 19:29:12,539][04980] Updated weights for policy 0, policy_version 2040 (0.0012) +[2024-07-04 19:29:14,804][04980] Updated weights for policy 0, policy_version 2050 (0.0012) +[2024-07-04 19:29:15,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 8409088. Throughput: 0: 4450.6. Samples: 2093498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:29:15,545][02883] Avg episode reward: [(0, '25.568')] +[2024-07-04 19:29:17,086][04980] Updated weights for policy 0, policy_version 2060 (0.0012) +[2024-07-04 19:29:19,355][04980] Updated weights for policy 0, policy_version 2070 (0.0012) +[2024-07-04 19:29:20,543][02883] Fps is (10 sec: 18022.3, 60 sec: 17817.6, 300 sec: 17772.5). Total num frames: 8499200. Throughput: 0: 4455.5. Samples: 2120516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:29:20,545][02883] Avg episode reward: [(0, '28.999')] +[2024-07-04 19:29:21,667][04980] Updated weights for policy 0, policy_version 2080 (0.0013) +[2024-07-04 19:29:24,114][04980] Updated weights for policy 0, policy_version 2090 (0.0013) +[2024-07-04 19:29:25,543][02883] Fps is (10 sec: 17613.0, 60 sec: 17749.4, 300 sec: 17758.6). Total num frames: 8585216. Throughput: 0: 4439.7. Samples: 2146718. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:29:25,546][02883] Avg episode reward: [(0, '29.381')] +[2024-07-04 19:29:26,356][04980] Updated weights for policy 0, policy_version 2100 (0.0011) +[2024-07-04 19:29:28,635][04980] Updated weights for policy 0, policy_version 2110 (0.0012) +[2024-07-04 19:29:30,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 8675328. Throughput: 0: 4459.6. Samples: 2160190. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-07-04 19:29:30,545][02883] Avg episode reward: [(0, '26.977')] +[2024-07-04 19:29:30,913][04980] Updated weights for policy 0, policy_version 2120 (0.0012) +[2024-07-04 19:29:33,167][04980] Updated weights for policy 0, policy_version 2130 (0.0012) +[2024-07-04 19:29:35,451][04980] Updated weights for policy 0, policy_version 2140 (0.0012) +[2024-07-04 19:29:35,543][02883] Fps is (10 sec: 18021.8, 60 sec: 17817.5, 300 sec: 17772.5). Total num frames: 8765440. Throughput: 0: 4459.9. Samples: 2187436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:29:35,547][02883] Avg episode reward: [(0, '28.896')] +[2024-07-04 19:29:37,812][04980] Updated weights for policy 0, policy_version 2150 (0.0012) +[2024-07-04 19:29:40,136][04980] Updated weights for policy 0, policy_version 2160 (0.0012) +[2024-07-04 19:29:40,543][02883] Fps is (10 sec: 17613.0, 60 sec: 17749.4, 300 sec: 17758.6). Total num frames: 8851456. Throughput: 0: 4453.0. Samples: 2213716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:29:40,544][02883] Avg episode reward: [(0, '29.518')] +[2024-07-04 19:29:42,395][04980] Updated weights for policy 0, policy_version 2170 (0.0011) +[2024-07-04 19:29:44,711][04980] Updated weights for policy 0, policy_version 2180 (0.0012) +[2024-07-04 19:29:45,543][02883] Fps is (10 sec: 17613.2, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 8941568. Throughput: 0: 4456.5. Samples: 2227096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:29:45,546][02883] Avg episode reward: [(0, '29.056')] +[2024-07-04 19:29:46,960][04980] Updated weights for policy 0, policy_version 2190 (0.0012) +[2024-07-04 19:29:49,284][04980] Updated weights for policy 0, policy_version 2200 (0.0012) +[2024-07-04 19:29:50,543][02883] Fps is (10 sec: 18022.3, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 9031680. Throughput: 0: 4453.3. Samples: 2253992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:29:50,546][02883] Avg episode reward: [(0, '28.957')] +[2024-07-04 19:29:51,685][04980] Updated weights for policy 0, policy_version 2210 (0.0013) +[2024-07-04 19:29:53,937][04980] Updated weights for policy 0, policy_version 2220 (0.0012) +[2024-07-04 19:29:55,543][02883] Fps is (10 sec: 18022.6, 60 sec: 17817.6, 300 sec: 17772.5). Total num frames: 9121792. Throughput: 0: 4455.2. Samples: 2280470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:29:55,545][02883] Avg episode reward: [(0, '24.898')] +[2024-07-04 19:29:56,247][04980] Updated weights for policy 0, policy_version 2230 (0.0012) +[2024-07-04 19:29:58,497][04980] Updated weights for policy 0, policy_version 2240 (0.0012) +[2024-07-04 19:30:00,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17885.8, 300 sec: 17772.5). Total num frames: 9211904. Throughput: 0: 4454.4. Samples: 2293946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:30:00,545][02883] Avg episode reward: [(0, '25.474')] +[2024-07-04 19:30:00,775][04980] Updated weights for policy 0, policy_version 2250 (0.0012) +[2024-07-04 19:30:03,156][04980] Updated weights for policy 0, policy_version 2260 (0.0013) +[2024-07-04 19:30:05,517][04980] Updated weights for policy 0, policy_version 2270 (0.0012) +[2024-07-04 19:30:05,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17817.6, 300 sec: 17772.5). Total num frames: 9297920. Throughput: 0: 4437.6. Samples: 2320208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:30:05,545][02883] Avg episode reward: [(0, '27.818')] +[2024-07-04 19:30:07,771][04980] Updated weights for policy 0, policy_version 2280 (0.0012) +[2024-07-04 19:30:10,049][04980] Updated weights for policy 0, policy_version 2290 (0.0012) +[2024-07-04 19:30:10,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17817.6, 300 sec: 17772.5). Total num frames: 9388032. Throughput: 0: 4454.4. Samples: 2347164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:30:10,545][02883] Avg episode reward: [(0, '26.486')] +[2024-07-04 19:30:12,314][04980] Updated weights for policy 0, policy_version 2300 (0.0012) +[2024-07-04 19:30:14,595][04980] Updated weights for policy 0, policy_version 2310 (0.0012) +[2024-07-04 19:30:15,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17817.6, 300 sec: 17786.4). Total num frames: 9478144. Throughput: 0: 4455.5. Samples: 2360686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:30:15,546][02883] Avg episode reward: [(0, '29.395')] +[2024-07-04 19:30:16,950][04980] Updated weights for policy 0, policy_version 2320 (0.0012) +[2024-07-04 19:30:19,306][04980] Updated weights for policy 0, policy_version 2330 (0.0012) +[2024-07-04 19:30:20,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17749.3, 300 sec: 17772.5). Total num frames: 9564160. Throughput: 0: 4433.2. Samples: 2386928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:30:20,545][02883] Avg episode reward: [(0, '29.341')] +[2024-07-04 19:30:21,577][04980] Updated weights for policy 0, policy_version 2340 (0.0012) +[2024-07-04 19:30:23,861][04980] Updated weights for policy 0, policy_version 2350 (0.0012) +[2024-07-04 19:30:25,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17817.6, 300 sec: 17786.4). Total num frames: 9654272. Throughput: 0: 4445.2. Samples: 2413750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:30:25,546][02883] Avg episode reward: [(0, '29.687')] +[2024-07-04 19:30:26,191][04980] Updated weights for policy 0, policy_version 2360 (0.0013) +[2024-07-04 19:30:28,461][04980] Updated weights for policy 0, policy_version 2370 (0.0011) +[2024-07-04 19:30:30,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17749.3, 300 sec: 17786.4). Total num frames: 9740288. Throughput: 0: 4449.0. Samples: 2427300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:30:30,545][02883] Avg episode reward: [(0, '33.211')] +[2024-07-04 19:30:30,550][04967] Saving new best policy, reward=33.211! +[2024-07-04 19:30:30,803][04980] Updated weights for policy 0, policy_version 2380 (0.0012) +[2024-07-04 19:30:33,180][04980] Updated weights for policy 0, policy_version 2390 (0.0012) +[2024-07-04 19:30:35,498][04980] Updated weights for policy 0, policy_version 2400 (0.0012) +[2024-07-04 19:30:35,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17749.4, 300 sec: 17786.4). Total num frames: 9830400. Throughput: 0: 4431.2. Samples: 2453398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:30:35,545][02883] Avg episode reward: [(0, '30.511')] +[2024-07-04 19:30:37,810][04980] Updated weights for policy 0, policy_version 2410 (0.0012) +[2024-07-04 19:30:40,139][04980] Updated weights for policy 0, policy_version 2420 (0.0012) +[2024-07-04 19:30:40,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17749.3, 300 sec: 17772.5). Total num frames: 9916416. Throughput: 0: 4431.8. Samples: 2479900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:30:40,545][02883] Avg episode reward: [(0, '31.443')] +[2024-07-04 19:30:42,416][04980] Updated weights for policy 0, policy_version 2430 (0.0012) +[2024-07-04 19:30:44,819][04980] Updated weights for policy 0, policy_version 2440 (0.0012) +[2024-07-04 19:30:45,543][02883] Fps is (10 sec: 17203.2, 60 sec: 17681.1, 300 sec: 17758.6). Total num frames: 10002432. Throughput: 0: 4425.8. Samples: 2493106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:30:45,545][02883] Avg episode reward: [(0, '28.561')] +[2024-07-04 19:30:47,188][04980] Updated weights for policy 0, policy_version 2450 (0.0012) +[2024-07-04 19:30:49,443][04980] Updated weights for policy 0, policy_version 2460 (0.0011) +[2024-07-04 19:30:50,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17681.0, 300 sec: 17758.6). Total num frames: 10092544. Throughput: 0: 4428.2. Samples: 2519480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:30:50,547][02883] Avg episode reward: [(0, '27.790')] +[2024-07-04 19:30:51,726][04980] Updated weights for policy 0, policy_version 2470 (0.0012) +[2024-07-04 19:30:53,970][04980] Updated weights for policy 0, policy_version 2480 (0.0012) +[2024-07-04 19:30:55,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17681.1, 300 sec: 17772.5). Total num frames: 10182656. Throughput: 0: 4430.5. Samples: 2546536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:30:55,545][02883] Avg episode reward: [(0, '31.433')] +[2024-07-04 19:30:56,266][04980] Updated weights for policy 0, policy_version 2490 (0.0012) +[2024-07-04 19:30:58,682][04980] Updated weights for policy 0, policy_version 2500 (0.0012) +[2024-07-04 19:31:00,543][02883] Fps is (10 sec: 18022.7, 60 sec: 17681.1, 300 sec: 17772.5). Total num frames: 10272768. Throughput: 0: 4418.9. Samples: 2559536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-07-04 19:31:00,546][02883] Avg episode reward: [(0, '30.532')] +[2024-07-04 19:31:00,958][04980] Updated weights for policy 0, policy_version 2510 (0.0012) +[2024-07-04 19:31:03,286][04980] Updated weights for policy 0, policy_version 2520 (0.0012) +[2024-07-04 19:31:05,543][02883] Fps is (10 sec: 17612.6, 60 sec: 17681.0, 300 sec: 17758.6). Total num frames: 10358784. Throughput: 0: 4426.6. Samples: 2586124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:31:05,545][02883] Avg episode reward: [(0, '32.758')] +[2024-07-04 19:31:05,553][02883] Components not started: RolloutWorker_w1, wait_time=600.0 seconds +[2024-07-04 19:31:05,566][04967] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002530_10362880.pth... +[2024-07-04 19:31:05,568][04980] Updated weights for policy 0, policy_version 2530 (0.0012) +[2024-07-04 19:31:05,634][04967] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001489_6098944.pth +[2024-07-04 19:31:07,930][04980] Updated weights for policy 0, policy_version 2540 (0.0012) +[2024-07-04 19:31:10,266][04980] Updated weights for policy 0, policy_version 2550 (0.0012) +[2024-07-04 19:31:10,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17681.1, 300 sec: 17772.5). Total num frames: 10448896. Throughput: 0: 4416.3. Samples: 2612484. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:31:10,545][02883] Avg episode reward: [(0, '29.839')] +[2024-07-04 19:31:12,625][04980] Updated weights for policy 0, policy_version 2560 (0.0013) +[2024-07-04 19:31:14,944][04980] Updated weights for policy 0, policy_version 2570 (0.0012) +[2024-07-04 19:31:15,543][02883] Fps is (10 sec: 17613.0, 60 sec: 17612.8, 300 sec: 17758.6). Total num frames: 10534912. Throughput: 0: 4400.6. Samples: 2625326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:31:15,546][02883] Avg episode reward: [(0, '29.145')] +[2024-07-04 19:31:17,226][04980] Updated weights for policy 0, policy_version 2580 (0.0012) +[2024-07-04 19:31:19,519][04980] Updated weights for policy 0, policy_version 2590 (0.0012) +[2024-07-04 19:31:20,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17681.1, 300 sec: 17772.5). Total num frames: 10625024. Throughput: 0: 4418.7. Samples: 2652238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:31:20,545][02883] Avg episode reward: [(0, '30.518')] +[2024-07-04 19:31:21,833][04980] Updated weights for policy 0, policy_version 2600 (0.0012) +[2024-07-04 19:31:24,146][04980] Updated weights for policy 0, policy_version 2610 (0.0012) +[2024-07-04 19:31:25,543][02883] Fps is (10 sec: 17612.6, 60 sec: 17612.8, 300 sec: 17758.6). Total num frames: 10711040. Throughput: 0: 4415.2. Samples: 2678584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:31:25,546][02883] Avg episode reward: [(0, '28.924')] +[2024-07-04 19:31:26,608][04980] Updated weights for policy 0, policy_version 2620 (0.0012) +[2024-07-04 19:31:28,899][04980] Updated weights for policy 0, policy_version 2630 (0.0012) +[2024-07-04 19:31:30,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17681.1, 300 sec: 17758.6). Total num frames: 10801152. Throughput: 0: 4410.4. Samples: 2691572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:31:30,547][02883] Avg episode reward: [(0, '30.559')] +[2024-07-04 19:31:31,185][04980] Updated weights for policy 0, policy_version 2640 (0.0012) +[2024-07-04 19:31:33,478][04980] Updated weights for policy 0, policy_version 2650 (0.0012) +[2024-07-04 19:31:35,543][02883] Fps is (10 sec: 18022.6, 60 sec: 17681.1, 300 sec: 17758.6). Total num frames: 10891264. Throughput: 0: 4419.4. Samples: 2718350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:31:35,545][02883] Avg episode reward: [(0, '31.578')] +[2024-07-04 19:31:35,737][04980] Updated weights for policy 0, policy_version 2660 (0.0012) +[2024-07-04 19:31:38,121][04980] Updated weights for policy 0, policy_version 2670 (0.0012) +[2024-07-04 19:31:40,493][04980] Updated weights for policy 0, policy_version 2680 (0.0012) +[2024-07-04 19:31:40,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17681.1, 300 sec: 17744.7). Total num frames: 10977280. Throughput: 0: 4401.0. Samples: 2744582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:31:40,545][02883] Avg episode reward: [(0, '30.109')] +[2024-07-04 19:31:42,772][04980] Updated weights for policy 0, policy_version 2690 (0.0012) +[2024-07-04 19:31:45,091][04980] Updated weights for policy 0, policy_version 2700 (0.0012) +[2024-07-04 19:31:45,543][02883] Fps is (10 sec: 17202.9, 60 sec: 17681.0, 300 sec: 17730.8). Total num frames: 11063296. Throughput: 0: 4408.1. Samples: 2757900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-07-04 19:31:45,545][02883] Avg episode reward: [(0, '28.572')] +[2024-07-04 19:31:47,387][04980] Updated weights for policy 0, policy_version 2710 (0.0012) +[2024-07-04 19:31:49,725][04980] Updated weights for policy 0, policy_version 2720 (0.0012) +[2024-07-04 19:31:50,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17681.1, 300 sec: 17744.7). Total num frames: 11153408. Throughput: 0: 4408.1. Samples: 2784488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-07-04 19:31:50,546][02883] Avg episode reward: [(0, '30.109')] +[2024-07-04 19:31:52,077][04980] Updated weights for policy 0, policy_version 2730 (0.0012) +[2024-07-04 19:31:54,466][04980] Updated weights for policy 0, policy_version 2740 (0.0013) +[2024-07-04 19:31:55,543][02883] Fps is (10 sec: 17613.0, 60 sec: 17612.8, 300 sec: 17730.8). Total num frames: 11239424. Throughput: 0: 4402.7. Samples: 2810606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:31:55,546][02883] Avg episode reward: [(0, '28.260')] +[2024-07-04 19:31:56,738][04980] Updated weights for policy 0, policy_version 2750 (0.0012) +[2024-07-04 19:31:59,039][04980] Updated weights for policy 0, policy_version 2760 (0.0012) +[2024-07-04 19:32:00,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17612.8, 300 sec: 17730.8). Total num frames: 11329536. Throughput: 0: 4415.0. Samples: 2824002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:32:00,545][02883] Avg episode reward: [(0, '29.736')] +[2024-07-04 19:32:01,297][04980] Updated weights for policy 0, policy_version 2770 (0.0012) +[2024-07-04 19:32:03,592][04980] Updated weights for policy 0, policy_version 2780 (0.0012) +[2024-07-04 19:32:05,543][02883] Fps is (10 sec: 18022.3, 60 sec: 17681.1, 300 sec: 17744.7). Total num frames: 11419648. Throughput: 0: 4417.6. Samples: 2851030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:32:05,546][02883] Avg episode reward: [(0, '31.729')] +[2024-07-04 19:32:05,901][04980] Updated weights for policy 0, policy_version 2790 (0.0013) +[2024-07-04 19:32:08,222][04980] Updated weights for policy 0, policy_version 2800 (0.0012) +[2024-07-04 19:32:10,511][04980] Updated weights for policy 0, policy_version 2810 (0.0013) +[2024-07-04 19:32:10,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17681.1, 300 sec: 17744.7). Total num frames: 11509760. Throughput: 0: 4422.4. Samples: 2877590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:32:10,545][02883] Avg episode reward: [(0, '36.507')] +[2024-07-04 19:32:10,547][04967] Saving new best policy, reward=36.507! +[2024-07-04 19:32:12,790][04980] Updated weights for policy 0, policy_version 2820 (0.0012) +[2024-07-04 19:32:15,027][04980] Updated weights for policy 0, policy_version 2830 (0.0012) +[2024-07-04 19:32:15,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17749.3, 300 sec: 17758.6). Total num frames: 11599872. Throughput: 0: 4434.6. Samples: 2891130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:32:15,545][02883] Avg episode reward: [(0, '30.448')] +[2024-07-04 19:32:17,307][04980] Updated weights for policy 0, policy_version 2840 (0.0012) +[2024-07-04 19:32:19,640][04980] Updated weights for policy 0, policy_version 2850 (0.0012) +[2024-07-04 19:32:20,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17681.1, 300 sec: 17744.7). Total num frames: 11685888. Throughput: 0: 4433.9. Samples: 2917876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:32:20,545][02883] Avg episode reward: [(0, '29.541')] +[2024-07-04 19:32:22,058][04980] Updated weights for policy 0, policy_version 2860 (0.0013) +[2024-07-04 19:32:24,290][04980] Updated weights for policy 0, policy_version 2870 (0.0012) +[2024-07-04 19:32:25,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17749.3, 300 sec: 17744.7). Total num frames: 11776000. Throughput: 0: 4442.0. Samples: 2944474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:32:25,545][02883] Avg episode reward: [(0, '30.828')] +[2024-07-04 19:32:26,574][04980] Updated weights for policy 0, policy_version 2880 (0.0012) +[2024-07-04 19:32:28,899][04980] Updated weights for policy 0, policy_version 2890 (0.0012) +[2024-07-04 19:32:30,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17749.4, 300 sec: 17758.6). Total num frames: 11866112. Throughput: 0: 4442.9. Samples: 2957828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:32:30,545][02883] Avg episode reward: [(0, '33.197')] +[2024-07-04 19:32:31,142][04980] Updated weights for policy 0, policy_version 2900 (0.0012) +[2024-07-04 19:32:33,549][04980] Updated weights for policy 0, policy_version 2910 (0.0012) +[2024-07-04 19:32:35,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17681.0, 300 sec: 17744.7). Total num frames: 11952128. Throughput: 0: 4438.5. Samples: 2984220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:32:35,545][02883] Avg episode reward: [(0, '30.649')] +[2024-07-04 19:32:35,830][04980] Updated weights for policy 0, policy_version 2920 (0.0012) +[2024-07-04 19:32:38,137][04980] Updated weights for policy 0, policy_version 2930 (0.0012) +[2024-07-04 19:32:40,347][04980] Updated weights for policy 0, policy_version 2940 (0.0012) +[2024-07-04 19:32:40,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17749.3, 300 sec: 17744.7). Total num frames: 12042240. Throughput: 0: 4460.1. Samples: 3011312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:32:40,546][02883] Avg episode reward: [(0, '29.331')] +[2024-07-04 19:32:42,617][04980] Updated weights for policy 0, policy_version 2950 (0.0012) +[2024-07-04 19:32:44,883][04980] Updated weights for policy 0, policy_version 2960 (0.0011) +[2024-07-04 19:32:45,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 12132352. Throughput: 0: 4465.1. Samples: 3024932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:32:45,546][02883] Avg episode reward: [(0, '28.285')] +[2024-07-04 19:32:47,238][04980] Updated weights for policy 0, policy_version 2970 (0.0012) +[2024-07-04 19:32:49,601][04980] Updated weights for policy 0, policy_version 2980 (0.0012) +[2024-07-04 19:32:50,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17817.6, 300 sec: 17744.7). Total num frames: 12222464. Throughput: 0: 4448.6. Samples: 3051218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:32:50,546][02883] Avg episode reward: [(0, '30.072')] +[2024-07-04 19:32:51,842][04980] Updated weights for policy 0, policy_version 2990 (0.0012) +[2024-07-04 19:32:54,131][04980] Updated weights for policy 0, policy_version 3000 (0.0012) +[2024-07-04 19:32:55,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17885.9, 300 sec: 17758.6). Total num frames: 12312576. Throughput: 0: 4461.8. Samples: 3078372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:32:55,546][02883] Avg episode reward: [(0, '29.753')] +[2024-07-04 19:32:56,410][04980] Updated weights for policy 0, policy_version 3010 (0.0012) +[2024-07-04 19:32:58,672][04980] Updated weights for policy 0, policy_version 3020 (0.0012) +[2024-07-04 19:33:00,543][02883] Fps is (10 sec: 17613.0, 60 sec: 17817.6, 300 sec: 17744.7). Total num frames: 12398592. Throughput: 0: 4460.7. Samples: 3091860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:33:00,545][02883] Avg episode reward: [(0, '31.608')] +[2024-07-04 19:33:01,089][04980] Updated weights for policy 0, policy_version 3030 (0.0013) +[2024-07-04 19:33:03,379][04980] Updated weights for policy 0, policy_version 3040 (0.0012) +[2024-07-04 19:33:05,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 12488704. Throughput: 0: 4451.1. Samples: 3118176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-07-04 19:33:05,546][02883] Avg episode reward: [(0, '28.051')] +[2024-07-04 19:33:05,555][04967] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003049_12488704.pth... +[2024-07-04 19:33:05,622][04967] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002009_8228864.pth +[2024-07-04 19:33:05,678][04980] Updated weights for policy 0, policy_version 3050 (0.0012) +[2024-07-04 19:33:07,923][04980] Updated weights for policy 0, policy_version 3060 (0.0012) +[2024-07-04 19:33:10,190][04980] Updated weights for policy 0, policy_version 3070 (0.0012) +[2024-07-04 19:33:10,543][02883] Fps is (10 sec: 18022.2, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 12578816. Throughput: 0: 4459.3. Samples: 3145142. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:33:10,546][02883] Avg episode reward: [(0, '29.568')] +[2024-07-04 19:33:12,529][04980] Updated weights for policy 0, policy_version 3080 (0.0012) +[2024-07-04 19:33:14,855][04980] Updated weights for policy 0, policy_version 3090 (0.0012) +[2024-07-04 19:33:15,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17749.4, 300 sec: 17744.7). Total num frames: 12664832. Throughput: 0: 4456.6. Samples: 3158374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-07-04 19:33:15,546][02883] Avg episode reward: [(0, '31.251')] +[2024-07-04 19:33:17,208][04980] Updated weights for policy 0, policy_version 3100 (0.0012) +[2024-07-04 19:33:19,441][04980] Updated weights for policy 0, policy_version 3110 (0.0012) +[2024-07-04 19:33:20,543][02883] Fps is (10 sec: 17613.0, 60 sec: 17817.6, 300 sec: 17744.7). Total num frames: 12754944. Throughput: 0: 4461.4. Samples: 3184984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:33:20,545][02883] Avg episode reward: [(0, '31.321')] +[2024-07-04 19:33:21,750][04980] Updated weights for policy 0, policy_version 3120 (0.0013) +[2024-07-04 19:33:23,996][04980] Updated weights for policy 0, policy_version 3130 (0.0012) +[2024-07-04 19:33:25,543][02883] Fps is (10 sec: 18022.2, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 12845056. Throughput: 0: 4462.1. Samples: 3212106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-07-04 19:33:25,545][02883] Avg episode reward: [(0, '32.423')] +[2024-07-04 19:33:26,261][04980] Updated weights for policy 0, policy_version 3140 (0.0012) +[2024-07-04 19:33:28,665][04980] Updated weights for policy 0, policy_version 3150 (0.0012) +[2024-07-04 19:33:30,543][02883] Fps is (10 sec: 17612.6, 60 sec: 17749.3, 300 sec: 17744.7). Total num frames: 12931072. Throughput: 0: 4448.9. Samples: 3225132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-07-04 19:33:30,545][02883] Avg episode reward: [(0, '32.747')] +[2024-07-04 19:33:31,078][04980] Updated weights for policy 0, policy_version 3160 (0.0013) +[2024-07-04 19:33:33,361][04980] Updated weights for policy 0, policy_version 3170 (0.0012) +[2024-07-04 19:33:35,543][02883] Fps is (10 sec: 17613.0, 60 sec: 17817.6, 300 sec: 17744.7). Total num frames: 13021184. Throughput: 0: 4448.7. Samples: 3251410. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:33:35,545][02883] Avg episode reward: [(0, '28.791')] +[2024-07-04 19:33:35,600][04980] Updated weights for policy 0, policy_version 3180 (0.0012) +[2024-07-04 19:33:37,910][04980] Updated weights for policy 0, policy_version 3190 (0.0012) +[2024-07-04 19:33:40,190][04980] Updated weights for policy 0, policy_version 3200 (0.0012) +[2024-07-04 19:33:40,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 13111296. Throughput: 0: 4442.9. Samples: 3278304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:33:40,545][02883] Avg episode reward: [(0, '32.401')] +[2024-07-04 19:33:42,577][04980] Updated weights for policy 0, policy_version 3210 (0.0013) +[2024-07-04 19:33:44,836][04980] Updated weights for policy 0, policy_version 3220 (0.0012) +[2024-07-04 19:33:45,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 13201408. Throughput: 0: 4432.3. Samples: 3291312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-07-04 19:33:45,545][02883] Avg episode reward: [(0, '27.927')] +[2024-07-04 19:33:47,103][04980] Updated weights for policy 0, policy_version 3230 (0.0012) +[2024-07-04 19:33:49,392][04980] Updated weights for policy 0, policy_version 3240 (0.0012) +[2024-07-04 19:33:50,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 13291520. Throughput: 0: 4449.8. Samples: 3318416. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:33:50,545][02883] Avg episode reward: [(0, '27.281')] +[2024-07-04 19:33:51,642][04980] Updated weights for policy 0, policy_version 3250 (0.0012) +[2024-07-04 19:33:53,935][04980] Updated weights for policy 0, policy_version 3260 (0.0012) +[2024-07-04 19:33:55,543][02883] Fps is (10 sec: 17612.6, 60 sec: 17749.3, 300 sec: 17758.6). Total num frames: 13377536. Throughput: 0: 4441.2. Samples: 3344996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:33:55,547][02883] Avg episode reward: [(0, '31.154')] +[2024-07-04 19:33:56,355][04980] Updated weights for policy 0, policy_version 3270 (0.0012) +[2024-07-04 19:33:58,619][04980] Updated weights for policy 0, policy_version 3280 (0.0012) +[2024-07-04 19:34:00,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 13467648. Throughput: 0: 4442.1. Samples: 3358268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:34:00,544][02883] Avg episode reward: [(0, '32.263')] +[2024-07-04 19:34:00,915][04980] Updated weights for policy 0, policy_version 3290 (0.0012) +[2024-07-04 19:34:03,197][04980] Updated weights for policy 0, policy_version 3300 (0.0012) +[2024-07-04 19:34:05,476][04980] Updated weights for policy 0, policy_version 3310 (0.0012) +[2024-07-04 19:34:05,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 13557760. Throughput: 0: 4449.9. Samples: 3385230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:34:05,546][02883] Avg episode reward: [(0, '30.611')] +[2024-07-04 19:34:07,787][04980] Updated weights for policy 0, policy_version 3320 (0.0012) +[2024-07-04 19:34:10,173][04980] Updated weights for policy 0, policy_version 3330 (0.0012) +[2024-07-04 19:34:10,543][02883] Fps is (10 sec: 17612.6, 60 sec: 17749.3, 300 sec: 17744.7). Total num frames: 13643776. Throughput: 0: 4433.2. Samples: 3411600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:34:10,545][02883] Avg episode reward: [(0, '33.214')] +[2024-07-04 19:34:12,420][04980] Updated weights for policy 0, policy_version 3340 (0.0012) +[2024-07-04 19:34:14,667][04980] Updated weights for policy 0, policy_version 3350 (0.0012) +[2024-07-04 19:34:15,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17817.6, 300 sec: 17744.7). Total num frames: 13733888. Throughput: 0: 4445.0. Samples: 3425156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:34:15,545][02883] Avg episode reward: [(0, '36.389')] +[2024-07-04 19:34:16,969][04980] Updated weights for policy 0, policy_version 3360 (0.0012) +[2024-07-04 19:34:19,203][04980] Updated weights for policy 0, policy_version 3370 (0.0012) +[2024-07-04 19:34:20,543][02883] Fps is (10 sec: 18022.2, 60 sec: 17817.5, 300 sec: 17758.6). Total num frames: 13824000. Throughput: 0: 4464.5. Samples: 3452312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:34:20,545][02883] Avg episode reward: [(0, '31.376')] +[2024-07-04 19:34:21,529][04980] Updated weights for policy 0, policy_version 3380 (0.0012) +[2024-07-04 19:34:23,885][04980] Updated weights for policy 0, policy_version 3390 (0.0012) +[2024-07-04 19:34:25,543][02883] Fps is (10 sec: 18022.3, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 13914112. Throughput: 0: 4455.0. Samples: 3478778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-07-04 19:34:25,546][02883] Avg episode reward: [(0, '30.042')] +[2024-07-04 19:34:26,197][04980] Updated weights for policy 0, policy_version 3400 (0.0012) +[2024-07-04 19:34:28,437][04980] Updated weights for policy 0, policy_version 3410 (0.0012) +[2024-07-04 19:34:30,543][02883] Fps is (10 sec: 18022.8, 60 sec: 17885.9, 300 sec: 17758.6). Total num frames: 14004224. Throughput: 0: 4466.4. Samples: 3492302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:34:30,545][02883] Avg episode reward: [(0, '28.166')] +[2024-07-04 19:34:30,736][04980] Updated weights for policy 0, policy_version 3420 (0.0012) +[2024-07-04 19:34:33,019][04980] Updated weights for policy 0, policy_version 3430 (0.0012) +[2024-07-04 19:34:35,323][04980] Updated weights for policy 0, policy_version 3440 (0.0012) +[2024-07-04 19:34:35,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 14090240. Throughput: 0: 4460.4. Samples: 3519136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:34:35,546][02883] Avg episode reward: [(0, '30.302')] +[2024-07-04 19:34:37,663][04980] Updated weights for policy 0, policy_version 3450 (0.0012) +[2024-07-04 19:34:39,924][04980] Updated weights for policy 0, policy_version 3460 (0.0012) +[2024-07-04 19:34:40,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 14180352. Throughput: 0: 4460.4. Samples: 3545712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:34:40,546][02883] Avg episode reward: [(0, '28.894')] +[2024-07-04 19:34:42,214][04980] Updated weights for policy 0, policy_version 3470 (0.0012) +[2024-07-04 19:34:44,450][04980] Updated weights for policy 0, policy_version 3480 (0.0012) +[2024-07-04 19:34:45,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 14270464. Throughput: 0: 4468.7. Samples: 3559360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:34:45,545][02883] Avg episode reward: [(0, '30.000')] +[2024-07-04 19:34:46,728][04980] Updated weights for policy 0, policy_version 3490 (0.0012) +[2024-07-04 19:34:49,067][04980] Updated weights for policy 0, policy_version 3500 (0.0012) +[2024-07-04 19:34:50,543][02883] Fps is (10 sec: 18022.3, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 14360576. Throughput: 0: 4463.0. Samples: 3586066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:34:50,546][02883] Avg episode reward: [(0, '30.929')] +[2024-07-04 19:34:51,444][04980] Updated weights for policy 0, policy_version 3510 (0.0012) +[2024-07-04 19:34:53,707][04980] Updated weights for policy 0, policy_version 3520 (0.0012) +[2024-07-04 19:34:55,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17885.9, 300 sec: 17758.6). Total num frames: 14450688. Throughput: 0: 4472.0. Samples: 3612842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:34:55,545][02883] Avg episode reward: [(0, '30.885')] +[2024-07-04 19:34:55,949][04980] Updated weights for policy 0, policy_version 3530 (0.0012) +[2024-07-04 19:34:58,207][04980] Updated weights for policy 0, policy_version 3540 (0.0012) +[2024-07-04 19:35:00,461][04980] Updated weights for policy 0, policy_version 3550 (0.0012) +[2024-07-04 19:35:00,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17885.8, 300 sec: 17772.5). Total num frames: 14540800. Throughput: 0: 4473.6. Samples: 3626470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:35:00,545][02883] Avg episode reward: [(0, '33.169')] +[2024-07-04 19:35:02,851][04980] Updated weights for policy 0, policy_version 3560 (0.0013) +[2024-07-04 19:35:05,210][04980] Updated weights for policy 0, policy_version 3570 (0.0013) +[2024-07-04 19:35:05,543][02883] Fps is (10 sec: 17612.7, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 14626816. Throughput: 0: 4450.9. Samples: 3652600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:35:05,546][02883] Avg episode reward: [(0, '30.350')] +[2024-07-04 19:35:05,555][04967] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003571_14626816.pth... +[2024-07-04 19:35:05,622][04967] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002530_10362880.pth +[2024-07-04 19:35:07,504][04980] Updated weights for policy 0, policy_version 3580 (0.0012) +[2024-07-04 19:35:09,758][04980] Updated weights for policy 0, policy_version 3590 (0.0012) +[2024-07-04 19:35:10,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17885.9, 300 sec: 17758.6). Total num frames: 14716928. Throughput: 0: 4463.8. Samples: 3679648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:35:10,545][02883] Avg episode reward: [(0, '30.239')] +[2024-07-04 19:35:12,018][04980] Updated weights for policy 0, policy_version 3600 (0.0012) +[2024-07-04 19:35:14,298][04980] Updated weights for policy 0, policy_version 3610 (0.0012) +[2024-07-04 19:35:15,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17885.8, 300 sec: 17772.5). Total num frames: 14807040. Throughput: 0: 4464.7. Samples: 3693214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:35:15,545][02883] Avg episode reward: [(0, '29.670')] +[2024-07-04 19:35:16,637][04980] Updated weights for policy 0, policy_version 3620 (0.0012) +[2024-07-04 19:35:18,969][04980] Updated weights for policy 0, policy_version 3630 (0.0012) +[2024-07-04 19:35:20,543][02883] Fps is (10 sec: 17612.9, 60 sec: 17817.6, 300 sec: 17758.6). Total num frames: 14893056. Throughput: 0: 4457.5. Samples: 3719724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:35:20,545][02883] Avg episode reward: [(0, '29.672')] +[2024-07-04 19:35:21,230][04980] Updated weights for policy 0, policy_version 3640 (0.0012) +[2024-07-04 19:35:23,469][04980] Updated weights for policy 0, policy_version 3650 (0.0012) +[2024-07-04 19:35:25,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17885.9, 300 sec: 17786.4). Total num frames: 14987264. Throughput: 0: 4475.4. Samples: 3747104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:35:25,547][02883] Avg episode reward: [(0, '29.958')] +[2024-07-04 19:35:25,697][04980] Updated weights for policy 0, policy_version 3660 (0.0011) +[2024-07-04 19:35:27,946][04980] Updated weights for policy 0, policy_version 3670 (0.0012) +[2024-07-04 19:35:30,320][04980] Updated weights for policy 0, policy_version 3680 (0.0012) +[2024-07-04 19:35:30,543][02883] Fps is (10 sec: 18432.0, 60 sec: 17885.9, 300 sec: 17786.4). Total num frames: 15077376. Throughput: 0: 4473.3. Samples: 3760656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:35:30,545][02883] Avg episode reward: [(0, '32.096')] +[2024-07-04 19:35:32,643][04980] Updated weights for policy 0, policy_version 3690 (0.0012) +[2024-07-04 19:35:34,951][04980] Updated weights for policy 0, policy_version 3700 (0.0012) +[2024-07-04 19:35:35,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17885.9, 300 sec: 17786.4). Total num frames: 15163392. Throughput: 0: 4467.8. Samples: 3787116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:35:35,546][02883] Avg episode reward: [(0, '31.601')] +[2024-07-04 19:35:37,169][04980] Updated weights for policy 0, policy_version 3710 (0.0013) +[2024-07-04 19:35:39,420][04980] Updated weights for policy 0, policy_version 3720 (0.0012) +[2024-07-04 19:35:40,544][02883] Fps is (10 sec: 18020.8, 60 sec: 17953.9, 300 sec: 17814.1). Total num frames: 15257600. Throughput: 0: 4477.9. Samples: 3814350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-07-04 19:35:40,547][02883] Avg episode reward: [(0, '32.127')] +[2024-07-04 19:35:41,694][04980] Updated weights for policy 0, policy_version 3730 (0.0012) +[2024-07-04 19:35:44,041][04980] Updated weights for policy 0, policy_version 3740 (0.0013) +[2024-07-04 19:35:45,543][02883] Fps is (10 sec: 18022.5, 60 sec: 17885.9, 300 sec: 17800.3). Total num frames: 15343616. Throughput: 0: 4466.7. Samples: 3827470. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-07-04 19:35:45,546][02883] Avg episode reward: [(0, '29.134')] +[2024-07-04 19:35:46,445][04980] Updated weights for policy 0, policy_version 3750 (0.0012) +[2024-07-04 19:35:48,721][04980] Updated weights for policy 0, policy_version 3760 (0.0012) +[2024-07-04 19:35:50,543][02883] Fps is (10 sec: 17204.7, 60 sec: 17817.6, 300 sec: 17786.4). Total num frames: 15429632. Throughput: 0: 4470.9. Samples: 3853790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:35:50,546][02883] Avg episode reward: [(0, '26.973')] +[2024-07-04 19:35:51,100][04980] Updated weights for policy 0, policy_version 3770 (0.0013) +[2024-07-04 19:35:53,372][04980] Updated weights for policy 0, policy_version 3780 (0.0012) +[2024-07-04 19:35:55,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17817.6, 300 sec: 17786.4). Total num frames: 15519744. Throughput: 0: 4457.4. Samples: 3880232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:35:55,545][02883] Avg episode reward: [(0, '30.778')] +[2024-07-04 19:35:55,699][04980] Updated weights for policy 0, policy_version 3790 (0.0012) +[2024-07-04 19:35:58,111][04980] Updated weights for policy 0, policy_version 3800 (0.0012) +[2024-07-04 19:36:00,358][04980] Updated weights for policy 0, policy_version 3810 (0.0012) +[2024-07-04 19:36:00,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17749.3, 300 sec: 17786.4). Total num frames: 15605760. Throughput: 0: 4442.6. Samples: 3893130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-07-04 19:36:00,545][02883] Avg episode reward: [(0, '30.990')] +[2024-07-04 19:36:02,669][04980] Updated weights for policy 0, policy_version 3820 (0.0012) +[2024-07-04 19:36:04,916][04980] Updated weights for policy 0, policy_version 3830 (0.0012) +[2024-07-04 19:36:05,543][02883] Fps is (10 sec: 17612.8, 60 sec: 17817.6, 300 sec: 17786.4). Total num frames: 15695872. Throughput: 0: 4455.1. Samples: 3920204. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:36:05,546][02883] Avg episode reward: [(0, '31.781')] +[2024-07-04 19:36:07,216][04980] Updated weights for policy 0, policy_version 3840 (0.0012) +[2024-07-04 19:36:09,463][04980] Updated weights for policy 0, policy_version 3850 (0.0011) +[2024-07-04 19:36:10,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17817.6, 300 sec: 17800.2). Total num frames: 15785984. Throughput: 0: 4443.7. Samples: 3947070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:36:10,545][02883] Avg episode reward: [(0, '32.003')] +[2024-07-04 19:36:11,836][04980] Updated weights for policy 0, policy_version 3860 (0.0012) +[2024-07-04 19:36:14,152][04980] Updated weights for policy 0, policy_version 3870 (0.0011) +[2024-07-04 19:36:15,543][02883] Fps is (10 sec: 18022.3, 60 sec: 17817.6, 300 sec: 17800.2). Total num frames: 15876096. Throughput: 0: 4433.9. Samples: 3960180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-07-04 19:36:15,546][02883] Avg episode reward: [(0, '29.995')] +[2024-07-04 19:36:16,375][04980] Updated weights for policy 0, policy_version 3880 (0.0012) +[2024-07-04 19:36:18,636][04980] Updated weights for policy 0, policy_version 3890 (0.0012) +[2024-07-04 19:36:20,543][02883] Fps is (10 sec: 18022.4, 60 sec: 17885.9, 300 sec: 17814.1). Total num frames: 15966208. Throughput: 0: 4450.9. Samples: 3987406. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-07-04 19:36:20,546][02883] Avg episode reward: [(0, '28.421')] +[2024-07-04 19:36:20,918][04980] Updated weights for policy 0, policy_version 3900 (0.0012) +[2024-07-04 19:36:22,715][04967] Stopping Batcher_0... +[2024-07-04 19:36:22,715][02883] Component Batcher_0 stopped! +[2024-07-04 19:36:22,720][04967] Loop batcher_evt_loop terminating... +[2024-07-04 19:36:22,716][04967] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... +[2024-07-04 19:36:22,718][02883] Component RolloutWorker_w1 process died already! Don't wait for it. +[2024-07-04 19:36:22,733][04987] Stopping RolloutWorker_w6... +[2024-07-04 19:36:22,734][04987] Loop rollout_proc6_evt_loop terminating... +[2024-07-04 19:36:22,733][02883] Component RolloutWorker_w6 stopped! +[2024-07-04 19:36:22,735][04980] Weights refcount: 2 0 +[2024-07-04 19:36:22,737][04985] Stopping RolloutWorker_w4... +[2024-07-04 19:36:22,737][04980] Stopping InferenceWorker_p0-w0... +[2024-07-04 19:36:22,737][04980] Loop inference_proc0-0_evt_loop terminating... +[2024-07-04 19:36:22,737][04985] Loop rollout_proc4_evt_loop terminating... +[2024-07-04 19:36:22,738][04983] Stopping RolloutWorker_w2... +[2024-07-04 19:36:22,738][04988] Stopping RolloutWorker_w7... +[2024-07-04 19:36:22,738][04986] Stopping RolloutWorker_w5... +[2024-07-04 19:36:22,739][04983] Loop rollout_proc2_evt_loop terminating... +[2024-07-04 19:36:22,739][04988] Loop rollout_proc7_evt_loop terminating... +[2024-07-04 19:36:22,737][02883] Component RolloutWorker_w4 stopped! +[2024-07-04 19:36:22,739][04986] Loop rollout_proc5_evt_loop terminating... +[2024-07-04 19:36:22,740][04984] Stopping RolloutWorker_w3... +[2024-07-04 19:36:22,739][02883] Component InferenceWorker_p0-w0 stopped! +[2024-07-04 19:36:22,741][04984] Loop rollout_proc3_evt_loop terminating... +[2024-07-04 19:36:22,743][04981] Stopping RolloutWorker_w0... +[2024-07-04 19:36:22,743][02883] Component RolloutWorker_w2 stopped! +[2024-07-04 19:36:22,744][04981] Loop rollout_proc0_evt_loop terminating... +[2024-07-04 19:36:22,745][02883] Component RolloutWorker_w7 stopped! +[2024-07-04 19:36:22,746][02883] Component RolloutWorker_w5 stopped! +[2024-07-04 19:36:22,748][02883] Component RolloutWorker_w3 stopped! +[2024-07-04 19:36:22,749][02883] Component RolloutWorker_w0 stopped! +[2024-07-04 19:36:22,804][04967] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003049_12488704.pth +[2024-07-04 19:36:22,814][04967] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... +[2024-07-04 19:36:22,939][04967] Stopping LearnerWorker_p0... +[2024-07-04 19:36:22,939][04967] Loop learner_proc0_evt_loop terminating... +[2024-07-04 19:36:22,939][02883] Component LearnerWorker_p0 stopped! +[2024-07-04 19:36:22,942][02883] Waiting for process learner_proc0 to stop... +[2024-07-04 19:36:23,695][02883] Waiting for process inference_proc0-0 to join... +[2024-07-04 19:36:23,698][02883] Waiting for process rollout_proc0 to join... +[2024-07-04 19:36:23,700][02883] Waiting for process rollout_proc1 to join... +[2024-07-04 19:36:23,701][02883] Waiting for process rollout_proc2 to join... +[2024-07-04 19:36:23,704][02883] Waiting for process rollout_proc3 to join... +[2024-07-04 19:36:23,707][02883] Waiting for process rollout_proc4 to join... +[2024-07-04 19:36:23,709][02883] Waiting for process rollout_proc5 to join... +[2024-07-04 19:36:23,711][02883] Waiting for process rollout_proc6 to join... +[2024-07-04 19:36:23,714][02883] Waiting for process rollout_proc7 to join... +[2024-07-04 19:36:23,716][02883] Batcher 0 profile tree view: +batching: 67.6331, releasing_batches: 0.0949 +[2024-07-04 19:36:23,717][02883] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 11.6512 +update_model: 14.0208 + weight_update: 0.0012 +one_step: 0.0027 + handle_policy_step: 837.8563 + deserialize: 33.5072, stack: 5.5337, obs_to_device_normalize: 195.7815, forward: 416.3114, send_messages: 52.4694 + prepare_outputs: 96.2869 + to_cpu: 57.8186 +[2024-07-04 19:36:23,719][02883] Learner 0 profile tree view: +misc: 0.0198, prepare_batch: 21.5928 +train: 67.2959 + epoch_init: 0.0230, minibatch_init: 0.0232, losses_postprocess: 1.9503, kl_divergence: 1.3760, after_optimizer: 8.0212 + calculate_losses: 30.6346 + losses_init: 0.0145, forward_head: 2.4632, bptt_initial: 15.4775, tail: 2.3389, advantages_returns: 0.6024, losses: 4.4506 + bptt: 4.6135 + bptt_forward_core: 4.3970 + update: 23.9488 + clip: 2.6209 +[2024-07-04 19:36:23,720][02883] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.6723, enqueue_policy_requests: 31.0428, env_step: 592.4949, overhead: 27.9295, complete_rollouts: 1.0218 +save_policy_outputs: 38.8391 + split_output_tensors: 15.8166 +[2024-07-04 19:36:23,721][02883] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.6753, enqueue_policy_requests: 30.9580, env_step: 592.8407, overhead: 27.4947, complete_rollouts: 1.0352 +save_policy_outputs: 38.7635 + split_output_tensors: 15.6677 +[2024-07-04 19:36:23,726][02883] Loop Runner_EvtLoop terminating... +[2024-07-04 19:36:23,727][02883] Runner profile tree view: +main_loop: 913.7020 +[2024-07-04 19:36:23,728][02883] Collected {0: 16007168}, FPS: 17519.0 +[2024-07-04 19:36:23,748][02883] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-07-04 19:36:23,749][02883] Overriding arg 'num_workers' with value 1 passed from command line +[2024-07-04 19:36:23,751][02883] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-07-04 19:36:23,752][02883] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-07-04 19:36:23,753][02883] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-07-04 19:36:23,755][02883] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-07-04 19:36:23,756][02883] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-07-04 19:36:23,757][02883] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-07-04 19:36:23,758][02883] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-07-04 19:36:23,760][02883] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-07-04 19:36:23,761][02883] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-07-04 19:36:23,762][02883] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-07-04 19:36:23,763][02883] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-07-04 19:36:23,764][02883] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-07-04 19:36:23,766][02883] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-07-04 19:36:23,795][02883] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-07-04 19:36:23,799][02883] RunningMeanStd input shape: (3, 72, 128) +[2024-07-04 19:36:23,801][02883] RunningMeanStd input shape: (1,) +[2024-07-04 19:36:23,816][02883] ConvEncoder: input_channels=3 +[2024-07-04 19:36:23,930][02883] Conv encoder output size: 512 +[2024-07-04 19:36:23,932][02883] Policy head output size: 512 +[2024-07-04 19:36:24,086][02883] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... +[2024-07-04 19:36:24,863][02883] Num frames 100... +[2024-07-04 19:36:24,989][02883] Num frames 200... +[2024-07-04 19:36:25,113][02883] Num frames 300... +[2024-07-04 19:36:25,242][02883] Num frames 400... +[2024-07-04 19:36:25,372][02883] Num frames 500... +[2024-07-04 19:36:25,497][02883] Num frames 600... +[2024-07-04 19:36:25,625][02883] Num frames 700... +[2024-07-04 19:36:25,754][02883] Num frames 800... +[2024-07-04 19:36:25,881][02883] Num frames 900... +[2024-07-04 19:36:26,005][02883] Num frames 1000... +[2024-07-04 19:36:26,129][02883] Num frames 1100... +[2024-07-04 19:36:26,258][02883] Num frames 1200... +[2024-07-04 19:36:26,383][02883] Num frames 1300... +[2024-07-04 19:36:26,510][02883] Num frames 1400... +[2024-07-04 19:36:26,636][02883] Num frames 1500... +[2024-07-04 19:36:26,762][02883] Num frames 1600... +[2024-07-04 19:36:26,888][02883] Num frames 1700... +[2024-07-04 19:36:27,010][02883] Num frames 1800... +[2024-07-04 19:36:27,137][02883] Num frames 1900... +[2024-07-04 19:36:27,262][02883] Num frames 2000... +[2024-07-04 19:36:27,343][02883] Avg episode rewards: #0: 52.199, true rewards: #0: 20.200 +[2024-07-04 19:36:27,344][02883] Avg episode reward: 52.199, avg true_objective: 20.200 +[2024-07-04 19:36:27,449][02883] Num frames 2100... +[2024-07-04 19:36:27,576][02883] Num frames 2200... +[2024-07-04 19:36:27,706][02883] Num frames 2300... +[2024-07-04 19:36:27,834][02883] Num frames 2400... +[2024-07-04 19:36:27,959][02883] Num frames 2500... +[2024-07-04 19:36:28,087][02883] Num frames 2600... +[2024-07-04 19:36:28,216][02883] Num frames 2700... +[2024-07-04 19:36:28,344][02883] Num frames 2800... +[2024-07-04 19:36:28,474][02883] Num frames 2900... +[2024-07-04 19:36:28,600][02883] Num frames 3000... +[2024-07-04 19:36:28,729][02883] Num frames 3100... +[2024-07-04 19:36:28,858][02883] Num frames 3200... +[2024-07-04 19:36:28,987][02883] Num frames 3300... +[2024-07-04 19:36:29,114][02883] Num frames 3400... +[2024-07-04 19:36:29,243][02883] Num frames 3500... +[2024-07-04 19:36:29,324][02883] Avg episode rewards: #0: 47.099, true rewards: #0: 17.600 +[2024-07-04 19:36:29,326][02883] Avg episode reward: 47.099, avg true_objective: 17.600 +[2024-07-04 19:36:29,429][02883] Num frames 3600... +[2024-07-04 19:36:29,556][02883] Num frames 3700... +[2024-07-04 19:36:29,683][02883] Num frames 3800... +[2024-07-04 19:36:29,809][02883] Num frames 3900... +[2024-07-04 19:36:29,936][02883] Num frames 4000... +[2024-07-04 19:36:30,061][02883] Num frames 4100... +[2024-07-04 19:36:30,189][02883] Num frames 4200... +[2024-07-04 19:36:30,319][02883] Num frames 4300... +[2024-07-04 19:36:30,445][02883] Num frames 4400... +[2024-07-04 19:36:30,572][02883] Num frames 4500... +[2024-07-04 19:36:30,696][02883] Num frames 4600... +[2024-07-04 19:36:30,820][02883] Num frames 4700... +[2024-07-04 19:36:30,946][02883] Num frames 4800... +[2024-07-04 19:36:31,078][02883] Avg episode rewards: #0: 42.879, true rewards: #0: 16.213 +[2024-07-04 19:36:31,080][02883] Avg episode reward: 42.879, avg true_objective: 16.213 +[2024-07-04 19:36:31,128][02883] Num frames 4900... +[2024-07-04 19:36:31,257][02883] Num frames 5000... +[2024-07-04 19:36:31,383][02883] Num frames 5100... +[2024-07-04 19:36:31,508][02883] Num frames 5200... +[2024-07-04 19:36:31,635][02883] Num frames 5300... +[2024-07-04 19:36:31,760][02883] Num frames 5400... +[2024-07-04 19:36:31,889][02883] Num frames 5500... +[2024-07-04 19:36:32,014][02883] Num frames 5600... +[2024-07-04 19:36:32,140][02883] Num frames 5700... +[2024-07-04 19:36:32,270][02883] Num frames 5800... +[2024-07-04 19:36:32,398][02883] Num frames 5900... +[2024-07-04 19:36:32,523][02883] Num frames 6000... +[2024-07-04 19:36:32,648][02883] Num frames 6100... +[2024-07-04 19:36:32,775][02883] Num frames 6200... +[2024-07-04 19:36:32,906][02883] Num frames 6300... +[2024-07-04 19:36:33,031][02883] Avg episode rewards: #0: 40.885, true rewards: #0: 15.885 +[2024-07-04 19:36:33,033][02883] Avg episode reward: 40.885, avg true_objective: 15.885 +[2024-07-04 19:36:33,092][02883] Num frames 6400... +[2024-07-04 19:36:33,224][02883] Num frames 6500... +[2024-07-04 19:36:33,350][02883] Num frames 6600... +[2024-07-04 19:36:33,558][02883] Num frames 6700... +[2024-07-04 19:36:33,694][02883] Num frames 6800... +[2024-07-04 19:36:33,831][02883] Num frames 6900... +[2024-07-04 19:36:33,967][02883] Num frames 7000... +[2024-07-04 19:36:34,100][02883] Num frames 7100... +[2024-07-04 19:36:34,227][02883] Num frames 7200... +[2024-07-04 19:36:34,354][02883] Num frames 7300... +[2024-07-04 19:36:34,485][02883] Num frames 7400... +[2024-07-04 19:36:34,619][02883] Num frames 7500... +[2024-07-04 19:36:34,756][02883] Num frames 7600... +[2024-07-04 19:36:34,889][02883] Num frames 7700... +[2024-07-04 19:36:35,025][02883] Num frames 7800... +[2024-07-04 19:36:35,165][02883] Num frames 7900... +[2024-07-04 19:36:35,299][02883] Num frames 8000... +[2024-07-04 19:36:35,437][02883] Num frames 8100... +[2024-07-04 19:36:35,577][02883] Num frames 8200... +[2024-07-04 19:36:35,715][02883] Num frames 8300... +[2024-07-04 19:36:35,850][02883] Num frames 8400... +[2024-07-04 19:36:35,909][02883] Avg episode rewards: #0: 43.603, true rewards: #0: 16.804 +[2024-07-04 19:36:35,910][02883] Avg episode reward: 43.603, avg true_objective: 16.804 +[2024-07-04 19:36:36,036][02883] Num frames 8500... +[2024-07-04 19:36:36,168][02883] Num frames 8600... +[2024-07-04 19:36:36,302][02883] Num frames 8700... +[2024-07-04 19:36:36,431][02883] Num frames 8800... +[2024-07-04 19:36:36,557][02883] Num frames 8900... +[2024-07-04 19:36:36,681][02883] Num frames 9000... +[2024-07-04 19:36:36,816][02883] Num frames 9100... +[2024-07-04 19:36:36,949][02883] Num frames 9200... +[2024-07-04 19:36:37,083][02883] Num frames 9300... +[2024-07-04 19:36:37,223][02883] Num frames 9400... +[2024-07-04 19:36:37,303][02883] Avg episode rewards: #0: 40.363, true rewards: #0: 15.697 +[2024-07-04 19:36:37,304][02883] Avg episode reward: 40.363, avg true_objective: 15.697 +[2024-07-04 19:36:37,412][02883] Num frames 9500... +[2024-07-04 19:36:37,538][02883] Num frames 9600... +[2024-07-04 19:36:37,666][02883] Num frames 9700... +[2024-07-04 19:36:37,789][02883] Num frames 9800... +[2024-07-04 19:36:37,917][02883] Num frames 9900... +[2024-07-04 19:36:38,041][02883] Num frames 10000... +[2024-07-04 19:36:38,170][02883] Num frames 10100... +[2024-07-04 19:36:38,294][02883] Num frames 10200... +[2024-07-04 19:36:38,420][02883] Num frames 10300... +[2024-07-04 19:36:38,544][02883] Num frames 10400... +[2024-07-04 19:36:38,672][02883] Num frames 10500... +[2024-07-04 19:36:38,798][02883] Num frames 10600... +[2024-07-04 19:36:38,929][02883] Num frames 10700... +[2024-07-04 19:36:39,057][02883] Num frames 10800... +[2024-07-04 19:36:39,181][02883] Num frames 10900... +[2024-07-04 19:36:39,306][02883] Num frames 11000... +[2024-07-04 19:36:39,434][02883] Num frames 11100... +[2024-07-04 19:36:39,561][02883] Num frames 11200... +[2024-07-04 19:36:39,686][02883] Num frames 11300... +[2024-07-04 19:36:39,790][02883] Avg episode rewards: #0: 41.197, true rewards: #0: 16.197 +[2024-07-04 19:36:39,791][02883] Avg episode reward: 41.197, avg true_objective: 16.197 +[2024-07-04 19:36:39,873][02883] Num frames 11400... +[2024-07-04 19:36:39,996][02883] Num frames 11500... +[2024-07-04 19:36:40,122][02883] Num frames 11600... +[2024-07-04 19:36:40,247][02883] Num frames 11700... +[2024-07-04 19:36:40,371][02883] Num frames 11800... +[2024-07-04 19:36:40,495][02883] Num frames 11900... +[2024-07-04 19:36:40,622][02883] Num frames 12000... +[2024-07-04 19:36:40,750][02883] Num frames 12100... +[2024-07-04 19:36:40,879][02883] Num frames 12200... +[2024-07-04 19:36:41,012][02883] Num frames 12300... +[2024-07-04 19:36:41,139][02883] Num frames 12400... +[2024-07-04 19:36:41,263][02883] Num frames 12500... +[2024-07-04 19:36:41,390][02883] Num frames 12600... +[2024-07-04 19:36:41,518][02883] Num frames 12700... +[2024-07-04 19:36:41,642][02883] Num frames 12800... +[2024-07-04 19:36:41,770][02883] Avg episode rewards: #0: 41.072, true rewards: #0: 16.072 +[2024-07-04 19:36:41,772][02883] Avg episode reward: 41.072, avg true_objective: 16.072 +[2024-07-04 19:36:41,828][02883] Num frames 12900... +[2024-07-04 19:36:41,954][02883] Num frames 13000... +[2024-07-04 19:36:42,078][02883] Num frames 13100... +[2024-07-04 19:36:42,207][02883] Num frames 13200... +[2024-07-04 19:36:42,335][02883] Num frames 13300... +[2024-07-04 19:36:42,469][02883] Num frames 13400... +[2024-07-04 19:36:42,595][02883] Num frames 13500... +[2024-07-04 19:36:42,724][02883] Num frames 13600... +[2024-07-04 19:36:42,855][02883] Num frames 13700... +[2024-07-04 19:36:42,981][02883] Num frames 13800... +[2024-07-04 19:36:43,110][02883] Num frames 13900... +[2024-07-04 19:36:43,239][02883] Num frames 14000... +[2024-07-04 19:36:43,365][02883] Num frames 14100... +[2024-07-04 19:36:43,454][02883] Avg episode rewards: #0: 40.029, true rewards: #0: 15.696 +[2024-07-04 19:36:43,455][02883] Avg episode reward: 40.029, avg true_objective: 15.696 +[2024-07-04 19:36:43,549][02883] Num frames 14200... +[2024-07-04 19:36:43,675][02883] Num frames 14300... +[2024-07-04 19:36:43,803][02883] Num frames 14400... +[2024-07-04 19:36:43,936][02883] Num frames 14500... +[2024-07-04 19:36:44,070][02883] Num frames 14600... +[2024-07-04 19:36:44,205][02883] Num frames 14700... +[2024-07-04 19:36:44,338][02883] Num frames 14800... +[2024-07-04 19:36:44,472][02883] Num frames 14900... +[2024-07-04 19:36:44,605][02883] Avg episode rewards: #0: 37.958, true rewards: #0: 14.958 +[2024-07-04 19:36:44,607][02883] Avg episode reward: 37.958, avg true_objective: 14.958 +[2024-07-04 19:37:20,099][02883] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-07-04 19:37:20,252][02883] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-07-04 19:37:20,253][02883] Overriding arg 'num_workers' with value 1 passed from command line +[2024-07-04 19:37:20,255][02883] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-07-04 19:37:20,256][02883] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-07-04 19:37:20,258][02883] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-07-04 19:37:20,259][02883] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-07-04 19:37:20,261][02883] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-07-04 19:37:20,263][02883] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-07-04 19:37:20,263][02883] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-07-04 19:37:20,264][02883] Adding new argument 'hf_repository'='Hamze-Hammami/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-07-04 19:37:20,266][02883] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-07-04 19:37:20,268][02883] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-07-04 19:37:20,269][02883] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-07-04 19:37:20,270][02883] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-07-04 19:37:20,270][02883] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-07-04 19:37:20,295][02883] RunningMeanStd input shape: (3, 72, 128) +[2024-07-04 19:37:20,297][02883] RunningMeanStd input shape: (1,) +[2024-07-04 19:37:20,309][02883] ConvEncoder: input_channels=3 +[2024-07-04 19:37:20,347][02883] Conv encoder output size: 512 +[2024-07-04 19:37:20,349][02883] Policy head output size: 512 +[2024-07-04 19:37:20,367][02883] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... +[2024-07-04 19:37:20,795][02883] Num frames 100... +[2024-07-04 19:37:20,946][02883] Num frames 200... +[2024-07-04 19:37:21,093][02883] Num frames 300... +[2024-07-04 19:37:21,224][02883] Num frames 400... +[2024-07-04 19:37:21,361][02883] Num frames 500... +[2024-07-04 19:37:21,493][02883] Num frames 600... +[2024-07-04 19:37:21,626][02883] Num frames 700... +[2024-07-04 19:37:21,763][02883] Num frames 800... +[2024-07-04 19:37:21,900][02883] Num frames 900... +[2024-07-04 19:37:22,039][02883] Num frames 1000... +[2024-07-04 19:37:22,171][02883] Num frames 1100... +[2024-07-04 19:37:22,307][02883] Num frames 1200... +[2024-07-04 19:37:22,443][02883] Num frames 1300... +[2024-07-04 19:37:22,574][02883] Num frames 1400... +[2024-07-04 19:37:22,709][02883] Num frames 1500... +[2024-07-04 19:37:22,844][02883] Num frames 1600... +[2024-07-04 19:37:22,980][02883] Num frames 1700... +[2024-07-04 19:37:23,113][02883] Num frames 1800... +[2024-07-04 19:37:23,246][02883] Num frames 1900... +[2024-07-04 19:37:23,381][02883] Num frames 2000... +[2024-07-04 19:37:23,517][02883] Num frames 2100... +[2024-07-04 19:37:23,569][02883] Avg episode rewards: #0: 54.999, true rewards: #0: 21.000 +[2024-07-04 19:37:23,570][02883] Avg episode reward: 54.999, avg true_objective: 21.000 +[2024-07-04 19:37:23,704][02883] Num frames 2200... +[2024-07-04 19:37:23,836][02883] Num frames 2300... +[2024-07-04 19:37:23,973][02883] Num frames 2400... +[2024-07-04 19:37:24,100][02883] Num frames 2500... +[2024-07-04 19:37:24,226][02883] Num frames 2600... +[2024-07-04 19:37:24,349][02883] Num frames 2700... +[2024-07-04 19:37:24,473][02883] Num frames 2800... +[2024-07-04 19:37:24,572][02883] Avg episode rewards: #0: 35.679, true rewards: #0: 14.180 +[2024-07-04 19:37:24,574][02883] Avg episode reward: 35.679, avg true_objective: 14.180 +[2024-07-04 19:37:24,655][02883] Num frames 2900... +[2024-07-04 19:37:24,778][02883] Num frames 3000... +[2024-07-04 19:37:24,904][02883] Num frames 3100... +[2024-07-04 19:37:25,029][02883] Num frames 3200... +[2024-07-04 19:37:25,155][02883] Num frames 3300... +[2024-07-04 19:37:25,278][02883] Num frames 3400... +[2024-07-04 19:37:25,402][02883] Num frames 3500... +[2024-07-04 19:37:25,526][02883] Num frames 3600... +[2024-07-04 19:37:25,651][02883] Num frames 3700... +[2024-07-04 19:37:25,775][02883] Num frames 3800... +[2024-07-04 19:37:25,900][02883] Num frames 3900... +[2024-07-04 19:37:26,027][02883] Num frames 4000... +[2024-07-04 19:37:26,152][02883] Num frames 4100... +[2024-07-04 19:37:26,277][02883] Num frames 4200... +[2024-07-04 19:37:26,403][02883] Num frames 4300... +[2024-07-04 19:37:26,527][02883] Num frames 4400... +[2024-07-04 19:37:26,651][02883] Num frames 4500... +[2024-07-04 19:37:26,776][02883] Num frames 4600... +[2024-07-04 19:37:26,945][02883] Avg episode rewards: #0: 39.303, true rewards: #0: 15.637 +[2024-07-04 19:37:26,947][02883] Avg episode reward: 39.303, avg true_objective: 15.637 +[2024-07-04 19:37:26,961][02883] Num frames 4700... +[2024-07-04 19:37:27,085][02883] Num frames 4800... +[2024-07-04 19:37:27,218][02883] Num frames 4900... +[2024-07-04 19:37:27,351][02883] Num frames 5000... +[2024-07-04 19:37:27,481][02883] Num frames 5100... +[2024-07-04 19:37:27,614][02883] Num frames 5200... +[2024-07-04 19:37:27,743][02883] Num frames 5300... +[2024-07-04 19:37:27,882][02883] Num frames 5400... +[2024-07-04 19:37:28,016][02883] Num frames 5500... +[2024-07-04 19:37:28,142][02883] Num frames 5600... +[2024-07-04 19:37:28,266][02883] Num frames 5700... +[2024-07-04 19:37:28,392][02883] Num frames 5800... +[2024-07-04 19:37:28,515][02883] Num frames 5900... +[2024-07-04 19:37:28,640][02883] Num frames 6000... +[2024-07-04 19:37:28,763][02883] Num frames 6100... +[2024-07-04 19:37:28,889][02883] Num frames 6200... +[2024-07-04 19:37:29,018][02883] Num frames 6300... +[2024-07-04 19:37:29,156][02883] Num frames 6400... +[2024-07-04 19:37:29,321][02883] Avg episode rewards: #0: 40.957, true rewards: #0: 16.208 +[2024-07-04 19:37:29,323][02883] Avg episode reward: 40.957, avg true_objective: 16.208 +[2024-07-04 19:37:29,346][02883] Num frames 6500... +[2024-07-04 19:37:29,466][02883] Num frames 6600... +[2024-07-04 19:37:29,592][02883] Num frames 6700... +[2024-07-04 19:37:29,717][02883] Num frames 6800... +[2024-07-04 19:37:29,843][02883] Num frames 6900... +[2024-07-04 19:37:29,970][02883] Num frames 7000... +[2024-07-04 19:37:30,095][02883] Num frames 7100... +[2024-07-04 19:37:30,220][02883] Num frames 7200... +[2024-07-04 19:37:30,345][02883] Num frames 7300... +[2024-07-04 19:37:30,470][02883] Num frames 7400... +[2024-07-04 19:37:30,597][02883] Num frames 7500... +[2024-07-04 19:37:30,722][02883] Num frames 7600... +[2024-07-04 19:37:30,821][02883] Avg episode rewards: #0: 38.470, true rewards: #0: 15.270 +[2024-07-04 19:37:30,822][02883] Avg episode reward: 38.470, avg true_objective: 15.270 +[2024-07-04 19:37:30,904][02883] Num frames 7700... +[2024-07-04 19:37:31,029][02883] Num frames 7800... +[2024-07-04 19:37:31,157][02883] Num frames 7900... +[2024-07-04 19:37:31,286][02883] Num frames 8000... +[2024-07-04 19:37:31,413][02883] Num frames 8100... +[2024-07-04 19:37:31,538][02883] Num frames 8200... +[2024-07-04 19:37:31,663][02883] Num frames 8300... +[2024-07-04 19:37:31,735][02883] Avg episode rewards: #0: 34.355, true rewards: #0: 13.855 +[2024-07-04 19:37:31,737][02883] Avg episode reward: 34.355, avg true_objective: 13.855 +[2024-07-04 19:37:31,852][02883] Num frames 8400... +[2024-07-04 19:37:31,977][02883] Num frames 8500... +[2024-07-04 19:37:32,102][02883] Num frames 8600... +[2024-07-04 19:37:32,224][02883] Num frames 8700... +[2024-07-04 19:37:32,351][02883] Num frames 8800... +[2024-07-04 19:37:32,479][02883] Num frames 8900... +[2024-07-04 19:37:32,608][02883] Num frames 9000... +[2024-07-04 19:37:32,736][02883] Num frames 9100... +[2024-07-04 19:37:32,861][02883] Num frames 9200... +[2024-07-04 19:37:32,986][02883] Num frames 9300... +[2024-07-04 19:37:33,110][02883] Num frames 9400... +[2024-07-04 19:37:33,235][02883] Num frames 9500... +[2024-07-04 19:37:33,362][02883] Num frames 9600... +[2024-07-04 19:37:33,489][02883] Num frames 9700... +[2024-07-04 19:37:33,628][02883] Num frames 9800... +[2024-07-04 19:37:33,706][02883] Avg episode rewards: #0: 34.024, true rewards: #0: 14.024 +[2024-07-04 19:37:33,708][02883] Avg episode reward: 34.024, avg true_objective: 14.024 +[2024-07-04 19:37:33,825][02883] Num frames 9900... +[2024-07-04 19:37:33,958][02883] Num frames 10000... +[2024-07-04 19:37:34,085][02883] Num frames 10100... +[2024-07-04 19:37:34,223][02883] Num frames 10200... +[2024-07-04 19:37:34,359][02883] Num frames 10300... +[2024-07-04 19:37:34,487][02883] Num frames 10400... +[2024-07-04 19:37:34,613][02883] Num frames 10500... +[2024-07-04 19:37:34,744][02883] Num frames 10600... +[2024-07-04 19:37:34,942][02883] Num frames 10700... +[2024-07-04 19:37:35,021][02883] Avg episode rewards: #0: 31.891, true rewards: #0: 13.391 +[2024-07-04 19:37:35,023][02883] Avg episode reward: 31.891, avg true_objective: 13.391 +[2024-07-04 19:37:35,131][02883] Num frames 10800... +[2024-07-04 19:37:35,265][02883] Num frames 10900... +[2024-07-04 19:37:35,399][02883] Num frames 11000... +[2024-07-04 19:37:35,532][02883] Num frames 11100... +[2024-07-04 19:37:35,666][02883] Num frames 11200... +[2024-07-04 19:37:35,838][02883] Avg episode rewards: #0: 29.432, true rewards: #0: 12.543 +[2024-07-04 19:37:35,840][02883] Avg episode reward: 29.432, avg true_objective: 12.543 +[2024-07-04 19:37:35,856][02883] Num frames 11300... +[2024-07-04 19:37:35,983][02883] Num frames 11400... +[2024-07-04 19:37:36,109][02883] Num frames 11500... +[2024-07-04 19:37:36,240][02883] Num frames 11600... +[2024-07-04 19:37:36,367][02883] Num frames 11700... +[2024-07-04 19:37:36,492][02883] Num frames 11800... +[2024-07-04 19:37:36,550][02883] Avg episode rewards: #0: 27.401, true rewards: #0: 11.801 +[2024-07-04 19:37:36,551][02883] Avg episode reward: 27.401, avg true_objective: 11.801 +[2024-07-04 19:38:04,539][02883] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-07-04 19:38:35,488][02883] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-07-04 19:38:35,489][02883] Overriding arg 'num_workers' with value 1 passed from command line +[2024-07-04 19:38:35,491][02883] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-07-04 19:38:35,493][02883] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-07-04 19:38:35,494][02883] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-07-04 19:38:35,496][02883] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-07-04 19:38:35,497][02883] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-07-04 19:38:35,499][02883] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-07-04 19:38:35,500][02883] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-07-04 19:38:35,502][02883] Adding new argument 'hf_repository'='Hamze-Hammami/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-07-04 19:38:35,503][02883] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-07-04 19:38:35,504][02883] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-07-04 19:38:35,505][02883] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-07-04 19:38:35,506][02883] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-07-04 19:38:35,507][02883] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-07-04 19:38:35,539][02883] RunningMeanStd input shape: (3, 72, 128) +[2024-07-04 19:38:35,541][02883] RunningMeanStd input shape: (1,) +[2024-07-04 19:38:35,555][02883] ConvEncoder: input_channels=3 +[2024-07-04 19:38:35,594][02883] Conv encoder output size: 512 +[2024-07-04 19:38:35,597][02883] Policy head output size: 512 +[2024-07-04 19:38:35,616][02883] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... +[2024-07-04 19:38:36,050][02883] Num frames 100... +[2024-07-04 19:38:36,177][02883] Num frames 200... +[2024-07-04 19:38:36,303][02883] Num frames 300... +[2024-07-04 19:38:36,502][02883] Num frames 400... +[2024-07-04 19:38:36,628][02883] Num frames 500... +[2024-07-04 19:38:36,759][02883] Num frames 600... +[2024-07-04 19:38:36,894][02883] Num frames 700... +[2024-07-04 19:38:37,028][02883] Num frames 800... +[2024-07-04 19:38:37,161][02883] Num frames 900... +[2024-07-04 19:38:37,294][02883] Num frames 1000... +[2024-07-04 19:38:37,425][02883] Num frames 1100... +[2024-07-04 19:38:37,558][02883] Num frames 1200... +[2024-07-04 19:38:37,693][02883] Num frames 1300... +[2024-07-04 19:38:37,827][02883] Num frames 1400... +[2024-07-04 19:38:37,960][02883] Num frames 1500... +[2024-07-04 19:38:38,098][02883] Num frames 1600... +[2024-07-04 19:38:38,233][02883] Num frames 1700... +[2024-07-04 19:38:38,367][02883] Num frames 1800... +[2024-07-04 19:38:38,500][02883] Num frames 1900... +[2024-07-04 19:38:38,624][02883] Avg episode rewards: #0: 53.519, true rewards: #0: 19.520 +[2024-07-04 19:38:38,625][02883] Avg episode reward: 53.519, avg true_objective: 19.520 +[2024-07-04 19:38:38,686][02883] Num frames 2000... +[2024-07-04 19:38:38,812][02883] Num frames 2100... +[2024-07-04 19:38:38,947][02883] Num frames 2200... +[2024-07-04 19:38:39,073][02883] Num frames 2300... +[2024-07-04 19:38:39,196][02883] Num frames 2400... +[2024-07-04 19:38:39,321][02883] Num frames 2500... +[2024-07-04 19:38:39,447][02883] Num frames 2600... +[2024-07-04 19:38:39,572][02883] Num frames 2700... +[2024-07-04 19:38:39,697][02883] Num frames 2800... +[2024-07-04 19:38:39,825][02883] Num frames 2900... +[2024-07-04 19:38:39,951][02883] Num frames 3000... +[2024-07-04 19:38:40,078][02883] Num frames 3100... +[2024-07-04 19:38:40,203][02883] Num frames 3200... +[2024-07-04 19:38:40,330][02883] Num frames 3300... +[2024-07-04 19:38:40,456][02883] Num frames 3400... +[2024-07-04 19:38:40,583][02883] Num frames 3500... +[2024-07-04 19:38:40,709][02883] Num frames 3600... +[2024-07-04 19:38:40,866][02883] Avg episode rewards: #0: 48.399, true rewards: #0: 18.400 +[2024-07-04 19:38:40,868][02883] Avg episode reward: 48.399, avg true_objective: 18.400 +[2024-07-04 19:38:40,896][02883] Num frames 3700... +[2024-07-04 19:38:41,024][02883] Num frames 3800... +[2024-07-04 19:38:41,154][02883] Num frames 3900... +[2024-07-04 19:38:41,280][02883] Num frames 4000... +[2024-07-04 19:38:41,408][02883] Num frames 4100... +[2024-07-04 19:38:41,538][02883] Num frames 4200... +[2024-07-04 19:38:41,669][02883] Num frames 4300... +[2024-07-04 19:38:41,797][02883] Num frames 4400... +[2024-07-04 19:38:41,926][02883] Num frames 4500... +[2024-07-04 19:38:42,058][02883] Num frames 4600... +[2024-07-04 19:38:42,190][02883] Num frames 4700... +[2024-07-04 19:38:42,327][02883] Num frames 4800... +[2024-07-04 19:38:42,454][02883] Num frames 4900... +[2024-07-04 19:38:42,583][02883] Num frames 5000... +[2024-07-04 19:38:42,720][02883] Num frames 5100... +[2024-07-04 19:38:42,845][02883] Avg episode rewards: #0: 43.839, true rewards: #0: 17.173 +[2024-07-04 19:38:42,846][02883] Avg episode reward: 43.839, avg true_objective: 17.173 +[2024-07-04 19:38:42,921][02883] Num frames 5200... +[2024-07-04 19:38:43,051][02883] Num frames 5300... +[2024-07-04 19:38:43,178][02883] Num frames 5400... +[2024-07-04 19:38:43,307][02883] Num frames 5500... +[2024-07-04 19:38:43,434][02883] Num frames 5600... +[2024-07-04 19:38:43,563][02883] Num frames 5700... +[2024-07-04 19:38:43,687][02883] Num frames 5800... +[2024-07-04 19:38:43,815][02883] Num frames 5900... +[2024-07-04 19:38:43,941][02883] Num frames 6000... +[2024-07-04 19:38:44,069][02883] Num frames 6100... +[2024-07-04 19:38:44,197][02883] Num frames 6200... +[2024-07-04 19:38:44,324][02883] Num frames 6300... +[2024-07-04 19:38:44,449][02883] Num frames 6400... +[2024-07-04 19:38:44,578][02883] Num frames 6500... +[2024-07-04 19:38:44,706][02883] Num frames 6600... +[2024-07-04 19:38:44,827][02883] Avg episode rewards: #0: 42.877, true rewards: #0: 16.627 +[2024-07-04 19:38:44,828][02883] Avg episode reward: 42.877, avg true_objective: 16.627 +[2024-07-04 19:38:44,893][02883] Num frames 6700... +[2024-07-04 19:38:45,018][02883] Num frames 6800... +[2024-07-04 19:38:45,144][02883] Num frames 6900... +[2024-07-04 19:38:45,272][02883] Num frames 7000... +[2024-07-04 19:38:45,401][02883] Num frames 7100... +[2024-07-04 19:38:45,529][02883] Num frames 7200... +[2024-07-04 19:38:45,657][02883] Num frames 7300... +[2024-07-04 19:38:45,784][02883] Num frames 7400... +[2024-07-04 19:38:45,912][02883] Num frames 7500... +[2024-07-04 19:38:46,038][02883] Num frames 7600... +[2024-07-04 19:38:46,165][02883] Num frames 7700... +[2024-07-04 19:38:46,290][02883] Num frames 7800... +[2024-07-04 19:38:46,416][02883] Num frames 7900... +[2024-07-04 19:38:46,543][02883] Num frames 8000... +[2024-07-04 19:38:46,669][02883] Num frames 8100... +[2024-07-04 19:38:46,798][02883] Num frames 8200... +[2024-07-04 19:38:46,935][02883] Num frames 8300... +[2024-07-04 19:38:47,062][02883] Num frames 8400... +[2024-07-04 19:38:47,194][02883] Num frames 8500... +[2024-07-04 19:38:47,323][02883] Num frames 8600... +[2024-07-04 19:38:47,449][02883] Num frames 8700... +[2024-07-04 19:38:47,570][02883] Avg episode rewards: #0: 44.901, true rewards: #0: 17.502 +[2024-07-04 19:38:47,571][02883] Avg episode reward: 44.901, avg true_objective: 17.502 +[2024-07-04 19:38:47,633][02883] Num frames 8800... +[2024-07-04 19:38:47,757][02883] Num frames 8900... +[2024-07-04 19:38:47,883][02883] Num frames 9000... +[2024-07-04 19:38:48,010][02883] Num frames 9100... +[2024-07-04 19:38:48,136][02883] Num frames 9200... +[2024-07-04 19:38:48,262][02883] Num frames 9300... +[2024-07-04 19:38:48,388][02883] Num frames 9400... +[2024-07-04 19:38:48,515][02883] Num frames 9500... +[2024-07-04 19:38:48,595][02883] Avg episode rewards: #0: 40.198, true rewards: #0: 15.865 +[2024-07-04 19:38:48,597][02883] Avg episode reward: 40.198, avg true_objective: 15.865 +[2024-07-04 19:38:48,700][02883] Num frames 9600... +[2024-07-04 19:38:48,825][02883] Num frames 9700... +[2024-07-04 19:38:48,948][02883] Num frames 9800... +[2024-07-04 19:38:49,075][02883] Num frames 9900... +[2024-07-04 19:38:49,168][02883] Avg episode rewards: #0: 35.470, true rewards: #0: 14.184 +[2024-07-04 19:38:49,169][02883] Avg episode reward: 35.470, avg true_objective: 14.184 +[2024-07-04 19:38:49,259][02883] Num frames 10000... +[2024-07-04 19:38:49,384][02883] Num frames 10100... +[2024-07-04 19:38:49,511][02883] Num frames 10200... +[2024-07-04 19:38:49,636][02883] Num frames 10300... +[2024-07-04 19:38:49,760][02883] Num frames 10400... +[2024-07-04 19:38:49,887][02883] Num frames 10500... +[2024-07-04 19:38:50,016][02883] Num frames 10600... +[2024-07-04 19:38:50,140][02883] Num frames 10700... +[2024-07-04 19:38:50,267][02883] Num frames 10800... +[2024-07-04 19:38:50,392][02883] Num frames 10900... +[2024-07-04 19:38:50,517][02883] Num frames 11000... +[2024-07-04 19:38:50,646][02883] Num frames 11100... +[2024-07-04 19:38:50,759][02883] Avg episode rewards: #0: 34.306, true rewards: #0: 13.931 +[2024-07-04 19:38:50,761][02883] Avg episode reward: 34.306, avg true_objective: 13.931 +[2024-07-04 19:38:50,832][02883] Num frames 11200... +[2024-07-04 19:38:50,959][02883] Num frames 11300... +[2024-07-04 19:38:51,086][02883] Num frames 11400... +[2024-07-04 19:38:51,212][02883] Num frames 11500... +[2024-07-04 19:38:51,340][02883] Num frames 11600... +[2024-07-04 19:38:51,466][02883] Num frames 11700... +[2024-07-04 19:38:51,591][02883] Num frames 11800... +[2024-07-04 19:38:51,718][02883] Num frames 11900... +[2024-07-04 19:38:51,844][02883] Num frames 12000... +[2024-07-04 19:38:51,970][02883] Num frames 12100... +[2024-07-04 19:38:52,097][02883] Num frames 12200... +[2024-07-04 19:38:52,224][02883] Num frames 12300... +[2024-07-04 19:38:52,349][02883] Num frames 12400... +[2024-07-04 19:38:52,485][02883] Num frames 12500... +[2024-07-04 19:38:52,612][02883] Num frames 12600... +[2024-07-04 19:38:52,739][02883] Num frames 12700... +[2024-07-04 19:38:52,871][02883] Num frames 12800... +[2024-07-04 19:38:53,000][02883] Num frames 12900... +[2024-07-04 19:38:53,136][02883] Num frames 13000... +[2024-07-04 19:38:53,264][02883] Num frames 13100... +[2024-07-04 19:38:53,394][02883] Num frames 13200... +[2024-07-04 19:38:53,508][02883] Avg episode rewards: #0: 36.827, true rewards: #0: 14.717 +[2024-07-04 19:38:53,510][02883] Avg episode reward: 36.827, avg true_objective: 14.717 +[2024-07-04 19:38:53,584][02883] Num frames 13300... +[2024-07-04 19:38:53,716][02883] Num frames 13400... +[2024-07-04 19:38:53,850][02883] Num frames 13500... +[2024-07-04 19:38:53,987][02883] Num frames 13600... +[2024-07-04 19:38:54,126][02883] Num frames 13700... +[2024-07-04 19:38:54,265][02883] Num frames 13800... +[2024-07-04 19:38:54,394][02883] Num frames 13900... +[2024-07-04 19:38:54,541][02883] Num frames 14000... +[2024-07-04 19:38:54,671][02883] Num frames 14100... +[2024-07-04 19:38:54,796][02883] Num frames 14200... +[2024-07-04 19:38:54,925][02883] Num frames 14300... +[2024-07-04 19:38:55,057][02883] Num frames 14400... +[2024-07-04 19:38:55,184][02883] Num frames 14500... +[2024-07-04 19:38:55,314][02883] Num frames 14600... +[2024-07-04 19:38:55,442][02883] Num frames 14700... +[2024-07-04 19:38:55,568][02883] Num frames 14800... +[2024-07-04 19:38:55,697][02883] Num frames 14900... +[2024-07-04 19:38:55,824][02883] Num frames 15000... +[2024-07-04 19:38:55,976][02883] Avg episode rewards: #0: 37.769, true rewards: #0: 15.069 +[2024-07-04 19:38:55,978][02883] Avg episode reward: 37.769, avg true_objective: 15.069 +[2024-07-04 19:39:31,616][02883] Replay video saved to /content/train_dir/default_experiment/replay.mp4!