[2024-08-10 13:17:36,250][00331] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-08-10 13:17:36,253][00331] Rollout worker 0 uses device cpu [2024-08-10 13:17:36,255][00331] Rollout worker 1 uses device cpu [2024-08-10 13:17:36,256][00331] Rollout worker 2 uses device cpu [2024-08-10 13:17:36,258][00331] Rollout worker 3 uses device cpu [2024-08-10 13:17:36,259][00331] Rollout worker 4 uses device cpu [2024-08-10 13:17:36,260][00331] Rollout worker 5 uses device cpu [2024-08-10 13:17:36,261][00331] Rollout worker 6 uses device cpu [2024-08-10 13:17:36,262][00331] Rollout worker 7 uses device cpu [2024-08-10 13:17:36,405][00331] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-10 13:17:36,406][00331] InferenceWorker_p0-w0: min num requests: 2 [2024-08-10 13:17:36,439][00331] Starting all processes... [2024-08-10 13:17:36,441][00331] Starting process learner_proc0 [2024-08-10 13:17:36,489][00331] Starting all processes... [2024-08-10 13:17:36,498][00331] Starting process inference_proc0-0 [2024-08-10 13:17:36,498][00331] Starting process rollout_proc0 [2024-08-10 13:17:36,499][00331] Starting process rollout_proc1 [2024-08-10 13:17:36,499][00331] Starting process rollout_proc2 [2024-08-10 13:17:36,499][00331] Starting process rollout_proc3 [2024-08-10 13:17:36,499][00331] Starting process rollout_proc4 [2024-08-10 13:17:36,499][00331] Starting process rollout_proc5 [2024-08-10 13:17:36,499][00331] Starting process rollout_proc6 [2024-08-10 13:17:36,499][00331] Starting process rollout_proc7 [2024-08-10 13:17:47,763][04466] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-10 13:17:47,770][04466] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-08-10 13:17:47,842][04466] Num visible devices: 1 [2024-08-10 13:17:47,885][04466] Starting seed is not provided [2024-08-10 13:17:47,886][04466] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-10 13:17:47,886][04466] Initializing actor-critic model on device cuda:0 [2024-08-10 13:17:47,887][04466] RunningMeanStd input shape: (3, 72, 128) [2024-08-10 13:17:47,889][04466] RunningMeanStd input shape: (1,) [2024-08-10 13:17:47,958][04484] Worker 5 uses CPU cores [1] [2024-08-10 13:17:47,964][04466] ConvEncoder: input_channels=3 [2024-08-10 13:17:47,973][04482] Worker 2 uses CPU cores [0] [2024-08-10 13:17:48,019][04483] Worker 3 uses CPU cores [1] [2024-08-10 13:17:48,032][04487] Worker 7 uses CPU cores [1] [2024-08-10 13:17:48,139][04479] Worker 0 uses CPU cores [0] [2024-08-10 13:17:48,171][04485] Worker 4 uses CPU cores [0] [2024-08-10 13:17:48,203][04481] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-10 13:17:48,204][04481] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-08-10 13:17:48,232][04481] Num visible devices: 1 [2024-08-10 13:17:48,237][04480] Worker 1 uses CPU cores [1] [2024-08-10 13:17:48,246][04486] Worker 6 uses CPU cores [0] [2024-08-10 13:17:48,333][04466] Conv encoder output size: 512 [2024-08-10 13:17:48,333][04466] Policy head output size: 512 [2024-08-10 13:17:48,348][04466] Created Actor Critic model with architecture: [2024-08-10 13:17:48,349][04466] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-08-10 13:17:52,240][04466] Using optimizer [2024-08-10 13:17:52,241][04466] No checkpoints found [2024-08-10 13:17:52,242][04466] Did not load from checkpoint, starting from scratch! [2024-08-10 13:17:52,242][04466] Initialized policy 0 weights for model version 0 [2024-08-10 13:17:52,246][04466] LearnerWorker_p0 finished initialization! [2024-08-10 13:17:52,248][04466] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-10 13:17:52,349][04481] RunningMeanStd input shape: (3, 72, 128) [2024-08-10 13:17:52,351][04481] RunningMeanStd input shape: (1,) [2024-08-10 13:17:52,371][04481] ConvEncoder: input_channels=3 [2024-08-10 13:17:52,480][04481] Conv encoder output size: 512 [2024-08-10 13:17:52,480][04481] Policy head output size: 512 [2024-08-10 13:17:52,889][00331] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-10 13:17:54,121][00331] Inference worker 0-0 is ready! [2024-08-10 13:17:54,126][00331] All inference workers are ready! Signal rollout workers to start! [2024-08-10 13:17:54,290][04482] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-10 13:17:54,294][04479] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-10 13:17:54,305][04487] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-10 13:17:54,304][04484] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-10 13:17:54,320][04483] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-10 13:17:54,325][04480] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-10 13:17:54,334][04486] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-10 13:17:54,338][04485] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-10 13:17:55,790][04483] Decorrelating experience for 0 frames... [2024-08-10 13:17:55,789][04482] Decorrelating experience for 0 frames... [2024-08-10 13:17:55,792][04487] Decorrelating experience for 0 frames... [2024-08-10 13:17:55,793][04484] Decorrelating experience for 0 frames... [2024-08-10 13:17:56,398][00331] Heartbeat connected on Batcher_0 [2024-08-10 13:17:56,401][00331] Heartbeat connected on LearnerWorker_p0 [2024-08-10 13:17:56,431][00331] Heartbeat connected on InferenceWorker_p0-w0 [2024-08-10 13:17:56,729][04483] Decorrelating experience for 32 frames... [2024-08-10 13:17:56,730][04487] Decorrelating experience for 32 frames... [2024-08-10 13:17:57,008][04486] Decorrelating experience for 0 frames... [2024-08-10 13:17:57,028][04482] Decorrelating experience for 32 frames... [2024-08-10 13:17:57,749][04486] Decorrelating experience for 32 frames... [2024-08-10 13:17:57,889][00331] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-10 13:17:58,027][04480] Decorrelating experience for 0 frames... [2024-08-10 13:17:58,041][04483] Decorrelating experience for 64 frames... [2024-08-10 13:17:59,146][04480] Decorrelating experience for 32 frames... [2024-08-10 13:17:59,230][04483] Decorrelating experience for 96 frames... [2024-08-10 13:17:59,355][04486] Decorrelating experience for 64 frames... [2024-08-10 13:17:59,375][00331] Heartbeat connected on RolloutWorker_w3 [2024-08-10 13:17:59,536][04485] Decorrelating experience for 0 frames... [2024-08-10 13:17:59,545][04482] Decorrelating experience for 64 frames... [2024-08-10 13:18:00,118][04485] Decorrelating experience for 32 frames... [2024-08-10 13:18:00,290][04480] Decorrelating experience for 64 frames... [2024-08-10 13:18:00,714][04487] Decorrelating experience for 64 frames... [2024-08-10 13:18:00,712][04484] Decorrelating experience for 32 frames... [2024-08-10 13:18:01,167][04485] Decorrelating experience for 64 frames... [2024-08-10 13:18:01,608][04482] Decorrelating experience for 96 frames... [2024-08-10 13:18:01,969][00331] Heartbeat connected on RolloutWorker_w2 [2024-08-10 13:18:02,028][04487] Decorrelating experience for 96 frames... [2024-08-10 13:18:02,169][04484] Decorrelating experience for 64 frames... [2024-08-10 13:18:02,327][00331] Heartbeat connected on RolloutWorker_w7 [2024-08-10 13:18:02,452][04480] Decorrelating experience for 96 frames... [2024-08-10 13:18:02,476][04479] Decorrelating experience for 0 frames... [2024-08-10 13:18:02,687][00331] Heartbeat connected on RolloutWorker_w1 [2024-08-10 13:18:02,889][00331] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 1.6. Samples: 16. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-10 13:18:03,649][04485] Decorrelating experience for 96 frames... [2024-08-10 13:18:03,915][00331] Heartbeat connected on RolloutWorker_w4 [2024-08-10 13:18:04,197][04486] Decorrelating experience for 96 frames... [2024-08-10 13:18:04,796][00331] Heartbeat connected on RolloutWorker_w6 [2024-08-10 13:18:05,280][04484] Decorrelating experience for 96 frames... [2024-08-10 13:18:05,893][04479] Decorrelating experience for 32 frames... [2024-08-10 13:18:05,895][00331] Heartbeat connected on RolloutWorker_w5 [2024-08-10 13:18:06,331][04466] Signal inference workers to stop experience collection... [2024-08-10 13:18:06,344][04481] InferenceWorker_p0-w0: stopping experience collection [2024-08-10 13:18:06,710][04479] Decorrelating experience for 64 frames... [2024-08-10 13:18:06,998][04479] Decorrelating experience for 96 frames... [2024-08-10 13:18:07,059][00331] Heartbeat connected on RolloutWorker_w0 [2024-08-10 13:18:07,889][00331] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 152.5. Samples: 2288. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-10 13:18:07,891][00331] Avg episode reward: [(0, '2.805')] [2024-08-10 13:18:08,178][04466] Signal inference workers to resume experience collection... [2024-08-10 13:18:08,180][04481] InferenceWorker_p0-w0: resuming experience collection [2024-08-10 13:18:12,895][00331] Fps is (10 sec: 1637.5, 60 sec: 819.0, 300 sec: 819.0). Total num frames: 16384. Throughput: 0: 234.9. Samples: 4700. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-08-10 13:18:12,897][00331] Avg episode reward: [(0, '3.392')] [2024-08-10 13:18:17,889][00331] Fps is (10 sec: 2867.2, 60 sec: 1146.9, 300 sec: 1146.9). Total num frames: 28672. Throughput: 0: 275.2. Samples: 6880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:18:17,892][00331] Avg episode reward: [(0, '3.910')] [2024-08-10 13:18:19,583][04481] Updated weights for policy 0, policy_version 10 (0.0013) [2024-08-10 13:18:22,889][00331] Fps is (10 sec: 3688.5, 60 sec: 1774.9, 300 sec: 1774.9). Total num frames: 53248. Throughput: 0: 420.7. Samples: 12620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-10 13:18:22,891][00331] Avg episode reward: [(0, '4.490')] [2024-08-10 13:18:27,893][00331] Fps is (10 sec: 4504.0, 60 sec: 2106.3, 300 sec: 2106.3). Total num frames: 73728. Throughput: 0: 554.3. Samples: 19402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-10 13:18:27,895][00331] Avg episode reward: [(0, '4.637')] [2024-08-10 13:18:29,544][04481] Updated weights for policy 0, policy_version 20 (0.0023) [2024-08-10 13:18:32,889][00331] Fps is (10 sec: 3686.3, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 90112. Throughput: 0: 538.7. Samples: 21548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:18:32,892][00331] Avg episode reward: [(0, '4.592')] [2024-08-10 13:18:37,891][00331] Fps is (10 sec: 3687.0, 60 sec: 2457.5, 300 sec: 2457.5). Total num frames: 110592. Throughput: 0: 599.7. Samples: 26986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-10 13:18:37,898][00331] Avg episode reward: [(0, '4.304')] [2024-08-10 13:18:37,902][04466] Saving new best policy, reward=4.304! [2024-08-10 13:18:40,030][04481] Updated weights for policy 0, policy_version 30 (0.0015) [2024-08-10 13:18:42,889][00331] Fps is (10 sec: 4505.7, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 135168. Throughput: 0: 754.0. Samples: 33928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:18:42,891][00331] Avg episode reward: [(0, '4.329')] [2024-08-10 13:18:42,898][04466] Saving new best policy, reward=4.329! [2024-08-10 13:18:47,889][00331] Fps is (10 sec: 4096.8, 60 sec: 2755.5, 300 sec: 2755.5). Total num frames: 151552. Throughput: 0: 813.1. Samples: 36604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:18:47,893][00331] Avg episode reward: [(0, '4.316')] [2024-08-10 13:18:51,611][04481] Updated weights for policy 0, policy_version 40 (0.0027) [2024-08-10 13:18:52,889][00331] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2798.9). Total num frames: 167936. Throughput: 0: 859.9. Samples: 40982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-10 13:18:52,891][00331] Avg episode reward: [(0, '4.249')] [2024-08-10 13:18:57,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 2961.7). Total num frames: 192512. Throughput: 0: 960.6. Samples: 47920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:18:57,892][00331] Avg episode reward: [(0, '4.225')] [2024-08-10 13:19:00,287][04481] Updated weights for policy 0, policy_version 50 (0.0023) [2024-08-10 13:19:02,889][00331] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3042.7). Total num frames: 212992. Throughput: 0: 990.3. Samples: 51444. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-10 13:19:02,891][00331] Avg episode reward: [(0, '4.236')] [2024-08-10 13:19:07,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3003.7). Total num frames: 225280. Throughput: 0: 969.0. Samples: 56224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:19:07,891][00331] Avg episode reward: [(0, '4.302')] [2024-08-10 13:19:11,848][04481] Updated weights for policy 0, policy_version 60 (0.0023) [2024-08-10 13:19:12,890][00331] Fps is (10 sec: 3686.0, 60 sec: 3891.5, 300 sec: 3123.2). Total num frames: 249856. Throughput: 0: 953.8. Samples: 62322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:19:12,893][00331] Avg episode reward: [(0, '4.249')] [2024-08-10 13:19:17,892][00331] Fps is (10 sec: 4504.6, 60 sec: 4027.6, 300 sec: 3180.3). Total num frames: 270336. Throughput: 0: 985.1. Samples: 65878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:19:17,895][00331] Avg episode reward: [(0, '4.590')] [2024-08-10 13:19:17,900][04466] Saving new best policy, reward=4.590! [2024-08-10 13:19:22,178][04481] Updated weights for policy 0, policy_version 70 (0.0020) [2024-08-10 13:19:22,889][00331] Fps is (10 sec: 3686.8, 60 sec: 3891.2, 300 sec: 3185.8). Total num frames: 286720. Throughput: 0: 986.4. Samples: 71370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:19:22,892][00331] Avg episode reward: [(0, '4.670')] [2024-08-10 13:19:22,902][04466] Saving new best policy, reward=4.670! [2024-08-10 13:19:27,889][00331] Fps is (10 sec: 3277.6, 60 sec: 3823.2, 300 sec: 3190.6). Total num frames: 303104. Throughput: 0: 944.4. Samples: 76426. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-10 13:19:27,894][00331] Avg episode reward: [(0, '4.267')] [2024-08-10 13:19:32,186][04481] Updated weights for policy 0, policy_version 80 (0.0018) [2024-08-10 13:19:32,889][00331] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3276.8). Total num frames: 327680. Throughput: 0: 961.8. Samples: 79884. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-10 13:19:32,892][00331] Avg episode reward: [(0, '4.412')] [2024-08-10 13:19:32,900][04466] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000080_327680.pth... [2024-08-10 13:19:37,889][00331] Fps is (10 sec: 4505.6, 60 sec: 3959.6, 300 sec: 3315.8). Total num frames: 348160. Throughput: 0: 1012.8. Samples: 86560. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-10 13:19:37,892][00331] Avg episode reward: [(0, '4.499')] [2024-08-10 13:19:42,889][00331] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 360448. Throughput: 0: 955.4. Samples: 90912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:19:42,892][00331] Avg episode reward: [(0, '4.598')] [2024-08-10 13:19:44,034][04481] Updated weights for policy 0, policy_version 90 (0.0029) [2024-08-10 13:19:47,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3348.0). Total num frames: 385024. Throughput: 0: 948.0. Samples: 94104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-10 13:19:47,892][00331] Avg episode reward: [(0, '4.595')] [2024-08-10 13:19:52,848][04481] Updated weights for policy 0, policy_version 100 (0.0012) [2024-08-10 13:19:52,889][00331] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3413.3). Total num frames: 409600. Throughput: 0: 995.0. Samples: 100998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:19:52,894][00331] Avg episode reward: [(0, '4.450')] [2024-08-10 13:19:57,893][00331] Fps is (10 sec: 3685.1, 60 sec: 3822.7, 300 sec: 3375.0). Total num frames: 421888. Throughput: 0: 974.1. Samples: 106160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:19:57,895][00331] Avg episode reward: [(0, '4.486')] [2024-08-10 13:20:02,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3402.8). Total num frames: 442368. Throughput: 0: 943.3. Samples: 108324. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-10 13:20:02,892][00331] Avg episode reward: [(0, '4.415')] [2024-08-10 13:20:04,145][04481] Updated weights for policy 0, policy_version 110 (0.0020) [2024-08-10 13:20:07,889][00331] Fps is (10 sec: 4507.3, 60 sec: 4027.7, 300 sec: 3458.8). Total num frames: 466944. Throughput: 0: 977.9. Samples: 115376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-10 13:20:07,892][00331] Avg episode reward: [(0, '4.499')] [2024-08-10 13:20:12,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3452.3). Total num frames: 483328. Throughput: 0: 1001.1. Samples: 121476. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-10 13:20:12,892][00331] Avg episode reward: [(0, '4.522')] [2024-08-10 13:20:14,866][04481] Updated weights for policy 0, policy_version 120 (0.0022) [2024-08-10 13:20:17,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3446.3). Total num frames: 499712. Throughput: 0: 972.5. Samples: 123646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:20:17,894][00331] Avg episode reward: [(0, '4.433')] [2024-08-10 13:20:22,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3467.9). Total num frames: 520192. Throughput: 0: 956.2. Samples: 129588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:20:22,891][00331] Avg episode reward: [(0, '4.583')] [2024-08-10 13:20:24,768][04481] Updated weights for policy 0, policy_version 130 (0.0026) [2024-08-10 13:20:27,889][00331] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3514.6). Total num frames: 544768. Throughput: 0: 1017.2. Samples: 136686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-08-10 13:20:27,896][00331] Avg episode reward: [(0, '4.602')] [2024-08-10 13:20:32,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3507.2). Total num frames: 561152. Throughput: 0: 993.6. Samples: 138816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:20:32,894][00331] Avg episode reward: [(0, '4.624')] [2024-08-10 13:20:36,177][04481] Updated weights for policy 0, policy_version 140 (0.0015) [2024-08-10 13:20:37,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3500.2). Total num frames: 577536. Throughput: 0: 951.8. Samples: 143828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:20:37,891][00331] Avg episode reward: [(0, '4.347')] [2024-08-10 13:20:42,889][00331] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3541.8). Total num frames: 602112. Throughput: 0: 994.3. Samples: 150900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:20:42,892][00331] Avg episode reward: [(0, '4.168')] [2024-08-10 13:20:45,374][04481] Updated weights for policy 0, policy_version 150 (0.0023) [2024-08-10 13:20:47,891][00331] Fps is (10 sec: 4095.3, 60 sec: 3891.1, 300 sec: 3534.2). Total num frames: 618496. Throughput: 0: 1018.2. Samples: 154146. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-08-10 13:20:47,894][00331] Avg episode reward: [(0, '4.270')] [2024-08-10 13:20:52,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3527.1). Total num frames: 634880. Throughput: 0: 957.6. Samples: 158468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:20:52,891][00331] Avg episode reward: [(0, '4.486')] [2024-08-10 13:20:56,354][04481] Updated weights for policy 0, policy_version 160 (0.0037) [2024-08-10 13:20:57,889][00331] Fps is (10 sec: 4096.7, 60 sec: 3959.7, 300 sec: 3564.6). Total num frames: 659456. Throughput: 0: 972.1. Samples: 165220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-10 13:20:57,892][00331] Avg episode reward: [(0, '4.596')] [2024-08-10 13:21:02,889][00331] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3578.6). Total num frames: 679936. Throughput: 0: 1000.9. Samples: 168686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-08-10 13:21:02,893][00331] Avg episode reward: [(0, '4.478')] [2024-08-10 13:21:07,421][04481] Updated weights for policy 0, policy_version 170 (0.0023) [2024-08-10 13:21:07,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3570.9). Total num frames: 696320. Throughput: 0: 982.5. Samples: 173802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-10 13:21:07,892][00331] Avg episode reward: [(0, '4.554')] [2024-08-10 13:21:12,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3584.0). Total num frames: 716800. Throughput: 0: 952.7. Samples: 179556. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-08-10 13:21:12,896][00331] Avg episode reward: [(0, '4.553')] [2024-08-10 13:21:16,786][04481] Updated weights for policy 0, policy_version 180 (0.0017) [2024-08-10 13:21:17,889][00331] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3616.5). Total num frames: 741376. Throughput: 0: 983.3. Samples: 183064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-10 13:21:17,891][00331] Avg episode reward: [(0, '4.440')] [2024-08-10 13:21:22,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3608.4). Total num frames: 757760. Throughput: 0: 1005.4. Samples: 189070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:21:22,899][00331] Avg episode reward: [(0, '4.539')] [2024-08-10 13:21:27,889][00331] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3600.7). Total num frames: 774144. Throughput: 0: 953.2. Samples: 193796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-10 13:21:27,893][00331] Avg episode reward: [(0, '4.631')] [2024-08-10 13:21:28,260][04481] Updated weights for policy 0, policy_version 190 (0.0021) [2024-08-10 13:21:32,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3630.5). Total num frames: 798720. Throughput: 0: 957.3. Samples: 197222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:21:32,892][00331] Avg episode reward: [(0, '4.561')] [2024-08-10 13:21:32,901][04466] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000195_798720.pth... [2024-08-10 13:21:37,842][04481] Updated weights for policy 0, policy_version 200 (0.0029) [2024-08-10 13:21:37,889][00331] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3640.9). Total num frames: 819200. Throughput: 0: 1014.9. Samples: 204140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:21:37,895][00331] Avg episode reward: [(0, '4.374')] [2024-08-10 13:21:42,897][00331] Fps is (10 sec: 3274.2, 60 sec: 3822.4, 300 sec: 3615.0). Total num frames: 831488. Throughput: 0: 961.5. Samples: 208494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:21:42,903][00331] Avg episode reward: [(0, '4.390')] [2024-08-10 13:21:47,889][00331] Fps is (10 sec: 3686.5, 60 sec: 3959.6, 300 sec: 3642.8). Total num frames: 856064. Throughput: 0: 948.7. Samples: 211378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:21:47,892][00331] Avg episode reward: [(0, '4.309')] [2024-08-10 13:21:48,812][04481] Updated weights for policy 0, policy_version 210 (0.0026) [2024-08-10 13:21:52,889][00331] Fps is (10 sec: 4509.2, 60 sec: 4027.7, 300 sec: 3652.3). Total num frames: 876544. Throughput: 0: 990.1. Samples: 218358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:21:52,892][00331] Avg episode reward: [(0, '4.369')] [2024-08-10 13:21:57,891][00331] Fps is (10 sec: 3685.8, 60 sec: 3891.1, 300 sec: 3644.6). Total num frames: 892928. Throughput: 0: 982.9. Samples: 223788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:21:57,896][00331] Avg episode reward: [(0, '4.458')] [2024-08-10 13:22:00,077][04481] Updated weights for policy 0, policy_version 220 (0.0032) [2024-08-10 13:22:02,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3653.6). Total num frames: 913408. Throughput: 0: 952.4. Samples: 225920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:22:02,896][00331] Avg episode reward: [(0, '4.529')] [2024-08-10 13:22:07,889][00331] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3662.3). Total num frames: 933888. Throughput: 0: 969.8. Samples: 232710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:22:07,895][00331] Avg episode reward: [(0, '4.629')] [2024-08-10 13:22:09,030][04481] Updated weights for policy 0, policy_version 230 (0.0026) [2024-08-10 13:22:12,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3670.6). Total num frames: 954368. Throughput: 0: 1005.8. Samples: 239058. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-10 13:22:12,894][00331] Avg episode reward: [(0, '4.518')] [2024-08-10 13:22:17,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3647.8). Total num frames: 966656. Throughput: 0: 976.9. Samples: 241184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:22:17,895][00331] Avg episode reward: [(0, '4.576')] [2024-08-10 13:22:20,588][04481] Updated weights for policy 0, policy_version 240 (0.0013) [2024-08-10 13:22:22,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3671.2). Total num frames: 991232. Throughput: 0: 949.6. Samples: 246872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:22:22,892][00331] Avg episode reward: [(0, '4.611')] [2024-08-10 13:22:27,889][00331] Fps is (10 sec: 4915.2, 60 sec: 4027.8, 300 sec: 3693.8). Total num frames: 1015808. Throughput: 0: 1008.9. Samples: 253886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:22:27,893][00331] Avg episode reward: [(0, '4.675')] [2024-08-10 13:22:27,896][04466] Saving new best policy, reward=4.675! [2024-08-10 13:22:30,557][04481] Updated weights for policy 0, policy_version 250 (0.0016) [2024-08-10 13:22:32,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3671.8). Total num frames: 1028096. Throughput: 0: 998.0. Samples: 256288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:22:32,896][00331] Avg episode reward: [(0, '4.555')] [2024-08-10 13:22:37,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3679.2). Total num frames: 1048576. Throughput: 0: 948.9. Samples: 261060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:22:37,897][00331] Avg episode reward: [(0, '4.503')] [2024-08-10 13:22:41,069][04481] Updated weights for policy 0, policy_version 260 (0.0016) [2024-08-10 13:22:42,889][00331] Fps is (10 sec: 4505.6, 60 sec: 4028.3, 300 sec: 3700.5). Total num frames: 1073152. Throughput: 0: 983.6. Samples: 268050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:22:42,892][00331] Avg episode reward: [(0, '4.534')] [2024-08-10 13:22:47,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3693.3). Total num frames: 1089536. Throughput: 0: 1013.2. Samples: 271516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:22:47,892][00331] Avg episode reward: [(0, '4.620')] [2024-08-10 13:22:52,674][04481] Updated weights for policy 0, policy_version 270 (0.0034) [2024-08-10 13:22:52,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1105920. Throughput: 0: 956.6. Samples: 275758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:22:52,892][00331] Avg episode reward: [(0, '4.661')] [2024-08-10 13:22:57,893][00331] Fps is (10 sec: 3685.0, 60 sec: 3891.1, 300 sec: 3818.3). Total num frames: 1126400. Throughput: 0: 960.9. Samples: 282300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:22:57,896][00331] Avg episode reward: [(0, '4.591')] [2024-08-10 13:23:01,567][04481] Updated weights for policy 0, policy_version 280 (0.0022) [2024-08-10 13:23:02,891][00331] Fps is (10 sec: 4504.7, 60 sec: 3959.3, 300 sec: 3901.6). Total num frames: 1150976. Throughput: 0: 990.3. Samples: 285748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-10 13:23:02,895][00331] Avg episode reward: [(0, '4.492')] [2024-08-10 13:23:07,892][00331] Fps is (10 sec: 3686.8, 60 sec: 3822.8, 300 sec: 3887.8). Total num frames: 1163264. Throughput: 0: 981.3. Samples: 291032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:23:07,894][00331] Avg episode reward: [(0, '4.366')] [2024-08-10 13:23:12,851][04481] Updated weights for policy 0, policy_version 290 (0.0018) [2024-08-10 13:23:12,889][00331] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 1187840. Throughput: 0: 947.5. Samples: 296522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:23:12,891][00331] Avg episode reward: [(0, '4.541')] [2024-08-10 13:23:17,889][00331] Fps is (10 sec: 4506.8, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 1208320. Throughput: 0: 971.9. Samples: 300024. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-10 13:23:17,896][00331] Avg episode reward: [(0, '4.645')] [2024-08-10 13:23:22,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.7). Total num frames: 1224704. Throughput: 0: 1006.0. Samples: 306328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:23:22,896][00331] Avg episode reward: [(0, '4.761')] [2024-08-10 13:23:22,912][04466] Saving new best policy, reward=4.761! [2024-08-10 13:23:23,209][04481] Updated weights for policy 0, policy_version 300 (0.0018) [2024-08-10 13:23:27,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 1241088. Throughput: 0: 946.8. Samples: 310656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:23:27,892][00331] Avg episode reward: [(0, '4.702')] [2024-08-10 13:23:32,891][00331] Fps is (10 sec: 4095.3, 60 sec: 3959.4, 300 sec: 3915.5). Total num frames: 1265664. Throughput: 0: 946.7. Samples: 314120. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-10 13:23:32,893][00331] Avg episode reward: [(0, '4.522')] [2024-08-10 13:23:32,908][04466] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000309_1265664.pth... [2024-08-10 13:23:33,039][04466] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000080_327680.pth [2024-08-10 13:23:33,498][04481] Updated weights for policy 0, policy_version 310 (0.0023) [2024-08-10 13:23:37,889][00331] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1286144. Throughput: 0: 1006.3. Samples: 321042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:23:37,892][00331] Avg episode reward: [(0, '4.575')] [2024-08-10 13:23:42,889][00331] Fps is (10 sec: 3687.0, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 1302528. Throughput: 0: 968.6. Samples: 325882. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-10 13:23:42,891][00331] Avg episode reward: [(0, '4.837')] [2024-08-10 13:23:42,908][04466] Saving new best policy, reward=4.837! [2024-08-10 13:23:44,942][04481] Updated weights for policy 0, policy_version 320 (0.0020) [2024-08-10 13:23:47,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1323008. Throughput: 0: 947.3. Samples: 328374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:23:47,895][00331] Avg episode reward: [(0, '4.845')] [2024-08-10 13:23:47,898][04466] Saving new best policy, reward=4.845! [2024-08-10 13:23:52,892][00331] Fps is (10 sec: 4094.9, 60 sec: 3959.3, 300 sec: 3901.6). Total num frames: 1343488. Throughput: 0: 984.1. Samples: 335318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:23:52,894][00331] Avg episode reward: [(0, '4.608')] [2024-08-10 13:23:53,938][04481] Updated weights for policy 0, policy_version 330 (0.0019) [2024-08-10 13:23:57,890][00331] Fps is (10 sec: 3686.1, 60 sec: 3891.4, 300 sec: 3887.7). Total num frames: 1359872. Throughput: 0: 986.3. Samples: 340906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:23:57,893][00331] Avg episode reward: [(0, '4.565')] [2024-08-10 13:24:02,889][00331] Fps is (10 sec: 3687.4, 60 sec: 3823.1, 300 sec: 3915.5). Total num frames: 1380352. Throughput: 0: 956.1. Samples: 343048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:24:02,896][00331] Avg episode reward: [(0, '4.762')] [2024-08-10 13:24:05,444][04481] Updated weights for policy 0, policy_version 340 (0.0019) [2024-08-10 13:24:07,889][00331] Fps is (10 sec: 4096.3, 60 sec: 3959.6, 300 sec: 3901.6). Total num frames: 1400832. Throughput: 0: 959.9. Samples: 349524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-10 13:24:07,892][00331] Avg episode reward: [(0, '4.351')] [2024-08-10 13:24:12,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1421312. Throughput: 0: 1012.9. Samples: 356236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-10 13:24:12,892][00331] Avg episode reward: [(0, '4.533')] [2024-08-10 13:24:16,252][04481] Updated weights for policy 0, policy_version 350 (0.0012) [2024-08-10 13:24:17,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 1437696. Throughput: 0: 982.9. Samples: 358348. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-10 13:24:17,897][00331] Avg episode reward: [(0, '4.770')] [2024-08-10 13:24:22,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1458176. Throughput: 0: 951.9. Samples: 363876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:24:22,892][00331] Avg episode reward: [(0, '4.761')] [2024-08-10 13:24:25,889][04481] Updated weights for policy 0, policy_version 360 (0.0012) [2024-08-10 13:24:27,889][00331] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 1482752. Throughput: 0: 995.9. Samples: 370698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:24:27,892][00331] Avg episode reward: [(0, '4.507')] [2024-08-10 13:24:32,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3901.6). Total num frames: 1499136. Throughput: 0: 1003.1. Samples: 373512. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-10 13:24:32,896][00331] Avg episode reward: [(0, '4.384')] [2024-08-10 13:24:37,299][04481] Updated weights for policy 0, policy_version 370 (0.0016) [2024-08-10 13:24:37,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 1515520. Throughput: 0: 949.0. Samples: 378022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:24:37,894][00331] Avg episode reward: [(0, '4.595')] [2024-08-10 13:24:42,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1540096. Throughput: 0: 980.4. Samples: 385022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-10 13:24:42,891][00331] Avg episode reward: [(0, '4.539')] [2024-08-10 13:24:45,973][04481] Updated weights for policy 0, policy_version 380 (0.0019) [2024-08-10 13:24:47,889][00331] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1560576. Throughput: 0: 1011.4. Samples: 388562. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:24:47,898][00331] Avg episode reward: [(0, '4.599')] [2024-08-10 13:24:52,889][00331] Fps is (10 sec: 3276.7, 60 sec: 3823.1, 300 sec: 3901.7). Total num frames: 1572864. Throughput: 0: 974.1. Samples: 393358. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-10 13:24:52,898][00331] Avg episode reward: [(0, '4.739')] [2024-08-10 13:24:57,675][04481] Updated weights for policy 0, policy_version 390 (0.0012) [2024-08-10 13:24:57,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1597440. Throughput: 0: 958.4. Samples: 399364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:24:57,891][00331] Avg episode reward: [(0, '4.539')] [2024-08-10 13:25:02,889][00331] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1617920. Throughput: 0: 988.7. Samples: 402840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-10 13:25:02,896][00331] Avg episode reward: [(0, '4.504')] [2024-08-10 13:25:07,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1634304. Throughput: 0: 994.3. Samples: 408620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-10 13:25:07,895][00331] Avg episode reward: [(0, '4.724')] [2024-08-10 13:25:08,286][04481] Updated weights for policy 0, policy_version 400 (0.0027) [2024-08-10 13:25:12,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1654784. Throughput: 0: 956.3. Samples: 413732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:25:12,891][00331] Avg episode reward: [(0, '4.863')] [2024-08-10 13:25:12,904][04466] Saving new best policy, reward=4.863! [2024-08-10 13:25:17,889][00331] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1679360. Throughput: 0: 970.9. Samples: 417202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:25:17,894][00331] Avg episode reward: [(0, '4.854')] [2024-08-10 13:25:17,897][04481] Updated weights for policy 0, policy_version 410 (0.0017) [2024-08-10 13:25:22,889][00331] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1695744. Throughput: 0: 1022.7. Samples: 424042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-08-10 13:25:22,893][00331] Avg episode reward: [(0, '4.833')] [2024-08-10 13:25:27,889][00331] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 1712128. Throughput: 0: 964.7. Samples: 428432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-10 13:25:27,893][00331] Avg episode reward: [(0, '4.787')] [2024-08-10 13:25:29,429][04481] Updated weights for policy 0, policy_version 420 (0.0021) [2024-08-10 13:25:32,889][00331] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1736704. Throughput: 0: 954.4. Samples: 431512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:25:32,895][00331] Avg episode reward: [(0, '4.793')] [2024-08-10 13:25:32,904][04466] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000424_1736704.pth... [2024-08-10 13:25:33,041][04466] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000195_798720.pth [2024-08-10 13:25:37,889][00331] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 1757184. Throughput: 0: 1004.4. Samples: 438554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:25:37,892][00331] Avg episode reward: [(0, '5.146')] [2024-08-10 13:25:37,897][04466] Saving new best policy, reward=5.146! [2024-08-10 13:25:38,363][04481] Updated weights for policy 0, policy_version 430 (0.0047) [2024-08-10 13:25:42,894][00331] Fps is (10 sec: 3684.6, 60 sec: 3890.9, 300 sec: 3915.5). Total num frames: 1773568. Throughput: 0: 983.7. Samples: 443634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:25:42,896][00331] Avg episode reward: [(0, '5.129')] [2024-08-10 13:25:47,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 1789952. Throughput: 0: 955.9. Samples: 445856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:25:47,894][00331] Avg episode reward: [(0, '5.082')] [2024-08-10 13:25:49,776][04481] Updated weights for policy 0, policy_version 440 (0.0018) [2024-08-10 13:25:52,889][00331] Fps is (10 sec: 4098.0, 60 sec: 4027.8, 300 sec: 3915.5). Total num frames: 1814528. Throughput: 0: 982.3. Samples: 452824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:25:52,891][00331] Avg episode reward: [(0, '5.042')] [2024-08-10 13:25:57,889][00331] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1835008. Throughput: 0: 1007.6. Samples: 459076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:25:57,894][00331] Avg episode reward: [(0, '5.199')] [2024-08-10 13:25:57,896][04466] Saving new best policy, reward=5.199! [2024-08-10 13:26:00,347][04481] Updated weights for policy 0, policy_version 450 (0.0033) [2024-08-10 13:26:02,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 1847296. Throughput: 0: 975.5. Samples: 461098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-10 13:26:02,896][00331] Avg episode reward: [(0, '5.258')] [2024-08-10 13:26:03,003][04466] Saving new best policy, reward=5.258! [2024-08-10 13:26:07,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1871872. Throughput: 0: 952.6. Samples: 466908. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-10 13:26:07,894][00331] Avg episode reward: [(0, '5.223')] [2024-08-10 13:26:10,058][04481] Updated weights for policy 0, policy_version 460 (0.0021) [2024-08-10 13:26:12,890][00331] Fps is (10 sec: 4914.8, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 1896448. Throughput: 0: 1013.1. Samples: 474024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:26:12,895][00331] Avg episode reward: [(0, '5.072')] [2024-08-10 13:26:17,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 1908736. Throughput: 0: 998.8. Samples: 476460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:26:17,896][00331] Avg episode reward: [(0, '5.348')] [2024-08-10 13:26:17,902][04466] Saving new best policy, reward=5.348! [2024-08-10 13:26:21,587][04481] Updated weights for policy 0, policy_version 470 (0.0024) [2024-08-10 13:26:22,889][00331] Fps is (10 sec: 3277.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1929216. Throughput: 0: 951.7. Samples: 481382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:26:22,895][00331] Avg episode reward: [(0, '5.515')] [2024-08-10 13:26:22,911][04466] Saving new best policy, reward=5.515! [2024-08-10 13:26:27,889][00331] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 1953792. Throughput: 0: 990.6. Samples: 488208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:26:27,893][00331] Avg episode reward: [(0, '5.438')] [2024-08-10 13:26:30,765][04481] Updated weights for policy 0, policy_version 480 (0.0019) [2024-08-10 13:26:32,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1970176. Throughput: 0: 1016.1. Samples: 491580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:26:32,897][00331] Avg episode reward: [(0, '5.434')] [2024-08-10 13:26:37,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.6). Total num frames: 1986560. Throughput: 0: 958.5. Samples: 495958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:26:37,891][00331] Avg episode reward: [(0, '5.636')] [2024-08-10 13:26:37,899][04466] Saving new best policy, reward=5.636! [2024-08-10 13:26:42,128][04481] Updated weights for policy 0, policy_version 490 (0.0015) [2024-08-10 13:26:42,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3891.5, 300 sec: 3901.6). Total num frames: 2007040. Throughput: 0: 964.1. Samples: 502460. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-10 13:26:42,891][00331] Avg episode reward: [(0, '5.456')] [2024-08-10 13:26:47,889][00331] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2031616. Throughput: 0: 998.8. Samples: 506042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-10 13:26:47,895][00331] Avg episode reward: [(0, '5.741')] [2024-08-10 13:26:47,897][04466] Saving new best policy, reward=5.741! [2024-08-10 13:26:52,890][00331] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 2043904. Throughput: 0: 985.5. Samples: 511256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:26:52,892][00331] Avg episode reward: [(0, '5.711')] [2024-08-10 13:26:53,112][04481] Updated weights for policy 0, policy_version 500 (0.0026) [2024-08-10 13:26:57,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 2064384. Throughput: 0: 948.4. Samples: 516702. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:26:57,892][00331] Avg episode reward: [(0, '5.151')] [2024-08-10 13:27:02,298][04481] Updated weights for policy 0, policy_version 510 (0.0012) [2024-08-10 13:27:02,889][00331] Fps is (10 sec: 4505.8, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2088960. Throughput: 0: 972.9. Samples: 520242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:27:02,897][00331] Avg episode reward: [(0, '5.142')] [2024-08-10 13:27:07,890][00331] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 2105344. Throughput: 0: 1002.2. Samples: 526480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:27:07,895][00331] Avg episode reward: [(0, '5.456')] [2024-08-10 13:27:12,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 2121728. Throughput: 0: 951.7. Samples: 531036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:27:12,891][00331] Avg episode reward: [(0, '5.470')] [2024-08-10 13:27:13,881][04481] Updated weights for policy 0, policy_version 520 (0.0022) [2024-08-10 13:27:17,889][00331] Fps is (10 sec: 4096.3, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2146304. Throughput: 0: 955.4. Samples: 534572. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-10 13:27:17,892][00331] Avg episode reward: [(0, '5.323')] [2024-08-10 13:27:22,889][00331] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2170880. Throughput: 0: 1016.4. Samples: 541694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:27:22,896][00331] Avg episode reward: [(0, '5.559')] [2024-08-10 13:27:22,895][04481] Updated weights for policy 0, policy_version 530 (0.0019) [2024-08-10 13:27:27,889][00331] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 2183168. Throughput: 0: 971.6. Samples: 546184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:27:27,898][00331] Avg episode reward: [(0, '5.417')] [2024-08-10 13:27:32,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2203648. Throughput: 0: 952.4. Samples: 548898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:27:32,891][00331] Avg episode reward: [(0, '5.398')] [2024-08-10 13:27:32,900][04466] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000538_2203648.pth... [2024-08-10 13:27:33,052][04466] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000309_1265664.pth [2024-08-10 13:27:34,131][04481] Updated weights for policy 0, policy_version 540 (0.0044) [2024-08-10 13:27:37,889][00331] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2228224. Throughput: 0: 989.8. Samples: 555796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-10 13:27:37,891][00331] Avg episode reward: [(0, '5.628')] [2024-08-10 13:27:42,889][00331] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3915.5). Total num frames: 2244608. Throughput: 0: 991.7. Samples: 561330. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-10 13:27:42,892][00331] Avg episode reward: [(0, '5.427')] [2024-08-10 13:27:45,234][04481] Updated weights for policy 0, policy_version 550 (0.0048) [2024-08-10 13:27:47,889][00331] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 2260992. Throughput: 0: 962.5. Samples: 563556. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-10 13:27:47,893][00331] Avg episode reward: [(0, '5.294')] [2024-08-10 13:27:52,889][00331] Fps is (10 sec: 4096.1, 60 sec: 4027.8, 300 sec: 3929.4). Total num frames: 2285568. Throughput: 0: 972.0. Samples: 570220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:27:52,896][00331] Avg episode reward: [(0, '5.723')] [2024-08-10 13:27:54,491][04481] Updated weights for policy 0, policy_version 560 (0.0015) [2024-08-10 13:27:57,895][00331] Fps is (10 sec: 4503.1, 60 sec: 4027.4, 300 sec: 3915.4). Total num frames: 2306048. Throughput: 0: 1017.5. Samples: 576828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-10 13:27:57,897][00331] Avg episode reward: [(0, '6.131')] [2024-08-10 13:27:57,903][04466] Saving new best policy, reward=6.131! [2024-08-10 13:28:02,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 2318336. Throughput: 0: 985.8. Samples: 578934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-10 13:28:02,893][00331] Avg episode reward: [(0, '6.146')] [2024-08-10 13:28:02,975][04466] Saving new best policy, reward=6.146! [2024-08-10 13:28:05,971][04481] Updated weights for policy 0, policy_version 570 (0.0017) [2024-08-10 13:28:07,889][00331] Fps is (10 sec: 3688.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2342912. Throughput: 0: 947.3. Samples: 584322. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-10 13:28:07,893][00331] Avg episode reward: [(0, '5.984')] [2024-08-10 13:28:12,889][00331] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2363392. Throughput: 0: 1005.2. Samples: 591416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:28:12,901][00331] Avg episode reward: [(0, '6.269')] [2024-08-10 13:28:12,911][04466] Saving new best policy, reward=6.269! [2024-08-10 13:28:15,684][04481] Updated weights for policy 0, policy_version 580 (0.0012) [2024-08-10 13:28:17,889][00331] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2379776. Throughput: 0: 1005.4. Samples: 594140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:28:17,893][00331] Avg episode reward: [(0, '6.097')] [2024-08-10 13:28:22,889][00331] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2400256. Throughput: 0: 949.4. Samples: 598518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:28:22,891][00331] Avg episode reward: [(0, '6.212')] [2024-08-10 13:28:26,715][04481] Updated weights for policy 0, policy_version 590 (0.0015) [2024-08-10 13:28:27,890][00331] Fps is (10 sec: 4095.8, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2420736. Throughput: 0: 975.6. Samples: 605230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:28:27,899][00331] Avg episode reward: [(0, '6.445')] [2024-08-10 13:28:27,902][04466] Saving new best policy, reward=6.445! [2024-08-10 13:28:32,891][00331] Fps is (10 sec: 4095.3, 60 sec: 3959.4, 300 sec: 3915.5). Total num frames: 2441216. Throughput: 0: 1001.3. Samples: 608616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:28:32,893][00331] Avg episode reward: [(0, '6.473')] [2024-08-10 13:28:32,906][04466] Saving new best policy, reward=6.473! [2024-08-10 13:28:37,889][00331] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 2453504. Throughput: 0: 955.9. Samples: 613236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:28:37,891][00331] Avg episode reward: [(0, '6.850')] [2024-08-10 13:28:37,899][04466] Saving new best policy, reward=6.850! [2024-08-10 13:28:38,443][04481] Updated weights for policy 0, policy_version 600 (0.0016) [2024-08-10 13:28:42,889][00331] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2478080. Throughput: 0: 944.0. Samples: 619304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:28:42,896][00331] Avg episode reward: [(0, '7.007')] [2024-08-10 13:28:42,907][04466] Saving new best policy, reward=7.007! [2024-08-10 13:28:47,195][04481] Updated weights for policy 0, policy_version 610 (0.0025) [2024-08-10 13:28:47,889][00331] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2498560. Throughput: 0: 973.4. Samples: 622738. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-10 13:28:47,891][00331] Avg episode reward: [(0, '6.617')] [2024-08-10 13:28:52,891][00331] Fps is (10 sec: 3685.8, 60 sec: 3822.8, 300 sec: 3915.5). Total num frames: 2514944. Throughput: 0: 977.0. Samples: 628290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:28:52,897][00331] Avg episode reward: [(0, '6.394')] [2024-08-10 13:28:57,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3755.0, 300 sec: 3901.6). Total num frames: 2531328. Throughput: 0: 933.2. Samples: 633410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:28:57,892][00331] Avg episode reward: [(0, '6.064')] [2024-08-10 13:28:58,982][04481] Updated weights for policy 0, policy_version 620 (0.0017) [2024-08-10 13:29:02,889][00331] Fps is (10 sec: 4096.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2555904. Throughput: 0: 949.6. Samples: 636872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:29:02,897][00331] Avg episode reward: [(0, '6.344')] [2024-08-10 13:29:07,890][00331] Fps is (10 sec: 4505.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2576384. Throughput: 0: 1001.4. Samples: 643582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:29:07,895][00331] Avg episode reward: [(0, '6.487')] [2024-08-10 13:29:09,082][04481] Updated weights for policy 0, policy_version 630 (0.0017) [2024-08-10 13:29:12,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 2588672. Throughput: 0: 951.0. Samples: 648026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:29:12,895][00331] Avg episode reward: [(0, '6.512')] [2024-08-10 13:29:17,889][00331] Fps is (10 sec: 3686.6, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2613248. Throughput: 0: 946.5. Samples: 651208. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-10 13:29:17,894][00331] Avg episode reward: [(0, '6.416')] [2024-08-10 13:29:19,061][04481] Updated weights for policy 0, policy_version 640 (0.0023) [2024-08-10 13:29:22,889][00331] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2637824. Throughput: 0: 1003.5. Samples: 658394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:29:22,896][00331] Avg episode reward: [(0, '6.728')] [2024-08-10 13:29:27,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3901.6). Total num frames: 2650112. Throughput: 0: 980.0. Samples: 663406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:29:27,891][00331] Avg episode reward: [(0, '6.682')] [2024-08-10 13:29:30,525][04481] Updated weights for policy 0, policy_version 650 (0.0024) [2024-08-10 13:29:32,891][00331] Fps is (10 sec: 3276.2, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 2670592. Throughput: 0: 953.9. Samples: 665664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:29:32,893][00331] Avg episode reward: [(0, '6.507')] [2024-08-10 13:29:32,905][04466] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000652_2670592.pth... [2024-08-10 13:29:33,039][04466] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000424_1736704.pth [2024-08-10 13:29:37,889][00331] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2695168. Throughput: 0: 985.0. Samples: 672612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:29:37,893][00331] Avg episode reward: [(0, '6.212')] [2024-08-10 13:29:39,335][04481] Updated weights for policy 0, policy_version 660 (0.0012) [2024-08-10 13:29:42,892][00331] Fps is (10 sec: 4095.6, 60 sec: 3891.0, 300 sec: 3901.6). Total num frames: 2711552. Throughput: 0: 1006.9. Samples: 678724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:29:42,894][00331] Avg episode reward: [(0, '6.629')] [2024-08-10 13:29:47,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 2727936. Throughput: 0: 977.6. Samples: 680864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:29:47,896][00331] Avg episode reward: [(0, '6.352')] [2024-08-10 13:29:50,830][04481] Updated weights for policy 0, policy_version 670 (0.0026) [2024-08-10 13:29:52,889][00331] Fps is (10 sec: 4097.2, 60 sec: 3959.6, 300 sec: 3915.5). Total num frames: 2752512. Throughput: 0: 963.3. Samples: 686932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-10 13:29:52,895][00331] Avg episode reward: [(0, '6.436')] [2024-08-10 13:29:57,889][00331] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2772992. Throughput: 0: 1018.8. Samples: 693870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:29:57,895][00331] Avg episode reward: [(0, '7.561')] [2024-08-10 13:29:57,899][04466] Saving new best policy, reward=7.561! [2024-08-10 13:30:01,245][04481] Updated weights for policy 0, policy_version 680 (0.0016) [2024-08-10 13:30:02,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2789376. Throughput: 0: 996.3. Samples: 696040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:30:02,895][00331] Avg episode reward: [(0, '8.301')] [2024-08-10 13:30:02,905][04466] Saving new best policy, reward=8.301! [2024-08-10 13:30:07,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2809856. Throughput: 0: 950.0. Samples: 701144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:30:07,894][00331] Avg episode reward: [(0, '8.657')] [2024-08-10 13:30:07,898][04466] Saving new best policy, reward=8.657! [2024-08-10 13:30:11,253][04481] Updated weights for policy 0, policy_version 690 (0.0017) [2024-08-10 13:30:12,889][00331] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 2830336. Throughput: 0: 995.0. Samples: 708180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:30:12,897][00331] Avg episode reward: [(0, '7.717')] [2024-08-10 13:30:17,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2850816. Throughput: 0: 1015.2. Samples: 711348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:30:17,893][00331] Avg episode reward: [(0, '7.977')] [2024-08-10 13:30:22,660][04481] Updated weights for policy 0, policy_version 700 (0.0024) [2024-08-10 13:30:22,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 2867200. Throughput: 0: 958.0. Samples: 715724. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-10 13:30:22,894][00331] Avg episode reward: [(0, '7.949')] [2024-08-10 13:30:27,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2887680. Throughput: 0: 972.0. Samples: 722460. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-10 13:30:27,892][00331] Avg episode reward: [(0, '8.807')] [2024-08-10 13:30:27,896][04466] Saving new best policy, reward=8.807! [2024-08-10 13:30:31,477][04481] Updated weights for policy 0, policy_version 710 (0.0027) [2024-08-10 13:30:32,889][00331] Fps is (10 sec: 4505.6, 60 sec: 4027.9, 300 sec: 3915.5). Total num frames: 2912256. Throughput: 0: 1001.6. Samples: 725938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:30:32,896][00331] Avg episode reward: [(0, '9.138')] [2024-08-10 13:30:32,907][04466] Saving new best policy, reward=9.138! [2024-08-10 13:30:37,894][00331] Fps is (10 sec: 3684.8, 60 sec: 3822.6, 300 sec: 3901.6). Total num frames: 2924544. Throughput: 0: 976.5. Samples: 730880. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-10 13:30:37,896][00331] Avg episode reward: [(0, '9.288')] [2024-08-10 13:30:37,902][04466] Saving new best policy, reward=9.288! [2024-08-10 13:30:42,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3891.4, 300 sec: 3915.5). Total num frames: 2945024. Throughput: 0: 951.3. Samples: 736678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:30:42,896][00331] Avg episode reward: [(0, '9.705')] [2024-08-10 13:30:42,912][04466] Saving new best policy, reward=9.705! [2024-08-10 13:30:43,278][04481] Updated weights for policy 0, policy_version 720 (0.0023) [2024-08-10 13:30:47,889][00331] Fps is (10 sec: 4507.6, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2969600. Throughput: 0: 979.2. Samples: 740106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:30:47,896][00331] Avg episode reward: [(0, '10.039')] [2024-08-10 13:30:47,899][04466] Saving new best policy, reward=10.039! [2024-08-10 13:30:52,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 2985984. Throughput: 0: 998.3. Samples: 746068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-10 13:30:52,897][00331] Avg episode reward: [(0, '10.138')] [2024-08-10 13:30:52,909][04466] Saving new best policy, reward=10.138! [2024-08-10 13:30:54,284][04481] Updated weights for policy 0, policy_version 730 (0.0022) [2024-08-10 13:30:57,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3002368. Throughput: 0: 941.8. Samples: 750560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:30:57,891][00331] Avg episode reward: [(0, '10.335')] [2024-08-10 13:30:57,897][04466] Saving new best policy, reward=10.335! [2024-08-10 13:31:02,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3022848. Throughput: 0: 945.8. Samples: 753910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-10 13:31:02,892][00331] Avg episode reward: [(0, '11.019')] [2024-08-10 13:31:02,901][04466] Saving new best policy, reward=11.019! [2024-08-10 13:31:04,202][04481] Updated weights for policy 0, policy_version 740 (0.0023) [2024-08-10 13:31:07,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3043328. Throughput: 0: 996.3. Samples: 760558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:31:07,896][00331] Avg episode reward: [(0, '11.032')] [2024-08-10 13:31:07,899][04466] Saving new best policy, reward=11.032! [2024-08-10 13:31:12,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 3055616. Throughput: 0: 938.7. Samples: 764700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:31:12,894][00331] Avg episode reward: [(0, '10.603')] [2024-08-10 13:31:16,000][04481] Updated weights for policy 0, policy_version 750 (0.0027) [2024-08-10 13:31:17,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3080192. Throughput: 0: 924.4. Samples: 767538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:31:17,891][00331] Avg episode reward: [(0, '10.718')] [2024-08-10 13:31:22,889][00331] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3100672. Throughput: 0: 968.8. Samples: 774470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:31:22,892][00331] Avg episode reward: [(0, '10.620')] [2024-08-10 13:31:25,921][04481] Updated weights for policy 0, policy_version 760 (0.0012) [2024-08-10 13:31:27,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3117056. Throughput: 0: 950.0. Samples: 779426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:31:27,894][00331] Avg episode reward: [(0, '11.069')] [2024-08-10 13:31:27,898][04466] Saving new best policy, reward=11.069! [2024-08-10 13:31:32,889][00331] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3873.8). Total num frames: 3129344. Throughput: 0: 913.3. Samples: 781206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:31:32,894][00331] Avg episode reward: [(0, '11.521')] [2024-08-10 13:31:32,907][04466] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000764_3129344.pth... [2024-08-10 13:31:33,030][04466] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000538_2203648.pth [2024-08-10 13:31:33,048][04466] Saving new best policy, reward=11.521! [2024-08-10 13:31:37,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3873.8). Total num frames: 3149824. Throughput: 0: 909.2. Samples: 786980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-10 13:31:37,893][00331] Avg episode reward: [(0, '11.556')] [2024-08-10 13:31:37,898][04466] Saving new best policy, reward=11.556! [2024-08-10 13:31:38,137][04481] Updated weights for policy 0, policy_version 770 (0.0024) [2024-08-10 13:31:42,890][00331] Fps is (10 sec: 4095.8, 60 sec: 3754.6, 300 sec: 3860.0). Total num frames: 3170304. Throughput: 0: 939.9. Samples: 792858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:31:42,892][00331] Avg episode reward: [(0, '12.091')] [2024-08-10 13:31:42,911][04466] Saving new best policy, reward=12.091! [2024-08-10 13:31:47,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3860.0). Total num frames: 3182592. Throughput: 0: 910.6. Samples: 794886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:31:47,896][00331] Avg episode reward: [(0, '11.791')] [2024-08-10 13:31:50,160][04481] Updated weights for policy 0, policy_version 780 (0.0012) [2024-08-10 13:31:52,889][00331] Fps is (10 sec: 3277.0, 60 sec: 3618.1, 300 sec: 3860.0). Total num frames: 3203072. Throughput: 0: 886.1. Samples: 800434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:31:52,895][00331] Avg episode reward: [(0, '12.588')] [2024-08-10 13:31:52,942][04466] Saving new best policy, reward=12.588! [2024-08-10 13:31:57,889][00331] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 3227648. Throughput: 0: 942.0. Samples: 807090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:31:57,892][00331] Avg episode reward: [(0, '13.697')] [2024-08-10 13:31:57,895][04466] Saving new best policy, reward=13.697! [2024-08-10 13:32:00,150][04481] Updated weights for policy 0, policy_version 790 (0.0028) [2024-08-10 13:32:02,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3846.1). Total num frames: 3239936. Throughput: 0: 933.2. Samples: 809534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:32:02,894][00331] Avg episode reward: [(0, '14.024')] [2024-08-10 13:32:02,972][04466] Saving new best policy, reward=14.024! [2024-08-10 13:32:07,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3860.0). Total num frames: 3260416. Throughput: 0: 879.1. Samples: 814030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:32:07,893][00331] Avg episode reward: [(0, '15.318')] [2024-08-10 13:32:07,896][04466] Saving new best policy, reward=15.318! [2024-08-10 13:32:11,016][04481] Updated weights for policy 0, policy_version 800 (0.0037) [2024-08-10 13:32:12,890][00331] Fps is (10 sec: 4505.1, 60 sec: 3822.9, 300 sec: 3859.9). Total num frames: 3284992. Throughput: 0: 925.0. Samples: 821052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:32:12,895][00331] Avg episode reward: [(0, '14.542')] [2024-08-10 13:32:17,892][00331] Fps is (10 sec: 4094.8, 60 sec: 3686.2, 300 sec: 3832.2). Total num frames: 3301376. Throughput: 0: 963.6. Samples: 824572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:32:17,898][00331] Avg episode reward: [(0, '14.324')] [2024-08-10 13:32:22,376][04481] Updated weights for policy 0, policy_version 810 (0.0029) [2024-08-10 13:32:22,889][00331] Fps is (10 sec: 3277.2, 60 sec: 3618.1, 300 sec: 3846.1). Total num frames: 3317760. Throughput: 0: 933.2. Samples: 828974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:32:22,892][00331] Avg episode reward: [(0, '13.718')] [2024-08-10 13:32:27,889][00331] Fps is (10 sec: 3687.4, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 3338240. Throughput: 0: 943.7. Samples: 835324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:32:27,892][00331] Avg episode reward: [(0, '12.771')] [2024-08-10 13:32:31,367][04481] Updated weights for policy 0, policy_version 820 (0.0029) [2024-08-10 13:32:32,890][00331] Fps is (10 sec: 4505.2, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 3362816. Throughput: 0: 974.9. Samples: 838758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:32:32,893][00331] Avg episode reward: [(0, '13.408')] [2024-08-10 13:32:37,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3379200. Throughput: 0: 973.3. Samples: 844234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:32:37,899][00331] Avg episode reward: [(0, '12.963')] [2024-08-10 13:32:42,864][04481] Updated weights for policy 0, policy_version 830 (0.0019) [2024-08-10 13:32:42,889][00331] Fps is (10 sec: 3686.7, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 3399680. Throughput: 0: 944.3. Samples: 849584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:32:42,892][00331] Avg episode reward: [(0, '13.494')] [2024-08-10 13:32:47,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3420160. Throughput: 0: 969.6. Samples: 853168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:32:47,892][00331] Avg episode reward: [(0, '12.530')] [2024-08-10 13:32:52,591][04481] Updated weights for policy 0, policy_version 840 (0.0024) [2024-08-10 13:32:52,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3440640. Throughput: 0: 1015.2. Samples: 859716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:32:52,893][00331] Avg episode reward: [(0, '13.636')] [2024-08-10 13:32:57,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3457024. Throughput: 0: 957.3. Samples: 864128. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-10 13:32:57,892][00331] Avg episode reward: [(0, '13.747')] [2024-08-10 13:33:02,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3477504. Throughput: 0: 954.9. Samples: 867540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:33:02,892][00331] Avg episode reward: [(0, '14.240')] [2024-08-10 13:33:03,148][04481] Updated weights for policy 0, policy_version 850 (0.0021) [2024-08-10 13:33:07,896][00331] Fps is (10 sec: 4502.6, 60 sec: 4027.3, 300 sec: 3859.9). Total num frames: 3502080. Throughput: 0: 1013.0. Samples: 874564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:33:07,903][00331] Avg episode reward: [(0, '16.102')] [2024-08-10 13:33:07,914][04466] Saving new best policy, reward=16.102! [2024-08-10 13:33:12,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 3514368. Throughput: 0: 978.9. Samples: 879376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:33:12,895][00331] Avg episode reward: [(0, '14.986')] [2024-08-10 13:33:14,391][04481] Updated weights for policy 0, policy_version 860 (0.0025) [2024-08-10 13:33:17,889][00331] Fps is (10 sec: 3279.0, 60 sec: 3891.4, 300 sec: 3846.1). Total num frames: 3534848. Throughput: 0: 958.4. Samples: 881884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:33:17,892][00331] Avg episode reward: [(0, '15.079')] [2024-08-10 13:33:22,889][00331] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3559424. Throughput: 0: 994.5. Samples: 888988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:33:22,892][00331] Avg episode reward: [(0, '15.179')] [2024-08-10 13:33:23,229][04481] Updated weights for policy 0, policy_version 870 (0.0015) [2024-08-10 13:33:27,890][00331] Fps is (10 sec: 4095.7, 60 sec: 3959.4, 300 sec: 3846.1). Total num frames: 3575808. Throughput: 0: 1006.5. Samples: 894878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:33:27,896][00331] Avg episode reward: [(0, '15.963')] [2024-08-10 13:33:32,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 3592192. Throughput: 0: 972.3. Samples: 896922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:33:32,892][00331] Avg episode reward: [(0, '15.599')] [2024-08-10 13:33:32,902][04466] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000877_3592192.pth... [2024-08-10 13:33:33,026][04466] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000652_2670592.pth [2024-08-10 13:33:34,870][04481] Updated weights for policy 0, policy_version 880 (0.0030) [2024-08-10 13:33:37,889][00331] Fps is (10 sec: 4096.3, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3616768. Throughput: 0: 964.3. Samples: 903110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:33:37,898][00331] Avg episode reward: [(0, '16.342')] [2024-08-10 13:33:37,901][04466] Saving new best policy, reward=16.342! [2024-08-10 13:33:42,889][00331] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3637248. Throughput: 0: 1017.6. Samples: 909922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:33:42,891][00331] Avg episode reward: [(0, '15.399')] [2024-08-10 13:33:44,847][04481] Updated weights for policy 0, policy_version 890 (0.0033) [2024-08-10 13:33:47,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3653632. Throughput: 0: 991.3. Samples: 912148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:33:47,892][00331] Avg episode reward: [(0, '14.590')] [2024-08-10 13:33:52,891][00331] Fps is (10 sec: 3685.9, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 3674112. Throughput: 0: 949.4. Samples: 917282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:33:52,893][00331] Avg episode reward: [(0, '14.549')] [2024-08-10 13:33:55,293][04481] Updated weights for policy 0, policy_version 900 (0.0018) [2024-08-10 13:33:57,889][00331] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 3698688. Throughput: 0: 999.6. Samples: 924360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:33:57,892][00331] Avg episode reward: [(0, '15.164')] [2024-08-10 13:34:02,889][00331] Fps is (10 sec: 4096.5, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3715072. Throughput: 0: 1011.6. Samples: 927406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:34:02,896][00331] Avg episode reward: [(0, '15.601')] [2024-08-10 13:34:06,711][04481] Updated weights for policy 0, policy_version 910 (0.0012) [2024-08-10 13:34:07,889][00331] Fps is (10 sec: 3276.9, 60 sec: 3823.4, 300 sec: 3873.8). Total num frames: 3731456. Throughput: 0: 951.2. Samples: 931794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:34:07,893][00331] Avg episode reward: [(0, '15.972')] [2024-08-10 13:34:12,889][00331] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 3756032. Throughput: 0: 975.8. Samples: 938790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-10 13:34:12,898][00331] Avg episode reward: [(0, '16.994')] [2024-08-10 13:34:12,908][04466] Saving new best policy, reward=16.994! [2024-08-10 13:34:15,527][04481] Updated weights for policy 0, policy_version 920 (0.0015) [2024-08-10 13:34:17,889][00331] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3776512. Throughput: 0: 1008.4. Samples: 942298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:34:17,898][00331] Avg episode reward: [(0, '16.613')] [2024-08-10 13:34:22,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3788800. Throughput: 0: 981.1. Samples: 947260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:34:22,894][00331] Avg episode reward: [(0, '16.656')] [2024-08-10 13:34:26,717][04481] Updated weights for policy 0, policy_version 930 (0.0022) [2024-08-10 13:34:27,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 3813376. Throughput: 0: 966.2. Samples: 953400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:34:27,891][00331] Avg episode reward: [(0, '16.594')] [2024-08-10 13:34:32,889][00331] Fps is (10 sec: 4915.1, 60 sec: 4096.0, 300 sec: 3873.8). Total num frames: 3837952. Throughput: 0: 990.4. Samples: 956716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:34:32,895][00331] Avg episode reward: [(0, '16.885')] [2024-08-10 13:34:36,812][04481] Updated weights for policy 0, policy_version 940 (0.0015) [2024-08-10 13:34:37,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3850240. Throughput: 0: 1007.1. Samples: 962598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-10 13:34:37,896][00331] Avg episode reward: [(0, '16.332')] [2024-08-10 13:34:42,889][00331] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3870720. Throughput: 0: 955.6. Samples: 967360. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-10 13:34:42,891][00331] Avg episode reward: [(0, '15.618')] [2024-08-10 13:34:47,189][04481] Updated weights for policy 0, policy_version 950 (0.0032) [2024-08-10 13:34:47,889][00331] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3891200. Throughput: 0: 966.3. Samples: 970888. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-10 13:34:47,892][00331] Avg episode reward: [(0, '16.104')] [2024-08-10 13:34:52,889][00331] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3911680. Throughput: 0: 1017.6. Samples: 977586. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-10 13:34:52,892][00331] Avg episode reward: [(0, '16.910')] [2024-08-10 13:34:57,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3923968. Throughput: 0: 955.2. Samples: 981772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:34:57,892][00331] Avg episode reward: [(0, '16.612')] [2024-08-10 13:34:59,120][04481] Updated weights for policy 0, policy_version 960 (0.0052) [2024-08-10 13:35:02,889][00331] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3948544. Throughput: 0: 940.0. Samples: 984596. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-10 13:35:02,892][00331] Avg episode reward: [(0, '17.346')] [2024-08-10 13:35:02,900][04466] Saving new best policy, reward=17.346! [2024-08-10 13:35:07,889][00331] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3969024. Throughput: 0: 984.6. Samples: 991568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-10 13:35:07,893][00331] Avg episode reward: [(0, '18.111')] [2024-08-10 13:35:07,895][04466] Saving new best policy, reward=18.111! [2024-08-10 13:35:08,333][04481] Updated weights for policy 0, policy_version 970 (0.0025) [2024-08-10 13:35:12,889][00331] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3985408. Throughput: 0: 962.5. Samples: 996714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-10 13:35:12,891][00331] Avg episode reward: [(0, '17.865')] [2024-08-10 13:35:17,889][00331] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 4001792. Throughput: 0: 937.8. Samples: 998918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-10 13:35:17,894][00331] Avg episode reward: [(0, '18.682')] [2024-08-10 13:35:17,973][04466] Stopping Batcher_0... [2024-08-10 13:35:17,974][04466] Loop batcher_evt_loop terminating... [2024-08-10 13:35:17,974][00331] Component Batcher_0 stopped! [2024-08-10 13:35:17,979][04466] Saving new best policy, reward=18.682! [2024-08-10 13:35:18,035][00331] Component RolloutWorker_w2 stopped! [2024-08-10 13:35:18,041][04480] Stopping RolloutWorker_w1... [2024-08-10 13:35:18,041][00331] Component RolloutWorker_w1 stopped! [2024-08-10 13:35:18,047][04483] Stopping RolloutWorker_w3... [2024-08-10 13:35:18,047][00331] Component RolloutWorker_w3 stopped! [2024-08-10 13:35:18,045][04481] Weights refcount: 2 0 [2024-08-10 13:35:18,042][04480] Loop rollout_proc1_evt_loop terminating... [2024-08-10 13:35:18,054][00331] Component InferenceWorker_p0-w0 stopped! [2024-08-10 13:35:18,055][04484] Stopping RolloutWorker_w5... [2024-08-10 13:35:18,060][04484] Loop rollout_proc5_evt_loop terminating... [2024-08-10 13:35:18,054][04481] Stopping InferenceWorker_p0-w0... [2024-08-10 13:35:18,063][04481] Loop inference_proc0-0_evt_loop terminating... [2024-08-10 13:35:18,064][04479] Stopping RolloutWorker_w0... [2024-08-10 13:35:18,064][04479] Loop rollout_proc0_evt_loop terminating... [2024-08-10 13:35:18,048][04483] Loop rollout_proc3_evt_loop terminating... [2024-08-10 13:35:18,059][00331] Component RolloutWorker_w5 stopped! [2024-08-10 13:35:18,069][00331] Component RolloutWorker_w0 stopped! [2024-08-10 13:35:18,039][04482] Stopping RolloutWorker_w2... [2024-08-10 13:35:18,082][04482] Loop rollout_proc2_evt_loop terminating... [2024-08-10 13:35:18,109][00331] Component RolloutWorker_w7 stopped! [2024-08-10 13:35:18,105][04487] Stopping RolloutWorker_w7... [2024-08-10 13:35:18,115][04487] Loop rollout_proc7_evt_loop terminating... [2024-08-10 13:35:18,122][00331] Component RolloutWorker_w4 stopped! [2024-08-10 13:35:18,125][04485] Stopping RolloutWorker_w4... [2024-08-10 13:35:18,129][00331] Component RolloutWorker_w6 stopped! [2024-08-10 13:35:18,134][04486] Stopping RolloutWorker_w6... [2024-08-10 13:35:18,128][04485] Loop rollout_proc4_evt_loop terminating... [2024-08-10 13:35:18,134][04486] Loop rollout_proc6_evt_loop terminating... [2024-08-10 13:35:18,276][04466] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-10 13:35:18,438][04466] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000764_3129344.pth [2024-08-10 13:35:18,453][04466] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-10 13:35:18,643][00331] Component LearnerWorker_p0 stopped! [2024-08-10 13:35:18,649][00331] Waiting for process learner_proc0 to stop... [2024-08-10 13:35:18,653][04466] Stopping LearnerWorker_p0... [2024-08-10 13:35:18,654][04466] Loop learner_proc0_evt_loop terminating... [2024-08-10 13:35:20,102][00331] Waiting for process inference_proc0-0 to join... [2024-08-10 13:35:20,285][00331] Waiting for process rollout_proc0 to join... [2024-08-10 13:35:21,380][00331] Waiting for process rollout_proc1 to join... [2024-08-10 13:35:21,384][00331] Waiting for process rollout_proc2 to join... [2024-08-10 13:35:21,389][00331] Waiting for process rollout_proc3 to join... [2024-08-10 13:35:21,392][00331] Waiting for process rollout_proc4 to join... [2024-08-10 13:35:21,396][00331] Waiting for process rollout_proc5 to join... [2024-08-10 13:35:21,400][00331] Waiting for process rollout_proc6 to join... [2024-08-10 13:35:21,402][00331] Waiting for process rollout_proc7 to join... [2024-08-10 13:35:21,406][00331] Batcher 0 profile tree view: batching: 25.5072, releasing_batches: 0.0241 [2024-08-10 13:35:21,408][00331] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 444.6725 update_model: 7.7459 weight_update: 0.0031 one_step: 0.0094 handle_policy_step: 548.2633 deserialize: 14.3180, stack: 2.8494, obs_to_device_normalize: 113.8058, forward: 274.9022, send_messages: 27.8485 prepare_outputs: 86.7237 to_cpu: 54.7446 [2024-08-10 13:35:21,412][00331] Learner 0 profile tree view: misc: 0.0056, prepare_batch: 16.2213 train: 74.8939 epoch_init: 0.0053, minibatch_init: 0.0097, losses_postprocess: 0.6037, kl_divergence: 0.5492, after_optimizer: 33.7517 calculate_losses: 25.0819 losses_init: 0.0098, forward_head: 1.8574, bptt_initial: 16.0841, tail: 1.1635, advantages_returns: 0.3123, losses: 2.9329 bptt: 2.3582 bptt_forward_core: 2.2738 update: 14.2340 clip: 1.4255 [2024-08-10 13:35:21,414][00331] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3571, enqueue_policy_requests: 106.2809, env_step: 810.8044, overhead: 13.0861, complete_rollouts: 7.4737 save_policy_outputs: 23.8561 split_output_tensors: 8.4929 [2024-08-10 13:35:21,415][00331] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3311, enqueue_policy_requests: 110.5014, env_step: 810.6402, overhead: 13.1210, complete_rollouts: 6.4376 save_policy_outputs: 24.2107 split_output_tensors: 8.0945 [2024-08-10 13:35:21,417][00331] Loop Runner_EvtLoop terminating... [2024-08-10 13:35:21,418][00331] Runner profile tree view: main_loop: 1064.9790 [2024-08-10 13:35:21,419][00331] Collected {0: 4005888}, FPS: 3761.5 [2024-08-10 13:41:15,864][00331] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-10 13:41:15,868][00331] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-10 13:41:15,870][00331] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-10 13:41:15,871][00331] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-10 13:41:15,873][00331] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-10 13:41:15,877][00331] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-10 13:41:15,878][00331] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-08-10 13:41:15,880][00331] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-10 13:41:15,881][00331] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-08-10 13:41:15,885][00331] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-08-10 13:41:15,886][00331] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-10 13:41:15,887][00331] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-10 13:41:15,888][00331] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-10 13:41:15,889][00331] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-10 13:41:15,890][00331] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-10 13:41:15,908][00331] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-10 13:41:15,911][00331] RunningMeanStd input shape: (3, 72, 128) [2024-08-10 13:41:15,914][00331] RunningMeanStd input shape: (1,) [2024-08-10 13:41:15,935][00331] ConvEncoder: input_channels=3 [2024-08-10 13:41:16,059][00331] Conv encoder output size: 512 [2024-08-10 13:41:16,061][00331] Policy head output size: 512 [2024-08-10 13:41:17,613][00331] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-10 13:41:18,457][00331] Num frames 100... [2024-08-10 13:41:18,578][00331] Num frames 200... [2024-08-10 13:41:18,691][00331] Num frames 300... [2024-08-10 13:41:18,811][00331] Num frames 400... [2024-08-10 13:41:18,927][00331] Num frames 500... [2024-08-10 13:41:19,049][00331] Num frames 600... [2024-08-10 13:41:19,165][00331] Num frames 700... [2024-08-10 13:41:19,226][00331] Avg episode rewards: #0: 12.040, true rewards: #0: 7.040 [2024-08-10 13:41:19,228][00331] Avg episode reward: 12.040, avg true_objective: 7.040 [2024-08-10 13:41:19,348][00331] Num frames 800... [2024-08-10 13:41:19,460][00331] Num frames 900... [2024-08-10 13:41:19,627][00331] Num frames 1000... [2024-08-10 13:41:19,791][00331] Num frames 1100... [2024-08-10 13:41:19,950][00331] Num frames 1200... [2024-08-10 13:41:20,115][00331] Num frames 1300... [2024-08-10 13:41:20,274][00331] Num frames 1400... [2024-08-10 13:41:20,429][00331] Num frames 1500... [2024-08-10 13:41:20,593][00331] Num frames 1600... [2024-08-10 13:41:20,752][00331] Num frames 1700... [2024-08-10 13:41:20,905][00331] Avg episode rewards: #0: 16.300, true rewards: #0: 8.800 [2024-08-10 13:41:20,907][00331] Avg episode reward: 16.300, avg true_objective: 8.800 [2024-08-10 13:41:20,974][00331] Num frames 1800... [2024-08-10 13:41:21,139][00331] Num frames 1900... [2024-08-10 13:41:21,303][00331] Num frames 2000... [2024-08-10 13:41:21,474][00331] Num frames 2100... [2024-08-10 13:41:21,654][00331] Num frames 2200... [2024-08-10 13:41:21,825][00331] Num frames 2300... [2024-08-10 13:41:22,001][00331] Num frames 2400... [2024-08-10 13:41:22,119][00331] Num frames 2500... [2024-08-10 13:41:22,248][00331] Num frames 2600... [2024-08-10 13:41:22,369][00331] Num frames 2700... [2024-08-10 13:41:22,488][00331] Num frames 2800... [2024-08-10 13:41:22,612][00331] Num frames 2900... [2024-08-10 13:41:22,730][00331] Num frames 3000... [2024-08-10 13:41:22,849][00331] Num frames 3100... [2024-08-10 13:41:22,946][00331] Avg episode rewards: #0: 22.120, true rewards: #0: 10.453 [2024-08-10 13:41:22,947][00331] Avg episode reward: 22.120, avg true_objective: 10.453 [2024-08-10 13:41:23,024][00331] Num frames 3200... [2024-08-10 13:41:23,152][00331] Num frames 3300... [2024-08-10 13:41:23,272][00331] Num frames 3400... [2024-08-10 13:41:23,388][00331] Num frames 3500... [2024-08-10 13:41:23,511][00331] Num frames 3600... [2024-08-10 13:41:23,587][00331] Avg episode rewards: #0: 18.290, true rewards: #0: 9.040 [2024-08-10 13:41:23,588][00331] Avg episode reward: 18.290, avg true_objective: 9.040 [2024-08-10 13:41:23,703][00331] Num frames 3700... [2024-08-10 13:41:23,823][00331] Num frames 3800... [2024-08-10 13:41:23,941][00331] Num frames 3900... [2024-08-10 13:41:24,056][00331] Num frames 4000... [2024-08-10 13:41:24,174][00331] Avg episode rewards: #0: 15.906, true rewards: #0: 8.106 [2024-08-10 13:41:24,176][00331] Avg episode reward: 15.906, avg true_objective: 8.106 [2024-08-10 13:41:24,235][00331] Num frames 4100... [2024-08-10 13:41:24,350][00331] Num frames 4200... [2024-08-10 13:41:24,469][00331] Num frames 4300... [2024-08-10 13:41:24,594][00331] Num frames 4400... [2024-08-10 13:41:24,708][00331] Num frames 4500... [2024-08-10 13:41:24,827][00331] Num frames 4600... [2024-08-10 13:41:24,942][00331] Num frames 4700... [2024-08-10 13:41:25,063][00331] Avg episode rewards: #0: 15.595, true rewards: #0: 7.928 [2024-08-10 13:41:25,066][00331] Avg episode reward: 15.595, avg true_objective: 7.928 [2024-08-10 13:41:25,118][00331] Num frames 4800... [2024-08-10 13:41:25,242][00331] Num frames 4900... [2024-08-10 13:41:25,354][00331] Num frames 5000... [2024-08-10 13:41:25,466][00331] Num frames 5100... [2024-08-10 13:41:25,587][00331] Num frames 5200... [2024-08-10 13:41:25,703][00331] Num frames 5300... [2024-08-10 13:41:25,835][00331] Avg episode rewards: #0: 14.807, true rewards: #0: 7.664 [2024-08-10 13:41:25,838][00331] Avg episode reward: 14.807, avg true_objective: 7.664 [2024-08-10 13:41:25,880][00331] Num frames 5400... [2024-08-10 13:41:25,993][00331] Num frames 5500... [2024-08-10 13:41:26,113][00331] Num frames 5600... [2024-08-10 13:41:26,236][00331] Num frames 5700... [2024-08-10 13:41:26,351][00331] Num frames 5800... [2024-08-10 13:41:26,469][00331] Num frames 5900... [2024-08-10 13:41:26,598][00331] Num frames 6000... [2024-08-10 13:41:26,672][00331] Avg episode rewards: #0: 14.644, true rewards: #0: 7.519 [2024-08-10 13:41:26,673][00331] Avg episode reward: 14.644, avg true_objective: 7.519 [2024-08-10 13:41:26,776][00331] Num frames 6100... [2024-08-10 13:41:26,892][00331] Num frames 6200... [2024-08-10 13:41:27,003][00331] Num frames 6300... [2024-08-10 13:41:27,124][00331] Num frames 6400... [2024-08-10 13:41:27,236][00331] Num frames 6500... [2024-08-10 13:41:27,358][00331] Num frames 6600... [2024-08-10 13:41:27,474][00331] Num frames 6700... [2024-08-10 13:41:27,599][00331] Num frames 6800... [2024-08-10 13:41:27,731][00331] Avg episode rewards: #0: 14.851, true rewards: #0: 7.629 [2024-08-10 13:41:27,732][00331] Avg episode reward: 14.851, avg true_objective: 7.629 [2024-08-10 13:41:27,773][00331] Num frames 6900... [2024-08-10 13:41:27,889][00331] Num frames 7000... [2024-08-10 13:41:28,005][00331] Num frames 7100... [2024-08-10 13:41:28,119][00331] Num frames 7200... [2024-08-10 13:41:28,232][00331] Num frames 7300... [2024-08-10 13:41:28,355][00331] Num frames 7400... [2024-08-10 13:41:28,474][00331] Num frames 7500... [2024-08-10 13:41:28,605][00331] Num frames 7600... [2024-08-10 13:41:28,724][00331] Num frames 7700... [2024-08-10 13:41:28,848][00331] Num frames 7800... [2024-08-10 13:41:28,966][00331] Num frames 7900... [2024-08-10 13:41:29,094][00331] Num frames 8000... [2024-08-10 13:41:29,215][00331] Num frames 8100... [2024-08-10 13:41:29,337][00331] Num frames 8200... [2024-08-10 13:41:29,457][00331] Num frames 8300... [2024-08-10 13:41:29,565][00331] Avg episode rewards: #0: 16.738, true rewards: #0: 8.338 [2024-08-10 13:41:29,567][00331] Avg episode reward: 16.738, avg true_objective: 8.338 [2024-08-10 13:42:19,516][00331] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-08-10 13:43:43,489][00331] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-10 13:43:43,490][00331] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-10 13:43:43,493][00331] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-10 13:43:43,495][00331] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-10 13:43:43,497][00331] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-10 13:43:43,499][00331] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-10 13:43:43,501][00331] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-08-10 13:43:43,503][00331] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-10 13:43:43,505][00331] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-08-10 13:43:43,506][00331] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-08-10 13:43:43,507][00331] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-10 13:43:43,508][00331] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-10 13:43:43,509][00331] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-10 13:43:43,511][00331] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-10 13:43:43,512][00331] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-10 13:43:43,520][00331] RunningMeanStd input shape: (3, 72, 128) [2024-08-10 13:43:43,523][00331] RunningMeanStd input shape: (1,) [2024-08-10 13:43:43,541][00331] ConvEncoder: input_channels=3 [2024-08-10 13:43:43,586][00331] Conv encoder output size: 512 [2024-08-10 13:43:43,588][00331] Policy head output size: 512 [2024-08-10 13:43:43,606][00331] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-10 13:43:44,070][00331] Num frames 100... [2024-08-10 13:43:44,191][00331] Num frames 200... [2024-08-10 13:43:44,306][00331] Num frames 300... [2024-08-10 13:43:44,425][00331] Num frames 400... [2024-08-10 13:43:44,556][00331] Num frames 500... [2024-08-10 13:43:44,687][00331] Num frames 600... [2024-08-10 13:43:44,834][00331] Num frames 700... [2024-08-10 13:43:44,997][00331] Num frames 800... [2024-08-10 13:43:45,155][00331] Num frames 900... [2024-08-10 13:43:45,331][00331] Avg episode rewards: #0: 20.750, true rewards: #0: 9.750 [2024-08-10 13:43:45,334][00331] Avg episode reward: 20.750, avg true_objective: 9.750 [2024-08-10 13:43:45,377][00331] Num frames 1000... [2024-08-10 13:43:45,541][00331] Num frames 1100... [2024-08-10 13:43:45,703][00331] Num frames 1200... [2024-08-10 13:43:45,865][00331] Num frames 1300... [2024-08-10 13:43:46,025][00331] Num frames 1400... [2024-08-10 13:43:46,185][00331] Num frames 1500... [2024-08-10 13:43:46,329][00331] Avg episode rewards: #0: 15.255, true rewards: #0: 7.755 [2024-08-10 13:43:46,331][00331] Avg episode reward: 15.255, avg true_objective: 7.755 [2024-08-10 13:43:46,414][00331] Num frames 1600... [2024-08-10 13:43:46,578][00331] Num frames 1700... [2024-08-10 13:43:46,753][00331] Num frames 1800... [2024-08-10 13:43:46,920][00331] Num frames 1900... [2024-08-10 13:43:47,079][00331] Num frames 2000... [2024-08-10 13:43:47,250][00331] Num frames 2100... [2024-08-10 13:43:47,385][00331] Num frames 2200... [2024-08-10 13:43:47,545][00331] Avg episode rewards: #0: 14.623, true rewards: #0: 7.623 [2024-08-10 13:43:47,547][00331] Avg episode reward: 14.623, avg true_objective: 7.623 [2024-08-10 13:43:47,565][00331] Num frames 2300... [2024-08-10 13:43:47,684][00331] Num frames 2400... [2024-08-10 13:43:47,812][00331] Num frames 2500... [2024-08-10 13:43:47,881][00331] Avg episode rewards: #0: 11.778, true rewards: #0: 6.277 [2024-08-10 13:43:47,883][00331] Avg episode reward: 11.778, avg true_objective: 6.277 [2024-08-10 13:43:47,988][00331] Num frames 2600... [2024-08-10 13:43:48,105][00331] Num frames 2700... [2024-08-10 13:43:48,227][00331] Num frames 2800... [2024-08-10 13:43:48,345][00331] Num frames 2900... [2024-08-10 13:43:48,462][00331] Num frames 3000... [2024-08-10 13:43:48,593][00331] Num frames 3100... [2024-08-10 13:43:48,711][00331] Num frames 3200... [2024-08-10 13:43:48,838][00331] Num frames 3300... [2024-08-10 13:43:48,955][00331] Num frames 3400... [2024-08-10 13:43:54,579][00331] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-10 13:43:54,581][00331] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-10 13:43:54,583][00331] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-10 13:43:54,585][00331] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-10 13:43:54,587][00331] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-10 13:43:54,589][00331] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-10 13:43:54,591][00331] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-08-10 13:43:54,592][00331] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-10 13:43:54,593][00331] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-08-10 13:43:54,594][00331] Adding new argument 'hf_repository'='mliubimov/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-08-10 13:43:54,595][00331] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-10 13:43:54,596][00331] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-10 13:43:54,597][00331] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-10 13:43:54,598][00331] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-10 13:43:54,599][00331] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-10 13:43:54,619][00331] RunningMeanStd input shape: (3, 72, 128) [2024-08-10 13:43:54,621][00331] RunningMeanStd input shape: (1,) [2024-08-10 13:43:54,633][00331] ConvEncoder: input_channels=3 [2024-08-10 13:43:54,668][00331] Conv encoder output size: 512 [2024-08-10 13:43:54,670][00331] Policy head output size: 512 [2024-08-10 13:43:54,688][00331] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-10 13:43:55,157][00331] Num frames 100... [2024-08-10 13:43:55,279][00331] Num frames 200... [2024-08-10 13:43:55,401][00331] Num frames 300... [2024-08-10 13:43:55,544][00331] Num frames 400... [2024-08-10 13:43:55,674][00331] Num frames 500... [2024-08-10 13:43:55,757][00331] Avg episode rewards: #0: 10.230, true rewards: #0: 5.230 [2024-08-10 13:43:55,759][00331] Avg episode reward: 10.230, avg true_objective: 5.230 [2024-08-10 13:43:55,861][00331] Num frames 600... [2024-08-10 13:43:55,999][00331] Num frames 700... [2024-08-10 13:43:56,122][00331] Num frames 800... [2024-08-10 13:43:56,243][00331] Num frames 900... [2024-08-10 13:43:56,367][00331] Num frames 1000... [2024-08-10 13:43:56,490][00331] Num frames 1100... [2024-08-10 13:43:56,614][00331] Num frames 1200... [2024-08-10 13:43:56,734][00331] Num frames 1300... [2024-08-10 13:43:56,852][00331] Num frames 1400... [2024-08-10 13:43:56,931][00331] Avg episode rewards: #0: 13.095, true rewards: #0: 7.095 [2024-08-10 13:43:56,933][00331] Avg episode reward: 13.095, avg true_objective: 7.095 [2024-08-10 13:43:57,035][00331] Num frames 1500... [2024-08-10 13:43:57,153][00331] Num frames 1600... [2024-08-10 13:43:57,277][00331] Num frames 1700... [2024-08-10 13:43:57,441][00331] Num frames 1800... [2024-08-10 13:43:57,612][00331] Avg episode rewards: #0: 10.557, true rewards: #0: 6.223 [2024-08-10 13:43:57,615][00331] Avg episode reward: 10.557, avg true_objective: 6.223 [2024-08-10 13:43:57,667][00331] Num frames 1900... [2024-08-10 13:43:57,832][00331] Num frames 2000... [2024-08-10 13:43:58,000][00331] Num frames 2100... [2024-08-10 13:43:58,165][00331] Num frames 2200... [2024-08-10 13:43:58,321][00331] Num frames 2300... [2024-08-10 13:43:58,402][00331] Avg episode rewards: #0: 10.038, true rewards: #0: 5.787 [2024-08-10 13:43:58,406][00331] Avg episode reward: 10.038, avg true_objective: 5.787 [2024-08-10 13:43:58,563][00331] Num frames 2400... [2024-08-10 13:43:58,729][00331] Num frames 2500... [2024-08-10 13:43:58,898][00331] Num frames 2600... [2024-08-10 13:43:59,076][00331] Num frames 2700... [2024-08-10 13:43:59,241][00331] Num frames 2800... [2024-08-10 13:43:59,415][00331] Num frames 2900... [2024-08-10 13:43:59,584][00331] Num frames 3000... [2024-08-10 13:43:59,748][00331] Num frames 3100... [2024-08-10 13:43:59,908][00331] Num frames 3200... [2024-08-10 13:44:00,032][00331] Num frames 3300... [2024-08-10 13:44:00,107][00331] Avg episode rewards: #0: 12.224, true rewards: #0: 6.624 [2024-08-10 13:44:00,110][00331] Avg episode reward: 12.224, avg true_objective: 6.624 [2024-08-10 13:44:00,219][00331] Num frames 3400... [2024-08-10 13:44:00,336][00331] Num frames 3500... [2024-08-10 13:44:00,454][00331] Num frames 3600... [2024-08-10 13:44:00,615][00331] Num frames 3700... [2024-08-10 13:44:00,729][00331] Num frames 3800... [2024-08-10 13:44:00,812][00331] Avg episode rewards: #0: 11.707, true rewards: #0: 6.373 [2024-08-10 13:44:00,813][00331] Avg episode reward: 11.707, avg true_objective: 6.373 [2024-08-10 13:44:00,900][00331] Num frames 3900... [2024-08-10 13:44:01,022][00331] Num frames 4000... [2024-08-10 13:44:01,152][00331] Num frames 4100... [2024-08-10 13:44:01,285][00331] Num frames 4200... [2024-08-10 13:44:01,401][00331] Num frames 4300... [2024-08-10 13:44:01,526][00331] Num frames 4400... [2024-08-10 13:44:01,644][00331] Num frames 4500... [2024-08-10 13:44:01,769][00331] Num frames 4600... [2024-08-10 13:44:01,891][00331] Avg episode rewards: #0: 13.080, true rewards: #0: 6.651 [2024-08-10 13:44:01,892][00331] Avg episode reward: 13.080, avg true_objective: 6.651 [2024-08-10 13:44:01,946][00331] Num frames 4700... [2024-08-10 13:44:02,060][00331] Num frames 4800... [2024-08-10 13:44:02,181][00331] Num frames 4900... [2024-08-10 13:44:02,301][00331] Num frames 5000... [2024-08-10 13:44:02,421][00331] Num frames 5100... [2024-08-10 13:44:02,482][00331] Avg episode rewards: #0: 12.505, true rewards: #0: 6.380 [2024-08-10 13:44:02,484][00331] Avg episode reward: 12.505, avg true_objective: 6.380 [2024-08-10 13:44:02,604][00331] Num frames 5200... [2024-08-10 13:44:02,721][00331] Num frames 5300... [2024-08-10 13:44:02,839][00331] Num frames 5400... [2024-08-10 13:44:02,956][00331] Num frames 5500... [2024-08-10 13:44:03,076][00331] Num frames 5600... [2024-08-10 13:44:03,203][00331] Num frames 5700... [2024-08-10 13:44:03,273][00331] Avg episode rewards: #0: 12.236, true rewards: #0: 6.347 [2024-08-10 13:44:03,274][00331] Avg episode reward: 12.236, avg true_objective: 6.347 [2024-08-10 13:44:03,380][00331] Num frames 5800... [2024-08-10 13:44:03,502][00331] Num frames 5900... [2024-08-10 13:44:03,627][00331] Num frames 6000... [2024-08-10 13:44:03,742][00331] Num frames 6100... [2024-08-10 13:44:03,861][00331] Num frames 6200... [2024-08-10 13:44:03,945][00331] Avg episode rewards: #0: 11.824, true rewards: #0: 6.224 [2024-08-10 13:44:03,947][00331] Avg episode reward: 11.824, avg true_objective: 6.224 [2024-08-10 13:44:39,421][00331] Replay video saved to /content/train_dir/default_experiment/replay.mp4!