[2024-09-22 15:26:35,389][00338] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-22 15:26:35,392][00338] Rollout worker 0 uses device cpu [2024-09-22 15:26:35,393][00338] Rollout worker 1 uses device cpu [2024-09-22 15:26:35,396][00338] Rollout worker 2 uses device cpu [2024-09-22 15:26:35,400][00338] Rollout worker 3 uses device cpu [2024-09-22 15:26:35,401][00338] Rollout worker 4 uses device cpu [2024-09-22 15:26:35,402][00338] Rollout worker 5 uses device cpu [2024-09-22 15:26:35,404][00338] Rollout worker 6 uses device cpu [2024-09-22 15:26:35,408][00338] Rollout worker 7 uses device cpu [2024-09-22 15:26:35,574][00338] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-22 15:26:35,577][00338] InferenceWorker_p0-w0: min num requests: 2 [2024-09-22 15:26:35,610][00338] Starting all processes... [2024-09-22 15:26:35,611][00338] Starting process learner_proc0 [2024-09-22 15:26:36,309][00338] Starting all processes... [2024-09-22 15:26:36,320][00338] Starting process inference_proc0-0 [2024-09-22 15:26:36,321][00338] Starting process rollout_proc0 [2024-09-22 15:26:36,326][00338] Starting process rollout_proc1 [2024-09-22 15:26:36,326][00338] Starting process rollout_proc2 [2024-09-22 15:26:36,326][00338] Starting process rollout_proc3 [2024-09-22 15:26:36,326][00338] Starting process rollout_proc4 [2024-09-22 15:26:36,326][00338] Starting process rollout_proc5 [2024-09-22 15:26:36,326][00338] Starting process rollout_proc6 [2024-09-22 15:26:36,326][00338] Starting process rollout_proc7 [2024-09-22 15:26:51,985][02371] Worker 4 uses CPU cores [0] [2024-09-22 15:26:52,373][02352] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-22 15:26:52,373][02352] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-22 15:26:52,387][02367] Worker 1 uses CPU cores [1] [2024-09-22 15:26:52,435][02352] Num visible devices: 1 [2024-09-22 15:26:52,471][02352] Starting seed is not provided [2024-09-22 15:26:52,471][02352] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-22 15:26:52,471][02352] Initializing actor-critic model on device cuda:0 [2024-09-22 15:26:52,471][02352] RunningMeanStd input shape: (3, 72, 128) [2024-09-22 15:26:52,474][02352] RunningMeanStd input shape: (1,) [2024-09-22 15:26:52,554][02352] ConvEncoder: input_channels=3 [2024-09-22 15:26:52,634][02368] Worker 2 uses CPU cores [0] [2024-09-22 15:26:52,673][02372] Worker 6 uses CPU cores [0] [2024-09-22 15:26:52,686][02369] Worker 3 uses CPU cores [1] [2024-09-22 15:26:52,711][02366] Worker 0 uses CPU cores [0] [2024-09-22 15:26:52,768][02370] Worker 5 uses CPU cores [1] [2024-09-22 15:26:52,782][02373] Worker 7 uses CPU cores [1] [2024-09-22 15:26:52,808][02365] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-22 15:26:52,809][02365] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-22 15:26:52,827][02365] Num visible devices: 1 [2024-09-22 15:26:52,895][02352] Conv encoder output size: 512 [2024-09-22 15:26:52,895][02352] Policy head output size: 512 [2024-09-22 15:26:52,956][02352] Created Actor Critic model with architecture: [2024-09-22 15:26:52,956][02352] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-22 15:26:53,257][02352] Using optimizer [2024-09-22 15:26:53,954][02352] No checkpoints found [2024-09-22 15:26:53,954][02352] Did not load from checkpoint, starting from scratch! [2024-09-22 15:26:53,954][02352] Initialized policy 0 weights for model version 0 [2024-09-22 15:26:53,958][02352] LearnerWorker_p0 finished initialization! [2024-09-22 15:26:53,959][02352] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-22 15:26:54,056][02365] RunningMeanStd input shape: (3, 72, 128) [2024-09-22 15:26:54,058][02365] RunningMeanStd input shape: (1,) [2024-09-22 15:26:54,069][02365] ConvEncoder: input_channels=3 [2024-09-22 15:26:54,171][02365] Conv encoder output size: 512 [2024-09-22 15:26:54,171][02365] Policy head output size: 512 [2024-09-22 15:26:54,222][00338] Inference worker 0-0 is ready! [2024-09-22 15:26:54,223][00338] All inference workers are ready! Signal rollout workers to start! [2024-09-22 15:26:54,427][02367] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-22 15:26:54,426][02369] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-22 15:26:54,431][02366] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-22 15:26:54,429][02370] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-22 15:26:54,425][02373] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-22 15:26:54,432][02371] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-22 15:26:54,434][02372] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-22 15:26:54,433][02368] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-22 15:26:55,567][00338] Heartbeat connected on Batcher_0 [2024-09-22 15:26:55,574][00338] Heartbeat connected on LearnerWorker_p0 [2024-09-22 15:26:55,626][00338] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-22 15:26:55,888][02369] Decorrelating experience for 0 frames... [2024-09-22 15:26:55,892][02367] Decorrelating experience for 0 frames... [2024-09-22 15:26:55,902][02370] Decorrelating experience for 0 frames... [2024-09-22 15:26:56,161][02372] Decorrelating experience for 0 frames... [2024-09-22 15:26:56,169][02368] Decorrelating experience for 0 frames... [2024-09-22 15:26:56,167][02366] Decorrelating experience for 0 frames... [2024-09-22 15:26:56,171][02371] Decorrelating experience for 0 frames... [2024-09-22 15:26:57,620][02371] Decorrelating experience for 32 frames... [2024-09-22 15:26:57,625][02366] Decorrelating experience for 32 frames... [2024-09-22 15:26:57,629][02372] Decorrelating experience for 32 frames... [2024-09-22 15:26:57,908][02370] Decorrelating experience for 32 frames... [2024-09-22 15:26:57,926][02367] Decorrelating experience for 32 frames... [2024-09-22 15:26:58,334][02369] Decorrelating experience for 32 frames... [2024-09-22 15:26:58,839][00338] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-22 15:26:59,809][02368] Decorrelating experience for 32 frames... [2024-09-22 15:26:59,870][02373] Decorrelating experience for 0 frames... [2024-09-22 15:27:00,049][02371] Decorrelating experience for 64 frames... [2024-09-22 15:27:00,055][02372] Decorrelating experience for 64 frames... [2024-09-22 15:27:00,078][02366] Decorrelating experience for 64 frames... [2024-09-22 15:27:00,667][02369] Decorrelating experience for 64 frames... [2024-09-22 15:27:01,506][02367] Decorrelating experience for 64 frames... [2024-09-22 15:27:01,545][02372] Decorrelating experience for 96 frames... [2024-09-22 15:27:01,547][02371] Decorrelating experience for 96 frames... [2024-09-22 15:27:01,795][02373] Decorrelating experience for 32 frames... [2024-09-22 15:27:01,863][00338] Heartbeat connected on RolloutWorker_w4 [2024-09-22 15:27:01,868][00338] Heartbeat connected on RolloutWorker_w6 [2024-09-22 15:27:02,542][02369] Decorrelating experience for 96 frames... [2024-09-22 15:27:02,589][02368] Decorrelating experience for 64 frames... [2024-09-22 15:27:02,778][00338] Heartbeat connected on RolloutWorker_w3 [2024-09-22 15:27:03,194][02370] Decorrelating experience for 64 frames... [2024-09-22 15:27:03,378][02367] Decorrelating experience for 96 frames... [2024-09-22 15:27:03,635][00338] Heartbeat connected on RolloutWorker_w1 [2024-09-22 15:27:03,757][02366] Decorrelating experience for 96 frames... [2024-09-22 15:27:03,841][00338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-22 15:27:03,860][02368] Decorrelating experience for 96 frames... [2024-09-22 15:27:03,961][02373] Decorrelating experience for 64 frames... [2024-09-22 15:27:04,057][00338] Heartbeat connected on RolloutWorker_w0 [2024-09-22 15:27:04,284][00338] Heartbeat connected on RolloutWorker_w2 [2024-09-22 15:27:04,450][02370] Decorrelating experience for 96 frames... [2024-09-22 15:27:04,759][00338] Heartbeat connected on RolloutWorker_w5 [2024-09-22 15:27:05,630][02373] Decorrelating experience for 96 frames... [2024-09-22 15:27:05,863][00338] Heartbeat connected on RolloutWorker_w7 [2024-09-22 15:27:07,287][02352] Signal inference workers to stop experience collection... [2024-09-22 15:27:07,299][02365] InferenceWorker_p0-w0: stopping experience collection [2024-09-22 15:27:08,839][00338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 170.6. Samples: 1706. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-22 15:27:08,844][00338] Avg episode reward: [(0, '2.022')] [2024-09-22 15:27:10,629][02352] Signal inference workers to resume experience collection... [2024-09-22 15:27:10,630][02365] InferenceWorker_p0-w0: resuming experience collection [2024-09-22 15:27:13,841][00338] Fps is (10 sec: 1638.5, 60 sec: 1092.2, 300 sec: 1092.2). Total num frames: 16384. Throughput: 0: 254.9. Samples: 3824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 3.0) [2024-09-22 15:27:13,843][00338] Avg episode reward: [(0, '3.218')] [2024-09-22 15:27:18,842][00338] Fps is (10 sec: 2866.6, 60 sec: 1433.4, 300 sec: 1433.4). Total num frames: 28672. Throughput: 0: 378.6. Samples: 7572. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-22 15:27:18,850][00338] Avg episode reward: [(0, '3.953')] [2024-09-22 15:27:21,698][02365] Updated weights for policy 0, policy_version 10 (0.0027) [2024-09-22 15:27:23,839][00338] Fps is (10 sec: 3277.3, 60 sec: 1966.1, 300 sec: 1966.1). Total num frames: 49152. Throughput: 0: 404.2. Samples: 10106. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-22 15:27:23,848][00338] Avg episode reward: [(0, '4.469')] [2024-09-22 15:27:28,839][00338] Fps is (10 sec: 4096.9, 60 sec: 2321.1, 300 sec: 2321.1). Total num frames: 69632. Throughput: 0: 565.0. Samples: 16950. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:27:28,846][00338] Avg episode reward: [(0, '4.260')] [2024-09-22 15:27:31,243][02365] Updated weights for policy 0, policy_version 20 (0.0035) [2024-09-22 15:27:33,839][00338] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 86016. Throughput: 0: 637.4. Samples: 22308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:27:33,844][00338] Avg episode reward: [(0, '4.321')] [2024-09-22 15:27:38,839][00338] Fps is (10 sec: 3276.8, 60 sec: 2560.0, 300 sec: 2560.0). Total num frames: 102400. Throughput: 0: 608.7. Samples: 24348. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:27:38,845][00338] Avg episode reward: [(0, '4.430')] [2024-09-22 15:27:38,853][02352] Saving new best policy, reward=4.430! [2024-09-22 15:27:43,386][02365] Updated weights for policy 0, policy_version 30 (0.0020) [2024-09-22 15:27:43,839][00338] Fps is (10 sec: 3686.4, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 122880. Throughput: 0: 665.6. Samples: 29954. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:27:43,847][00338] Avg episode reward: [(0, '4.459')] [2024-09-22 15:27:43,849][02352] Saving new best policy, reward=4.459! [2024-09-22 15:27:48,845][00338] Fps is (10 sec: 4093.8, 60 sec: 2866.9, 300 sec: 2866.9). Total num frames: 143360. Throughput: 0: 802.4. Samples: 36112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:27:48,849][00338] Avg episode reward: [(0, '4.504')] [2024-09-22 15:27:48,859][02352] Saving new best policy, reward=4.504! [2024-09-22 15:27:53,847][00338] Fps is (10 sec: 3274.3, 60 sec: 2829.6, 300 sec: 2829.6). Total num frames: 155648. Throughput: 0: 808.7. Samples: 38102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:27:53,849][00338] Avg episode reward: [(0, '4.535')] [2024-09-22 15:27:53,854][02352] Saving new best policy, reward=4.535! [2024-09-22 15:27:55,649][02365] Updated weights for policy 0, policy_version 40 (0.0019) [2024-09-22 15:27:58,839][00338] Fps is (10 sec: 3278.6, 60 sec: 2935.5, 300 sec: 2935.5). Total num frames: 176128. Throughput: 0: 879.5. Samples: 43402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:27:58,844][00338] Avg episode reward: [(0, '4.471')] [2024-09-22 15:28:03,839][00338] Fps is (10 sec: 4509.0, 60 sec: 3345.2, 300 sec: 3087.8). Total num frames: 200704. Throughput: 0: 946.9. Samples: 50182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:28:03,844][00338] Avg episode reward: [(0, '4.358')] [2024-09-22 15:28:05,066][02365] Updated weights for policy 0, policy_version 50 (0.0020) [2024-09-22 15:28:08,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3042.7). Total num frames: 212992. Throughput: 0: 945.0. Samples: 52630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:28:08,845][00338] Avg episode reward: [(0, '4.332')] [2024-09-22 15:28:13,839][00338] Fps is (10 sec: 2867.2, 60 sec: 3550.0, 300 sec: 3058.3). Total num frames: 229376. Throughput: 0: 886.8. Samples: 56854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:28:13,842][00338] Avg episode reward: [(0, '4.330')] [2024-09-22 15:28:17,173][02365] Updated weights for policy 0, policy_version 60 (0.0032) [2024-09-22 15:28:18,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3123.2). Total num frames: 249856. Throughput: 0: 908.9. Samples: 63210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:28:18,847][00338] Avg episode reward: [(0, '4.348')] [2024-09-22 15:28:23,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3180.4). Total num frames: 270336. Throughput: 0: 936.1. Samples: 66474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:28:23,843][00338] Avg episode reward: [(0, '4.526')] [2024-09-22 15:28:28,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3140.3). Total num frames: 282624. Throughput: 0: 904.6. Samples: 70662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:28:28,844][00338] Avg episode reward: [(0, '4.573')] [2024-09-22 15:28:28,863][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth... [2024-09-22 15:28:29,009][02352] Saving new best policy, reward=4.573! [2024-09-22 15:28:29,342][02365] Updated weights for policy 0, policy_version 70 (0.0042) [2024-09-22 15:28:33,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3233.7). Total num frames: 307200. Throughput: 0: 900.7. Samples: 76638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:28:33,842][00338] Avg episode reward: [(0, '4.466')] [2024-09-22 15:28:38,297][02365] Updated weights for policy 0, policy_version 80 (0.0021) [2024-09-22 15:28:38,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 327680. Throughput: 0: 930.8. Samples: 79980. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-22 15:28:38,842][00338] Avg episode reward: [(0, '4.498')] [2024-09-22 15:28:43,844][00338] Fps is (10 sec: 3275.4, 60 sec: 3617.9, 300 sec: 3237.7). Total num frames: 339968. Throughput: 0: 929.0. Samples: 85210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:28:43,846][00338] Avg episode reward: [(0, '4.571')] [2024-09-22 15:28:48,840][00338] Fps is (10 sec: 3276.7, 60 sec: 3618.4, 300 sec: 3276.8). Total num frames: 360448. Throughput: 0: 892.9. Samples: 90364. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:28:48,842][00338] Avg episode reward: [(0, '4.528')] [2024-09-22 15:28:50,373][02365] Updated weights for policy 0, policy_version 90 (0.0043) [2024-09-22 15:28:53,839][00338] Fps is (10 sec: 4097.8, 60 sec: 3755.1, 300 sec: 3312.4). Total num frames: 380928. Throughput: 0: 913.7. Samples: 93748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:28:53,846][00338] Avg episode reward: [(0, '4.674')] [2024-09-22 15:28:53,848][02352] Saving new best policy, reward=4.674! [2024-09-22 15:28:58,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3345.1). Total num frames: 401408. Throughput: 0: 952.0. Samples: 99692. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:28:58,842][00338] Avg episode reward: [(0, '4.670')] [2024-09-22 15:29:01,797][02365] Updated weights for policy 0, policy_version 100 (0.0041) [2024-09-22 15:29:03,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3309.6). Total num frames: 413696. Throughput: 0: 907.0. Samples: 104024. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-22 15:29:03,842][00338] Avg episode reward: [(0, '4.638')] [2024-09-22 15:29:08,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3371.3). Total num frames: 438272. Throughput: 0: 907.1. Samples: 107292. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:29:08,845][00338] Avg episode reward: [(0, '4.721')] [2024-09-22 15:29:08,855][02352] Saving new best policy, reward=4.721! [2024-09-22 15:29:11,504][02365] Updated weights for policy 0, policy_version 110 (0.0024) [2024-09-22 15:29:13,839][00338] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3367.8). Total num frames: 454656. Throughput: 0: 962.0. Samples: 113950. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:29:13,842][00338] Avg episode reward: [(0, '4.593')] [2024-09-22 15:29:18,841][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3364.6). Total num frames: 471040. Throughput: 0: 921.3. Samples: 118096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:29:18,844][00338] Avg episode reward: [(0, '4.543')] [2024-09-22 15:29:23,642][02365] Updated weights for policy 0, policy_version 120 (0.0039) [2024-09-22 15:29:23,839][00338] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3389.8). Total num frames: 491520. Throughput: 0: 906.2. Samples: 120760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:29:23,844][00338] Avg episode reward: [(0, '4.507')] [2024-09-22 15:29:28,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3440.6). Total num frames: 516096. Throughput: 0: 946.3. Samples: 127790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:29:28,847][00338] Avg episode reward: [(0, '4.756')] [2024-09-22 15:29:28,854][02352] Saving new best policy, reward=4.756! [2024-09-22 15:29:33,842][00338] Fps is (10 sec: 3685.2, 60 sec: 3686.2, 300 sec: 3408.9). Total num frames: 528384. Throughput: 0: 949.5. Samples: 133096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:29:33,848][00338] Avg episode reward: [(0, '4.737')] [2024-09-22 15:29:34,058][02365] Updated weights for policy 0, policy_version 130 (0.0043) [2024-09-22 15:29:38,840][00338] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3430.4). Total num frames: 548864. Throughput: 0: 921.0. Samples: 135194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:29:38,845][00338] Avg episode reward: [(0, '4.793')] [2024-09-22 15:29:38,855][02352] Saving new best policy, reward=4.793! [2024-09-22 15:29:43,839][00338] Fps is (10 sec: 4097.3, 60 sec: 3823.2, 300 sec: 3450.6). Total num frames: 569344. Throughput: 0: 937.6. Samples: 141886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:29:43,846][00338] Avg episode reward: [(0, '4.871')] [2024-09-22 15:29:43,848][02352] Saving new best policy, reward=4.871! [2024-09-22 15:29:44,104][02365] Updated weights for policy 0, policy_version 140 (0.0031) [2024-09-22 15:29:48,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3469.5). Total num frames: 589824. Throughput: 0: 976.1. Samples: 147950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:29:48,841][00338] Avg episode reward: [(0, '4.581')] [2024-09-22 15:29:53,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3440.6). Total num frames: 602112. Throughput: 0: 947.5. Samples: 149930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:29:53,845][00338] Avg episode reward: [(0, '4.601')] [2024-09-22 15:29:56,038][02365] Updated weights for policy 0, policy_version 150 (0.0024) [2024-09-22 15:29:58,840][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3481.6). Total num frames: 626688. Throughput: 0: 927.3. Samples: 155680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:29:58,847][00338] Avg episode reward: [(0, '4.761')] [2024-09-22 15:30:03,840][00338] Fps is (10 sec: 4505.4, 60 sec: 3891.2, 300 sec: 3498.2). Total num frames: 647168. Throughput: 0: 987.8. Samples: 162546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:30:03,847][00338] Avg episode reward: [(0, '4.622')] [2024-09-22 15:30:05,394][02365] Updated weights for policy 0, policy_version 160 (0.0023) [2024-09-22 15:30:08,842][00338] Fps is (10 sec: 3685.4, 60 sec: 3754.5, 300 sec: 3492.3). Total num frames: 663552. Throughput: 0: 983.6. Samples: 165026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:30:08,845][00338] Avg episode reward: [(0, '4.622')] [2024-09-22 15:30:13,839][00338] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3507.9). Total num frames: 684032. Throughput: 0: 933.8. Samples: 169810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:30:13,846][00338] Avg episode reward: [(0, '4.563')] [2024-09-22 15:30:16,482][02365] Updated weights for policy 0, policy_version 170 (0.0015) [2024-09-22 15:30:18,839][00338] Fps is (10 sec: 4097.2, 60 sec: 3891.2, 300 sec: 3522.6). Total num frames: 704512. Throughput: 0: 969.0. Samples: 176698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:30:18,846][00338] Avg episode reward: [(0, '4.883')] [2024-09-22 15:30:18,861][02352] Saving new best policy, reward=4.883! [2024-09-22 15:30:23,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3516.6). Total num frames: 720896. Throughput: 0: 994.9. Samples: 179962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:30:23,845][00338] Avg episode reward: [(0, '5.263')] [2024-09-22 15:30:23,851][02352] Saving new best policy, reward=5.263! [2024-09-22 15:30:28,190][02365] Updated weights for policy 0, policy_version 180 (0.0025) [2024-09-22 15:30:28,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3510.9). Total num frames: 737280. Throughput: 0: 934.7. Samples: 183948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:30:28,842][00338] Avg episode reward: [(0, '5.269')] [2024-09-22 15:30:28,854][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000180_737280.pth... [2024-09-22 15:30:28,974][02352] Saving new best policy, reward=5.269! [2024-09-22 15:30:33,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3543.5). Total num frames: 761856. Throughput: 0: 944.9. Samples: 190472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:30:33,842][00338] Avg episode reward: [(0, '5.052')] [2024-09-22 15:30:37,155][02365] Updated weights for policy 0, policy_version 190 (0.0032) [2024-09-22 15:30:38,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3556.1). Total num frames: 782336. Throughput: 0: 978.0. Samples: 193942. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:30:38,851][00338] Avg episode reward: [(0, '5.141')] [2024-09-22 15:30:43,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3531.7). Total num frames: 794624. Throughput: 0: 956.9. Samples: 198738. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:30:43,847][00338] Avg episode reward: [(0, '5.166')] [2024-09-22 15:30:48,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3543.9). Total num frames: 815104. Throughput: 0: 932.5. Samples: 204506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:30:48,841][00338] Avg episode reward: [(0, '5.383')] [2024-09-22 15:30:48,854][02352] Saving new best policy, reward=5.383! [2024-09-22 15:30:49,154][02365] Updated weights for policy 0, policy_version 200 (0.0025) [2024-09-22 15:30:53,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3573.1). Total num frames: 839680. Throughput: 0: 949.0. Samples: 207726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:30:53,843][00338] Avg episode reward: [(0, '5.660')] [2024-09-22 15:30:53,847][02352] Saving new best policy, reward=5.660! [2024-09-22 15:30:58,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3549.9). Total num frames: 851968. Throughput: 0: 969.7. Samples: 213446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:30:58,843][00338] Avg episode reward: [(0, '5.635')] [2024-09-22 15:31:00,644][02365] Updated weights for policy 0, policy_version 210 (0.0042) [2024-09-22 15:31:03,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3561.0). Total num frames: 872448. Throughput: 0: 923.2. Samples: 218242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:31:03,842][00338] Avg episode reward: [(0, '5.562')] [2024-09-22 15:31:08,840][00338] Fps is (10 sec: 4095.7, 60 sec: 3823.1, 300 sec: 3571.7). Total num frames: 892928. Throughput: 0: 928.9. Samples: 221762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:31:08,844][00338] Avg episode reward: [(0, '5.271')] [2024-09-22 15:31:10,003][02365] Updated weights for policy 0, policy_version 220 (0.0027) [2024-09-22 15:31:13,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3582.0). Total num frames: 913408. Throughput: 0: 985.6. Samples: 228298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:31:13,843][00338] Avg episode reward: [(0, '5.234')] [2024-09-22 15:31:18,841][00338] Fps is (10 sec: 3276.6, 60 sec: 3686.3, 300 sec: 3560.4). Total num frames: 925696. Throughput: 0: 933.1. Samples: 232464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:31:18,845][00338] Avg episode reward: [(0, '5.586')] [2024-09-22 15:31:21,668][02365] Updated weights for policy 0, policy_version 230 (0.0013) [2024-09-22 15:31:23,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3585.9). Total num frames: 950272. Throughput: 0: 925.8. Samples: 235604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:31:23,847][00338] Avg episode reward: [(0, '5.597')] [2024-09-22 15:31:28,839][00338] Fps is (10 sec: 4915.8, 60 sec: 3959.5, 300 sec: 3610.5). Total num frames: 974848. Throughput: 0: 970.6. Samples: 242414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:31:28,842][00338] Avg episode reward: [(0, '5.848')] [2024-09-22 15:31:28,853][02352] Saving new best policy, reward=5.848! [2024-09-22 15:31:31,812][02365] Updated weights for policy 0, policy_version 240 (0.0033) [2024-09-22 15:31:33,842][00338] Fps is (10 sec: 3685.3, 60 sec: 3754.5, 300 sec: 3589.5). Total num frames: 987136. Throughput: 0: 950.2. Samples: 247268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:31:33,848][00338] Avg episode reward: [(0, '5.782')] [2024-09-22 15:31:38,839][00338] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3584.0). Total num frames: 1003520. Throughput: 0: 925.8. Samples: 249388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:31:38,841][00338] Avg episode reward: [(0, '5.718')] [2024-09-22 15:31:42,606][02365] Updated weights for policy 0, policy_version 250 (0.0021) [2024-09-22 15:31:43,839][00338] Fps is (10 sec: 4097.2, 60 sec: 3891.2, 300 sec: 3607.4). Total num frames: 1028096. Throughput: 0: 949.8. Samples: 256188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:31:43,842][00338] Avg episode reward: [(0, '6.157')] [2024-09-22 15:31:43,846][02352] Saving new best policy, reward=6.157! [2024-09-22 15:31:48,840][00338] Fps is (10 sec: 4095.6, 60 sec: 3822.9, 300 sec: 3601.6). Total num frames: 1044480. Throughput: 0: 967.5. Samples: 261780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:31:48,845][00338] Avg episode reward: [(0, '6.049')] [2024-09-22 15:31:53,840][00338] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 1060864. Throughput: 0: 933.1. Samples: 263752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:31:53,847][00338] Avg episode reward: [(0, '6.390')] [2024-09-22 15:31:53,849][02352] Saving new best policy, reward=6.390! [2024-09-22 15:31:54,609][02365] Updated weights for policy 0, policy_version 260 (0.0035) [2024-09-22 15:31:58,839][00338] Fps is (10 sec: 3686.8, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 1081344. Throughput: 0: 921.7. Samples: 269774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:31:58,842][00338] Avg episode reward: [(0, '6.089')] [2024-09-22 15:32:03,839][00338] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1101824. Throughput: 0: 974.4. Samples: 276312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:32:03,843][00338] Avg episode reward: [(0, '6.191')] [2024-09-22 15:32:04,167][02365] Updated weights for policy 0, policy_version 270 (0.0018) [2024-09-22 15:32:08,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1118208. Throughput: 0: 949.4. Samples: 278328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:32:08,843][00338] Avg episode reward: [(0, '6.406')] [2024-09-22 15:32:08,856][02352] Saving new best policy, reward=6.406! [2024-09-22 15:32:13,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1138688. Throughput: 0: 912.9. Samples: 283496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:32:13,842][00338] Avg episode reward: [(0, '6.470')] [2024-09-22 15:32:13,845][02352] Saving new best policy, reward=6.470! [2024-09-22 15:32:15,595][02365] Updated weights for policy 0, policy_version 280 (0.0031) [2024-09-22 15:32:18,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3762.8). Total num frames: 1159168. Throughput: 0: 957.0. Samples: 290332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:32:18,845][00338] Avg episode reward: [(0, '7.265')] [2024-09-22 15:32:18,855][02352] Saving new best policy, reward=7.265! [2024-09-22 15:32:23,843][00338] Fps is (10 sec: 3684.9, 60 sec: 3754.4, 300 sec: 3748.8). Total num frames: 1175552. Throughput: 0: 970.5. Samples: 293066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:32:23,847][00338] Avg episode reward: [(0, '7.586')] [2024-09-22 15:32:23,853][02352] Saving new best policy, reward=7.586! [2024-09-22 15:32:27,249][02365] Updated weights for policy 0, policy_version 290 (0.0032) [2024-09-22 15:32:28,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 1191936. Throughput: 0: 916.2. Samples: 297416. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:32:28,842][00338] Avg episode reward: [(0, '7.498')] [2024-09-22 15:32:28,850][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000291_1191936.pth... [2024-09-22 15:32:28,990][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth [2024-09-22 15:32:33,839][00338] Fps is (10 sec: 4097.7, 60 sec: 3823.1, 300 sec: 3776.7). Total num frames: 1216512. Throughput: 0: 946.3. Samples: 304364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:32:33,847][00338] Avg episode reward: [(0, '7.244')] [2024-09-22 15:32:36,196][02365] Updated weights for policy 0, policy_version 300 (0.0019) [2024-09-22 15:32:38,840][00338] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1236992. Throughput: 0: 978.3. Samples: 307776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:32:38,844][00338] Avg episode reward: [(0, '7.357')] [2024-09-22 15:32:43,842][00338] Fps is (10 sec: 3276.0, 60 sec: 3686.3, 300 sec: 3748.9). Total num frames: 1249280. Throughput: 0: 945.5. Samples: 312324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:32:43,844][00338] Avg episode reward: [(0, '7.439')] [2024-09-22 15:32:47,796][02365] Updated weights for policy 0, policy_version 310 (0.0048) [2024-09-22 15:32:48,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3790.6). Total num frames: 1273856. Throughput: 0: 937.4. Samples: 318494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:32:48,842][00338] Avg episode reward: [(0, '7.960')] [2024-09-22 15:32:48,853][02352] Saving new best policy, reward=7.960! [2024-09-22 15:32:53,839][00338] Fps is (10 sec: 4506.7, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1294336. Throughput: 0: 963.5. Samples: 321686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:32:53,845][00338] Avg episode reward: [(0, '8.157')] [2024-09-22 15:32:53,850][02352] Saving new best policy, reward=8.157! [2024-09-22 15:32:58,847][00338] Fps is (10 sec: 3274.4, 60 sec: 3754.2, 300 sec: 3748.8). Total num frames: 1306624. Throughput: 0: 967.2. Samples: 327028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:32:58,853][00338] Avg episode reward: [(0, '8.346')] [2024-09-22 15:32:58,869][02352] Saving new best policy, reward=8.346! [2024-09-22 15:32:59,206][02365] Updated weights for policy 0, policy_version 320 (0.0039) [2024-09-22 15:33:03,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1327104. Throughput: 0: 928.3. Samples: 332104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:33:03,845][00338] Avg episode reward: [(0, '8.598')] [2024-09-22 15:33:03,849][02352] Saving new best policy, reward=8.598! [2024-09-22 15:33:08,839][00338] Fps is (10 sec: 4099.1, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1347584. Throughput: 0: 941.2. Samples: 335414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:33:08,849][00338] Avg episode reward: [(0, '8.822')] [2024-09-22 15:33:08,860][02352] Saving new best policy, reward=8.822! [2024-09-22 15:33:09,074][02365] Updated weights for policy 0, policy_version 330 (0.0034) [2024-09-22 15:33:13,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1368064. Throughput: 0: 983.4. Samples: 341670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:33:13,844][00338] Avg episode reward: [(0, '9.428')] [2024-09-22 15:33:13,846][02352] Saving new best policy, reward=9.428! [2024-09-22 15:33:18,842][00338] Fps is (10 sec: 3275.9, 60 sec: 3686.2, 300 sec: 3762.7). Total num frames: 1380352. Throughput: 0: 918.6. Samples: 345704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:33:18,847][00338] Avg episode reward: [(0, '9.738')] [2024-09-22 15:33:18,860][02352] Saving new best policy, reward=9.738! [2024-09-22 15:33:21,135][02365] Updated weights for policy 0, policy_version 340 (0.0033) [2024-09-22 15:33:23,840][00338] Fps is (10 sec: 3686.3, 60 sec: 3823.2, 300 sec: 3804.4). Total num frames: 1404928. Throughput: 0: 913.4. Samples: 348880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:33:23,846][00338] Avg episode reward: [(0, '10.152')] [2024-09-22 15:33:23,850][02352] Saving new best policy, reward=10.152! [2024-09-22 15:33:28,839][00338] Fps is (10 sec: 4506.8, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1425408. Throughput: 0: 963.6. Samples: 355682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:33:28,849][00338] Avg episode reward: [(0, '11.169')] [2024-09-22 15:33:28,869][02352] Saving new best policy, reward=11.169! [2024-09-22 15:33:31,534][02365] Updated weights for policy 0, policy_version 350 (0.0029) [2024-09-22 15:33:33,842][00338] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1437696. Throughput: 0: 924.7. Samples: 360104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:33:33,844][00338] Avg episode reward: [(0, '12.188')] [2024-09-22 15:33:33,849][02352] Saving new best policy, reward=12.188! [2024-09-22 15:33:38,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 1458176. Throughput: 0: 911.5. Samples: 362704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:33:38,844][00338] Avg episode reward: [(0, '12.282')] [2024-09-22 15:33:38,856][02352] Saving new best policy, reward=12.282! [2024-09-22 15:33:42,017][02365] Updated weights for policy 0, policy_version 360 (0.0020) [2024-09-22 15:33:43,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3891.4, 300 sec: 3804.4). Total num frames: 1482752. Throughput: 0: 943.4. Samples: 369472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:33:43,846][00338] Avg episode reward: [(0, '12.300')] [2024-09-22 15:33:43,852][02352] Saving new best policy, reward=12.300! [2024-09-22 15:33:48,844][00338] Fps is (10 sec: 3684.8, 60 sec: 3686.1, 300 sec: 3776.6). Total num frames: 1495040. Throughput: 0: 946.7. Samples: 374710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:33:48,846][00338] Avg episode reward: [(0, '13.072')] [2024-09-22 15:33:48,864][02352] Saving new best policy, reward=13.072! [2024-09-22 15:33:53,839][00338] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 1511424. Throughput: 0: 916.9. Samples: 376676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:33:53,842][00338] Avg episode reward: [(0, '12.152')] [2024-09-22 15:33:54,065][02365] Updated weights for policy 0, policy_version 370 (0.0026) [2024-09-22 15:33:58,839][00338] Fps is (10 sec: 4097.7, 60 sec: 3823.4, 300 sec: 3804.4). Total num frames: 1536000. Throughput: 0: 920.0. Samples: 383068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:33:58,842][00338] Avg episode reward: [(0, '13.438')] [2024-09-22 15:33:58,850][02352] Saving new best policy, reward=13.438! [2024-09-22 15:34:03,824][02365] Updated weights for policy 0, policy_version 380 (0.0036) [2024-09-22 15:34:03,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1556480. Throughput: 0: 969.8. Samples: 389342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:34:03,844][00338] Avg episode reward: [(0, '14.617')] [2024-09-22 15:34:03,847][02352] Saving new best policy, reward=14.617! [2024-09-22 15:34:08,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1568768. Throughput: 0: 941.9. Samples: 391266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:34:08,846][00338] Avg episode reward: [(0, '14.618')] [2024-09-22 15:34:08,856][02352] Saving new best policy, reward=14.618! [2024-09-22 15:34:13,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 1589248. Throughput: 0: 911.6. Samples: 396704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:34:13,841][00338] Avg episode reward: [(0, '15.683')] [2024-09-22 15:34:13,844][02352] Saving new best policy, reward=15.683! [2024-09-22 15:34:15,197][02365] Updated weights for policy 0, policy_version 390 (0.0028) [2024-09-22 15:34:18,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3790.5). Total num frames: 1609728. Throughput: 0: 959.4. Samples: 403278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:34:18,843][00338] Avg episode reward: [(0, '15.263')] [2024-09-22 15:34:23,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1626112. Throughput: 0: 954.7. Samples: 405664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:34:23,842][00338] Avg episode reward: [(0, '13.975')] [2024-09-22 15:34:27,142][02365] Updated weights for policy 0, policy_version 400 (0.0038) [2024-09-22 15:34:28,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 1642496. Throughput: 0: 905.6. Samples: 410224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:34:28,846][00338] Avg episode reward: [(0, '14.143')] [2024-09-22 15:34:28,856][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000401_1642496.pth... [2024-09-22 15:34:28,973][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000180_737280.pth [2024-09-22 15:34:33,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1667072. Throughput: 0: 938.4. Samples: 416934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:34:33,846][00338] Avg episode reward: [(0, '13.346')] [2024-09-22 15:34:36,573][02365] Updated weights for policy 0, policy_version 410 (0.0026) [2024-09-22 15:34:38,840][00338] Fps is (10 sec: 4095.9, 60 sec: 3754.6, 300 sec: 3776.6). Total num frames: 1683456. Throughput: 0: 967.7. Samples: 420222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:34:38,842][00338] Avg episode reward: [(0, '14.623')] [2024-09-22 15:34:43,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 1699840. Throughput: 0: 915.5. Samples: 424266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:34:43,847][00338] Avg episode reward: [(0, '14.998')] [2024-09-22 15:34:48,096][02365] Updated weights for policy 0, policy_version 420 (0.0043) [2024-09-22 15:34:48,839][00338] Fps is (10 sec: 3686.5, 60 sec: 3754.9, 300 sec: 3790.5). Total num frames: 1720320. Throughput: 0: 920.8. Samples: 430778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:34:48,842][00338] Avg episode reward: [(0, '16.677')] [2024-09-22 15:34:48,851][02352] Saving new best policy, reward=16.677! [2024-09-22 15:34:53,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1740800. Throughput: 0: 951.7. Samples: 434092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:34:53,846][00338] Avg episode reward: [(0, '16.541')] [2024-09-22 15:34:58,844][00338] Fps is (10 sec: 3684.8, 60 sec: 3686.1, 300 sec: 3762.7). Total num frames: 1757184. Throughput: 0: 934.1. Samples: 438744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:34:58,852][00338] Avg episode reward: [(0, '16.378')] [2024-09-22 15:35:00,171][02365] Updated weights for policy 0, policy_version 430 (0.0027) [2024-09-22 15:35:03,850][00338] Fps is (10 sec: 3273.2, 60 sec: 3617.5, 300 sec: 3762.7). Total num frames: 1773568. Throughput: 0: 912.6. Samples: 444356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:35:03,856][00338] Avg episode reward: [(0, '16.382')] [2024-09-22 15:35:08,839][00338] Fps is (10 sec: 4097.7, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1798144. Throughput: 0: 934.8. Samples: 447728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:35:08,841][00338] Avg episode reward: [(0, '15.820')] [2024-09-22 15:35:09,291][02365] Updated weights for policy 0, policy_version 440 (0.0023) [2024-09-22 15:35:13,841][00338] Fps is (10 sec: 4099.7, 60 sec: 3754.6, 300 sec: 3762.7). Total num frames: 1814528. Throughput: 0: 963.9. Samples: 453602. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-22 15:35:13,844][00338] Avg episode reward: [(0, '15.270')] [2024-09-22 15:35:18,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1830912. Throughput: 0: 918.5. Samples: 458268. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:35:18,842][00338] Avg episode reward: [(0, '16.650')] [2024-09-22 15:35:21,010][02365] Updated weights for policy 0, policy_version 450 (0.0051) [2024-09-22 15:35:23,839][00338] Fps is (10 sec: 4096.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1855488. Throughput: 0: 919.3. Samples: 461588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:35:23,842][00338] Avg episode reward: [(0, '17.366')] [2024-09-22 15:35:23,847][02352] Saving new best policy, reward=17.366! [2024-09-22 15:35:28,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1871872. Throughput: 0: 978.2. Samples: 468286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:35:28,843][00338] Avg episode reward: [(0, '18.678')] [2024-09-22 15:35:28,855][02352] Saving new best policy, reward=18.678! [2024-09-22 15:35:32,397][02365] Updated weights for policy 0, policy_version 460 (0.0037) [2024-09-22 15:35:33,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 1888256. Throughput: 0: 920.7. Samples: 472210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:35:33,844][00338] Avg episode reward: [(0, '19.187')] [2024-09-22 15:35:33,850][02352] Saving new best policy, reward=19.187! [2024-09-22 15:35:38,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1908736. Throughput: 0: 909.2. Samples: 475006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:35:38,842][00338] Avg episode reward: [(0, '20.260')] [2024-09-22 15:35:38,854][02352] Saving new best policy, reward=20.260! [2024-09-22 15:35:42,329][02365] Updated weights for policy 0, policy_version 470 (0.0021) [2024-09-22 15:35:43,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1929216. Throughput: 0: 956.1. Samples: 481766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:35:43,846][00338] Avg episode reward: [(0, '20.296')] [2024-09-22 15:35:43,850][02352] Saving new best policy, reward=20.296! [2024-09-22 15:35:48,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1945600. Throughput: 0: 940.0. Samples: 486648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:35:48,844][00338] Avg episode reward: [(0, '20.982')] [2024-09-22 15:35:48,853][02352] Saving new best policy, reward=20.982! [2024-09-22 15:35:53,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1961984. Throughput: 0: 909.6. Samples: 488658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:35:53,842][00338] Avg episode reward: [(0, '19.365')] [2024-09-22 15:35:54,484][02365] Updated weights for policy 0, policy_version 480 (0.0031) [2024-09-22 15:35:58,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.9, 300 sec: 3762.8). Total num frames: 1982464. Throughput: 0: 922.9. Samples: 495132. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:35:58,842][00338] Avg episode reward: [(0, '19.093')] [2024-09-22 15:36:03,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3823.6, 300 sec: 3762.8). Total num frames: 2002944. Throughput: 0: 951.8. Samples: 501100. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:36:03,845][00338] Avg episode reward: [(0, '18.858')] [2024-09-22 15:36:04,921][02365] Updated weights for policy 0, policy_version 490 (0.0029) [2024-09-22 15:36:08,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 2015232. Throughput: 0: 922.5. Samples: 503100. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:36:08,845][00338] Avg episode reward: [(0, '19.638')] [2024-09-22 15:36:13,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3776.7). Total num frames: 2039808. Throughput: 0: 901.3. Samples: 508846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:36:13,846][00338] Avg episode reward: [(0, '19.198')] [2024-09-22 15:36:15,539][02365] Updated weights for policy 0, policy_version 500 (0.0031) [2024-09-22 15:36:18,841][00338] Fps is (10 sec: 4504.7, 60 sec: 3822.8, 300 sec: 3762.7). Total num frames: 2060288. Throughput: 0: 964.1. Samples: 515596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:36:18,844][00338] Avg episode reward: [(0, '18.836')] [2024-09-22 15:36:23,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 2072576. Throughput: 0: 949.1. Samples: 517716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:36:23,842][00338] Avg episode reward: [(0, '20.531')] [2024-09-22 15:36:27,654][02365] Updated weights for policy 0, policy_version 510 (0.0042) [2024-09-22 15:36:28,839][00338] Fps is (10 sec: 3277.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 2093056. Throughput: 0: 903.8. Samples: 522436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:36:28,842][00338] Avg episode reward: [(0, '20.714')] [2024-09-22 15:36:28,857][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000511_2093056.pth... [2024-09-22 15:36:28,976][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000291_1191936.pth [2024-09-22 15:36:33,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2113536. Throughput: 0: 942.9. Samples: 529080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:36:33,847][00338] Avg episode reward: [(0, '20.633')] [2024-09-22 15:36:37,242][02365] Updated weights for policy 0, policy_version 520 (0.0022) [2024-09-22 15:36:38,843][00338] Fps is (10 sec: 4094.4, 60 sec: 3754.4, 300 sec: 3748.8). Total num frames: 2134016. Throughput: 0: 967.1. Samples: 532180. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:36:38,848][00338] Avg episode reward: [(0, '21.002')] [2024-09-22 15:36:38,867][02352] Saving new best policy, reward=21.002! [2024-09-22 15:36:43,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 2146304. Throughput: 0: 912.9. Samples: 536212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:36:43,847][00338] Avg episode reward: [(0, '22.038')] [2024-09-22 15:36:43,850][02352] Saving new best policy, reward=22.038! [2024-09-22 15:36:48,796][02365] Updated weights for policy 0, policy_version 530 (0.0029) [2024-09-22 15:36:48,840][00338] Fps is (10 sec: 3687.8, 60 sec: 3754.6, 300 sec: 3762.8). Total num frames: 2170880. Throughput: 0: 924.2. Samples: 542690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:36:48,842][00338] Avg episode reward: [(0, '21.295')] [2024-09-22 15:36:53,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2191360. Throughput: 0: 955.7. Samples: 546106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:36:53,843][00338] Avg episode reward: [(0, '21.918')] [2024-09-22 15:36:58,839][00338] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2203648. Throughput: 0: 927.2. Samples: 550568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:36:58,845][00338] Avg episode reward: [(0, '22.319')] [2024-09-22 15:36:58,857][02352] Saving new best policy, reward=22.319! [2024-09-22 15:37:00,954][02365] Updated weights for policy 0, policy_version 540 (0.0014) [2024-09-22 15:37:03,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 2224128. Throughput: 0: 902.2. Samples: 556192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:37:03,849][00338] Avg episode reward: [(0, '20.965')] [2024-09-22 15:37:08,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2244608. Throughput: 0: 929.7. Samples: 559554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:37:08,846][00338] Avg episode reward: [(0, '21.731')] [2024-09-22 15:37:10,136][02365] Updated weights for policy 0, policy_version 550 (0.0030) [2024-09-22 15:37:13,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2260992. Throughput: 0: 949.8. Samples: 565176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:37:13,842][00338] Avg episode reward: [(0, '21.048')] [2024-09-22 15:37:18,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3735.0). Total num frames: 2277376. Throughput: 0: 907.6. Samples: 569922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:37:18,842][00338] Avg episode reward: [(0, '20.025')] [2024-09-22 15:37:21,838][02365] Updated weights for policy 0, policy_version 560 (0.0031) [2024-09-22 15:37:23,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2301952. Throughput: 0: 915.8. Samples: 573388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:37:23,842][00338] Avg episode reward: [(0, '19.551')] [2024-09-22 15:37:28,841][00338] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2318336. Throughput: 0: 967.1. Samples: 579732. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:37:28,846][00338] Avg episode reward: [(0, '18.000')] [2024-09-22 15:37:33,839][00338] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 2330624. Throughput: 0: 912.5. Samples: 583752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:37:33,843][00338] Avg episode reward: [(0, '17.322')] [2024-09-22 15:37:34,104][02365] Updated weights for policy 0, policy_version 570 (0.0016) [2024-09-22 15:37:38,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3686.7, 300 sec: 3748.9). Total num frames: 2355200. Throughput: 0: 905.0. Samples: 586832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:37:38,847][00338] Avg episode reward: [(0, '17.904')] [2024-09-22 15:37:42,926][02365] Updated weights for policy 0, policy_version 580 (0.0019) [2024-09-22 15:37:43,839][00338] Fps is (10 sec: 4915.2, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 2379776. Throughput: 0: 959.6. Samples: 593752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:37:43,847][00338] Avg episode reward: [(0, '17.665')] [2024-09-22 15:37:48,841][00338] Fps is (10 sec: 3685.6, 60 sec: 3686.3, 300 sec: 3721.1). Total num frames: 2392064. Throughput: 0: 941.7. Samples: 598572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:37:48,844][00338] Avg episode reward: [(0, '18.937')] [2024-09-22 15:37:53,839][00338] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3735.1). Total num frames: 2408448. Throughput: 0: 916.2. Samples: 600784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:37:53,842][00338] Avg episode reward: [(0, '20.292')] [2024-09-22 15:37:54,825][02365] Updated weights for policy 0, policy_version 590 (0.0025) [2024-09-22 15:37:58,839][00338] Fps is (10 sec: 4096.8, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2433024. Throughput: 0: 940.6. Samples: 607502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:37:58,841][00338] Avg episode reward: [(0, '19.998')] [2024-09-22 15:38:03,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2449408. Throughput: 0: 962.6. Samples: 613240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:38:03,842][00338] Avg episode reward: [(0, '20.432')] [2024-09-22 15:38:05,706][02365] Updated weights for policy 0, policy_version 600 (0.0046) [2024-09-22 15:38:08,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2465792. Throughput: 0: 930.7. Samples: 615268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:38:08,842][00338] Avg episode reward: [(0, '21.815')] [2024-09-22 15:38:13,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2490368. Throughput: 0: 925.4. Samples: 621374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:38:13,841][00338] Avg episode reward: [(0, '21.220')] [2024-09-22 15:38:15,684][02365] Updated weights for policy 0, policy_version 610 (0.0029) [2024-09-22 15:38:18,840][00338] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 2510848. Throughput: 0: 984.4. Samples: 628052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:38:18,846][00338] Avg episode reward: [(0, '20.758')] [2024-09-22 15:38:23,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2523136. Throughput: 0: 959.5. Samples: 630010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:38:23,849][00338] Avg episode reward: [(0, '21.828')] [2024-09-22 15:38:27,491][02365] Updated weights for policy 0, policy_version 620 (0.0048) [2024-09-22 15:38:28,839][00338] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2543616. Throughput: 0: 918.5. Samples: 635086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:38:28,842][00338] Avg episode reward: [(0, '21.377')] [2024-09-22 15:38:28,850][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000621_2543616.pth... [2024-09-22 15:38:28,999][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000401_1642496.pth [2024-09-22 15:38:33,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 2564096. Throughput: 0: 960.5. Samples: 641792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:38:33,841][00338] Avg episode reward: [(0, '21.466')] [2024-09-22 15:38:37,669][02365] Updated weights for policy 0, policy_version 630 (0.0016) [2024-09-22 15:38:38,840][00338] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 2580480. Throughput: 0: 973.8. Samples: 644608. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:38:38,843][00338] Avg episode reward: [(0, '21.019')] [2024-09-22 15:38:43,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3735.1). Total num frames: 2596864. Throughput: 0: 915.6. Samples: 648704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:38:43,845][00338] Avg episode reward: [(0, '21.530')] [2024-09-22 15:38:48,657][02365] Updated weights for policy 0, policy_version 640 (0.0024) [2024-09-22 15:38:48,839][00338] Fps is (10 sec: 4096.5, 60 sec: 3823.1, 300 sec: 3762.8). Total num frames: 2621440. Throughput: 0: 938.4. Samples: 655466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:38:48,842][00338] Avg episode reward: [(0, '21.919')] [2024-09-22 15:38:53,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 2641920. Throughput: 0: 968.4. Samples: 658844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:38:53,841][00338] Avg episode reward: [(0, '21.526')] [2024-09-22 15:38:58,840][00338] Fps is (10 sec: 3276.6, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2654208. Throughput: 0: 931.2. Samples: 663278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:38:58,848][00338] Avg episode reward: [(0, '22.322')] [2024-09-22 15:38:58,859][02352] Saving new best policy, reward=22.322! [2024-09-22 15:39:00,588][02365] Updated weights for policy 0, policy_version 650 (0.0038) [2024-09-22 15:39:03,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2674688. Throughput: 0: 911.3. Samples: 669062. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:39:03,845][00338] Avg episode reward: [(0, '21.743')] [2024-09-22 15:39:08,839][00338] Fps is (10 sec: 4505.9, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2699264. Throughput: 0: 943.7. Samples: 672476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:39:08,845][00338] Avg episode reward: [(0, '21.966')] [2024-09-22 15:39:09,622][02365] Updated weights for policy 0, policy_version 660 (0.0034) [2024-09-22 15:39:13,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2711552. Throughput: 0: 954.7. Samples: 678048. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:39:13,844][00338] Avg episode reward: [(0, '22.638')] [2024-09-22 15:39:13,849][02352] Saving new best policy, reward=22.638! [2024-09-22 15:39:18,839][00338] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 2727936. Throughput: 0: 911.4. Samples: 682806. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:39:18,846][00338] Avg episode reward: [(0, '23.253')] [2024-09-22 15:39:18,856][02352] Saving new best policy, reward=23.253! [2024-09-22 15:39:21,797][02365] Updated weights for policy 0, policy_version 670 (0.0030) [2024-09-22 15:39:23,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2752512. Throughput: 0: 922.0. Samples: 686096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:39:23,845][00338] Avg episode reward: [(0, '22.961')] [2024-09-22 15:39:28,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2768896. Throughput: 0: 967.2. Samples: 692230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:39:28,842][00338] Avg episode reward: [(0, '22.834')] [2024-09-22 15:39:33,772][02365] Updated weights for policy 0, policy_version 680 (0.0016) [2024-09-22 15:39:33,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2785280. Throughput: 0: 908.2. Samples: 696334. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:39:33,841][00338] Avg episode reward: [(0, '22.203')] [2024-09-22 15:39:38,839][00338] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2805760. Throughput: 0: 905.8. Samples: 699606. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:39:38,842][00338] Avg episode reward: [(0, '20.710')] [2024-09-22 15:39:42,756][02365] Updated weights for policy 0, policy_version 690 (0.0023) [2024-09-22 15:39:43,840][00338] Fps is (10 sec: 4505.4, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2830336. Throughput: 0: 959.0. Samples: 706432. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:39:43,845][00338] Avg episode reward: [(0, '20.960')] [2024-09-22 15:39:48,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2842624. Throughput: 0: 933.5. Samples: 711068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:39:48,842][00338] Avg episode reward: [(0, '20.446')] [2024-09-22 15:39:53,839][00338] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 2863104. Throughput: 0: 912.8. Samples: 713550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:39:53,845][00338] Avg episode reward: [(0, '20.966')] [2024-09-22 15:39:54,674][02365] Updated weights for policy 0, policy_version 700 (0.0038) [2024-09-22 15:39:58,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3762.9). Total num frames: 2883584. Throughput: 0: 938.8. Samples: 720292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:39:58,847][00338] Avg episode reward: [(0, '22.894')] [2024-09-22 15:40:03,843][00338] Fps is (10 sec: 3685.1, 60 sec: 3754.5, 300 sec: 3735.0). Total num frames: 2899968. Throughput: 0: 953.7. Samples: 725724. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:40:03,845][00338] Avg episode reward: [(0, '23.483')] [2024-09-22 15:40:03,853][02352] Saving new best policy, reward=23.483! [2024-09-22 15:40:05,930][02365] Updated weights for policy 0, policy_version 710 (0.0024) [2024-09-22 15:40:08,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 2916352. Throughput: 0: 926.0. Samples: 727764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:40:08,844][00338] Avg episode reward: [(0, '23.948')] [2024-09-22 15:40:08,859][02352] Saving new best policy, reward=23.948! [2024-09-22 15:40:13,839][00338] Fps is (10 sec: 4097.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2940928. Throughput: 0: 929.6. Samples: 734064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:40:13,846][00338] Avg episode reward: [(0, '24.074')] [2024-09-22 15:40:13,849][02352] Saving new best policy, reward=24.074! [2024-09-22 15:40:15,722][02365] Updated weights for policy 0, policy_version 720 (0.0022) [2024-09-22 15:40:18,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2957312. Throughput: 0: 978.5. Samples: 740368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:40:18,844][00338] Avg episode reward: [(0, '24.752')] [2024-09-22 15:40:18,930][02352] Saving new best policy, reward=24.752! [2024-09-22 15:40:23,841][00338] Fps is (10 sec: 3276.2, 60 sec: 3686.3, 300 sec: 3735.0). Total num frames: 2973696. Throughput: 0: 948.9. Samples: 742310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:40:23,847][00338] Avg episode reward: [(0, '25.952')] [2024-09-22 15:40:23,849][02352] Saving new best policy, reward=25.952! [2024-09-22 15:40:27,688][02365] Updated weights for policy 0, policy_version 730 (0.0026) [2024-09-22 15:40:28,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2994176. Throughput: 0: 916.4. Samples: 747672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:40:28,846][00338] Avg episode reward: [(0, '24.923')] [2024-09-22 15:40:28,859][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000731_2994176.pth... [2024-09-22 15:40:28,976][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000511_2093056.pth [2024-09-22 15:40:33,839][00338] Fps is (10 sec: 4096.8, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3014656. Throughput: 0: 963.5. Samples: 754426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:40:33,844][00338] Avg episode reward: [(0, '25.106')] [2024-09-22 15:40:38,188][02365] Updated weights for policy 0, policy_version 740 (0.0030) [2024-09-22 15:40:38,840][00338] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 3031040. Throughput: 0: 962.6. Samples: 756868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:40:38,843][00338] Avg episode reward: [(0, '24.266')] [2024-09-22 15:40:43,839][00338] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3051520. Throughput: 0: 917.7. Samples: 761588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:40:43,841][00338] Avg episode reward: [(0, '23.866')] [2024-09-22 15:40:48,147][02365] Updated weights for policy 0, policy_version 750 (0.0023) [2024-09-22 15:40:48,839][00338] Fps is (10 sec: 4096.3, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3072000. Throughput: 0: 949.9. Samples: 768464. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:40:48,846][00338] Avg episode reward: [(0, '23.736')] [2024-09-22 15:40:53,842][00338] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3092480. Throughput: 0: 979.8. Samples: 771856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:40:53,845][00338] Avg episode reward: [(0, '24.382')] [2024-09-22 15:40:58,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3104768. Throughput: 0: 931.3. Samples: 775974. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:40:58,842][00338] Avg episode reward: [(0, '25.049')] [2024-09-22 15:41:00,073][02365] Updated weights for policy 0, policy_version 760 (0.0013) [2024-09-22 15:41:03,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3776.7). Total num frames: 3129344. Throughput: 0: 936.8. Samples: 782522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:41:03,842][00338] Avg episode reward: [(0, '25.633')] [2024-09-22 15:41:08,840][00338] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 3149824. Throughput: 0: 968.9. Samples: 785908. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:41:08,842][00338] Avg episode reward: [(0, '27.327')] [2024-09-22 15:41:08,854][02352] Saving new best policy, reward=27.327! [2024-09-22 15:41:09,750][02365] Updated weights for policy 0, policy_version 770 (0.0033) [2024-09-22 15:41:13,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3162112. Throughput: 0: 959.1. Samples: 790830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:41:13,843][00338] Avg episode reward: [(0, '25.621')] [2024-09-22 15:41:18,839][00338] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3182592. Throughput: 0: 927.6. Samples: 796170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:41:18,844][00338] Avg episode reward: [(0, '25.792')] [2024-09-22 15:41:20,973][02365] Updated weights for policy 0, policy_version 780 (0.0038) [2024-09-22 15:41:23,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3891.3, 300 sec: 3776.7). Total num frames: 3207168. Throughput: 0: 949.3. Samples: 799588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:41:23,844][00338] Avg episode reward: [(0, '26.433')] [2024-09-22 15:41:28,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3223552. Throughput: 0: 975.7. Samples: 805494. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:41:28,847][00338] Avg episode reward: [(0, '26.497')] [2024-09-22 15:41:32,972][02365] Updated weights for policy 0, policy_version 790 (0.0017) [2024-09-22 15:41:33,839][00338] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3235840. Throughput: 0: 918.4. Samples: 809794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:41:33,845][00338] Avg episode reward: [(0, '25.442')] [2024-09-22 15:41:38,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3776.6). Total num frames: 3260416. Throughput: 0: 920.0. Samples: 813256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:41:38,842][00338] Avg episode reward: [(0, '24.948')] [2024-09-22 15:41:41,958][02365] Updated weights for policy 0, policy_version 800 (0.0032) [2024-09-22 15:41:43,840][00338] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3280896. Throughput: 0: 981.8. Samples: 820156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:41:43,844][00338] Avg episode reward: [(0, '26.335')] [2024-09-22 15:41:48,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3293184. Throughput: 0: 930.4. Samples: 824388. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:41:48,842][00338] Avg episode reward: [(0, '26.305')] [2024-09-22 15:41:53,805][02365] Updated weights for policy 0, policy_version 810 (0.0031) [2024-09-22 15:41:53,839][00338] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3317760. Throughput: 0: 918.3. Samples: 827230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:41:53,842][00338] Avg episode reward: [(0, '24.515')] [2024-09-22 15:41:58,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 3338240. Throughput: 0: 956.4. Samples: 833866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:41:58,845][00338] Avg episode reward: [(0, '24.959')] [2024-09-22 15:42:03,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3354624. Throughput: 0: 951.7. Samples: 838996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:42:03,842][00338] Avg episode reward: [(0, '25.417')] [2024-09-22 15:42:04,990][02365] Updated weights for policy 0, policy_version 820 (0.0039) [2024-09-22 15:42:08,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3371008. Throughput: 0: 922.9. Samples: 841118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:42:08,842][00338] Avg episode reward: [(0, '25.946')] [2024-09-22 15:42:13,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3395584. Throughput: 0: 944.5. Samples: 847998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:42:13,847][00338] Avg episode reward: [(0, '24.203')] [2024-09-22 15:42:14,484][02365] Updated weights for policy 0, policy_version 830 (0.0037) [2024-09-22 15:42:18,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 3416064. Throughput: 0: 984.8. Samples: 854108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:42:18,849][00338] Avg episode reward: [(0, '24.179')] [2024-09-22 15:42:23,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3428352. Throughput: 0: 954.7. Samples: 856218. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:42:23,841][00338] Avg episode reward: [(0, '23.670')] [2024-09-22 15:42:26,027][02365] Updated weights for policy 0, policy_version 840 (0.0038) [2024-09-22 15:42:28,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3448832. Throughput: 0: 929.8. Samples: 861996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:42:28,842][00338] Avg episode reward: [(0, '22.334')] [2024-09-22 15:42:28,852][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000843_3452928.pth... [2024-09-22 15:42:28,991][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000621_2543616.pth [2024-09-22 15:42:33,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 3473408. Throughput: 0: 989.2. Samples: 868904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:42:33,843][00338] Avg episode reward: [(0, '22.276')] [2024-09-22 15:42:36,245][02365] Updated weights for policy 0, policy_version 850 (0.0021) [2024-09-22 15:42:38,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3485696. Throughput: 0: 972.8. Samples: 871004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:42:38,842][00338] Avg episode reward: [(0, '22.951')] [2024-09-22 15:42:43,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3506176. Throughput: 0: 937.9. Samples: 876072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:42:43,842][00338] Avg episode reward: [(0, '22.936')] [2024-09-22 15:42:46,780][02365] Updated weights for policy 0, policy_version 860 (0.0023) [2024-09-22 15:42:48,840][00338] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3530752. Throughput: 0: 978.4. Samples: 883024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:42:48,843][00338] Avg episode reward: [(0, '23.697')] [2024-09-22 15:42:53,841][00338] Fps is (10 sec: 4095.4, 60 sec: 3822.8, 300 sec: 3776.6). Total num frames: 3547136. Throughput: 0: 997.9. Samples: 886024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:42:53,846][00338] Avg episode reward: [(0, '24.406')] [2024-09-22 15:42:58,483][02365] Updated weights for policy 0, policy_version 870 (0.0016) [2024-09-22 15:42:58,839][00338] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3563520. Throughput: 0: 938.3. Samples: 890222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:42:58,847][00338] Avg episode reward: [(0, '25.191')] [2024-09-22 15:43:03,839][00338] Fps is (10 sec: 4096.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3588096. Throughput: 0: 951.5. Samples: 896926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:43:03,843][00338] Avg episode reward: [(0, '25.345')] [2024-09-22 15:43:07,275][02365] Updated weights for policy 0, policy_version 880 (0.0027) [2024-09-22 15:43:08,843][00338] Fps is (10 sec: 4504.0, 60 sec: 3959.2, 300 sec: 3790.5). Total num frames: 3608576. Throughput: 0: 982.6. Samples: 900438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:43:08,846][00338] Avg episode reward: [(0, '24.217')] [2024-09-22 15:43:13,845][00338] Fps is (10 sec: 3275.0, 60 sec: 3754.3, 300 sec: 3762.7). Total num frames: 3620864. Throughput: 0: 957.3. Samples: 905080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:43:13,847][00338] Avg episode reward: [(0, '23.009')] [2024-09-22 15:43:18,839][00338] Fps is (10 sec: 3277.9, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3641344. Throughput: 0: 936.8. Samples: 911062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:43:18,842][00338] Avg episode reward: [(0, '22.656')] [2024-09-22 15:43:19,095][02365] Updated weights for policy 0, policy_version 890 (0.0037) [2024-09-22 15:43:23,847][00338] Fps is (10 sec: 4504.5, 60 sec: 3958.9, 300 sec: 3804.3). Total num frames: 3665920. Throughput: 0: 967.0. Samples: 914528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:43:23,850][00338] Avg episode reward: [(0, '22.201')] [2024-09-22 15:43:28,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3682304. Throughput: 0: 978.1. Samples: 920086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:43:28,848][00338] Avg episode reward: [(0, '21.341')] [2024-09-22 15:43:30,165][02365] Updated weights for policy 0, policy_version 900 (0.0026) [2024-09-22 15:43:33,839][00338] Fps is (10 sec: 3279.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3698688. Throughput: 0: 936.8. Samples: 925182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:43:33,846][00338] Avg episode reward: [(0, '21.223')] [2024-09-22 15:43:38,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 3723264. Throughput: 0: 947.9. Samples: 928680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:43:38,846][00338] Avg episode reward: [(0, '22.092')] [2024-09-22 15:43:39,483][02365] Updated weights for policy 0, policy_version 910 (0.0025) [2024-09-22 15:43:43,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3743744. Throughput: 0: 1004.8. Samples: 935438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:43:43,841][00338] Avg episode reward: [(0, '22.678')] [2024-09-22 15:43:48,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3756032. Throughput: 0: 949.7. Samples: 939662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:43:48,842][00338] Avg episode reward: [(0, '23.779')] [2024-09-22 15:43:50,925][02365] Updated weights for policy 0, policy_version 920 (0.0030) [2024-09-22 15:43:53,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3818.3). Total num frames: 3780608. Throughput: 0: 947.6. Samples: 943076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:43:53,842][00338] Avg episode reward: [(0, '24.754')] [2024-09-22 15:43:58,841][00338] Fps is (10 sec: 4504.7, 60 sec: 3959.3, 300 sec: 3818.3). Total num frames: 3801088. Throughput: 0: 998.5. Samples: 950010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:43:58,847][00338] Avg episode reward: [(0, '24.423')] [2024-09-22 15:44:00,828][02365] Updated weights for policy 0, policy_version 930 (0.0022) [2024-09-22 15:44:03,843][00338] Fps is (10 sec: 3275.7, 60 sec: 3754.4, 300 sec: 3776.6). Total num frames: 3813376. Throughput: 0: 965.7. Samples: 954520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:44:03,845][00338] Avg episode reward: [(0, '23.929')] [2024-09-22 15:44:08,839][00338] Fps is (10 sec: 3277.4, 60 sec: 3754.9, 300 sec: 3804.4). Total num frames: 3833856. Throughput: 0: 944.5. Samples: 957022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:44:08,842][00338] Avg episode reward: [(0, '23.941')] [2024-09-22 15:44:11,453][02365] Updated weights for policy 0, policy_version 940 (0.0027) [2024-09-22 15:44:13,839][00338] Fps is (10 sec: 4507.2, 60 sec: 3959.8, 300 sec: 3832.2). Total num frames: 3858432. Throughput: 0: 977.3. Samples: 964066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:44:13,842][00338] Avg episode reward: [(0, '25.289')] [2024-09-22 15:44:18,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3874816. Throughput: 0: 985.7. Samples: 969538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:44:18,844][00338] Avg episode reward: [(0, '25.561')] [2024-09-22 15:44:23,105][02365] Updated weights for policy 0, policy_version 950 (0.0049) [2024-09-22 15:44:23,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3755.2, 300 sec: 3804.4). Total num frames: 3891200. Throughput: 0: 954.7. Samples: 971640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:44:23,843][00338] Avg episode reward: [(0, '25.334')] [2024-09-22 15:44:28,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3915776. Throughput: 0: 951.8. Samples: 978268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:44:28,842][00338] Avg episode reward: [(0, '24.556')] [2024-09-22 15:44:28,850][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000956_3915776.pth... [2024-09-22 15:44:28,971][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000731_2994176.pth [2024-09-22 15:44:32,084][02365] Updated weights for policy 0, policy_version 960 (0.0036) [2024-09-22 15:44:33,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 3936256. Throughput: 0: 995.4. Samples: 984454. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:44:33,844][00338] Avg episode reward: [(0, '25.367')] [2024-09-22 15:44:38,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3948544. Throughput: 0: 964.6. Samples: 986482. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:44:38,844][00338] Avg episode reward: [(0, '25.374')] [2024-09-22 15:44:43,721][02365] Updated weights for policy 0, policy_version 970 (0.0021) [2024-09-22 15:44:43,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3973120. Throughput: 0: 933.9. Samples: 992036. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:44:43,842][00338] Avg episode reward: [(0, '23.943')] [2024-09-22 15:44:48,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 3993600. Throughput: 0: 989.5. Samples: 999044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:44:48,847][00338] Avg episode reward: [(0, '24.991')] [2024-09-22 15:44:53,842][00338] Fps is (10 sec: 3685.5, 60 sec: 3822.8, 300 sec: 3818.3). Total num frames: 4009984. Throughput: 0: 989.5. Samples: 1001554. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:44:53,851][00338] Avg episode reward: [(0, '26.841')] [2024-09-22 15:44:54,901][02365] Updated weights for policy 0, policy_version 980 (0.0021) [2024-09-22 15:44:58,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3818.3). Total num frames: 4026368. Throughput: 0: 935.6. Samples: 1006168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:44:58,843][00338] Avg episode reward: [(0, '26.905')] [2024-09-22 15:45:03,840][00338] Fps is (10 sec: 4096.9, 60 sec: 3959.7, 300 sec: 3846.1). Total num frames: 4050944. Throughput: 0: 967.6. Samples: 1013080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:45:03,847][00338] Avg episode reward: [(0, '26.659')] [2024-09-22 15:45:04,377][02365] Updated weights for policy 0, policy_version 990 (0.0025) [2024-09-22 15:45:08,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 4067328. Throughput: 0: 996.6. Samples: 1016488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:45:08,846][00338] Avg episode reward: [(0, '26.694')] [2024-09-22 15:45:13,839][00338] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 4083712. Throughput: 0: 940.0. Samples: 1020566. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:45:13,846][00338] Avg episode reward: [(0, '28.101')] [2024-09-22 15:45:13,849][02352] Saving new best policy, reward=28.101! [2024-09-22 15:45:16,112][02365] Updated weights for policy 0, policy_version 1000 (0.0029) [2024-09-22 15:45:18,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 4104192. Throughput: 0: 946.7. Samples: 1027054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-22 15:45:18,848][00338] Avg episode reward: [(0, '25.362')] [2024-09-22 15:45:23,839][00338] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 4128768. Throughput: 0: 978.8. Samples: 1030530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:45:23,842][00338] Avg episode reward: [(0, '24.645')] [2024-09-22 15:45:26,214][02365] Updated weights for policy 0, policy_version 1010 (0.0027) [2024-09-22 15:45:28,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 4141056. Throughput: 0: 963.5. Samples: 1035392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:45:28,842][00338] Avg episode reward: [(0, '24.622')] [2024-09-22 15:45:33,839][00338] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 4161536. Throughput: 0: 930.9. Samples: 1040936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:45:33,846][00338] Avg episode reward: [(0, '24.693')] [2024-09-22 15:45:36,770][02365] Updated weights for policy 0, policy_version 1020 (0.0020) [2024-09-22 15:45:38,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 4186112. Throughput: 0: 954.5. Samples: 1044506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:45:38,842][00338] Avg episode reward: [(0, '21.831')] [2024-09-22 15:45:43,842][00338] Fps is (10 sec: 4094.9, 60 sec: 3822.8, 300 sec: 3832.2). Total num frames: 4202496. Throughput: 0: 982.3. Samples: 1050376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:45:43,846][00338] Avg episode reward: [(0, '21.685')] [2024-09-22 15:45:48,316][02365] Updated weights for policy 0, policy_version 1030 (0.0018) [2024-09-22 15:45:48,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 4218880. Throughput: 0: 937.8. Samples: 1055280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:45:48,847][00338] Avg episode reward: [(0, '22.084')] [2024-09-22 15:45:53,839][00338] Fps is (10 sec: 4097.1, 60 sec: 3891.4, 300 sec: 3860.0). Total num frames: 4243456. Throughput: 0: 939.5. Samples: 1058766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:45:53,842][00338] Avg episode reward: [(0, '23.411')] [2024-09-22 15:45:57,418][02365] Updated weights for policy 0, policy_version 1040 (0.0015) [2024-09-22 15:45:58,840][00338] Fps is (10 sec: 4505.4, 60 sec: 3959.4, 300 sec: 3846.1). Total num frames: 4263936. Throughput: 0: 998.6. Samples: 1065504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:45:58,846][00338] Avg episode reward: [(0, '23.480')] [2024-09-22 15:46:03,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 4276224. Throughput: 0: 945.3. Samples: 1069594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:46:03,842][00338] Avg episode reward: [(0, '23.285')] [2024-09-22 15:46:08,839][00338] Fps is (10 sec: 3277.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 4296704. Throughput: 0: 936.7. Samples: 1072682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:46:08,841][00338] Avg episode reward: [(0, '24.546')] [2024-09-22 15:46:08,944][02365] Updated weights for policy 0, policy_version 1050 (0.0025) [2024-09-22 15:46:13,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 4321280. Throughput: 0: 985.8. Samples: 1079752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:46:13,842][00338] Avg episode reward: [(0, '24.498')] [2024-09-22 15:46:18,845][00338] Fps is (10 sec: 4093.8, 60 sec: 3890.8, 300 sec: 3832.1). Total num frames: 4337664. Throughput: 0: 974.8. Samples: 1084808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:46:18,847][00338] Avg episode reward: [(0, '24.779')] [2024-09-22 15:46:19,926][02365] Updated weights for policy 0, policy_version 1060 (0.0018) [2024-09-22 15:46:23,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 4358144. Throughput: 0: 945.8. Samples: 1087068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:46:23,842][00338] Avg episode reward: [(0, '24.723')] [2024-09-22 15:46:28,839][00338] Fps is (10 sec: 4098.2, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 4378624. Throughput: 0: 972.8. Samples: 1094150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:46:28,846][00338] Avg episode reward: [(0, '24.971')] [2024-09-22 15:46:28,856][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001069_4378624.pth... [2024-09-22 15:46:28,990][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000843_3452928.pth [2024-09-22 15:46:29,284][02365] Updated weights for policy 0, policy_version 1070 (0.0019) [2024-09-22 15:46:33,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 4395008. Throughput: 0: 989.0. Samples: 1099784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:46:33,844][00338] Avg episode reward: [(0, '24.165')] [2024-09-22 15:46:38,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 4411392. Throughput: 0: 955.9. Samples: 1101782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:46:38,841][00338] Avg episode reward: [(0, '25.500')] [2024-09-22 15:46:41,088][02365] Updated weights for policy 0, policy_version 1080 (0.0029) [2024-09-22 15:46:43,840][00338] Fps is (10 sec: 4095.9, 60 sec: 3891.3, 300 sec: 3873.8). Total num frames: 4435968. Throughput: 0: 941.2. Samples: 1107860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:46:43,843][00338] Avg episode reward: [(0, '25.466')] [2024-09-22 15:46:48,844][00338] Fps is (10 sec: 4503.6, 60 sec: 3959.2, 300 sec: 3859.9). Total num frames: 4456448. Throughput: 0: 1001.4. Samples: 1114662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:46:48,846][00338] Avg episode reward: [(0, '25.895')] [2024-09-22 15:46:51,279][02365] Updated weights for policy 0, policy_version 1090 (0.0041) [2024-09-22 15:46:53,844][00338] Fps is (10 sec: 3275.4, 60 sec: 3754.4, 300 sec: 3832.1). Total num frames: 4468736. Throughput: 0: 977.9. Samples: 1116690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:46:53,847][00338] Avg episode reward: [(0, '26.322')] [2024-09-22 15:46:58,839][00338] Fps is (10 sec: 3278.3, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 4489216. Throughput: 0: 936.8. Samples: 1121906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:46:58,842][00338] Avg episode reward: [(0, '25.970')] [2024-09-22 15:47:01,736][02365] Updated weights for policy 0, policy_version 1100 (0.0022) [2024-09-22 15:47:03,839][00338] Fps is (10 sec: 4507.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 4513792. Throughput: 0: 977.8. Samples: 1128804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:47:03,845][00338] Avg episode reward: [(0, '25.974')] [2024-09-22 15:47:08,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 4530176. Throughput: 0: 990.4. Samples: 1131634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:47:08,844][00338] Avg episode reward: [(0, '26.032')] [2024-09-22 15:47:13,316][02365] Updated weights for policy 0, policy_version 1110 (0.0050) [2024-09-22 15:47:13,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 4546560. Throughput: 0: 930.5. Samples: 1136024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:47:13,841][00338] Avg episode reward: [(0, '24.220')] [2024-09-22 15:47:18,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.6, 300 sec: 3873.8). Total num frames: 4571136. Throughput: 0: 959.3. Samples: 1142952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:47:18,842][00338] Avg episode reward: [(0, '23.942')] [2024-09-22 15:47:22,326][02365] Updated weights for policy 0, policy_version 1120 (0.0033) [2024-09-22 15:47:23,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 4591616. Throughput: 0: 990.5. Samples: 1146356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:47:23,842][00338] Avg episode reward: [(0, '22.853')] [2024-09-22 15:47:28,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 4603904. Throughput: 0: 952.6. Samples: 1150726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:47:28,847][00338] Avg episode reward: [(0, '22.471')] [2024-09-22 15:47:33,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 4624384. Throughput: 0: 938.6. Samples: 1156894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:47:33,841][00338] Avg episode reward: [(0, '21.920')] [2024-09-22 15:47:33,904][02365] Updated weights for policy 0, policy_version 1130 (0.0034) [2024-09-22 15:47:38,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 4648960. Throughput: 0: 971.6. Samples: 1160408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:47:38,842][00338] Avg episode reward: [(0, '22.636')] [2024-09-22 15:47:43,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 4661248. Throughput: 0: 974.0. Samples: 1165734. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:47:43,841][00338] Avg episode reward: [(0, '22.844')] [2024-09-22 15:47:45,646][02365] Updated weights for policy 0, policy_version 1140 (0.0023) [2024-09-22 15:47:48,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3846.1). Total num frames: 4681728. Throughput: 0: 939.6. Samples: 1171084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:47:48,846][00338] Avg episode reward: [(0, '22.251')] [2024-09-22 15:47:53,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.8, 300 sec: 3873.8). Total num frames: 4706304. Throughput: 0: 956.5. Samples: 1174676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:47:53,845][00338] Avg episode reward: [(0, '24.751')] [2024-09-22 15:47:54,155][02365] Updated weights for policy 0, policy_version 1150 (0.0027) [2024-09-22 15:47:58,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 4722688. Throughput: 0: 996.8. Samples: 1180880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:47:58,842][00338] Avg episode reward: [(0, '24.637')] [2024-09-22 15:48:03,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 4739072. Throughput: 0: 940.3. Samples: 1185266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:48:03,847][00338] Avg episode reward: [(0, '25.934')] [2024-09-22 15:48:06,042][02365] Updated weights for policy 0, policy_version 1160 (0.0051) [2024-09-22 15:48:08,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 4763648. Throughput: 0: 941.5. Samples: 1188722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:48:08,847][00338] Avg episode reward: [(0, '26.302')] [2024-09-22 15:48:13,839][00338] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 4784128. Throughput: 0: 999.6. Samples: 1195710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:48:13,841][00338] Avg episode reward: [(0, '26.414')] [2024-09-22 15:48:16,168][02365] Updated weights for policy 0, policy_version 1170 (0.0024) [2024-09-22 15:48:18,847][00338] Fps is (10 sec: 3274.4, 60 sec: 3754.2, 300 sec: 3832.2). Total num frames: 4796416. Throughput: 0: 957.1. Samples: 1199970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:48:18,853][00338] Avg episode reward: [(0, '27.706')] [2024-09-22 15:48:23,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 4820992. Throughput: 0: 944.0. Samples: 1202886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:48:23,842][00338] Avg episode reward: [(0, '26.611')] [2024-09-22 15:48:26,367][02365] Updated weights for policy 0, policy_version 1180 (0.0021) [2024-09-22 15:48:28,839][00338] Fps is (10 sec: 4508.9, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 4841472. Throughput: 0: 980.8. Samples: 1209868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:48:28,846][00338] Avg episode reward: [(0, '27.605')] [2024-09-22 15:48:28,916][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001183_4845568.pth... [2024-09-22 15:48:29,037][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000956_3915776.pth [2024-09-22 15:48:33,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 4857856. Throughput: 0: 973.7. Samples: 1214900. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:48:33,842][00338] Avg episode reward: [(0, '27.598')] [2024-09-22 15:48:38,356][02365] Updated weights for policy 0, policy_version 1190 (0.0026) [2024-09-22 15:48:38,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 4874240. Throughput: 0: 940.2. Samples: 1216984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:48:38,847][00338] Avg episode reward: [(0, '28.541')] [2024-09-22 15:48:38,855][02352] Saving new best policy, reward=28.541! [2024-09-22 15:48:43,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 4898816. Throughput: 0: 952.3. Samples: 1223732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:48:43,846][00338] Avg episode reward: [(0, '27.429')] [2024-09-22 15:48:47,501][02365] Updated weights for policy 0, policy_version 1200 (0.0026) [2024-09-22 15:48:48,842][00338] Fps is (10 sec: 4504.6, 60 sec: 3959.3, 300 sec: 3859.9). Total num frames: 4919296. Throughput: 0: 992.2. Samples: 1229916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:48:48,846][00338] Avg episode reward: [(0, '26.557')] [2024-09-22 15:48:53,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 4931584. Throughput: 0: 960.7. Samples: 1231954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:48:53,844][00338] Avg episode reward: [(0, '26.408')] [2024-09-22 15:48:58,751][02365] Updated weights for policy 0, policy_version 1210 (0.0032) [2024-09-22 15:48:58,839][00338] Fps is (10 sec: 3687.2, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 4956160. Throughput: 0: 938.0. Samples: 1237918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:48:58,843][00338] Avg episode reward: [(0, '25.862')] [2024-09-22 15:49:03,840][00338] Fps is (10 sec: 4505.1, 60 sec: 3959.4, 300 sec: 3873.8). Total num frames: 4976640. Throughput: 0: 994.8. Samples: 1244730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:49:03,845][00338] Avg episode reward: [(0, '24.313')] [2024-09-22 15:49:08,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 4988928. Throughput: 0: 979.5. Samples: 1246964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:49:08,845][00338] Avg episode reward: [(0, '25.503')] [2024-09-22 15:49:10,376][02365] Updated weights for policy 0, policy_version 1220 (0.0013) [2024-09-22 15:49:13,839][00338] Fps is (10 sec: 3277.1, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 5009408. Throughput: 0: 935.3. Samples: 1251956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:49:13,847][00338] Avg episode reward: [(0, '25.778')] [2024-09-22 15:49:18,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3960.0, 300 sec: 3873.8). Total num frames: 5033984. Throughput: 0: 979.1. Samples: 1258960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:49:18,847][00338] Avg episode reward: [(0, '25.932')] [2024-09-22 15:49:19,321][02365] Updated weights for policy 0, policy_version 1230 (0.0032) [2024-09-22 15:49:23,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 5050368. Throughput: 0: 1001.4. Samples: 1262046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:49:23,844][00338] Avg episode reward: [(0, '24.164')] [2024-09-22 15:49:28,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 5066752. Throughput: 0: 945.7. Samples: 1266290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:49:28,848][00338] Avg episode reward: [(0, '23.159')] [2024-09-22 15:49:30,779][02365] Updated weights for policy 0, policy_version 1240 (0.0026) [2024-09-22 15:49:33,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 5091328. Throughput: 0: 956.8. Samples: 1272972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:49:33,846][00338] Avg episode reward: [(0, '22.880')] [2024-09-22 15:49:38,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 5111808. Throughput: 0: 988.9. Samples: 1276454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:49:38,842][00338] Avg episode reward: [(0, '21.987')] [2024-09-22 15:49:41,452][02365] Updated weights for policy 0, policy_version 1250 (0.0031) [2024-09-22 15:49:43,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 5124096. Throughput: 0: 959.0. Samples: 1281072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:49:43,846][00338] Avg episode reward: [(0, '22.300')] [2024-09-22 15:49:48,840][00338] Fps is (10 sec: 3686.3, 60 sec: 3823.1, 300 sec: 3860.0). Total num frames: 5148672. Throughput: 0: 943.7. Samples: 1287196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:49:48,848][00338] Avg episode reward: [(0, '22.491')] [2024-09-22 15:49:51,375][02365] Updated weights for policy 0, policy_version 1260 (0.0017) [2024-09-22 15:49:53,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 5169152. Throughput: 0: 972.1. Samples: 1290710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:49:53,842][00338] Avg episode reward: [(0, '22.279')] [2024-09-22 15:49:58,839][00338] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 5185536. Throughput: 0: 983.6. Samples: 1296218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:49:58,843][00338] Avg episode reward: [(0, '22.594')] [2024-09-22 15:50:03,021][02365] Updated weights for policy 0, policy_version 1270 (0.0015) [2024-09-22 15:50:03,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 5201920. Throughput: 0: 941.6. Samples: 1301334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:50:03,844][00338] Avg episode reward: [(0, '22.053')] [2024-09-22 15:50:08,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 5226496. Throughput: 0: 950.0. Samples: 1304796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:50:08,846][00338] Avg episode reward: [(0, '24.005')] [2024-09-22 15:50:12,416][02365] Updated weights for policy 0, policy_version 1280 (0.0031) [2024-09-22 15:50:13,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 5246976. Throughput: 0: 997.3. Samples: 1311168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:50:13,843][00338] Avg episode reward: [(0, '24.701')] [2024-09-22 15:50:18,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 5259264. Throughput: 0: 945.6. Samples: 1315526. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:50:18,844][00338] Avg episode reward: [(0, '25.818')] [2024-09-22 15:50:23,433][02365] Updated weights for policy 0, policy_version 1290 (0.0019) [2024-09-22 15:50:23,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 5283840. Throughput: 0: 946.9. Samples: 1319066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:50:23,847][00338] Avg episode reward: [(0, '26.601')] [2024-09-22 15:50:28,845][00338] Fps is (10 sec: 4503.2, 60 sec: 3959.1, 300 sec: 3873.8). Total num frames: 5304320. Throughput: 0: 995.9. Samples: 1325892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:50:28,847][00338] Avg episode reward: [(0, '26.920')] [2024-09-22 15:50:28,855][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001295_5304320.pth... [2024-09-22 15:50:29,070][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001069_4378624.pth [2024-09-22 15:50:33,841][00338] Fps is (10 sec: 3276.3, 60 sec: 3754.6, 300 sec: 3832.2). Total num frames: 5316608. Throughput: 0: 956.1. Samples: 1330220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:50:33,848][00338] Avg episode reward: [(0, '26.971')] [2024-09-22 15:50:35,233][02365] Updated weights for policy 0, policy_version 1300 (0.0036) [2024-09-22 15:50:38,839][00338] Fps is (10 sec: 3688.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 5341184. Throughput: 0: 938.5. Samples: 1332942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:50:38,844][00338] Avg episode reward: [(0, '27.082')] [2024-09-22 15:50:43,839][00338] Fps is (10 sec: 4506.3, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 5361664. Throughput: 0: 970.9. Samples: 1339908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:50:43,846][00338] Avg episode reward: [(0, '28.731')] [2024-09-22 15:50:43,851][02352] Saving new best policy, reward=28.731! [2024-09-22 15:50:44,266][02365] Updated weights for policy 0, policy_version 1310 (0.0029) [2024-09-22 15:50:48,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 5378048. Throughput: 0: 972.7. Samples: 1345106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:50:48,845][00338] Avg episode reward: [(0, '27.914')] [2024-09-22 15:50:53,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 5394432. Throughput: 0: 941.1. Samples: 1347146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:50:53,845][00338] Avg episode reward: [(0, '27.807')] [2024-09-22 15:50:55,777][02365] Updated weights for policy 0, policy_version 1320 (0.0039) [2024-09-22 15:50:58,841][00338] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 5419008. Throughput: 0: 948.5. Samples: 1353854. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:50:58,846][00338] Avg episode reward: [(0, '27.102')] [2024-09-22 15:51:03,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 5435392. Throughput: 0: 989.8. Samples: 1360066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:51:03,843][00338] Avg episode reward: [(0, '27.408')] [2024-09-22 15:51:06,826][02365] Updated weights for policy 0, policy_version 1330 (0.0029) [2024-09-22 15:51:08,843][00338] Fps is (10 sec: 3276.2, 60 sec: 3754.5, 300 sec: 3832.1). Total num frames: 5451776. Throughput: 0: 956.1. Samples: 1362094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:51:08,851][00338] Avg episode reward: [(0, '26.276')] [2024-09-22 15:51:13,840][00338] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 5472256. Throughput: 0: 931.3. Samples: 1367798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:51:13,848][00338] Avg episode reward: [(0, '24.708')] [2024-09-22 15:51:16,529][02365] Updated weights for policy 0, policy_version 1340 (0.0021) [2024-09-22 15:51:18,839][00338] Fps is (10 sec: 4507.1, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 5496832. Throughput: 0: 991.2. Samples: 1374822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:51:18,842][00338] Avg episode reward: [(0, '25.876')] [2024-09-22 15:51:23,840][00338] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 5513216. Throughput: 0: 984.3. Samples: 1377238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:51:23,848][00338] Avg episode reward: [(0, '25.140')] [2024-09-22 15:51:28,119][02365] Updated weights for policy 0, policy_version 1350 (0.0028) [2024-09-22 15:51:28,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3755.0, 300 sec: 3846.1). Total num frames: 5529600. Throughput: 0: 935.8. Samples: 1382018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:51:28,842][00338] Avg episode reward: [(0, '25.539')] [2024-09-22 15:51:33,839][00338] Fps is (10 sec: 4096.1, 60 sec: 3959.6, 300 sec: 3873.8). Total num frames: 5554176. Throughput: 0: 974.2. Samples: 1388946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:51:33,841][00338] Avg episode reward: [(0, '25.036')] [2024-09-22 15:51:37,671][02365] Updated weights for policy 0, policy_version 1360 (0.0022) [2024-09-22 15:51:38,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 5570560. Throughput: 0: 999.6. Samples: 1392130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:51:38,845][00338] Avg episode reward: [(0, '25.885')] [2024-09-22 15:51:43,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 5586944. Throughput: 0: 943.2. Samples: 1396298. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:51:43,843][00338] Avg episode reward: [(0, '25.270')] [2024-09-22 15:51:48,816][02365] Updated weights for policy 0, policy_version 1370 (0.0034) [2024-09-22 15:51:48,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 5611520. Throughput: 0: 952.8. Samples: 1402944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:51:48,843][00338] Avg episode reward: [(0, '25.207')] [2024-09-22 15:51:53,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 5632000. Throughput: 0: 984.8. Samples: 1406408. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:51:53,844][00338] Avg episode reward: [(0, '26.145')] [2024-09-22 15:51:58,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3832.2). Total num frames: 5644288. Throughput: 0: 967.6. Samples: 1411338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:51:58,841][00338] Avg episode reward: [(0, '26.708')] [2024-09-22 15:52:00,395][02365] Updated weights for policy 0, policy_version 1380 (0.0036) [2024-09-22 15:52:03,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 5664768. Throughput: 0: 935.8. Samples: 1416934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:52:03,847][00338] Avg episode reward: [(0, '25.863')] [2024-09-22 15:52:08,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.7, 300 sec: 3873.8). Total num frames: 5689344. Throughput: 0: 959.6. Samples: 1420420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-22 15:52:08,842][00338] Avg episode reward: [(0, '26.162')] [2024-09-22 15:52:09,305][02365] Updated weights for policy 0, policy_version 1390 (0.0026) [2024-09-22 15:52:13,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 5705728. Throughput: 0: 984.9. Samples: 1426338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:52:13,844][00338] Avg episode reward: [(0, '25.705')] [2024-09-22 15:52:18,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 5722112. Throughput: 0: 937.4. Samples: 1431130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:52:18,842][00338] Avg episode reward: [(0, '26.444')] [2024-09-22 15:52:20,961][02365] Updated weights for policy 0, policy_version 1400 (0.0031) [2024-09-22 15:52:23,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 5746688. Throughput: 0: 944.1. Samples: 1434616. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:52:23,842][00338] Avg episode reward: [(0, '25.719')] [2024-09-22 15:52:28,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 5763072. Throughput: 0: 1001.2. Samples: 1441350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:52:28,844][00338] Avg episode reward: [(0, '26.889')] [2024-09-22 15:52:28,948][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001408_5767168.pth... [2024-09-22 15:52:29,086][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001183_4845568.pth [2024-09-22 15:52:31,989][02365] Updated weights for policy 0, policy_version 1410 (0.0034) [2024-09-22 15:52:33,839][00338] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 5779456. Throughput: 0: 943.3. Samples: 1445392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:52:33,845][00338] Avg episode reward: [(0, '27.698')] [2024-09-22 15:52:38,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 5799936. Throughput: 0: 934.2. Samples: 1448448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:52:38,847][00338] Avg episode reward: [(0, '28.958')] [2024-09-22 15:52:38,883][02352] Saving new best policy, reward=28.958! [2024-09-22 15:52:41,673][02365] Updated weights for policy 0, policy_version 1420 (0.0025) [2024-09-22 15:52:43,839][00338] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 5824512. Throughput: 0: 977.3. Samples: 1455318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:52:43,846][00338] Avg episode reward: [(0, '29.876')] [2024-09-22 15:52:43,852][02352] Saving new best policy, reward=29.876! [2024-09-22 15:52:48,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 5836800. Throughput: 0: 960.4. Samples: 1460154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:52:48,841][00338] Avg episode reward: [(0, '30.492')] [2024-09-22 15:52:48,848][02352] Saving new best policy, reward=30.492! [2024-09-22 15:52:53,558][02365] Updated weights for policy 0, policy_version 1430 (0.0016) [2024-09-22 15:52:53,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 5857280. Throughput: 0: 931.8. Samples: 1462350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:52:53,842][00338] Avg episode reward: [(0, '30.174')] [2024-09-22 15:52:58,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 5881856. Throughput: 0: 952.7. Samples: 1469208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:52:58,846][00338] Avg episode reward: [(0, '29.841')] [2024-09-22 15:53:03,332][02365] Updated weights for policy 0, policy_version 1440 (0.0040) [2024-09-22 15:53:03,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 5898240. Throughput: 0: 977.2. Samples: 1475106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:53:03,843][00338] Avg episode reward: [(0, '28.492')] [2024-09-22 15:53:08,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 5914624. Throughput: 0: 943.7. Samples: 1477082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:53:08,841][00338] Avg episode reward: [(0, '28.378')] [2024-09-22 15:53:13,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.1). Total num frames: 5935104. Throughput: 0: 932.3. Samples: 1483302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:53:13,847][00338] Avg episode reward: [(0, '26.891')] [2024-09-22 15:53:13,962][02365] Updated weights for policy 0, policy_version 1450 (0.0030) [2024-09-22 15:53:18,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 5959680. Throughput: 0: 992.7. Samples: 1490064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:53:18,844][00338] Avg episode reward: [(0, '25.907')] [2024-09-22 15:53:23,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 5971968. Throughput: 0: 970.9. Samples: 1492140. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:53:23,844][00338] Avg episode reward: [(0, '25.435')] [2024-09-22 15:53:25,663][02365] Updated weights for policy 0, policy_version 1460 (0.0030) [2024-09-22 15:53:28,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 5992448. Throughput: 0: 934.6. Samples: 1497374. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:53:28,843][00338] Avg episode reward: [(0, '24.657')] [2024-09-22 15:53:33,840][00338] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 6017024. Throughput: 0: 981.7. Samples: 1504330. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:53:33,842][00338] Avg episode reward: [(0, '26.586')] [2024-09-22 15:53:34,705][02365] Updated weights for policy 0, policy_version 1470 (0.0013) [2024-09-22 15:53:38,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 6029312. Throughput: 0: 994.8. Samples: 1507116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:53:38,843][00338] Avg episode reward: [(0, '26.017')] [2024-09-22 15:53:43,839][00338] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 6049792. Throughput: 0: 938.7. Samples: 1511450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:53:43,844][00338] Avg episode reward: [(0, '26.007')] [2024-09-22 15:53:46,217][02365] Updated weights for policy 0, policy_version 1480 (0.0040) [2024-09-22 15:53:48,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 6070272. Throughput: 0: 963.2. Samples: 1518450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:53:48,842][00338] Avg episode reward: [(0, '27.049')] [2024-09-22 15:53:53,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 6090752. Throughput: 0: 996.7. Samples: 1521932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:53:53,842][00338] Avg episode reward: [(0, '27.243')] [2024-09-22 15:53:57,304][02365] Updated weights for policy 0, policy_version 1490 (0.0036) [2024-09-22 15:53:58,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 6107136. Throughput: 0: 955.0. Samples: 1526278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:53:58,846][00338] Avg episode reward: [(0, '27.224')] [2024-09-22 15:54:03,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 6127616. Throughput: 0: 946.8. Samples: 1532672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:54:03,843][00338] Avg episode reward: [(0, '26.205')] [2024-09-22 15:54:06,747][02365] Updated weights for policy 0, policy_version 1500 (0.0033) [2024-09-22 15:54:08,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 6152192. Throughput: 0: 974.8. Samples: 1536008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:54:08,847][00338] Avg episode reward: [(0, '26.740')] [2024-09-22 15:54:13,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 6164480. Throughput: 0: 975.1. Samples: 1541254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:54:13,853][00338] Avg episode reward: [(0, '27.168')] [2024-09-22 15:54:18,575][02365] Updated weights for policy 0, policy_version 1510 (0.0021) [2024-09-22 15:54:18,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 6184960. Throughput: 0: 940.3. Samples: 1546642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:54:18,846][00338] Avg episode reward: [(0, '26.135')] [2024-09-22 15:54:23,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 6209536. Throughput: 0: 957.0. Samples: 1550180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:54:23,842][00338] Avg episode reward: [(0, '26.723')] [2024-09-22 15:54:28,323][02365] Updated weights for policy 0, policy_version 1520 (0.0030) [2024-09-22 15:54:28,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 6225920. Throughput: 0: 999.8. Samples: 1556440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:54:28,843][00338] Avg episode reward: [(0, '27.461')] [2024-09-22 15:54:28,852][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001520_6225920.pth... [2024-09-22 15:54:29,014][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001295_5304320.pth [2024-09-22 15:54:33,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 6242304. Throughput: 0: 940.4. Samples: 1560766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:54:33,846][00338] Avg episode reward: [(0, '29.442')] [2024-09-22 15:54:38,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 6262784. Throughput: 0: 938.0. Samples: 1564144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:54:38,844][00338] Avg episode reward: [(0, '29.477')] [2024-09-22 15:54:38,953][02365] Updated weights for policy 0, policy_version 1530 (0.0040) [2024-09-22 15:54:43,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 6287360. Throughput: 0: 996.8. Samples: 1571134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:54:43,843][00338] Avg episode reward: [(0, '29.971')] [2024-09-22 15:54:48,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 6299648. Throughput: 0: 951.6. Samples: 1575494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:54:48,843][00338] Avg episode reward: [(0, '30.781')] [2024-09-22 15:54:48,865][02352] Saving new best policy, reward=30.781! [2024-09-22 15:54:50,536][02365] Updated weights for policy 0, policy_version 1540 (0.0028) [2024-09-22 15:54:53,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 6320128. Throughput: 0: 941.4. Samples: 1578372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:54:53,845][00338] Avg episode reward: [(0, '31.036')] [2024-09-22 15:54:53,848][02352] Saving new best policy, reward=31.036! [2024-09-22 15:54:58,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 6344704. Throughput: 0: 977.6. Samples: 1585248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:54:58,849][00338] Avg episode reward: [(0, '31.565')] [2024-09-22 15:54:58,860][02352] Saving new best policy, reward=31.565! [2024-09-22 15:54:59,461][02365] Updated weights for policy 0, policy_version 1550 (0.0020) [2024-09-22 15:55:03,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 6356992. Throughput: 0: 972.6. Samples: 1590408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:55:03,848][00338] Avg episode reward: [(0, '32.025')] [2024-09-22 15:55:03,887][02352] Saving new best policy, reward=32.025! [2024-09-22 15:55:08,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 6377472. Throughput: 0: 936.1. Samples: 1592304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:55:08,841][00338] Avg episode reward: [(0, '31.580')] [2024-09-22 15:55:11,391][02365] Updated weights for policy 0, policy_version 1560 (0.0017) [2024-09-22 15:55:13,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 6397952. Throughput: 0: 946.0. Samples: 1599008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:55:13,842][00338] Avg episode reward: [(0, '32.826')] [2024-09-22 15:55:13,844][02352] Saving new best policy, reward=32.826! [2024-09-22 15:55:18,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 6418432. Throughput: 0: 987.5. Samples: 1605204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:55:18,843][00338] Avg episode reward: [(0, '33.076')] [2024-09-22 15:55:18,853][02352] Saving new best policy, reward=33.076! [2024-09-22 15:55:22,920][02365] Updated weights for policy 0, policy_version 1570 (0.0041) [2024-09-22 15:55:23,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.4). Total num frames: 6430720. Throughput: 0: 955.4. Samples: 1607136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:55:23,846][00338] Avg episode reward: [(0, '33.056')] [2024-09-22 15:55:28,842][00338] Fps is (10 sec: 3685.5, 60 sec: 3822.8, 300 sec: 3859.9). Total num frames: 6455296. Throughput: 0: 931.9. Samples: 1613074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:55:28,844][00338] Avg episode reward: [(0, '31.093')] [2024-09-22 15:55:32,065][02365] Updated weights for policy 0, policy_version 1580 (0.0025) [2024-09-22 15:55:33,843][00338] Fps is (10 sec: 4504.1, 60 sec: 3891.0, 300 sec: 3846.0). Total num frames: 6475776. Throughput: 0: 986.7. Samples: 1619900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:55:33,851][00338] Avg episode reward: [(0, '29.015')] [2024-09-22 15:55:38,840][00338] Fps is (10 sec: 3687.2, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 6492160. Throughput: 0: 971.0. Samples: 1622068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:55:38,843][00338] Avg episode reward: [(0, '28.312')] [2024-09-22 15:55:43,661][02365] Updated weights for policy 0, policy_version 1590 (0.0024) [2024-09-22 15:55:43,839][00338] Fps is (10 sec: 3687.7, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 6512640. Throughput: 0: 928.0. Samples: 1627010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:55:43,842][00338] Avg episode reward: [(0, '28.615')] [2024-09-22 15:55:48,840][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 6533120. Throughput: 0: 967.9. Samples: 1633964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:55:48,847][00338] Avg episode reward: [(0, '28.279')] [2024-09-22 15:55:53,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 6549504. Throughput: 0: 995.8. Samples: 1637116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:55:53,845][00338] Avg episode reward: [(0, '27.685')] [2024-09-22 15:55:54,026][02365] Updated weights for policy 0, policy_version 1600 (0.0020) [2024-09-22 15:55:58,839][00338] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 6569984. Throughput: 0: 938.4. Samples: 1641234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:55:58,842][00338] Avg episode reward: [(0, '29.294')] [2024-09-22 15:56:03,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 6590464. Throughput: 0: 953.4. Samples: 1648106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:56:03,841][00338] Avg episode reward: [(0, '29.220')] [2024-09-22 15:56:04,102][02365] Updated weights for policy 0, policy_version 1610 (0.0024) [2024-09-22 15:56:08,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 6610944. Throughput: 0: 984.1. Samples: 1651420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:56:08,845][00338] Avg episode reward: [(0, '28.204')] [2024-09-22 15:56:13,847][00338] Fps is (10 sec: 3274.2, 60 sec: 3754.2, 300 sec: 3818.2). Total num frames: 6623232. Throughput: 0: 954.4. Samples: 1656028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:56:13,850][00338] Avg episode reward: [(0, '27.243')] [2024-09-22 15:56:15,993][02365] Updated weights for policy 0, policy_version 1620 (0.0028) [2024-09-22 15:56:18,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 6647808. Throughput: 0: 937.0. Samples: 1662060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:56:18,843][00338] Avg episode reward: [(0, '26.851')] [2024-09-22 15:56:23,839][00338] Fps is (10 sec: 4509.2, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 6668288. Throughput: 0: 964.8. Samples: 1665484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:56:23,842][00338] Avg episode reward: [(0, '26.709')] [2024-09-22 15:56:25,173][02365] Updated weights for policy 0, policy_version 1630 (0.0028) [2024-09-22 15:56:28,840][00338] Fps is (10 sec: 3686.3, 60 sec: 3823.1, 300 sec: 3832.2). Total num frames: 6684672. Throughput: 0: 977.5. Samples: 1670998. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:56:28,844][00338] Avg episode reward: [(0, '26.404')] [2024-09-22 15:56:28,855][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001632_6684672.pth... [2024-09-22 15:56:29,005][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001408_5767168.pth [2024-09-22 15:56:33,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3846.1). Total num frames: 6705152. Throughput: 0: 936.1. Samples: 1676090. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:56:33,844][00338] Avg episode reward: [(0, '25.980')] [2024-09-22 15:56:36,564][02365] Updated weights for policy 0, policy_version 1640 (0.0032) [2024-09-22 15:56:38,839][00338] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 6725632. Throughput: 0: 941.5. Samples: 1679484. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:56:38,842][00338] Avg episode reward: [(0, '27.544')] [2024-09-22 15:56:43,841][00338] Fps is (10 sec: 3685.9, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 6742016. Throughput: 0: 991.5. Samples: 1685852. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:56:43,844][00338] Avg episode reward: [(0, '27.599')] [2024-09-22 15:56:48,387][02365] Updated weights for policy 0, policy_version 1650 (0.0025) [2024-09-22 15:56:48,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 6758400. Throughput: 0: 932.4. Samples: 1690066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:56:48,848][00338] Avg episode reward: [(0, '28.497')] [2024-09-22 15:56:53,839][00338] Fps is (10 sec: 4096.5, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 6782976. Throughput: 0: 936.5. Samples: 1693564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:56:53,847][00338] Avg episode reward: [(0, '29.723')] [2024-09-22 15:56:57,229][02365] Updated weights for policy 0, policy_version 1660 (0.0014) [2024-09-22 15:56:58,847][00338] Fps is (10 sec: 4502.2, 60 sec: 3890.7, 300 sec: 3859.9). Total num frames: 6803456. Throughput: 0: 987.8. Samples: 1700480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:56:58,849][00338] Avg episode reward: [(0, '29.903')] [2024-09-22 15:57:03,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 6815744. Throughput: 0: 954.4. Samples: 1705008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:57:03,842][00338] Avg episode reward: [(0, '31.017')] [2024-09-22 15:57:08,839][00338] Fps is (10 sec: 3279.2, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 6836224. Throughput: 0: 935.1. Samples: 1707564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:57:08,848][00338] Avg episode reward: [(0, '30.339')] [2024-09-22 15:57:09,001][02365] Updated weights for policy 0, policy_version 1670 (0.0038) [2024-09-22 15:57:13,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3960.0, 300 sec: 3860.0). Total num frames: 6860800. Throughput: 0: 965.9. Samples: 1714462. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:57:13,842][00338] Avg episode reward: [(0, '29.405')] [2024-09-22 15:57:18,840][00338] Fps is (10 sec: 4095.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 6877184. Throughput: 0: 972.9. Samples: 1719870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:57:18,842][00338] Avg episode reward: [(0, '28.626')] [2024-09-22 15:57:19,805][02365] Updated weights for policy 0, policy_version 1680 (0.0021) [2024-09-22 15:57:23,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 6893568. Throughput: 0: 943.6. Samples: 1721948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:57:23,842][00338] Avg episode reward: [(0, '26.692')] [2024-09-22 15:57:28,839][00338] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 6918144. Throughput: 0: 949.6. Samples: 1728582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:57:28,842][00338] Avg episode reward: [(0, '27.476')] [2024-09-22 15:57:29,358][02365] Updated weights for policy 0, policy_version 1690 (0.0033) [2024-09-22 15:57:33,841][00338] Fps is (10 sec: 4504.9, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 6938624. Throughput: 0: 1000.2. Samples: 1735076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:57:33,845][00338] Avg episode reward: [(0, '25.619')] [2024-09-22 15:57:38,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 6950912. Throughput: 0: 966.4. Samples: 1737052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:57:38,841][00338] Avg episode reward: [(0, '25.420')] [2024-09-22 15:57:41,123][02365] Updated weights for policy 0, policy_version 1700 (0.0035) [2024-09-22 15:57:43,839][00338] Fps is (10 sec: 3686.9, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 6975488. Throughput: 0: 938.9. Samples: 1742724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:57:43,841][00338] Avg episode reward: [(0, '25.740')] [2024-09-22 15:57:48,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 6995968. Throughput: 0: 994.0. Samples: 1749736. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 15:57:48,841][00338] Avg episode reward: [(0, '26.360')] [2024-09-22 15:57:50,416][02365] Updated weights for policy 0, policy_version 1710 (0.0028) [2024-09-22 15:57:53,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 7012352. Throughput: 0: 992.3. Samples: 1752216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:57:53,842][00338] Avg episode reward: [(0, '28.036')] [2024-09-22 15:57:58,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3823.4, 300 sec: 3846.1). Total num frames: 7032832. Throughput: 0: 943.8. Samples: 1756932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:57:58,847][00338] Avg episode reward: [(0, '27.907')] [2024-09-22 15:58:01,522][02365] Updated weights for policy 0, policy_version 1720 (0.0039) [2024-09-22 15:58:03,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 7053312. Throughput: 0: 979.3. Samples: 1763936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:58:03,846][00338] Avg episode reward: [(0, '28.365')] [2024-09-22 15:58:08,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 7069696. Throughput: 0: 1007.5. Samples: 1767286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:58:08,842][00338] Avg episode reward: [(0, '28.773')] [2024-09-22 15:58:13,320][02365] Updated weights for policy 0, policy_version 1730 (0.0021) [2024-09-22 15:58:13,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 7086080. Throughput: 0: 949.6. Samples: 1771314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:58:13,842][00338] Avg episode reward: [(0, '29.301')] [2024-09-22 15:58:18,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 7110656. Throughput: 0: 951.1. Samples: 1777874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:58:18,842][00338] Avg episode reward: [(0, '28.024')] [2024-09-22 15:58:22,026][02365] Updated weights for policy 0, policy_version 1740 (0.0023) [2024-09-22 15:58:23,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 7131136. Throughput: 0: 985.0. Samples: 1781376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 15:58:23,847][00338] Avg episode reward: [(0, '26.641')] [2024-09-22 15:58:28,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 7147520. Throughput: 0: 971.2. Samples: 1786430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:58:28,849][00338] Avg episode reward: [(0, '25.863')] [2024-09-22 15:58:28,859][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001745_7147520.pth... [2024-09-22 15:58:29,014][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001520_6225920.pth [2024-09-22 15:58:33,763][02365] Updated weights for policy 0, policy_version 1750 (0.0050) [2024-09-22 15:58:33,843][00338] Fps is (10 sec: 3686.2, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 7168000. Throughput: 0: 940.3. Samples: 1792048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:58:33,846][00338] Avg episode reward: [(0, '26.187')] [2024-09-22 15:58:38,839][00338] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 7188480. Throughput: 0: 962.1. Samples: 1795512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:58:38,843][00338] Avg episode reward: [(0, '26.532')] [2024-09-22 15:58:43,839][00338] Fps is (10 sec: 3686.6, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 7204864. Throughput: 0: 986.6. Samples: 1801328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:58:43,843][00338] Avg episode reward: [(0, '26.097')] [2024-09-22 15:58:44,304][02365] Updated weights for policy 0, policy_version 1760 (0.0020) [2024-09-22 15:58:48,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 7221248. Throughput: 0: 935.2. Samples: 1806020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:58:48,842][00338] Avg episode reward: [(0, '26.692')] [2024-09-22 15:58:53,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 7245824. Throughput: 0: 938.9. Samples: 1809538. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:58:53,842][00338] Avg episode reward: [(0, '26.199')] [2024-09-22 15:58:54,200][02365] Updated weights for policy 0, policy_version 1770 (0.0044) [2024-09-22 15:58:58,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 7266304. Throughput: 0: 1001.7. Samples: 1816392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:58:58,842][00338] Avg episode reward: [(0, '27.868')] [2024-09-22 15:59:03,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 7278592. Throughput: 0: 949.2. Samples: 1820590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:59:03,846][00338] Avg episode reward: [(0, '26.893')] [2024-09-22 15:59:05,650][02365] Updated weights for policy 0, policy_version 1780 (0.0035) [2024-09-22 15:59:08,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 7303168. Throughput: 0: 945.0. Samples: 1823902. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:59:08,847][00338] Avg episode reward: [(0, '26.844')] [2024-09-22 15:59:13,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 7323648. Throughput: 0: 982.6. Samples: 1830646. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:59:13,843][00338] Avg episode reward: [(0, '26.670')] [2024-09-22 15:59:15,383][02365] Updated weights for policy 0, policy_version 1790 (0.0037) [2024-09-22 15:59:18,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 7340032. Throughput: 0: 966.9. Samples: 1835558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:59:18,846][00338] Avg episode reward: [(0, '26.069')] [2024-09-22 15:59:23,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 7360512. Throughput: 0: 942.6. Samples: 1837930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:59:23,846][00338] Avg episode reward: [(0, '26.158')] [2024-09-22 15:59:26,456][02365] Updated weights for policy 0, policy_version 1800 (0.0024) [2024-09-22 15:59:28,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 7380992. Throughput: 0: 965.0. Samples: 1844754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 15:59:28,847][00338] Avg episode reward: [(0, '24.958')] [2024-09-22 15:59:33,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 7397376. Throughput: 0: 985.5. Samples: 1850368. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 15:59:33,845][00338] Avg episode reward: [(0, '25.940')] [2024-09-22 15:59:38,019][02365] Updated weights for policy 0, policy_version 1810 (0.0025) [2024-09-22 15:59:38,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 7413760. Throughput: 0: 952.3. Samples: 1852390. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 15:59:38,844][00338] Avg episode reward: [(0, '26.396')] [2024-09-22 15:59:43,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 7438336. Throughput: 0: 941.9. Samples: 1858776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 15:59:43,843][00338] Avg episode reward: [(0, '26.900')] [2024-09-22 15:59:46,917][02365] Updated weights for policy 0, policy_version 1820 (0.0031) [2024-09-22 15:59:48,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 7458816. Throughput: 0: 994.1. Samples: 1865326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:59:48,841][00338] Avg episode reward: [(0, '24.309')] [2024-09-22 15:59:53,841][00338] Fps is (10 sec: 3276.3, 60 sec: 3754.6, 300 sec: 3818.3). Total num frames: 7471104. Throughput: 0: 965.3. Samples: 1867342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:59:53,843][00338] Avg episode reward: [(0, '25.954')] [2024-09-22 15:59:58,598][02365] Updated weights for policy 0, policy_version 1830 (0.0036) [2024-09-22 15:59:58,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 7495680. Throughput: 0: 941.7. Samples: 1873022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 15:59:58,841][00338] Avg episode reward: [(0, '24.781')] [2024-09-22 16:00:03,839][00338] Fps is (10 sec: 4915.9, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 7520256. Throughput: 0: 986.9. Samples: 1879970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 16:00:03,848][00338] Avg episode reward: [(0, '23.748')] [2024-09-22 16:00:08,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 7532544. Throughput: 0: 992.0. Samples: 1882568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:00:08,846][00338] Avg episode reward: [(0, '23.337')] [2024-09-22 16:00:09,330][02365] Updated weights for policy 0, policy_version 1840 (0.0026) [2024-09-22 16:00:13,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 7553024. Throughput: 0: 940.8. Samples: 1887092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:00:13,849][00338] Avg episode reward: [(0, '24.276')] [2024-09-22 16:00:18,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 7573504. Throughput: 0: 971.2. Samples: 1894072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:00:18,848][00338] Avg episode reward: [(0, '26.464')] [2024-09-22 16:00:19,012][02365] Updated weights for policy 0, policy_version 1850 (0.0035) [2024-09-22 16:00:23,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 7593984. Throughput: 0: 1005.9. Samples: 1897656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:00:23,842][00338] Avg episode reward: [(0, '27.119')] [2024-09-22 16:00:28,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 7610368. Throughput: 0: 958.0. Samples: 1901886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:00:28,842][00338] Avg episode reward: [(0, '28.117')] [2024-09-22 16:00:28,855][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001858_7610368.pth... [2024-09-22 16:00:28,972][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001632_6684672.pth [2024-09-22 16:00:30,716][02365] Updated weights for policy 0, policy_version 1860 (0.0022) [2024-09-22 16:00:33,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 7630848. Throughput: 0: 948.9. Samples: 1908026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:00:33,841][00338] Avg episode reward: [(0, '29.046')] [2024-09-22 16:00:38,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 7651328. Throughput: 0: 979.4. Samples: 1911414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:00:38,846][00338] Avg episode reward: [(0, '28.734')] [2024-09-22 16:00:40,911][02365] Updated weights for policy 0, policy_version 1870 (0.0042) [2024-09-22 16:00:43,842][00338] Fps is (10 sec: 3685.5, 60 sec: 3822.8, 300 sec: 3846.0). Total num frames: 7667712. Throughput: 0: 965.2. Samples: 1916456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:00:43,844][00338] Avg episode reward: [(0, '27.756')] [2024-09-22 16:00:48,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 7684096. Throughput: 0: 930.1. Samples: 1921826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:00:48,842][00338] Avg episode reward: [(0, '27.341')] [2024-09-22 16:00:51,528][02365] Updated weights for policy 0, policy_version 1880 (0.0021) [2024-09-22 16:00:53,839][00338] Fps is (10 sec: 4097.0, 60 sec: 3959.6, 300 sec: 3860.0). Total num frames: 7708672. Throughput: 0: 950.7. Samples: 1925350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:00:53,842][00338] Avg episode reward: [(0, '27.338')] [2024-09-22 16:00:58,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 7725056. Throughput: 0: 988.8. Samples: 1931590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:00:58,842][00338] Avg episode reward: [(0, '26.127')] [2024-09-22 16:01:02,986][02365] Updated weights for policy 0, policy_version 1890 (0.0040) [2024-09-22 16:01:03,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 7741440. Throughput: 0: 935.0. Samples: 1936146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:01:03,842][00338] Avg episode reward: [(0, '26.284')] [2024-09-22 16:01:08,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 7766016. Throughput: 0: 933.5. Samples: 1939662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:01:08,849][00338] Avg episode reward: [(0, '27.875')] [2024-09-22 16:01:12,077][02365] Updated weights for policy 0, policy_version 1900 (0.0020) [2024-09-22 16:01:13,843][00338] Fps is (10 sec: 4503.9, 60 sec: 3891.0, 300 sec: 3859.9). Total num frames: 7786496. Throughput: 0: 987.9. Samples: 1946346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:01:13,847][00338] Avg episode reward: [(0, '29.957')] [2024-09-22 16:01:18,841][00338] Fps is (10 sec: 3276.3, 60 sec: 3754.6, 300 sec: 3832.2). Total num frames: 7798784. Throughput: 0: 940.9. Samples: 1950370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:01:18,843][00338] Avg episode reward: [(0, '29.873')] [2024-09-22 16:01:23,840][00338] Fps is (10 sec: 3277.9, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 7819264. Throughput: 0: 930.1. Samples: 1953270. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 16:01:23,845][00338] Avg episode reward: [(0, '31.117')] [2024-09-22 16:01:24,101][02365] Updated weights for policy 0, policy_version 1910 (0.0034) [2024-09-22 16:01:28,839][00338] Fps is (10 sec: 4506.2, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 7843840. Throughput: 0: 968.5. Samples: 1960036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:01:28,848][00338] Avg episode reward: [(0, '30.706')] [2024-09-22 16:01:33,839][00338] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 7860224. Throughput: 0: 963.3. Samples: 1965174. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 16:01:33,844][00338] Avg episode reward: [(0, '32.814')] [2024-09-22 16:01:35,411][02365] Updated weights for policy 0, policy_version 1920 (0.0031) [2024-09-22 16:01:38,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 7876608. Throughput: 0: 928.9. Samples: 1967152. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 16:01:38,842][00338] Avg episode reward: [(0, '32.829')] [2024-09-22 16:01:43,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3860.0). Total num frames: 7897088. Throughput: 0: 938.3. Samples: 1973812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 16:01:43,842][00338] Avg episode reward: [(0, '32.362')] [2024-09-22 16:01:44,924][02365] Updated weights for policy 0, policy_version 1930 (0.0022) [2024-09-22 16:01:48,840][00338] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 7917568. Throughput: 0: 973.1. Samples: 1979936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-22 16:01:48,842][00338] Avg episode reward: [(0, '30.926')] [2024-09-22 16:01:53,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.3). Total num frames: 7933952. Throughput: 0: 938.8. Samples: 1981906. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 16:01:53,842][00338] Avg episode reward: [(0, '31.178')] [2024-09-22 16:01:56,430][02365] Updated weights for policy 0, policy_version 1940 (0.0033) [2024-09-22 16:01:58,839][00338] Fps is (10 sec: 3686.6, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 7954432. Throughput: 0: 924.6. Samples: 1987950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:01:58,842][00338] Avg episode reward: [(0, '31.557')] [2024-09-22 16:02:03,843][00338] Fps is (10 sec: 4094.6, 60 sec: 3891.0, 300 sec: 3859.9). Total num frames: 7974912. Throughput: 0: 979.4. Samples: 1994444. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 16:02:03,847][00338] Avg episode reward: [(0, '33.452')] [2024-09-22 16:02:03,852][02352] Saving new best policy, reward=33.452! [2024-09-22 16:02:07,560][02365] Updated weights for policy 0, policy_version 1950 (0.0021) [2024-09-22 16:02:08,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 7987200. Throughput: 0: 956.4. Samples: 1996308. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 16:02:08,847][00338] Avg episode reward: [(0, '32.377')] [2024-09-22 16:02:13,840][00338] Fps is (10 sec: 3277.8, 60 sec: 3686.6, 300 sec: 3832.2). Total num frames: 8007680. Throughput: 0: 917.0. Samples: 2001300. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 16:02:13,842][00338] Avg episode reward: [(0, '31.473')] [2024-09-22 16:02:17,914][02365] Updated weights for policy 0, policy_version 1960 (0.0031) [2024-09-22 16:02:18,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 8032256. Throughput: 0: 952.4. Samples: 2008032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:02:18,843][00338] Avg episode reward: [(0, '31.516')] [2024-09-22 16:02:23,839][00338] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 8044544. Throughput: 0: 972.7. Samples: 2010924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 16:02:23,844][00338] Avg episode reward: [(0, '32.256')] [2024-09-22 16:02:28,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 8065024. Throughput: 0: 924.5. Samples: 2015414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 16:02:28,844][00338] Avg episode reward: [(0, '31.379')] [2024-09-22 16:02:28,853][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001969_8065024.pth... [2024-09-22 16:02:28,979][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001745_7147520.pth [2024-09-22 16:02:29,428][02365] Updated weights for policy 0, policy_version 1970 (0.0027) [2024-09-22 16:02:33,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 8089600. Throughput: 0: 940.5. Samples: 2022256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:02:33,844][00338] Avg episode reward: [(0, '29.941')] [2024-09-22 16:02:38,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 8105984. Throughput: 0: 972.6. Samples: 2025672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:02:38,847][00338] Avg episode reward: [(0, '29.904')] [2024-09-22 16:02:39,179][02365] Updated weights for policy 0, policy_version 1980 (0.0028) [2024-09-22 16:02:43,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 8122368. Throughput: 0: 936.2. Samples: 2030078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:02:43,845][00338] Avg episode reward: [(0, '30.342')] [2024-09-22 16:02:48,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 8142848. Throughput: 0: 930.4. Samples: 2036310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:02:48,848][00338] Avg episode reward: [(0, '29.656')] [2024-09-22 16:02:50,024][02365] Updated weights for policy 0, policy_version 1990 (0.0025) [2024-09-22 16:02:53,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 8167424. Throughput: 0: 966.8. Samples: 2039814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:02:53,848][00338] Avg episode reward: [(0, '28.966')] [2024-09-22 16:02:58,840][00338] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3818.3). Total num frames: 8179712. Throughput: 0: 975.6. Samples: 2045202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:02:58,848][00338] Avg episode reward: [(0, '29.710')] [2024-09-22 16:03:01,537][02365] Updated weights for policy 0, policy_version 2000 (0.0026) [2024-09-22 16:03:03,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3832.2). Total num frames: 8200192. Throughput: 0: 947.2. Samples: 2050654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:03:03,842][00338] Avg episode reward: [(0, '28.883')] [2024-09-22 16:03:08,839][00338] Fps is (10 sec: 4506.1, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 8224768. Throughput: 0: 959.8. Samples: 2054116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:03:08,842][00338] Avg episode reward: [(0, '27.991')] [2024-09-22 16:03:10,311][02365] Updated weights for policy 0, policy_version 2010 (0.0032) [2024-09-22 16:03:13,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 8241152. Throughput: 0: 995.7. Samples: 2060222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:03:13,842][00338] Avg episode reward: [(0, '26.356')] [2024-09-22 16:03:18,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 8257536. Throughput: 0: 944.9. Samples: 2064778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:03:18,842][00338] Avg episode reward: [(0, '28.131')] [2024-09-22 16:03:21,931][02365] Updated weights for policy 0, policy_version 2020 (0.0044) [2024-09-22 16:03:23,840][00338] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3846.1). Total num frames: 8282112. Throughput: 0: 946.6. Samples: 2068268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 16:03:23,841][00338] Avg episode reward: [(0, '24.775')] [2024-09-22 16:03:28,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 8302592. Throughput: 0: 1005.2. Samples: 2075314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 16:03:28,846][00338] Avg episode reward: [(0, '24.982')] [2024-09-22 16:03:32,489][02365] Updated weights for policy 0, policy_version 2030 (0.0013) [2024-09-22 16:03:33,839][00338] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 8314880. Throughput: 0: 964.2. Samples: 2079698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:03:33,844][00338] Avg episode reward: [(0, '25.855')] [2024-09-22 16:03:38,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 8339456. Throughput: 0: 950.8. Samples: 2082600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 16:03:38,850][00338] Avg episode reward: [(0, '26.736')] [2024-09-22 16:03:42,296][02365] Updated weights for policy 0, policy_version 2040 (0.0021) [2024-09-22 16:03:43,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 8359936. Throughput: 0: 985.4. Samples: 2089546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 16:03:43,842][00338] Avg episode reward: [(0, '26.007')] [2024-09-22 16:03:48,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 8376320. Throughput: 0: 978.4. Samples: 2094680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-22 16:03:48,842][00338] Avg episode reward: [(0, '26.375')] [2024-09-22 16:03:53,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 8392704. Throughput: 0: 948.8. Samples: 2096814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 16:03:53,842][00338] Avg episode reward: [(0, '26.601')] [2024-09-22 16:03:53,862][02365] Updated weights for policy 0, policy_version 2050 (0.0022) [2024-09-22 16:03:58,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 8417280. Throughput: 0: 966.2. Samples: 2103702. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-22 16:03:58,842][00338] Avg episode reward: [(0, '27.059')] [2024-09-22 16:04:03,396][02365] Updated weights for policy 0, policy_version 2060 (0.0040) [2024-09-22 16:04:03,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 8437760. Throughput: 0: 1002.0. Samples: 2109866. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-22 16:04:03,842][00338] Avg episode reward: [(0, '26.839')] [2024-09-22 16:04:08,840][00338] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 8454144. Throughput: 0: 970.0. Samples: 2111916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:04:08,846][00338] Avg episode reward: [(0, '28.296')] [2024-09-22 16:04:13,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 8474624. Throughput: 0: 944.4. Samples: 2117810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:04:13,843][00338] Avg episode reward: [(0, '29.946')] [2024-09-22 16:04:14,383][02365] Updated weights for policy 0, policy_version 2070 (0.0017) [2024-09-22 16:04:18,841][00338] Fps is (10 sec: 4505.2, 60 sec: 4027.6, 300 sec: 3859.9). Total num frames: 8499200. Throughput: 0: 1003.0. Samples: 2124836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:04:18,850][00338] Avg episode reward: [(0, '30.634')] [2024-09-22 16:04:23,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 8511488. Throughput: 0: 985.2. Samples: 2126934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 16:04:23,842][00338] Avg episode reward: [(0, '31.482')] [2024-09-22 16:04:25,912][02365] Updated weights for policy 0, policy_version 2080 (0.0035) [2024-09-22 16:04:28,839][00338] Fps is (10 sec: 3277.3, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 8531968. Throughput: 0: 946.5. Samples: 2132138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:04:28,842][00338] Avg episode reward: [(0, '31.025')] [2024-09-22 16:04:28,858][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002083_8531968.pth... [2024-09-22 16:04:29,005][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001858_7610368.pth [2024-09-22 16:04:33,840][00338] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3860.0). Total num frames: 8552448. Throughput: 0: 985.5. Samples: 2139030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:04:33,842][00338] Avg episode reward: [(0, '29.630')] [2024-09-22 16:04:34,840][02365] Updated weights for policy 0, policy_version 2090 (0.0028) [2024-09-22 16:04:38,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 8568832. Throughput: 0: 1002.4. Samples: 2141924. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-22 16:04:38,842][00338] Avg episode reward: [(0, '27.281')] [2024-09-22 16:04:43,839][00338] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 8585216. Throughput: 0: 939.2. Samples: 2145966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 16:04:43,841][00338] Avg episode reward: [(0, '25.791')] [2024-09-22 16:04:46,755][02365] Updated weights for policy 0, policy_version 2100 (0.0040) [2024-09-22 16:04:48,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 8609792. Throughput: 0: 953.7. Samples: 2152782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:04:48,848][00338] Avg episode reward: [(0, '25.506')] [2024-09-22 16:04:53,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 8630272. Throughput: 0: 983.5. Samples: 2156172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:04:53,846][00338] Avg episode reward: [(0, '25.359')] [2024-09-22 16:04:57,980][02365] Updated weights for policy 0, policy_version 2110 (0.0019) [2024-09-22 16:04:58,843][00338] Fps is (10 sec: 3275.6, 60 sec: 3754.4, 300 sec: 3804.4). Total num frames: 8642560. Throughput: 0: 953.3. Samples: 2160714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:04:58,845][00338] Avg episode reward: [(0, '25.176')] [2024-09-22 16:05:03,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 8663040. Throughput: 0: 929.1. Samples: 2166644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:05:03,848][00338] Avg episode reward: [(0, '24.972')] [2024-09-22 16:05:07,348][02365] Updated weights for policy 0, policy_version 2120 (0.0013) [2024-09-22 16:05:08,839][00338] Fps is (10 sec: 4507.2, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 8687616. Throughput: 0: 960.3. Samples: 2170146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 16:05:08,846][00338] Avg episode reward: [(0, '26.652')] [2024-09-22 16:05:13,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 8704000. Throughput: 0: 962.7. Samples: 2175460. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 16:05:13,844][00338] Avg episode reward: [(0, '27.498')] [2024-09-22 16:05:18,840][00338] Fps is (10 sec: 3276.7, 60 sec: 3686.5, 300 sec: 3818.3). Total num frames: 8720384. Throughput: 0: 919.7. Samples: 2180416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:05:18,841][00338] Avg episode reward: [(0, '27.897')] [2024-09-22 16:05:19,389][02365] Updated weights for policy 0, policy_version 2130 (0.0022) [2024-09-22 16:05:23,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 8740864. Throughput: 0: 928.8. Samples: 2183720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:05:23,841][00338] Avg episode reward: [(0, '27.871')] [2024-09-22 16:05:28,843][00338] Fps is (10 sec: 4094.4, 60 sec: 3822.7, 300 sec: 3832.1). Total num frames: 8761344. Throughput: 0: 976.3. Samples: 2189904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:05:28,846][00338] Avg episode reward: [(0, '30.535')] [2024-09-22 16:05:30,287][02365] Updated weights for policy 0, policy_version 2140 (0.0022) [2024-09-22 16:05:33,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 8773632. Throughput: 0: 916.9. Samples: 2194042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 16:05:33,849][00338] Avg episode reward: [(0, '31.388')] [2024-09-22 16:05:38,839][00338] Fps is (10 sec: 3687.9, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 8798208. Throughput: 0: 916.9. Samples: 2197432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 16:05:38,844][00338] Avg episode reward: [(0, '30.914')] [2024-09-22 16:05:40,492][02365] Updated weights for policy 0, policy_version 2150 (0.0027) [2024-09-22 16:05:43,839][00338] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 8818688. Throughput: 0: 961.0. Samples: 2203954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:05:43,849][00338] Avg episode reward: [(0, '29.576')] [2024-09-22 16:05:48,839][00338] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 8830976. Throughput: 0: 924.8. Samples: 2208262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:05:48,842][00338] Avg episode reward: [(0, '29.672')] [2024-09-22 16:05:52,737][02365] Updated weights for policy 0, policy_version 2160 (0.0047) [2024-09-22 16:05:53,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 8851456. Throughput: 0: 904.8. Samples: 2210860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:05:53,846][00338] Avg episode reward: [(0, '29.520')] [2024-09-22 16:05:58,839][00338] Fps is (10 sec: 4096.1, 60 sec: 3823.2, 300 sec: 3832.2). Total num frames: 8871936. Throughput: 0: 932.7. Samples: 2217430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:05:58,842][00338] Avg episode reward: [(0, '29.407')] [2024-09-22 16:06:03,385][02365] Updated weights for policy 0, policy_version 2170 (0.0025) [2024-09-22 16:06:03,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 8888320. Throughput: 0: 936.4. Samples: 2222556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:06:03,847][00338] Avg episode reward: [(0, '28.083')] [2024-09-22 16:06:08,840][00338] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3790.6). Total num frames: 8904704. Throughput: 0: 908.6. Samples: 2224606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 16:06:08,848][00338] Avg episode reward: [(0, '28.735')] [2024-09-22 16:06:13,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 8925184. Throughput: 0: 912.3. Samples: 2230954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:06:13,841][00338] Avg episode reward: [(0, '30.509')] [2024-09-22 16:06:14,081][02365] Updated weights for policy 0, policy_version 2180 (0.0021) [2024-09-22 16:06:18,839][00338] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 8945664. Throughput: 0: 948.4. Samples: 2236718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:06:18,848][00338] Avg episode reward: [(0, '31.683')] [2024-09-22 16:06:23,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 8957952. Throughput: 0: 917.2. Samples: 2238708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:06:23,842][00338] Avg episode reward: [(0, '31.514')] [2024-09-22 16:06:26,197][02365] Updated weights for policy 0, policy_version 2190 (0.0024) [2024-09-22 16:06:28,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3790.5). Total num frames: 8978432. Throughput: 0: 898.8. Samples: 2244398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 16:06:28,843][00338] Avg episode reward: [(0, '31.397')] [2024-09-22 16:06:28,856][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002193_8982528.pth... [2024-09-22 16:06:28,981][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001969_8065024.pth [2024-09-22 16:06:33,841][00338] Fps is (10 sec: 4505.0, 60 sec: 3822.8, 300 sec: 3818.3). Total num frames: 9003008. Throughput: 0: 952.2. Samples: 2251110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:06:33,845][00338] Avg episode reward: [(0, '30.621')] [2024-09-22 16:06:36,287][02365] Updated weights for policy 0, policy_version 2200 (0.0031) [2024-09-22 16:06:38,840][00338] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 9015296. Throughput: 0: 944.1. Samples: 2253344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 16:06:38,847][00338] Avg episode reward: [(0, '31.297')] [2024-09-22 16:06:43,839][00338] Fps is (10 sec: 3277.3, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 9035776. Throughput: 0: 901.3. Samples: 2257990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:06:43,842][00338] Avg episode reward: [(0, '31.713')] [2024-09-22 16:06:47,439][02365] Updated weights for policy 0, policy_version 2210 (0.0020) [2024-09-22 16:06:48,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 9056256. Throughput: 0: 932.0. Samples: 2264496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 16:06:48,847][00338] Avg episode reward: [(0, '31.115')] [2024-09-22 16:06:53,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 9072640. Throughput: 0: 955.9. Samples: 2267622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:06:53,844][00338] Avg episode reward: [(0, '31.876')] [2024-09-22 16:06:58,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 9089024. Throughput: 0: 903.8. Samples: 2271624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:06:58,842][00338] Avg episode reward: [(0, '31.079')] [2024-09-22 16:06:59,598][02365] Updated weights for policy 0, policy_version 2220 (0.0033) [2024-09-22 16:07:03,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 9109504. Throughput: 0: 916.0. Samples: 2277940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:07:03,847][00338] Avg episode reward: [(0, '29.828')] [2024-09-22 16:07:08,841][00338] Fps is (10 sec: 4095.3, 60 sec: 3754.6, 300 sec: 3804.4). Total num frames: 9129984. Throughput: 0: 942.8. Samples: 2281134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:07:08,844][00338] Avg episode reward: [(0, '28.222')] [2024-09-22 16:07:09,524][02365] Updated weights for policy 0, policy_version 2230 (0.0021) [2024-09-22 16:07:13,846][00338] Fps is (10 sec: 3274.7, 60 sec: 3617.7, 300 sec: 3762.7). Total num frames: 9142272. Throughput: 0: 920.0. Samples: 2285806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:07:13,861][00338] Avg episode reward: [(0, '27.128')] [2024-09-22 16:07:18,839][00338] Fps is (10 sec: 3277.4, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 9162752. Throughput: 0: 886.2. Samples: 2290986. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:07:18,847][00338] Avg episode reward: [(0, '25.770')] [2024-09-22 16:07:21,320][02365] Updated weights for policy 0, policy_version 2240 (0.0021) [2024-09-22 16:07:23,839][00338] Fps is (10 sec: 4098.6, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 9183232. Throughput: 0: 907.7. Samples: 2294192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:07:23,842][00338] Avg episode reward: [(0, '26.569')] [2024-09-22 16:07:28,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 9199616. Throughput: 0: 928.7. Samples: 2299780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:07:28,842][00338] Avg episode reward: [(0, '27.256')] [2024-09-22 16:07:33,670][02365] Updated weights for policy 0, policy_version 2250 (0.0038) [2024-09-22 16:07:33,840][00338] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 9216000. Throughput: 0: 877.8. Samples: 2303996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 16:07:33,848][00338] Avg episode reward: [(0, '26.881')] [2024-09-22 16:07:38,840][00338] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3776.6). Total num frames: 9236480. Throughput: 0: 879.3. Samples: 2307192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:07:38,848][00338] Avg episode reward: [(0, '28.506')] [2024-09-22 16:07:43,839][00338] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 9252864. Throughput: 0: 936.6. Samples: 2313772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:07:43,843][00338] Avg episode reward: [(0, '28.727')] [2024-09-22 16:07:43,855][02365] Updated weights for policy 0, policy_version 2260 (0.0020) [2024-09-22 16:07:48,839][00338] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3735.0). Total num frames: 9269248. Throughput: 0: 881.6. Samples: 2317612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:07:48,843][00338] Avg episode reward: [(0, '29.467')] [2024-09-22 16:07:53,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3748.9). Total num frames: 9285632. Throughput: 0: 868.5. Samples: 2320214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 16:07:53,842][00338] Avg episode reward: [(0, '29.190')] [2024-09-22 16:07:55,712][02365] Updated weights for policy 0, policy_version 2270 (0.0027) [2024-09-22 16:07:58,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 9310208. Throughput: 0: 913.0. Samples: 2326884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:07:58,847][00338] Avg episode reward: [(0, '28.779')] [2024-09-22 16:08:03,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 9326592. Throughput: 0: 911.6. Samples: 2332008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:08:03,845][00338] Avg episode reward: [(0, '27.635')] [2024-09-22 16:08:07,591][02365] Updated weights for policy 0, policy_version 2280 (0.0017) [2024-09-22 16:08:08,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3735.0). Total num frames: 9342976. Throughput: 0: 884.4. Samples: 2333992. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 16:08:08,847][00338] Avg episode reward: [(0, '28.898')] [2024-09-22 16:08:13,839][00338] Fps is (10 sec: 3686.3, 60 sec: 3686.8, 300 sec: 3748.9). Total num frames: 9363456. Throughput: 0: 904.8. Samples: 2340498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:08:13,842][00338] Avg episode reward: [(0, '28.098')] [2024-09-22 16:08:16,761][02365] Updated weights for policy 0, policy_version 2290 (0.0048) [2024-09-22 16:08:18,840][00338] Fps is (10 sec: 4095.7, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 9383936. Throughput: 0: 946.6. Samples: 2346594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:08:18,846][00338] Avg episode reward: [(0, '28.231')] [2024-09-22 16:08:23,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 9396224. Throughput: 0: 920.4. Samples: 2348608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 16:08:23,843][00338] Avg episode reward: [(0, '27.925')] [2024-09-22 16:08:28,571][02365] Updated weights for policy 0, policy_version 2300 (0.0030) [2024-09-22 16:08:28,840][00338] Fps is (10 sec: 3686.6, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 9420800. Throughput: 0: 899.7. Samples: 2354258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:08:28,843][00338] Avg episode reward: [(0, '28.819')] [2024-09-22 16:08:28,851][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002300_9420800.pth... [2024-09-22 16:08:28,967][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002083_8531968.pth [2024-09-22 16:08:33,840][00338] Fps is (10 sec: 4505.3, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 9441280. Throughput: 0: 967.0. Samples: 2361128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:08:33,842][00338] Avg episode reward: [(0, '30.432')] [2024-09-22 16:08:38,840][00338] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 9457664. Throughput: 0: 959.9. Samples: 2363410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:08:38,841][00338] Avg episode reward: [(0, '30.146')] [2024-09-22 16:08:40,193][02365] Updated weights for policy 0, policy_version 2310 (0.0043) [2024-09-22 16:08:43,839][00338] Fps is (10 sec: 3277.0, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 9474048. Throughput: 0: 913.4. Samples: 2367986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:08:43,843][00338] Avg episode reward: [(0, '29.738')] [2024-09-22 16:08:48,839][00338] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 9494528. Throughput: 0: 939.6. Samples: 2374290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:08:48,842][00338] Avg episode reward: [(0, '28.896')] [2024-09-22 16:08:50,113][02365] Updated weights for policy 0, policy_version 2320 (0.0018) [2024-09-22 16:08:53,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 9510912. Throughput: 0: 963.8. Samples: 2377364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:08:53,843][00338] Avg episode reward: [(0, '29.110')] [2024-09-22 16:08:58,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 9527296. Throughput: 0: 906.8. Samples: 2381306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:08:58,842][00338] Avg episode reward: [(0, '28.651')] [2024-09-22 16:09:02,405][02365] Updated weights for policy 0, policy_version 2330 (0.0022) [2024-09-22 16:09:03,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 9547776. Throughput: 0: 906.5. Samples: 2387388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:09:03,842][00338] Avg episode reward: [(0, '28.253')] [2024-09-22 16:09:08,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 9568256. Throughput: 0: 932.6. Samples: 2390574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:09:08,842][00338] Avg episode reward: [(0, '29.159')] [2024-09-22 16:09:13,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 9580544. Throughput: 0: 909.6. Samples: 2395192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 16:09:13,842][00338] Avg episode reward: [(0, '29.965')] [2024-09-22 16:09:14,846][02365] Updated weights for policy 0, policy_version 2340 (0.0013) [2024-09-22 16:09:18,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3693.3). Total num frames: 9601024. Throughput: 0: 869.6. Samples: 2400258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:09:18,841][00338] Avg episode reward: [(0, '29.912')] [2024-09-22 16:09:23,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 9621504. Throughput: 0: 892.4. Samples: 2403566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:09:23,846][00338] Avg episode reward: [(0, '29.932')] [2024-09-22 16:09:24,308][02365] Updated weights for policy 0, policy_version 2350 (0.0018) [2024-09-22 16:09:28,839][00338] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 9637888. Throughput: 0: 915.9. Samples: 2409202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:09:28,847][00338] Avg episode reward: [(0, '30.072')] [2024-09-22 16:09:33,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 9654272. Throughput: 0: 876.9. Samples: 2413750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:09:33,841][00338] Avg episode reward: [(0, '29.140')] [2024-09-22 16:09:36,416][02365] Updated weights for policy 0, policy_version 2360 (0.0023) [2024-09-22 16:09:38,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3693.3). Total num frames: 9674752. Throughput: 0: 882.6. Samples: 2417082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:09:38,842][00338] Avg episode reward: [(0, '29.453')] [2024-09-22 16:09:43,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 9695232. Throughput: 0: 942.7. Samples: 2423726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:09:43,846][00338] Avg episode reward: [(0, '29.201')] [2024-09-22 16:09:47,760][02365] Updated weights for policy 0, policy_version 2370 (0.0024) [2024-09-22 16:09:48,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 9707520. Throughput: 0: 896.6. Samples: 2427736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:09:48,847][00338] Avg episode reward: [(0, '28.028')] [2024-09-22 16:09:53,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 9728000. Throughput: 0: 887.6. Samples: 2430514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:09:53,846][00338] Avg episode reward: [(0, '28.608')] [2024-09-22 16:09:57,547][02365] Updated weights for policy 0, policy_version 2380 (0.0013) [2024-09-22 16:09:58,840][00338] Fps is (10 sec: 4505.5, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 9752576. Throughput: 0: 937.1. Samples: 2437360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:09:58,842][00338] Avg episode reward: [(0, '29.160')] [2024-09-22 16:10:03,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 9768960. Throughput: 0: 937.6. Samples: 2442450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:10:03,846][00338] Avg episode reward: [(0, '29.427')] [2024-09-22 16:10:08,839][00338] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 9785344. Throughput: 0: 910.3. Samples: 2444530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:10:08,845][00338] Avg episode reward: [(0, '29.770')] [2024-09-22 16:10:09,350][02365] Updated weights for policy 0, policy_version 2390 (0.0015) [2024-09-22 16:10:13,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 9805824. Throughput: 0: 934.4. Samples: 2451252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:10:13,847][00338] Avg episode reward: [(0, '29.038')] [2024-09-22 16:10:18,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 9826304. Throughput: 0: 962.8. Samples: 2457078. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-22 16:10:18,842][00338] Avg episode reward: [(0, '29.128')] [2024-09-22 16:10:20,068][02365] Updated weights for policy 0, policy_version 2400 (0.0036) [2024-09-22 16:10:23,840][00338] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 9838592. Throughput: 0: 932.2. Samples: 2459032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-22 16:10:23,846][00338] Avg episode reward: [(0, '29.717')] [2024-09-22 16:10:28,839][00338] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 9863168. Throughput: 0: 908.7. Samples: 2464616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 16:10:28,842][00338] Avg episode reward: [(0, '30.508')] [2024-09-22 16:10:28,855][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002408_9863168.pth... [2024-09-22 16:10:29,010][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002193_8982528.pth [2024-09-22 16:10:30,745][02365] Updated weights for policy 0, policy_version 2410 (0.0034) [2024-09-22 16:10:33,842][00338] Fps is (10 sec: 4504.7, 60 sec: 3822.8, 300 sec: 3679.4). Total num frames: 9883648. Throughput: 0: 963.3. Samples: 2471086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:10:33,844][00338] Avg episode reward: [(0, '30.941')] [2024-09-22 16:10:38,839][00338] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 9895936. Throughput: 0: 946.9. Samples: 2473124. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-22 16:10:38,842][00338] Avg episode reward: [(0, '31.595')] [2024-09-22 16:10:42,852][02365] Updated weights for policy 0, policy_version 2420 (0.0032) [2024-09-22 16:10:43,840][00338] Fps is (10 sec: 2867.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 9912320. Throughput: 0: 898.5. Samples: 2477792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-22 16:10:43,847][00338] Avg episode reward: [(0, '31.141')] [2024-09-22 16:10:48,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 9936896. Throughput: 0: 927.9. Samples: 2484204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-22 16:10:48,843][00338] Avg episode reward: [(0, '31.017')] [2024-09-22 16:10:53,839][00338] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 9949184. Throughput: 0: 944.1. Samples: 2487016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:10:53,842][00338] Avg episode reward: [(0, '31.196')] [2024-09-22 16:10:53,964][02365] Updated weights for policy 0, policy_version 2430 (0.0034) [2024-09-22 16:10:58,839][00338] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 9965568. Throughput: 0: 881.8. Samples: 2490934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-22 16:10:58,848][00338] Avg episode reward: [(0, '31.272')] [2024-09-22 16:11:03,839][00338] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 9990144. Throughput: 0: 898.2. Samples: 2497498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-22 16:11:03,844][00338] Avg episode reward: [(0, '30.257')] [2024-09-22 16:11:04,697][02365] Updated weights for policy 0, policy_version 2440 (0.0024) [2024-09-22 16:11:07,458][02352] Stopping Batcher_0... [2024-09-22 16:11:07,459][02352] Loop batcher_evt_loop terminating... [2024-09-22 16:11:07,460][00338] Component Batcher_0 stopped! [2024-09-22 16:11:07,465][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-09-22 16:11:07,535][02365] Weights refcount: 2 0 [2024-09-22 16:11:07,537][02365] Stopping InferenceWorker_p0-w0... [2024-09-22 16:11:07,538][02365] Loop inference_proc0-0_evt_loop terminating... [2024-09-22 16:11:07,538][00338] Component InferenceWorker_p0-w0 stopped! [2024-09-22 16:11:07,682][02352] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002300_9420800.pth [2024-09-22 16:11:07,709][02352] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-09-22 16:11:07,952][02352] Stopping LearnerWorker_p0... [2024-09-22 16:11:07,955][02352] Loop learner_proc0_evt_loop terminating... [2024-09-22 16:11:07,954][00338] Component LearnerWorker_p0 stopped! [2024-09-22 16:11:08,198][02373] Stopping RolloutWorker_w7... [2024-09-22 16:11:08,198][02373] Loop rollout_proc7_evt_loop terminating... [2024-09-22 16:11:08,193][00338] Component RolloutWorker_w7 stopped! [2024-09-22 16:11:08,220][02367] Stopping RolloutWorker_w1... [2024-09-22 16:11:08,220][00338] Component RolloutWorker_w1 stopped! [2024-09-22 16:11:08,221][02367] Loop rollout_proc1_evt_loop terminating... [2024-09-22 16:11:08,234][02370] Stopping RolloutWorker_w5... [2024-09-22 16:11:08,243][00338] Component RolloutWorker_w5 stopped! [2024-09-22 16:11:08,252][02369] Stopping RolloutWorker_w3... [2024-09-22 16:11:08,252][00338] Component RolloutWorker_w3 stopped! [2024-09-22 16:11:08,235][02370] Loop rollout_proc5_evt_loop terminating... [2024-09-22 16:11:08,253][02369] Loop rollout_proc3_evt_loop terminating... [2024-09-22 16:11:08,349][00338] Component RolloutWorker_w4 stopped! [2024-09-22 16:11:08,352][02371] Stopping RolloutWorker_w4... [2024-09-22 16:11:08,352][02371] Loop rollout_proc4_evt_loop terminating... [2024-09-22 16:11:08,433][02372] Stopping RolloutWorker_w6... [2024-09-22 16:11:08,434][02372] Loop rollout_proc6_evt_loop terminating... [2024-09-22 16:11:08,433][00338] Component RolloutWorker_w6 stopped! [2024-09-22 16:11:08,445][02368] Stopping RolloutWorker_w2... [2024-09-22 16:11:08,445][00338] Component RolloutWorker_w2 stopped! [2024-09-22 16:11:08,450][02368] Loop rollout_proc2_evt_loop terminating... [2024-09-22 16:11:08,468][00338] Component RolloutWorker_w0 stopped! [2024-09-22 16:11:08,471][00338] Waiting for process learner_proc0 to stop... [2024-09-22 16:11:08,468][02366] Stopping RolloutWorker_w0... [2024-09-22 16:11:08,476][02366] Loop rollout_proc0_evt_loop terminating... [2024-09-22 16:11:10,013][00338] Waiting for process inference_proc0-0 to join... [2024-09-22 16:11:10,276][00338] Waiting for process rollout_proc0 to join... [2024-09-22 16:11:12,827][00338] Waiting for process rollout_proc1 to join... [2024-09-22 16:11:12,831][00338] Waiting for process rollout_proc2 to join... [2024-09-22 16:11:12,834][00338] Waiting for process rollout_proc3 to join... [2024-09-22 16:11:12,839][00338] Waiting for process rollout_proc4 to join... [2024-09-22 16:11:12,843][00338] Waiting for process rollout_proc5 to join... [2024-09-22 16:11:12,847][00338] Waiting for process rollout_proc6 to join... [2024-09-22 16:11:12,851][00338] Waiting for process rollout_proc7 to join... [2024-09-22 16:11:12,855][00338] Batcher 0 profile tree view: batching: 66.6346, releasing_batches: 0.0745 [2024-09-22 16:11:12,857][00338] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 1016.4153 update_model: 22.9892 weight_update: 0.0029 one_step: 0.0116 handle_policy_step: 1505.1943 deserialize: 36.7217, stack: 7.8323, obs_to_device_normalize: 306.3891, forward: 803.5875, send_messages: 72.8581 prepare_outputs: 204.0244 to_cpu: 117.1595 [2024-09-22 16:11:12,859][00338] Learner 0 profile tree view: misc: 0.0127, prepare_batch: 29.6141 train: 178.8906 epoch_init: 0.0190, minibatch_init: 0.0305, losses_postprocess: 1.6527, kl_divergence: 1.5693, after_optimizer: 85.3035 calculate_losses: 61.8876 losses_init: 0.0208, forward_head: 2.7074, bptt_initial: 41.4459, tail: 2.5972, advantages_returns: 0.7053, losses: 8.8913 bptt: 4.7191 bptt_forward_core: 4.4880 update: 26.7608 clip: 2.0785 [2024-09-22 16:11:12,861][00338] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.8633, enqueue_policy_requests: 245.6178, env_step: 2076.8457, overhead: 34.0852, complete_rollouts: 17.8207 save_policy_outputs: 53.1403 split_output_tensors: 21.0932 [2024-09-22 16:11:12,862][00338] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.8021, enqueue_policy_requests: 249.0436, env_step: 2074.5814, overhead: 34.2225, complete_rollouts: 17.7478 save_policy_outputs: 52.5804 split_output_tensors: 20.9781 [2024-09-22 16:11:12,864][00338] Loop Runner_EvtLoop terminating... [2024-09-22 16:11:12,865][00338] Runner profile tree view: main_loop: 2677.2555 [2024-09-22 16:11:12,866][00338] Collected {0: 10006528}, FPS: 3737.6 [2024-09-22 16:11:21,064][00338] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-22 16:11:21,066][00338] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-22 16:11:21,069][00338] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-22 16:11:21,071][00338] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-22 16:11:21,073][00338] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-22 16:11:21,075][00338] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-22 16:11:21,076][00338] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-09-22 16:11:21,078][00338] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-22 16:11:21,080][00338] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-09-22 16:11:21,081][00338] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-09-22 16:11:21,082][00338] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-22 16:11:21,083][00338] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-22 16:11:21,085][00338] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-22 16:11:21,086][00338] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-22 16:11:21,087][00338] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-22 16:11:21,121][00338] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-22 16:11:21,124][00338] RunningMeanStd input shape: (3, 72, 128) [2024-09-22 16:11:21,127][00338] RunningMeanStd input shape: (1,) [2024-09-22 16:11:21,143][00338] ConvEncoder: input_channels=3 [2024-09-22 16:11:21,254][00338] Conv encoder output size: 512 [2024-09-22 16:11:21,256][00338] Policy head output size: 512 [2024-09-22 16:11:21,447][00338] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-09-22 16:11:22,599][00338] Num frames 100... [2024-09-22 16:11:22,770][00338] Num frames 200... [2024-09-22 16:11:22,952][00338] Num frames 300... [2024-09-22 16:11:23,124][00338] Num frames 400... [2024-09-22 16:11:23,298][00338] Num frames 500... [2024-09-22 16:11:23,485][00338] Num frames 600... [2024-09-22 16:11:23,678][00338] Num frames 700... [2024-09-22 16:11:23,855][00338] Num frames 800... [2024-09-22 16:11:24,029][00338] Num frames 900... [2024-09-22 16:11:24,156][00338] Num frames 1000... [2024-09-22 16:11:24,284][00338] Num frames 1100... [2024-09-22 16:11:24,409][00338] Num frames 1200... [2024-09-22 16:11:24,543][00338] Num frames 1300... [2024-09-22 16:11:24,694][00338] Avg episode rewards: #0: 35.760, true rewards: #0: 13.760 [2024-09-22 16:11:24,696][00338] Avg episode reward: 35.760, avg true_objective: 13.760 [2024-09-22 16:11:24,731][00338] Num frames 1400... [2024-09-22 16:11:24,852][00338] Num frames 1500... [2024-09-22 16:11:24,976][00338] Num frames 1600... [2024-09-22 16:11:25,097][00338] Num frames 1700... [2024-09-22 16:11:25,220][00338] Num frames 1800... [2024-09-22 16:11:25,345][00338] Num frames 1900... [2024-09-22 16:11:25,468][00338] Num frames 2000... [2024-09-22 16:11:25,610][00338] Num frames 2100... [2024-09-22 16:11:25,733][00338] Num frames 2200... [2024-09-22 16:11:25,852][00338] Num frames 2300... [2024-09-22 16:11:25,976][00338] Num frames 2400... [2024-09-22 16:11:26,102][00338] Num frames 2500... [2024-09-22 16:11:26,230][00338] Num frames 2600... [2024-09-22 16:11:26,356][00338] Num frames 2700... [2024-09-22 16:11:26,483][00338] Num frames 2800... [2024-09-22 16:11:26,634][00338] Num frames 2900... [2024-09-22 16:11:26,761][00338] Num frames 3000... [2024-09-22 16:11:26,886][00338] Num frames 3100... [2024-09-22 16:11:27,009][00338] Num frames 3200... [2024-09-22 16:11:27,135][00338] Num frames 3300... [2024-09-22 16:11:27,266][00338] Num frames 3400... [2024-09-22 16:11:27,418][00338] Avg episode rewards: #0: 47.879, true rewards: #0: 17.380 [2024-09-22 16:11:27,419][00338] Avg episode reward: 47.879, avg true_objective: 17.380 [2024-09-22 16:11:27,453][00338] Num frames 3500... [2024-09-22 16:11:27,591][00338] Num frames 3600... [2024-09-22 16:11:27,713][00338] Num frames 3700... [2024-09-22 16:11:27,831][00338] Num frames 3800... [2024-09-22 16:11:27,955][00338] Num frames 3900... [2024-09-22 16:11:28,039][00338] Avg episode rewards: #0: 34.413, true rewards: #0: 13.080 [2024-09-22 16:11:28,041][00338] Avg episode reward: 34.413, avg true_objective: 13.080 [2024-09-22 16:11:28,132][00338] Num frames 4000... [2024-09-22 16:11:28,258][00338] Num frames 4100... [2024-09-22 16:11:28,379][00338] Num frames 4200... [2024-09-22 16:11:28,510][00338] Num frames 4300... [2024-09-22 16:11:28,652][00338] Num frames 4400... [2024-09-22 16:11:28,776][00338] Num frames 4500... [2024-09-22 16:11:28,901][00338] Num frames 4600... [2024-09-22 16:11:29,024][00338] Num frames 4700... [2024-09-22 16:11:29,109][00338] Avg episode rewards: #0: 30.060, true rewards: #0: 11.810 [2024-09-22 16:11:29,111][00338] Avg episode reward: 30.060, avg true_objective: 11.810 [2024-09-22 16:11:29,208][00338] Num frames 4800... [2024-09-22 16:11:29,332][00338] Num frames 4900... [2024-09-22 16:11:29,456][00338] Num frames 5000... [2024-09-22 16:11:29,589][00338] Num frames 5100... [2024-09-22 16:11:29,721][00338] Num frames 5200... [2024-09-22 16:11:29,846][00338] Num frames 5300... [2024-09-22 16:11:29,966][00338] Num frames 5400... [2024-09-22 16:11:30,066][00338] Avg episode rewards: #0: 26.672, true rewards: #0: 10.872 [2024-09-22 16:11:30,067][00338] Avg episode reward: 26.672, avg true_objective: 10.872 [2024-09-22 16:11:30,150][00338] Num frames 5500... [2024-09-22 16:11:30,273][00338] Num frames 5600... [2024-09-22 16:11:30,397][00338] Num frames 5700... [2024-09-22 16:11:30,524][00338] Num frames 5800... [2024-09-22 16:11:30,648][00338] Num frames 5900... [2024-09-22 16:11:30,784][00338] Num frames 6000... [2024-09-22 16:11:30,906][00338] Num frames 6100... [2024-09-22 16:11:31,032][00338] Num frames 6200... [2024-09-22 16:11:31,155][00338] Num frames 6300... [2024-09-22 16:11:31,282][00338] Num frames 6400... [2024-09-22 16:11:31,403][00338] Num frames 6500... [2024-09-22 16:11:31,573][00338] Avg episode rewards: #0: 27.147, true rewards: #0: 10.980 [2024-09-22 16:11:31,576][00338] Avg episode reward: 27.147, avg true_objective: 10.980 [2024-09-22 16:11:31,594][00338] Num frames 6600... [2024-09-22 16:11:31,734][00338] Num frames 6700... [2024-09-22 16:11:31,853][00338] Num frames 6800... [2024-09-22 16:11:31,973][00338] Num frames 6900... [2024-09-22 16:11:32,095][00338] Num frames 7000... [2024-09-22 16:11:32,214][00338] Num frames 7100... [2024-09-22 16:11:32,339][00338] Num frames 7200... [2024-09-22 16:11:32,460][00338] Num frames 7300... [2024-09-22 16:11:32,596][00338] Num frames 7400... [2024-09-22 16:11:32,724][00338] Num frames 7500... [2024-09-22 16:11:32,851][00338] Num frames 7600... [2024-09-22 16:11:32,976][00338] Num frames 7700... [2024-09-22 16:11:33,119][00338] Avg episode rewards: #0: 27.531, true rewards: #0: 11.103 [2024-09-22 16:11:33,121][00338] Avg episode reward: 27.531, avg true_objective: 11.103 [2024-09-22 16:11:33,159][00338] Num frames 7800... [2024-09-22 16:11:33,285][00338] Num frames 7900... [2024-09-22 16:11:33,407][00338] Num frames 8000... [2024-09-22 16:11:33,539][00338] Num frames 8100... [2024-09-22 16:11:33,661][00338] Num frames 8200... [2024-09-22 16:11:33,791][00338] Num frames 8300... [2024-09-22 16:11:33,917][00338] Num frames 8400... [2024-09-22 16:11:34,056][00338] Num frames 8500... [2024-09-22 16:11:34,229][00338] Num frames 8600... [2024-09-22 16:11:34,395][00338] Num frames 8700... [2024-09-22 16:11:34,578][00338] Num frames 8800... [2024-09-22 16:11:34,752][00338] Num frames 8900... [2024-09-22 16:11:34,929][00338] Num frames 9000... [2024-09-22 16:11:35,096][00338] Num frames 9100... [2024-09-22 16:11:35,266][00338] Num frames 9200... [2024-09-22 16:11:35,444][00338] Num frames 9300... [2024-09-22 16:11:35,617][00338] Num frames 9400... [2024-09-22 16:11:35,798][00338] Num frames 9500... [2024-09-22 16:11:35,983][00338] Num frames 9600... [2024-09-22 16:11:36,161][00338] Num frames 9700... [2024-09-22 16:11:36,350][00338] Num frames 9800... [2024-09-22 16:11:36,527][00338] Avg episode rewards: #0: 31.590, true rewards: #0: 12.340 [2024-09-22 16:11:36,529][00338] Avg episode reward: 31.590, avg true_objective: 12.340 [2024-09-22 16:11:36,571][00338] Num frames 9900... [2024-09-22 16:11:36,693][00338] Num frames 10000... [2024-09-22 16:11:36,813][00338] Num frames 10100... [2024-09-22 16:11:36,941][00338] Num frames 10200... [2024-09-22 16:11:37,060][00338] Num frames 10300... [2024-09-22 16:11:37,179][00338] Num frames 10400... [2024-09-22 16:11:37,301][00338] Num frames 10500... [2024-09-22 16:11:37,424][00338] Num frames 10600... [2024-09-22 16:11:37,532][00338] Avg episode rewards: #0: 29.489, true rewards: #0: 11.822 [2024-09-22 16:11:37,533][00338] Avg episode reward: 29.489, avg true_objective: 11.822 [2024-09-22 16:11:37,608][00338] Num frames 10700... [2024-09-22 16:11:37,728][00338] Num frames 10800... [2024-09-22 16:11:37,851][00338] Num frames 10900... [2024-09-22 16:11:37,980][00338] Num frames 11000... [2024-09-22 16:11:38,100][00338] Num frames 11100... [2024-09-22 16:11:38,223][00338] Num frames 11200... [2024-09-22 16:11:38,344][00338] Num frames 11300... [2024-09-22 16:11:38,467][00338] Num frames 11400... [2024-09-22 16:11:38,600][00338] Num frames 11500... [2024-09-22 16:11:38,727][00338] Num frames 11600... [2024-09-22 16:11:38,851][00338] Num frames 11700... [2024-09-22 16:11:38,982][00338] Num frames 11800... [2024-09-22 16:11:39,104][00338] Num frames 11900... [2024-09-22 16:11:39,230][00338] Num frames 12000... [2024-09-22 16:11:39,359][00338] Num frames 12100... [2024-09-22 16:11:39,483][00338] Num frames 12200... [2024-09-22 16:11:39,611][00338] Num frames 12300... [2024-09-22 16:11:39,739][00338] Num frames 12400... [2024-09-22 16:11:39,864][00338] Num frames 12500... [2024-09-22 16:11:39,993][00338] Num frames 12600... [2024-09-22 16:11:40,122][00338] Avg episode rewards: #0: 31.858, true rewards: #0: 12.658 [2024-09-22 16:11:40,124][00338] Avg episode reward: 31.858, avg true_objective: 12.658 [2024-09-22 16:12:57,362][00338] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-22 16:14:37,852][00338] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-22 16:14:37,854][00338] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-22 16:14:37,856][00338] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-22 16:14:37,858][00338] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-22 16:14:37,860][00338] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-22 16:14:37,862][00338] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-22 16:14:37,864][00338] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-22 16:14:37,865][00338] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-22 16:14:37,866][00338] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-22 16:14:37,867][00338] Adding new argument 'hf_repository'='kalmi901/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-22 16:14:37,868][00338] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-22 16:14:37,869][00338] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-22 16:14:37,870][00338] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-22 16:14:37,871][00338] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-22 16:14:37,872][00338] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-22 16:14:37,900][00338] RunningMeanStd input shape: (3, 72, 128) [2024-09-22 16:14:37,902][00338] RunningMeanStd input shape: (1,) [2024-09-22 16:14:37,914][00338] ConvEncoder: input_channels=3 [2024-09-22 16:14:37,952][00338] Conv encoder output size: 512 [2024-09-22 16:14:37,954][00338] Policy head output size: 512 [2024-09-22 16:14:37,973][00338] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-09-22 16:14:38,403][00338] Num frames 100... [2024-09-22 16:14:38,537][00338] Num frames 200... [2024-09-22 16:14:38,681][00338] Num frames 300... [2024-09-22 16:14:38,803][00338] Num frames 400... [2024-09-22 16:14:38,924][00338] Num frames 500... [2024-09-22 16:14:39,049][00338] Num frames 600... [2024-09-22 16:14:39,173][00338] Num frames 700... [2024-09-22 16:14:39,294][00338] Num frames 800... [2024-09-22 16:14:39,418][00338] Num frames 900... [2024-09-22 16:14:39,546][00338] Num frames 1000... [2024-09-22 16:14:39,677][00338] Num frames 1100... [2024-09-22 16:14:39,801][00338] Num frames 1200... [2024-09-22 16:14:39,952][00338] Avg episode rewards: #0: 30.800, true rewards: #0: 12.800 [2024-09-22 16:14:39,954][00338] Avg episode reward: 30.800, avg true_objective: 12.800 [2024-09-22 16:14:39,983][00338] Num frames 1300... [2024-09-22 16:14:40,114][00338] Num frames 1400... [2024-09-22 16:14:40,240][00338] Num frames 1500... [2024-09-22 16:14:40,365][00338] Num frames 1600... [2024-09-22 16:14:40,486][00338] Num frames 1700... [2024-09-22 16:14:40,620][00338] Num frames 1800... [2024-09-22 16:14:40,761][00338] Num frames 1900... [2024-09-22 16:14:40,884][00338] Num frames 2000... [2024-09-22 16:14:41,010][00338] Num frames 2100... [2024-09-22 16:14:41,135][00338] Num frames 2200... [2024-09-22 16:14:41,260][00338] Num frames 2300... [2024-09-22 16:14:41,382][00338] Num frames 2400... [2024-09-22 16:14:41,490][00338] Avg episode rewards: #0: 31.715, true rewards: #0: 12.215 [2024-09-22 16:14:41,493][00338] Avg episode reward: 31.715, avg true_objective: 12.215 [2024-09-22 16:14:41,575][00338] Num frames 2500... [2024-09-22 16:14:41,706][00338] Num frames 2600... [2024-09-22 16:14:41,829][00338] Num frames 2700... [2024-09-22 16:14:41,952][00338] Num frames 2800... [2024-09-22 16:14:42,077][00338] Num frames 2900... [2024-09-22 16:14:42,203][00338] Num frames 3000... [2024-09-22 16:14:42,360][00338] Avg episode rewards: #0: 25.613, true rewards: #0: 10.280 [2024-09-22 16:14:42,361][00338] Avg episode reward: 25.613, avg true_objective: 10.280 [2024-09-22 16:14:42,385][00338] Num frames 3100... [2024-09-22 16:14:42,514][00338] Num frames 3200... [2024-09-22 16:14:42,637][00338] Num frames 3300... [2024-09-22 16:14:42,771][00338] Num frames 3400... [2024-09-22 16:14:42,894][00338] Num frames 3500... [2024-09-22 16:14:43,021][00338] Num frames 3600... [2024-09-22 16:14:43,145][00338] Num frames 3700... [2024-09-22 16:14:43,269][00338] Num frames 3800... [2024-09-22 16:14:43,392][00338] Num frames 3900... [2024-09-22 16:14:43,521][00338] Num frames 4000... [2024-09-22 16:14:43,647][00338] Num frames 4100... [2024-09-22 16:14:43,779][00338] Num frames 4200... [2024-09-22 16:14:43,902][00338] Num frames 4300... [2024-09-22 16:14:44,025][00338] Num frames 4400... [2024-09-22 16:14:44,115][00338] Avg episode rewards: #0: 27.070, true rewards: #0: 11.070 [2024-09-22 16:14:44,117][00338] Avg episode reward: 27.070, avg true_objective: 11.070 [2024-09-22 16:14:44,211][00338] Num frames 4500... [2024-09-22 16:14:44,336][00338] Num frames 4600... [2024-09-22 16:14:44,462][00338] Num frames 4700... [2024-09-22 16:14:44,595][00338] Num frames 4800... [2024-09-22 16:14:44,717][00338] Num frames 4900... [2024-09-22 16:14:44,849][00338] Num frames 5000... [2024-09-22 16:14:44,974][00338] Num frames 5100... [2024-09-22 16:14:45,079][00338] Avg episode rewards: #0: 24.680, true rewards: #0: 10.280 [2024-09-22 16:14:45,081][00338] Avg episode reward: 24.680, avg true_objective: 10.280 [2024-09-22 16:14:45,159][00338] Num frames 5200... [2024-09-22 16:14:45,281][00338] Num frames 5300... [2024-09-22 16:14:45,402][00338] Num frames 5400... [2024-09-22 16:14:45,493][00338] Avg episode rewards: #0: 21.213, true rewards: #0: 9.047 [2024-09-22 16:14:45,494][00338] Avg episode reward: 21.213, avg true_objective: 9.047 [2024-09-22 16:14:45,590][00338] Num frames 5500... [2024-09-22 16:14:45,716][00338] Num frames 5600... [2024-09-22 16:14:45,848][00338] Num frames 5700... [2024-09-22 16:14:45,966][00338] Num frames 5800... [2024-09-22 16:14:46,084][00338] Num frames 5900... [2024-09-22 16:14:46,206][00338] Num frames 6000... [2024-09-22 16:14:46,323][00338] Num frames 6100... [2024-09-22 16:14:46,450][00338] Num frames 6200... [2024-09-22 16:14:46,580][00338] Num frames 6300... [2024-09-22 16:14:46,701][00338] Num frames 6400... [2024-09-22 16:14:46,850][00338] Num frames 6500... [2024-09-22 16:14:46,989][00338] Num frames 6600... [2024-09-22 16:14:47,060][00338] Avg episode rewards: #0: 22.017, true rewards: #0: 9.446 [2024-09-22 16:14:47,064][00338] Avg episode reward: 22.017, avg true_objective: 9.446 [2024-09-22 16:14:47,217][00338] Num frames 6700... [2024-09-22 16:14:47,384][00338] Num frames 6800... [2024-09-22 16:14:47,566][00338] Num frames 6900... [2024-09-22 16:14:47,738][00338] Num frames 7000... [2024-09-22 16:14:47,902][00338] Num frames 7100... [2024-09-22 16:14:48,078][00338] Num frames 7200... [2024-09-22 16:14:48,248][00338] Num frames 7300... [2024-09-22 16:14:48,429][00338] Num frames 7400... [2024-09-22 16:14:48,609][00338] Num frames 7500... [2024-09-22 16:14:48,781][00338] Num frames 7600... [2024-09-22 16:14:48,955][00338] Num frames 7700... [2024-09-22 16:14:49,137][00338] Num frames 7800... [2024-09-22 16:14:49,314][00338] Num frames 7900... [2024-09-22 16:14:49,492][00338] Num frames 8000... [2024-09-22 16:14:49,656][00338] Num frames 8100... [2024-09-22 16:14:49,779][00338] Num frames 8200... [2024-09-22 16:14:49,899][00338] Num frames 8300... [2024-09-22 16:14:50,018][00338] Num frames 8400... [2024-09-22 16:14:50,142][00338] Num frames 8500... [2024-09-22 16:14:50,276][00338] Num frames 8600... [2024-09-22 16:14:50,401][00338] Num frames 8700... [2024-09-22 16:14:50,472][00338] Avg episode rewards: #0: 26.515, true rewards: #0: 10.890 [2024-09-22 16:14:50,474][00338] Avg episode reward: 26.515, avg true_objective: 10.890 [2024-09-22 16:14:50,594][00338] Num frames 8800... [2024-09-22 16:14:50,716][00338] Num frames 8900... [2024-09-22 16:14:50,842][00338] Num frames 9000... [2024-09-22 16:14:50,964][00338] Num frames 9100... [2024-09-22 16:14:51,090][00338] Num frames 9200... [2024-09-22 16:14:51,217][00338] Num frames 9300... [2024-09-22 16:14:51,350][00338] Num frames 9400... [2024-09-22 16:14:51,474][00338] Num frames 9500... [2024-09-22 16:14:51,606][00338] Num frames 9600... [2024-09-22 16:14:51,708][00338] Avg episode rewards: #0: 25.487, true rewards: #0: 10.709 [2024-09-22 16:14:51,711][00338] Avg episode reward: 25.487, avg true_objective: 10.709 [2024-09-22 16:14:51,789][00338] Num frames 9700... [2024-09-22 16:14:51,915][00338] Num frames 9800... [2024-09-22 16:14:52,039][00338] Num frames 9900... [2024-09-22 16:14:52,164][00338] Num frames 10000... [2024-09-22 16:14:52,295][00338] Num frames 10100... [2024-09-22 16:14:52,419][00338] Num frames 10200... [2024-09-22 16:14:52,552][00338] Num frames 10300... [2024-09-22 16:14:52,676][00338] Num frames 10400... [2024-09-22 16:14:52,798][00338] Num frames 10500... [2024-09-22 16:14:52,919][00338] Num frames 10600... [2024-09-22 16:14:53,038][00338] Num frames 10700... [2024-09-22 16:14:53,164][00338] Num frames 10800... [2024-09-22 16:14:53,288][00338] Num frames 10900... [2024-09-22 16:14:53,377][00338] Avg episode rewards: #0: 25.923, true rewards: #0: 10.923 [2024-09-22 16:14:53,379][00338] Avg episode reward: 25.923, avg true_objective: 10.923 [2024-09-22 16:15:59,559][00338] Replay video saved to /content/train_dir/default_experiment/replay.mp4!