[2023-02-27 11:28:58,437][00107] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-27 11:28:58,442][00107] Rollout worker 0 uses device cpu [2023-02-27 11:28:58,443][00107] Rollout worker 1 uses device cpu [2023-02-27 11:28:58,445][00107] Rollout worker 2 uses device cpu [2023-02-27 11:28:58,446][00107] Rollout worker 3 uses device cpu [2023-02-27 11:28:58,448][00107] Rollout worker 4 uses device cpu [2023-02-27 11:28:58,449][00107] Rollout worker 5 uses device cpu [2023-02-27 11:28:58,450][00107] Rollout worker 6 uses device cpu [2023-02-27 11:28:58,451][00107] Rollout worker 7 uses device cpu [2023-02-27 11:28:58,642][00107] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:28:58,644][00107] InferenceWorker_p0-w0: min num requests: 2 [2023-02-27 11:28:58,675][00107] Starting all processes... [2023-02-27 11:28:58,676][00107] Starting process learner_proc0 [2023-02-27 11:28:58,727][00107] Starting all processes... [2023-02-27 11:28:58,738][00107] Starting process inference_proc0-0 [2023-02-27 11:28:58,738][00107] Starting process rollout_proc0 [2023-02-27 11:28:58,743][00107] Starting process rollout_proc1 [2023-02-27 11:28:58,743][00107] Starting process rollout_proc2 [2023-02-27 11:28:58,743][00107] Starting process rollout_proc3 [2023-02-27 11:28:58,743][00107] Starting process rollout_proc4 [2023-02-27 11:28:58,743][00107] Starting process rollout_proc5 [2023-02-27 11:28:58,743][00107] Starting process rollout_proc6 [2023-02-27 11:28:58,743][00107] Starting process rollout_proc7 [2023-02-27 11:29:08,913][20173] Worker 1 uses CPU cores [1] [2023-02-27 11:29:09,326][20174] Worker 2 uses CPU cores [0] [2023-02-27 11:29:09,663][20157] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:29:09,668][20157] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-27 11:29:09,741][20171] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:29:09,742][20171] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-27 11:29:09,813][20178] Worker 5 uses CPU cores [1] [2023-02-27 11:29:09,823][20176] Worker 6 uses CPU cores [0] [2023-02-27 11:29:09,841][20175] Worker 3 uses CPU cores [1] [2023-02-27 11:29:09,883][20172] Worker 0 uses CPU cores [0] [2023-02-27 11:29:09,899][20177] Worker 4 uses CPU cores [0] [2023-02-27 11:29:09,928][20179] Worker 7 uses CPU cores [1] [2023-02-27 11:29:10,366][20157] Num visible devices: 1 [2023-02-27 11:29:10,367][20171] Num visible devices: 1 [2023-02-27 11:29:10,378][20157] Starting seed is not provided [2023-02-27 11:29:10,378][20157] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:29:10,379][20157] Initializing actor-critic model on device cuda:0 [2023-02-27 11:29:10,380][20157] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:29:10,382][20157] RunningMeanStd input shape: (1,) [2023-02-27 11:29:10,394][20157] ConvEncoder: input_channels=3 [2023-02-27 11:29:10,688][20157] Conv encoder output size: 512 [2023-02-27 11:29:10,689][20157] Policy head output size: 512 [2023-02-27 11:29:10,741][20157] Created Actor Critic model with architecture: [2023-02-27 11:29:10,741][20157] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-27 11:29:18,021][20157] Using optimizer [2023-02-27 11:29:18,022][20157] No checkpoints found [2023-02-27 11:29:18,022][20157] Did not load from checkpoint, starting from scratch! [2023-02-27 11:29:18,022][20157] Initialized policy 0 weights for model version 0 [2023-02-27 11:29:18,026][20157] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:29:18,034][20157] LearnerWorker_p0 finished initialization! [2023-02-27 11:29:18,221][20171] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:29:18,222][20171] RunningMeanStd input shape: (1,) [2023-02-27 11:29:18,234][20171] ConvEncoder: input_channels=3 [2023-02-27 11:29:18,331][20171] Conv encoder output size: 512 [2023-02-27 11:29:18,331][20171] Policy head output size: 512 [2023-02-27 11:29:18,634][00107] Heartbeat connected on Batcher_0 [2023-02-27 11:29:18,641][00107] Heartbeat connected on LearnerWorker_p0 [2023-02-27 11:29:18,652][00107] Heartbeat connected on RolloutWorker_w0 [2023-02-27 11:29:18,657][00107] Heartbeat connected on RolloutWorker_w1 [2023-02-27 11:29:18,659][00107] Heartbeat connected on RolloutWorker_w2 [2023-02-27 11:29:18,663][00107] Heartbeat connected on RolloutWorker_w3 [2023-02-27 11:29:18,668][00107] Heartbeat connected on RolloutWorker_w4 [2023-02-27 11:29:18,673][00107] Heartbeat connected on RolloutWorker_w5 [2023-02-27 11:29:18,677][00107] Heartbeat connected on RolloutWorker_w6 [2023-02-27 11:29:18,681][00107] Heartbeat connected on RolloutWorker_w7 [2023-02-27 11:29:19,228][00107] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:29:20,599][00107] Inference worker 0-0 is ready! [2023-02-27 11:29:20,601][00107] All inference workers are ready! Signal rollout workers to start! [2023-02-27 11:29:20,606][00107] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-27 11:29:20,726][20178] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:29:20,742][20173] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:29:20,755][20172] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:29:20,764][20179] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:29:20,770][20176] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:29:20,767][20174] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:29:20,776][20177] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:29:20,775][20175] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:29:21,292][20176] Decorrelating experience for 0 frames... [2023-02-27 11:29:21,631][20176] Decorrelating experience for 32 frames... [2023-02-27 11:29:22,057][20176] Decorrelating experience for 64 frames... [2023-02-27 11:29:22,175][20179] Decorrelating experience for 0 frames... [2023-02-27 11:29:22,179][20178] Decorrelating experience for 0 frames... [2023-02-27 11:29:22,180][20173] Decorrelating experience for 0 frames... [2023-02-27 11:29:22,185][20175] Decorrelating experience for 0 frames... [2023-02-27 11:29:22,954][20176] Decorrelating experience for 96 frames... [2023-02-27 11:29:23,190][20173] Decorrelating experience for 32 frames... [2023-02-27 11:29:23,192][20179] Decorrelating experience for 32 frames... [2023-02-27 11:29:23,201][20178] Decorrelating experience for 32 frames... [2023-02-27 11:29:23,266][20172] Decorrelating experience for 0 frames... [2023-02-27 11:29:23,286][20177] Decorrelating experience for 0 frames... [2023-02-27 11:29:24,228][00107] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:29:24,321][20172] Decorrelating experience for 32 frames... [2023-02-27 11:29:24,387][20177] Decorrelating experience for 32 frames... [2023-02-27 11:29:24,429][20175] Decorrelating experience for 32 frames... [2023-02-27 11:29:24,675][20173] Decorrelating experience for 64 frames... [2023-02-27 11:29:24,686][20178] Decorrelating experience for 64 frames... [2023-02-27 11:29:25,263][20179] Decorrelating experience for 64 frames... [2023-02-27 11:29:25,658][20174] Decorrelating experience for 0 frames... [2023-02-27 11:29:25,760][20172] Decorrelating experience for 64 frames... [2023-02-27 11:29:25,871][20177] Decorrelating experience for 64 frames... [2023-02-27 11:29:26,522][20174] Decorrelating experience for 32 frames... [2023-02-27 11:29:26,652][20172] Decorrelating experience for 96 frames... [2023-02-27 11:29:27,093][20174] Decorrelating experience for 64 frames... [2023-02-27 11:29:27,201][20173] Decorrelating experience for 96 frames... [2023-02-27 11:29:27,325][20179] Decorrelating experience for 96 frames... [2023-02-27 11:29:28,105][20178] Decorrelating experience for 96 frames... [2023-02-27 11:29:28,678][20175] Decorrelating experience for 64 frames... [2023-02-27 11:29:28,834][20177] Decorrelating experience for 96 frames... [2023-02-27 11:29:29,228][00107] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 1.6. Samples: 16. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:29:29,231][00107] Avg episode reward: [(0, '1.216')] [2023-02-27 11:29:29,601][20174] Decorrelating experience for 96 frames... [2023-02-27 11:29:32,953][20157] Signal inference workers to stop experience collection... [2023-02-27 11:29:32,980][20171] InferenceWorker_p0-w0: stopping experience collection [2023-02-27 11:29:32,995][20175] Decorrelating experience for 96 frames... [2023-02-27 11:29:34,228][00107] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 161.2. Samples: 2418. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:29:34,231][00107] Avg episode reward: [(0, '2.637')] [2023-02-27 11:29:35,455][20157] Signal inference workers to resume experience collection... [2023-02-27 11:29:35,457][20171] InferenceWorker_p0-w0: resuming experience collection [2023-02-27 11:29:39,228][00107] Fps is (10 sec: 2048.0, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 20480. Throughput: 0: 266.6. Samples: 5332. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 11:29:39,230][00107] Avg episode reward: [(0, '3.396')] [2023-02-27 11:29:43,148][20171] Updated weights for policy 0, policy_version 10 (0.0358) [2023-02-27 11:29:44,228][00107] Fps is (10 sec: 4505.6, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 45056. Throughput: 0: 351.3. Samples: 8782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:29:44,230][00107] Avg episode reward: [(0, '4.097')] [2023-02-27 11:29:49,228][00107] Fps is (10 sec: 4095.9, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 61440. Throughput: 0: 506.5. Samples: 15196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:29:49,231][00107] Avg episode reward: [(0, '4.517')] [2023-02-27 11:29:54,229][00107] Fps is (10 sec: 3276.6, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 567.6. Samples: 19866. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:29:54,234][00107] Avg episode reward: [(0, '4.523')] [2023-02-27 11:29:55,100][20171] Updated weights for policy 0, policy_version 20 (0.0037) [2023-02-27 11:29:59,228][00107] Fps is (10 sec: 4096.1, 60 sec: 2560.0, 300 sec: 2560.0). Total num frames: 102400. Throughput: 0: 576.7. Samples: 23070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:29:59,230][00107] Avg episode reward: [(0, '4.281')] [2023-02-27 11:29:59,237][20157] Saving new best policy, reward=4.281! [2023-02-27 11:30:03,427][20171] Updated weights for policy 0, policy_version 30 (0.0015) [2023-02-27 11:30:04,228][00107] Fps is (10 sec: 4505.9, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 122880. Throughput: 0: 676.8. Samples: 30456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:30:04,230][00107] Avg episode reward: [(0, '4.317')] [2023-02-27 11:30:04,317][20157] Saving new best policy, reward=4.317! [2023-02-27 11:30:09,228][00107] Fps is (10 sec: 3686.4, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 139264. Throughput: 0: 796.8. Samples: 35856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:30:09,236][00107] Avg episode reward: [(0, '4.425')] [2023-02-27 11:30:09,241][20157] Saving new best policy, reward=4.425! [2023-02-27 11:30:14,229][00107] Fps is (10 sec: 3276.4, 60 sec: 2829.9, 300 sec: 2829.9). Total num frames: 155648. Throughput: 0: 845.4. Samples: 38058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:30:14,232][00107] Avg episode reward: [(0, '4.430')] [2023-02-27 11:30:14,268][20157] Saving new best policy, reward=4.430! [2023-02-27 11:30:15,130][20171] Updated weights for policy 0, policy_version 40 (0.0024) [2023-02-27 11:30:19,228][00107] Fps is (10 sec: 4096.0, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 180224. Throughput: 0: 937.6. Samples: 44612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:30:19,229][00107] Avg episode reward: [(0, '4.392')] [2023-02-27 11:30:23,484][20171] Updated weights for policy 0, policy_version 50 (0.0013) [2023-02-27 11:30:24,228][00107] Fps is (10 sec: 4915.7, 60 sec: 3413.3, 300 sec: 3150.8). Total num frames: 204800. Throughput: 0: 1035.9. Samples: 51946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:30:24,235][00107] Avg episode reward: [(0, '4.502')] [2023-02-27 11:30:24,257][20157] Saving new best policy, reward=4.502! [2023-02-27 11:30:29,228][00107] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3159.8). Total num frames: 221184. Throughput: 0: 1009.4. Samples: 54204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:30:29,235][00107] Avg episode reward: [(0, '4.469')] [2023-02-27 11:30:34,228][00107] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 3222.2). Total num frames: 241664. Throughput: 0: 973.0. Samples: 58982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:30:34,235][00107] Avg episode reward: [(0, '4.244')] [2023-02-27 11:30:34,967][20171] Updated weights for policy 0, policy_version 60 (0.0028) [2023-02-27 11:30:39,228][00107] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3276.8). Total num frames: 262144. Throughput: 0: 1033.4. Samples: 66370. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:30:39,233][00107] Avg episode reward: [(0, '4.426')] [2023-02-27 11:30:44,220][20171] Updated weights for policy 0, policy_version 70 (0.0023) [2023-02-27 11:30:44,228][00107] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3373.2). Total num frames: 286720. Throughput: 0: 1042.7. Samples: 69992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:30:44,230][00107] Avg episode reward: [(0, '4.514')] [2023-02-27 11:30:44,242][20157] Saving new best policy, reward=4.514! [2023-02-27 11:30:49,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3322.3). Total num frames: 299008. Throughput: 0: 986.7. Samples: 74858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:30:49,231][00107] Avg episode reward: [(0, '4.438')] [2023-02-27 11:30:54,228][00107] Fps is (10 sec: 3276.8, 60 sec: 4027.8, 300 sec: 3363.0). Total num frames: 319488. Throughput: 0: 995.0. Samples: 80630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:30:54,234][00107] Avg episode reward: [(0, '4.558')] [2023-02-27 11:30:54,242][20157] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth... [2023-02-27 11:30:54,365][20157] Saving new best policy, reward=4.558! [2023-02-27 11:30:55,319][20171] Updated weights for policy 0, policy_version 80 (0.0020) [2023-02-27 11:30:59,228][00107] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3440.6). Total num frames: 344064. Throughput: 0: 1022.6. Samples: 84074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:30:59,230][00107] Avg episode reward: [(0, '4.303')] [2023-02-27 11:31:04,228][00107] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3471.9). Total num frames: 364544. Throughput: 0: 1023.3. Samples: 90662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:31:04,232][00107] Avg episode reward: [(0, '4.342')] [2023-02-27 11:31:05,363][20171] Updated weights for policy 0, policy_version 90 (0.0024) [2023-02-27 11:31:09,231][00107] Fps is (10 sec: 3685.2, 60 sec: 4027.5, 300 sec: 3462.9). Total num frames: 380928. Throughput: 0: 965.2. Samples: 95382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:31:09,236][00107] Avg episode reward: [(0, '4.420')] [2023-02-27 11:31:14,228][00107] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 3490.5). Total num frames: 401408. Throughput: 0: 985.0. Samples: 98528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:31:14,236][00107] Avg episode reward: [(0, '4.564')] [2023-02-27 11:31:14,251][20157] Saving new best policy, reward=4.564! [2023-02-27 11:31:15,498][20171] Updated weights for policy 0, policy_version 100 (0.0037) [2023-02-27 11:31:19,228][00107] Fps is (10 sec: 4507.1, 60 sec: 4096.0, 300 sec: 3549.9). Total num frames: 425984. Throughput: 0: 1037.5. Samples: 105668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:31:19,235][00107] Avg episode reward: [(0, '4.645')] [2023-02-27 11:31:19,238][20157] Saving new best policy, reward=4.645! [2023-02-27 11:31:24,228][00107] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3538.9). Total num frames: 442368. Throughput: 0: 997.2. Samples: 111244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:31:24,238][00107] Avg episode reward: [(0, '4.526')] [2023-02-27 11:31:26,261][20171] Updated weights for policy 0, policy_version 110 (0.0024) [2023-02-27 11:31:29,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3528.9). Total num frames: 458752. Throughput: 0: 967.9. Samples: 113546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:31:29,234][00107] Avg episode reward: [(0, '4.519')] [2023-02-27 11:31:34,228][00107] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3580.2). Total num frames: 483328. Throughput: 0: 1002.5. Samples: 119972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:31:34,236][00107] Avg episode reward: [(0, '4.550')] [2023-02-27 11:31:35,544][20171] Updated weights for policy 0, policy_version 120 (0.0037) [2023-02-27 11:31:39,228][00107] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3627.9). Total num frames: 507904. Throughput: 0: 1038.0. Samples: 127340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:31:39,234][00107] Avg episode reward: [(0, '4.727')] [2023-02-27 11:31:39,237][20157] Saving new best policy, reward=4.727! [2023-02-27 11:31:44,228][00107] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3587.5). Total num frames: 520192. Throughput: 0: 1012.7. Samples: 129648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:31:44,230][00107] Avg episode reward: [(0, '4.917')] [2023-02-27 11:31:44,241][20157] Saving new best policy, reward=4.917! [2023-02-27 11:31:46,974][20171] Updated weights for policy 0, policy_version 130 (0.0011) [2023-02-27 11:31:49,228][00107] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3604.5). Total num frames: 540672. Throughput: 0: 969.9. Samples: 134308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:31:49,230][00107] Avg episode reward: [(0, '5.034')] [2023-02-27 11:31:49,237][20157] Saving new best policy, reward=5.034! [2023-02-27 11:31:54,228][00107] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3646.8). Total num frames: 565248. Throughput: 0: 1026.1. Samples: 141554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:31:54,231][00107] Avg episode reward: [(0, '4.929')] [2023-02-27 11:31:55,461][20171] Updated weights for policy 0, policy_version 140 (0.0026) [2023-02-27 11:31:59,228][00107] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3660.8). Total num frames: 585728. Throughput: 0: 1034.6. Samples: 145086. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:31:59,231][00107] Avg episode reward: [(0, '4.837')] [2023-02-27 11:32:04,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3649.2). Total num frames: 602112. Throughput: 0: 991.6. Samples: 150290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:32:04,230][00107] Avg episode reward: [(0, '4.588')] [2023-02-27 11:32:07,119][20171] Updated weights for policy 0, policy_version 150 (0.0016) [2023-02-27 11:32:09,228][00107] Fps is (10 sec: 3686.5, 60 sec: 4028.0, 300 sec: 3662.3). Total num frames: 622592. Throughput: 0: 992.2. Samples: 155892. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:32:09,234][00107] Avg episode reward: [(0, '4.564')] [2023-02-27 11:32:14,228][00107] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3698.1). Total num frames: 647168. Throughput: 0: 1020.2. Samples: 159456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:32:14,233][00107] Avg episode reward: [(0, '4.757')] [2023-02-27 11:32:15,648][20171] Updated weights for policy 0, policy_version 160 (0.0013) [2023-02-27 11:32:19,234][00107] Fps is (10 sec: 4502.6, 60 sec: 4027.3, 300 sec: 3709.0). Total num frames: 667648. Throughput: 0: 1027.0. Samples: 166192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:32:19,241][00107] Avg episode reward: [(0, '4.886')] [2023-02-27 11:32:24,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3675.3). Total num frames: 679936. Throughput: 0: 963.5. Samples: 170698. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:32:24,230][00107] Avg episode reward: [(0, '4.722')] [2023-02-27 11:32:27,440][20171] Updated weights for policy 0, policy_version 170 (0.0069) [2023-02-27 11:32:29,228][00107] Fps is (10 sec: 3688.8, 60 sec: 4096.0, 300 sec: 3708.0). Total num frames: 704512. Throughput: 0: 979.6. Samples: 173730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:32:29,230][00107] Avg episode reward: [(0, '4.685')] [2023-02-27 11:32:34,228][00107] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3738.9). Total num frames: 729088. Throughput: 0: 1037.4. Samples: 180992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:32:34,233][00107] Avg episode reward: [(0, '4.644')] [2023-02-27 11:32:36,158][20171] Updated weights for policy 0, policy_version 180 (0.0014) [2023-02-27 11:32:39,228][00107] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3727.4). Total num frames: 745472. Throughput: 0: 1003.5. Samples: 186710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:32:39,231][00107] Avg episode reward: [(0, '4.746')] [2023-02-27 11:32:44,228][00107] Fps is (10 sec: 3276.8, 60 sec: 4027.8, 300 sec: 3716.4). Total num frames: 761856. Throughput: 0: 975.1. Samples: 188964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:32:44,233][00107] Avg episode reward: [(0, '4.730')] [2023-02-27 11:32:47,465][20171] Updated weights for policy 0, policy_version 190 (0.0017) [2023-02-27 11:32:49,228][00107] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 3744.9). Total num frames: 786432. Throughput: 0: 997.9. Samples: 195196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:32:49,235][00107] Avg episode reward: [(0, '4.642')] [2023-02-27 11:32:54,229][00107] Fps is (10 sec: 4505.0, 60 sec: 4027.6, 300 sec: 3753.1). Total num frames: 806912. Throughput: 0: 1031.8. Samples: 202324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:32:54,234][00107] Avg episode reward: [(0, '4.838')] [2023-02-27 11:32:54,244][20157] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000197_806912.pth... [2023-02-27 11:32:57,347][20171] Updated weights for policy 0, policy_version 200 (0.0028) [2023-02-27 11:32:59,232][00107] Fps is (10 sec: 3684.8, 60 sec: 3959.2, 300 sec: 3742.2). Total num frames: 823296. Throughput: 0: 1006.8. Samples: 204764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:32:59,236][00107] Avg episode reward: [(0, '4.815')] [2023-02-27 11:33:04,228][00107] Fps is (10 sec: 3277.2, 60 sec: 3959.5, 300 sec: 3731.9). Total num frames: 839680. Throughput: 0: 959.9. Samples: 209380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:33:04,236][00107] Avg episode reward: [(0, '4.510')] [2023-02-27 11:33:07,686][20171] Updated weights for policy 0, policy_version 210 (0.0015) [2023-02-27 11:33:09,228][00107] Fps is (10 sec: 4097.7, 60 sec: 4027.7, 300 sec: 3757.6). Total num frames: 864256. Throughput: 0: 1020.2. Samples: 216606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:33:09,231][00107] Avg episode reward: [(0, '4.476')] [2023-02-27 11:33:14,228][00107] Fps is (10 sec: 4915.1, 60 sec: 4027.7, 300 sec: 3782.3). Total num frames: 888832. Throughput: 0: 1035.0. Samples: 220306. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:33:14,233][00107] Avg episode reward: [(0, '4.711')] [2023-02-27 11:33:18,026][20171] Updated weights for policy 0, policy_version 220 (0.0020) [2023-02-27 11:33:19,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3891.6, 300 sec: 3754.7). Total num frames: 901120. Throughput: 0: 986.9. Samples: 225402. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:33:19,231][00107] Avg episode reward: [(0, '4.794')] [2023-02-27 11:33:24,234][00107] Fps is (10 sec: 3274.9, 60 sec: 4027.3, 300 sec: 3761.5). Total num frames: 921600. Throughput: 0: 981.5. Samples: 230884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:33:24,236][00107] Avg episode reward: [(0, '4.688')] [2023-02-27 11:33:28,202][20171] Updated weights for policy 0, policy_version 230 (0.0033) [2023-02-27 11:33:29,228][00107] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3784.7). Total num frames: 946176. Throughput: 0: 1009.6. Samples: 234394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:33:29,230][00107] Avg episode reward: [(0, '4.772')] [2023-02-27 11:33:34,228][00107] Fps is (10 sec: 4098.5, 60 sec: 3891.2, 300 sec: 3774.7). Total num frames: 962560. Throughput: 0: 1013.8. Samples: 240818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:33:34,234][00107] Avg episode reward: [(0, '4.842')] [2023-02-27 11:33:39,229][00107] Fps is (10 sec: 3276.6, 60 sec: 3891.2, 300 sec: 3765.2). Total num frames: 978944. Throughput: 0: 956.1. Samples: 245348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:33:39,233][00107] Avg episode reward: [(0, '4.561')] [2023-02-27 11:33:39,794][20171] Updated weights for policy 0, policy_version 240 (0.0012) [2023-02-27 11:33:44,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3771.4). Total num frames: 999424. Throughput: 0: 967.3. Samples: 248290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:33:44,230][00107] Avg episode reward: [(0, '4.736')] [2023-02-27 11:33:48,610][20171] Updated weights for policy 0, policy_version 250 (0.0018) [2023-02-27 11:33:49,228][00107] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3792.6). Total num frames: 1024000. Throughput: 0: 1021.6. Samples: 255350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:33:49,236][00107] Avg episode reward: [(0, '4.644')] [2023-02-27 11:33:54,228][00107] Fps is (10 sec: 4096.1, 60 sec: 3891.3, 300 sec: 3783.2). Total num frames: 1040384. Throughput: 0: 988.9. Samples: 261106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:33:54,230][00107] Avg episode reward: [(0, '4.643')] [2023-02-27 11:33:59,228][00107] Fps is (10 sec: 3276.7, 60 sec: 3891.5, 300 sec: 3774.2). Total num frames: 1056768. Throughput: 0: 957.6. Samples: 263400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:33:59,236][00107] Avg episode reward: [(0, '4.691')] [2023-02-27 11:34:00,569][20171] Updated weights for policy 0, policy_version 260 (0.0030) [2023-02-27 11:34:04,228][00107] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3794.2). Total num frames: 1081344. Throughput: 0: 974.3. Samples: 269244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:34:04,231][00107] Avg episode reward: [(0, '4.633')] [2023-02-27 11:34:09,168][20171] Updated weights for policy 0, policy_version 270 (0.0023) [2023-02-27 11:34:09,228][00107] Fps is (10 sec: 4915.3, 60 sec: 4027.7, 300 sec: 3813.5). Total num frames: 1105920. Throughput: 0: 1014.5. Samples: 276532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:34:09,230][00107] Avg episode reward: [(0, '4.689')] [2023-02-27 11:34:14,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3790.5). Total num frames: 1118208. Throughput: 0: 995.0. Samples: 279170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:34:14,230][00107] Avg episode reward: [(0, '4.692')] [2023-02-27 11:34:19,228][00107] Fps is (10 sec: 3276.7, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1138688. Throughput: 0: 954.7. Samples: 283780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:34:19,231][00107] Avg episode reward: [(0, '4.671')] [2023-02-27 11:34:20,750][20171] Updated weights for policy 0, policy_version 280 (0.0014) [2023-02-27 11:34:24,228][00107] Fps is (10 sec: 4096.0, 60 sec: 3959.9, 300 sec: 3929.4). Total num frames: 1159168. Throughput: 0: 1004.5. Samples: 290548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:34:24,230][00107] Avg episode reward: [(0, '4.650')] [2023-02-27 11:34:29,228][00107] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 1183744. Throughput: 0: 1020.5. Samples: 294214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:34:29,230][00107] Avg episode reward: [(0, '4.674')] [2023-02-27 11:34:29,957][20171] Updated weights for policy 0, policy_version 290 (0.0025) [2023-02-27 11:34:34,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 1196032. Throughput: 0: 979.8. Samples: 299442. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:34:34,230][00107] Avg episode reward: [(0, '4.587')] [2023-02-27 11:34:39,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1216512. Throughput: 0: 966.7. Samples: 304608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:34:39,236][00107] Avg episode reward: [(0, '4.783')] [2023-02-27 11:34:41,496][20171] Updated weights for policy 0, policy_version 300 (0.0035) [2023-02-27 11:34:44,228][00107] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1241088. Throughput: 0: 993.4. Samples: 308104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:34:44,231][00107] Avg episode reward: [(0, '4.958')] [2023-02-27 11:34:49,228][00107] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 1261568. Throughput: 0: 1016.6. Samples: 314992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:34:49,232][00107] Avg episode reward: [(0, '4.823')] [2023-02-27 11:34:51,743][20171] Updated weights for policy 0, policy_version 310 (0.0013) [2023-02-27 11:34:54,228][00107] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1273856. Throughput: 0: 951.2. Samples: 319338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:34:54,237][00107] Avg episode reward: [(0, '4.804')] [2023-02-27 11:34:54,251][20157] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000311_1273856.pth... [2023-02-27 11:34:54,399][20157] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth [2023-02-27 11:34:59,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1294336. Throughput: 0: 949.8. Samples: 321912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:34:59,231][00107] Avg episode reward: [(0, '4.857')] [2023-02-27 11:35:02,121][20171] Updated weights for policy 0, policy_version 320 (0.0022) [2023-02-27 11:35:04,228][00107] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1318912. Throughput: 0: 1003.6. Samples: 328940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:35:04,233][00107] Avg episode reward: [(0, '4.684')] [2023-02-27 11:35:09,232][00107] Fps is (10 sec: 4503.7, 60 sec: 3890.9, 300 sec: 4012.7). Total num frames: 1339392. Throughput: 0: 985.9. Samples: 334916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:35:09,237][00107] Avg episode reward: [(0, '4.540')] [2023-02-27 11:35:13,346][20171] Updated weights for policy 0, policy_version 330 (0.0028) [2023-02-27 11:35:14,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1351680. Throughput: 0: 953.2. Samples: 337110. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:35:14,234][00107] Avg episode reward: [(0, '4.626')] [2023-02-27 11:35:19,228][00107] Fps is (10 sec: 3688.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1376256. Throughput: 0: 967.9. Samples: 342998. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:35:19,230][00107] Avg episode reward: [(0, '4.471')] [2023-02-27 11:35:22,520][20171] Updated weights for policy 0, policy_version 340 (0.0012) [2023-02-27 11:35:24,228][00107] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1396736. Throughput: 0: 1009.6. Samples: 350038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:35:24,231][00107] Avg episode reward: [(0, '4.464')] [2023-02-27 11:35:29,230][00107] Fps is (10 sec: 3685.6, 60 sec: 3822.8, 300 sec: 3971.0). Total num frames: 1413120. Throughput: 0: 993.4. Samples: 352810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:35:29,232][00107] Avg episode reward: [(0, '4.501')] [2023-02-27 11:35:34,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1429504. Throughput: 0: 941.5. Samples: 357358. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:35:34,230][00107] Avg episode reward: [(0, '4.582')] [2023-02-27 11:35:34,303][20171] Updated weights for policy 0, policy_version 350 (0.0030) [2023-02-27 11:35:39,228][00107] Fps is (10 sec: 4096.9, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1454080. Throughput: 0: 998.0. Samples: 364250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:35:39,230][00107] Avg episode reward: [(0, '4.870')] [2023-02-27 11:35:42,768][20171] Updated weights for policy 0, policy_version 360 (0.0023) [2023-02-27 11:35:44,229][00107] Fps is (10 sec: 4914.6, 60 sec: 3959.4, 300 sec: 3998.8). Total num frames: 1478656. Throughput: 0: 1019.0. Samples: 367768. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:35:44,233][00107] Avg episode reward: [(0, '4.758')] [2023-02-27 11:35:49,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3971.0). Total num frames: 1490944. Throughput: 0: 984.2. Samples: 373228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:35:49,237][00107] Avg episode reward: [(0, '4.610')] [2023-02-27 11:35:54,228][00107] Fps is (10 sec: 3277.2, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1511424. Throughput: 0: 964.8. Samples: 378328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:35:54,233][00107] Avg episode reward: [(0, '4.513')] [2023-02-27 11:35:54,587][20171] Updated weights for policy 0, policy_version 370 (0.0026) [2023-02-27 11:35:59,229][00107] Fps is (10 sec: 4505.0, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 1536000. Throughput: 0: 994.4. Samples: 381860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:35:59,234][00107] Avg episode reward: [(0, '4.589')] [2023-02-27 11:36:03,648][20171] Updated weights for policy 0, policy_version 380 (0.0019) [2023-02-27 11:36:04,228][00107] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3985.0). Total num frames: 1556480. Throughput: 0: 1019.6. Samples: 388880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:36:04,236][00107] Avg episode reward: [(0, '4.620')] [2023-02-27 11:36:09,228][00107] Fps is (10 sec: 3686.9, 60 sec: 3891.5, 300 sec: 3971.0). Total num frames: 1572864. Throughput: 0: 965.9. Samples: 393502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:36:09,230][00107] Avg episode reward: [(0, '4.591')] [2023-02-27 11:36:14,228][00107] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 1593344. Throughput: 0: 957.1. Samples: 395876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:36:14,231][00107] Avg episode reward: [(0, '4.578')] [2023-02-27 11:36:15,211][20171] Updated weights for policy 0, policy_version 390 (0.0024) [2023-02-27 11:36:19,228][00107] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1613824. Throughput: 0: 1010.9. Samples: 402850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:36:19,234][00107] Avg episode reward: [(0, '4.400')] [2023-02-27 11:36:24,228][00107] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3984.9). Total num frames: 1634304. Throughput: 0: 992.9. Samples: 408932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:36:24,235][00107] Avg episode reward: [(0, '4.330')] [2023-02-27 11:36:25,416][20171] Updated weights for policy 0, policy_version 400 (0.0015) [2023-02-27 11:36:29,228][00107] Fps is (10 sec: 3276.9, 60 sec: 3891.3, 300 sec: 3943.3). Total num frames: 1646592. Throughput: 0: 964.3. Samples: 411160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:36:29,234][00107] Avg episode reward: [(0, '4.424')] [2023-02-27 11:36:34,228][00107] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 1671168. Throughput: 0: 966.5. Samples: 416722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:36:34,234][00107] Avg episode reward: [(0, '4.633')] [2023-02-27 11:36:35,753][20171] Updated weights for policy 0, policy_version 410 (0.0013) [2023-02-27 11:36:39,228][00107] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1695744. Throughput: 0: 1011.9. Samples: 423862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:36:39,233][00107] Avg episode reward: [(0, '4.783')] [2023-02-27 11:36:44,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3957.2). Total num frames: 1708032. Throughput: 0: 994.9. Samples: 426630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:36:44,232][00107] Avg episode reward: [(0, '4.564')] [2023-02-27 11:36:48,002][20171] Updated weights for policy 0, policy_version 420 (0.0015) [2023-02-27 11:36:49,228][00107] Fps is (10 sec: 2457.5, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 1720320. Throughput: 0: 922.4. Samples: 430390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:36:49,237][00107] Avg episode reward: [(0, '4.606')] [2023-02-27 11:36:54,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 1740800. Throughput: 0: 938.0. Samples: 435710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:36:54,231][00107] Avg episode reward: [(0, '4.694')] [2023-02-27 11:36:54,241][20157] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000425_1740800.pth... [2023-02-27 11:36:54,357][20157] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000197_806912.pth [2023-02-27 11:36:58,893][20171] Updated weights for policy 0, policy_version 430 (0.0031) [2023-02-27 11:36:59,228][00107] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3929.4). Total num frames: 1761280. Throughput: 0: 950.9. Samples: 438668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:36:59,230][00107] Avg episode reward: [(0, '4.887')] [2023-02-27 11:37:04,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3901.6). Total num frames: 1773568. Throughput: 0: 904.7. Samples: 443560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:37:04,231][00107] Avg episode reward: [(0, '4.990')] [2023-02-27 11:37:09,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3873.8). Total num frames: 1789952. Throughput: 0: 869.2. Samples: 448048. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 11:37:09,230][00107] Avg episode reward: [(0, '5.034')] [2023-02-27 11:37:11,335][20171] Updated weights for policy 0, policy_version 440 (0.0014) [2023-02-27 11:37:14,228][00107] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3887.8). Total num frames: 1814528. Throughput: 0: 898.8. Samples: 451606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:37:14,235][00107] Avg episode reward: [(0, '4.827')] [2023-02-27 11:37:19,228][00107] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3915.5). Total num frames: 1835008. Throughput: 0: 933.0. Samples: 458706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:37:19,235][00107] Avg episode reward: [(0, '4.677')] [2023-02-27 11:37:20,690][20171] Updated weights for policy 0, policy_version 450 (0.0019) [2023-02-27 11:37:24,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3887.7). Total num frames: 1851392. Throughput: 0: 880.5. Samples: 463486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:37:24,238][00107] Avg episode reward: [(0, '4.651')] [2023-02-27 11:37:29,229][00107] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 1871872. Throughput: 0: 870.3. Samples: 465792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:37:29,232][00107] Avg episode reward: [(0, '4.571')] [2023-02-27 11:37:31,841][20171] Updated weights for policy 0, policy_version 460 (0.0020) [2023-02-27 11:37:34,228][00107] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 1892352. Throughput: 0: 937.8. Samples: 472592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:37:34,233][00107] Avg episode reward: [(0, '4.738')] [2023-02-27 11:37:39,228][00107] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3901.6). Total num frames: 1912832. Throughput: 0: 968.3. Samples: 479282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:37:39,234][00107] Avg episode reward: [(0, '4.861')] [2023-02-27 11:37:42,096][20171] Updated weights for policy 0, policy_version 470 (0.0018) [2023-02-27 11:37:44,228][00107] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 1929216. Throughput: 0: 953.0. Samples: 481552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:37:44,235][00107] Avg episode reward: [(0, '4.635')] [2023-02-27 11:37:49,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3873.9). Total num frames: 1949696. Throughput: 0: 951.0. Samples: 486354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:37:49,233][00107] Avg episode reward: [(0, '4.587')] [2023-02-27 11:37:52,942][20171] Updated weights for policy 0, policy_version 480 (0.0019) [2023-02-27 11:37:54,228][00107] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3887.8). Total num frames: 1970176. Throughput: 0: 998.0. Samples: 492956. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:37:54,234][00107] Avg episode reward: [(0, '4.774')] [2023-02-27 11:37:59,229][00107] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3887.7). Total num frames: 1986560. Throughput: 0: 988.2. Samples: 496074. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 11:37:59,237][00107] Avg episode reward: [(0, '4.704')] [2023-02-27 11:38:04,228][00107] Fps is (10 sec: 2867.1, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1998848. Throughput: 0: 919.4. Samples: 500080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:38:04,235][00107] Avg episode reward: [(0, '4.625')] [2023-02-27 11:38:05,666][20171] Updated weights for policy 0, policy_version 490 (0.0020) [2023-02-27 11:38:09,228][00107] Fps is (10 sec: 3277.2, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2019328. Throughput: 0: 932.2. Samples: 505436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:38:09,230][00107] Avg episode reward: [(0, '4.698')] [2023-02-27 11:38:14,228][00107] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 2043904. Throughput: 0: 951.7. Samples: 508620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:38:14,233][00107] Avg episode reward: [(0, '4.694')] [2023-02-27 11:38:15,157][20171] Updated weights for policy 0, policy_version 500 (0.0035) [2023-02-27 11:38:19,228][00107] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3846.2). Total num frames: 2056192. Throughput: 0: 925.5. Samples: 514242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:38:19,234][00107] Avg episode reward: [(0, '4.745')] [2023-02-27 11:38:24,232][00107] Fps is (10 sec: 2456.7, 60 sec: 3617.9, 300 sec: 3804.4). Total num frames: 2068480. Throughput: 0: 858.7. Samples: 517926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:38:24,234][00107] Avg episode reward: [(0, '4.648')] [2023-02-27 11:38:28,991][20171] Updated weights for policy 0, policy_version 510 (0.0012) [2023-02-27 11:38:29,228][00107] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 2088960. Throughput: 0: 862.5. Samples: 520364. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:38:29,230][00107] Avg episode reward: [(0, '4.744')] [2023-02-27 11:38:34,228][00107] Fps is (10 sec: 4097.5, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 2109440. Throughput: 0: 888.4. Samples: 526330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:38:34,230][00107] Avg episode reward: [(0, '4.513')] [2023-02-27 11:38:39,231][00107] Fps is (10 sec: 3275.8, 60 sec: 3481.4, 300 sec: 3804.4). Total num frames: 2121728. Throughput: 0: 847.3. Samples: 531088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:38:39,234][00107] Avg episode reward: [(0, '4.467')] [2023-02-27 11:38:41,737][20171] Updated weights for policy 0, policy_version 520 (0.0036) [2023-02-27 11:38:44,228][00107] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3762.8). Total num frames: 2134016. Throughput: 0: 819.2. Samples: 532936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:38:44,235][00107] Avg episode reward: [(0, '4.506')] [2023-02-27 11:38:49,228][00107] Fps is (10 sec: 3277.8, 60 sec: 3413.3, 300 sec: 3776.7). Total num frames: 2154496. Throughput: 0: 837.0. Samples: 537744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:38:49,234][00107] Avg episode reward: [(0, '4.559')] [2023-02-27 11:38:53,491][20171] Updated weights for policy 0, policy_version 530 (0.0013) [2023-02-27 11:38:54,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3776.7). Total num frames: 2170880. Throughput: 0: 847.4. Samples: 543570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:38:54,234][00107] Avg episode reward: [(0, '4.729')] [2023-02-27 11:38:54,253][20157] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000530_2170880.pth... [2023-02-27 11:38:54,383][20157] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000311_1273856.pth [2023-02-27 11:38:59,229][00107] Fps is (10 sec: 2866.9, 60 sec: 3276.8, 300 sec: 3735.0). Total num frames: 2183168. Throughput: 0: 827.4. Samples: 545854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:38:59,235][00107] Avg episode reward: [(0, '4.726')] [2023-02-27 11:39:04,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3707.2). Total num frames: 2199552. Throughput: 0: 787.0. Samples: 549658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:39:04,238][00107] Avg episode reward: [(0, '4.895')] [2023-02-27 11:39:07,159][20171] Updated weights for policy 0, policy_version 540 (0.0018) [2023-02-27 11:39:09,228][00107] Fps is (10 sec: 3277.2, 60 sec: 3276.8, 300 sec: 3721.1). Total num frames: 2215936. Throughput: 0: 823.6. Samples: 554986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:39:09,231][00107] Avg episode reward: [(0, '4.948')] [2023-02-27 11:39:14,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3721.1). Total num frames: 2236416. Throughput: 0: 832.2. Samples: 557812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:39:14,231][00107] Avg episode reward: [(0, '4.681')] [2023-02-27 11:39:19,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3693.3). Total num frames: 2248704. Throughput: 0: 805.3. Samples: 562570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:39:19,233][00107] Avg episode reward: [(0, '4.733')] [2023-02-27 11:39:19,837][20171] Updated weights for policy 0, policy_version 550 (0.0013) [2023-02-27 11:39:24,228][00107] Fps is (10 sec: 2457.6, 60 sec: 3208.7, 300 sec: 3651.7). Total num frames: 2260992. Throughput: 0: 779.2. Samples: 566148. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:39:24,233][00107] Avg episode reward: [(0, '4.671')] [2023-02-27 11:39:29,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3665.6). Total num frames: 2277376. Throughput: 0: 791.2. Samples: 568540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:39:29,230][00107] Avg episode reward: [(0, '4.807')] [2023-02-27 11:39:32,368][20171] Updated weights for policy 0, policy_version 560 (0.0036) [2023-02-27 11:39:34,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3665.6). Total num frames: 2297856. Throughput: 0: 815.0. Samples: 574420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:39:34,230][00107] Avg episode reward: [(0, '4.614')] [2023-02-27 11:39:39,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3208.7, 300 sec: 3637.8). Total num frames: 2314240. Throughput: 0: 790.5. Samples: 579142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:39:39,234][00107] Avg episode reward: [(0, '4.696')] [2023-02-27 11:39:44,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3610.0). Total num frames: 2326528. Throughput: 0: 779.2. Samples: 580918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:39:44,235][00107] Avg episode reward: [(0, '4.866')] [2023-02-27 11:39:46,088][20171] Updated weights for policy 0, policy_version 570 (0.0027) [2023-02-27 11:39:49,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3637.8). Total num frames: 2347008. Throughput: 0: 800.4. Samples: 585674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:39:49,231][00107] Avg episode reward: [(0, '4.905')] [2023-02-27 11:39:54,228][00107] Fps is (10 sec: 3686.5, 60 sec: 3208.5, 300 sec: 3623.9). Total num frames: 2363392. Throughput: 0: 812.4. Samples: 591542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:39:54,231][00107] Avg episode reward: [(0, '4.740')] [2023-02-27 11:39:57,980][20171] Updated weights for policy 0, policy_version 580 (0.0021) [2023-02-27 11:39:59,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3582.3). Total num frames: 2375680. Throughput: 0: 797.7. Samples: 593708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:39:59,231][00107] Avg episode reward: [(0, '4.677')] [2023-02-27 11:40:04,228][00107] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3554.5). Total num frames: 2387968. Throughput: 0: 775.7. Samples: 597478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:40:04,234][00107] Avg episode reward: [(0, '4.674')] [2023-02-27 11:40:09,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3596.2). Total num frames: 2412544. Throughput: 0: 819.8. Samples: 603038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:40:09,236][00107] Avg episode reward: [(0, '4.719')] [2023-02-27 11:40:10,291][20171] Updated weights for policy 0, policy_version 590 (0.0029) [2023-02-27 11:40:14,228][00107] Fps is (10 sec: 4505.6, 60 sec: 3276.8, 300 sec: 3582.3). Total num frames: 2433024. Throughput: 0: 836.4. Samples: 606176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:40:14,231][00107] Avg episode reward: [(0, '4.879')] [2023-02-27 11:40:19,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3554.5). Total num frames: 2445312. Throughput: 0: 815.2. Samples: 611104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:40:19,231][00107] Avg episode reward: [(0, '4.941')] [2023-02-27 11:40:23,646][20171] Updated weights for policy 0, policy_version 600 (0.0026) [2023-02-27 11:40:24,228][00107] Fps is (10 sec: 2457.5, 60 sec: 3276.8, 300 sec: 3540.6). Total num frames: 2457600. Throughput: 0: 797.1. Samples: 615010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:40:24,237][00107] Avg episode reward: [(0, '4.995')] [2023-02-27 11:40:29,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3554.5). Total num frames: 2478080. Throughput: 0: 824.9. Samples: 618040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:40:29,230][00107] Avg episode reward: [(0, '5.163')] [2023-02-27 11:40:29,237][20157] Saving new best policy, reward=5.163! [2023-02-27 11:40:33,775][20171] Updated weights for policy 0, policy_version 610 (0.0021) [2023-02-27 11:40:34,228][00107] Fps is (10 sec: 4095.9, 60 sec: 3345.0, 300 sec: 3540.6). Total num frames: 2498560. Throughput: 0: 855.1. Samples: 624152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:40:34,235][00107] Avg episode reward: [(0, '5.328')] [2023-02-27 11:40:34,246][20157] Saving new best policy, reward=5.328! [2023-02-27 11:40:39,232][00107] Fps is (10 sec: 3275.3, 60 sec: 3276.6, 300 sec: 3498.9). Total num frames: 2510848. Throughput: 0: 815.1. Samples: 628224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:40:39,235][00107] Avg episode reward: [(0, '5.403')] [2023-02-27 11:40:39,244][20157] Saving new best policy, reward=5.403! [2023-02-27 11:40:44,228][00107] Fps is (10 sec: 2457.7, 60 sec: 3276.8, 300 sec: 3499.0). Total num frames: 2523136. Throughput: 0: 806.3. Samples: 629992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:40:44,238][00107] Avg episode reward: [(0, '4.901')] [2023-02-27 11:40:47,790][20171] Updated weights for policy 0, policy_version 620 (0.0014) [2023-02-27 11:40:49,228][00107] Fps is (10 sec: 3278.1, 60 sec: 3276.8, 300 sec: 3499.0). Total num frames: 2543616. Throughput: 0: 840.1. Samples: 635284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:40:49,236][00107] Avg episode reward: [(0, '4.795')] [2023-02-27 11:40:54,228][00107] Fps is (10 sec: 4095.8, 60 sec: 3345.0, 300 sec: 3485.1). Total num frames: 2564096. Throughput: 0: 848.3. Samples: 641214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:40:54,232][00107] Avg episode reward: [(0, '4.856')] [2023-02-27 11:40:54,252][20157] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000626_2564096.pth... [2023-02-27 11:40:54,385][20157] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000425_1740800.pth [2023-02-27 11:40:59,228][00107] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3457.3). Total num frames: 2576384. Throughput: 0: 821.3. Samples: 643136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:40:59,233][00107] Avg episode reward: [(0, '4.940')] [2023-02-27 11:41:00,266][20171] Updated weights for policy 0, policy_version 630 (0.0026) [2023-02-27 11:41:04,228][00107] Fps is (10 sec: 2457.7, 60 sec: 3345.1, 300 sec: 3443.4). Total num frames: 2588672. Throughput: 0: 797.9. Samples: 647008. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:41:04,233][00107] Avg episode reward: [(0, '4.868')] [2023-02-27 11:41:09,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3443.4). Total num frames: 2609152. Throughput: 0: 840.8. Samples: 652846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:41:09,234][00107] Avg episode reward: [(0, '4.728')] [2023-02-27 11:41:11,662][20171] Updated weights for policy 0, policy_version 640 (0.0023) [2023-02-27 11:41:14,228][00107] Fps is (10 sec: 4095.9, 60 sec: 3276.8, 300 sec: 3443.4). Total num frames: 2629632. Throughput: 0: 842.8. Samples: 655968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:41:14,238][00107] Avg episode reward: [(0, '4.613')] [2023-02-27 11:41:19,230][00107] Fps is (10 sec: 3276.2, 60 sec: 3276.7, 300 sec: 3415.6). Total num frames: 2641920. Throughput: 0: 801.9. Samples: 660238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:41:19,235][00107] Avg episode reward: [(0, '4.720')] [2023-02-27 11:41:24,228][00107] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3415.6). Total num frames: 2654208. Throughput: 0: 801.1. Samples: 664270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:41:24,231][00107] Avg episode reward: [(0, '4.558')] [2023-02-27 11:41:25,562][20171] Updated weights for policy 0, policy_version 650 (0.0031) [2023-02-27 11:41:29,230][00107] Fps is (10 sec: 3276.6, 60 sec: 3276.7, 300 sec: 3401.7). Total num frames: 2674688. Throughput: 0: 827.3. Samples: 667222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:41:29,236][00107] Avg episode reward: [(0, '4.668')] [2023-02-27 11:41:34,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3208.6, 300 sec: 3374.0). Total num frames: 2691072. Throughput: 0: 838.9. Samples: 673032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:41:34,232][00107] Avg episode reward: [(0, '4.888')] [2023-02-27 11:41:37,869][20171] Updated weights for policy 0, policy_version 660 (0.0013) [2023-02-27 11:41:39,230][00107] Fps is (10 sec: 2867.2, 60 sec: 3208.7, 300 sec: 3374.0). Total num frames: 2703360. Throughput: 0: 792.9. Samples: 676896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:41:39,233][00107] Avg episode reward: [(0, '4.777')] [2023-02-27 11:41:44,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3387.9). Total num frames: 2719744. Throughput: 0: 792.4. Samples: 678792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:41:44,235][00107] Avg episode reward: [(0, '4.806')] [2023-02-27 11:41:49,228][00107] Fps is (10 sec: 3687.2, 60 sec: 3276.8, 300 sec: 3387.9). Total num frames: 2740224. Throughput: 0: 828.5. Samples: 684290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:41:49,230][00107] Avg episode reward: [(0, '4.985')] [2023-02-27 11:41:49,978][20171] Updated weights for policy 0, policy_version 670 (0.0013) [2023-02-27 11:41:54,235][00107] Fps is (10 sec: 3683.8, 60 sec: 3208.2, 300 sec: 3373.9). Total num frames: 2756608. Throughput: 0: 819.8. Samples: 689742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:41:54,239][00107] Avg episode reward: [(0, '5.034')] [2023-02-27 11:41:59,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3374.0). Total num frames: 2768896. Throughput: 0: 791.4. Samples: 691580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:41:59,232][00107] Avg episode reward: [(0, '4.948')] [2023-02-27 11:42:04,209][20171] Updated weights for policy 0, policy_version 680 (0.0028) [2023-02-27 11:42:04,228][00107] Fps is (10 sec: 2869.2, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 2785280. Throughput: 0: 778.4. Samples: 695264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:42:04,231][00107] Avg episode reward: [(0, '4.890')] [2023-02-27 11:42:09,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3346.2). Total num frames: 2801664. Throughput: 0: 821.3. Samples: 701230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:42:09,236][00107] Avg episode reward: [(0, '4.760')] [2023-02-27 11:42:14,228][00107] Fps is (10 sec: 3686.2, 60 sec: 3208.5, 300 sec: 3346.2). Total num frames: 2822144. Throughput: 0: 822.1. Samples: 704216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:42:14,232][00107] Avg episode reward: [(0, '4.722')] [2023-02-27 11:42:15,869][20171] Updated weights for policy 0, policy_version 690 (0.0043) [2023-02-27 11:42:19,232][00107] Fps is (10 sec: 3275.4, 60 sec: 3208.4, 300 sec: 3332.3). Total num frames: 2834432. Throughput: 0: 785.3. Samples: 708376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:42:19,235][00107] Avg episode reward: [(0, '4.562')] [2023-02-27 11:42:24,228][00107] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3304.6). Total num frames: 2846720. Throughput: 0: 795.7. Samples: 712700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:42:24,231][00107] Avg episode reward: [(0, '4.731')] [2023-02-27 11:42:28,322][20171] Updated weights for policy 0, policy_version 700 (0.0021) [2023-02-27 11:42:29,228][00107] Fps is (10 sec: 3278.3, 60 sec: 3208.7, 300 sec: 3304.6). Total num frames: 2867200. Throughput: 0: 819.9. Samples: 715686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:42:29,233][00107] Avg episode reward: [(0, '4.781')] [2023-02-27 11:42:34,228][00107] Fps is (10 sec: 3686.5, 60 sec: 3208.5, 300 sec: 3290.7). Total num frames: 2883584. Throughput: 0: 828.2. Samples: 721560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:42:34,231][00107] Avg episode reward: [(0, '4.869')] [2023-02-27 11:42:39,228][00107] Fps is (10 sec: 3276.6, 60 sec: 3276.9, 300 sec: 3290.7). Total num frames: 2899968. Throughput: 0: 794.9. Samples: 725508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:42:39,238][00107] Avg episode reward: [(0, '4.738')] [2023-02-27 11:42:41,926][20171] Updated weights for policy 0, policy_version 710 (0.0025) [2023-02-27 11:42:44,228][00107] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2916352. Throughput: 0: 797.6. Samples: 727472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:42:44,230][00107] Avg episode reward: [(0, '4.746')] [2023-02-27 11:42:49,228][00107] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2936832. Throughput: 0: 848.4. Samples: 733440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:42:49,238][00107] Avg episode reward: [(0, '4.772')] [2023-02-27 11:42:51,815][20171] Updated weights for policy 0, policy_version 720 (0.0014) [2023-02-27 11:42:54,228][00107] Fps is (10 sec: 3686.5, 60 sec: 3277.2, 300 sec: 3276.8). Total num frames: 2953216. Throughput: 0: 837.8. Samples: 738932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:42:54,235][00107] Avg episode reward: [(0, '4.840')] [2023-02-27 11:42:54,249][20157] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000721_2953216.pth... [2023-02-27 11:42:54,405][20157] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000530_2170880.pth [2023-02-27 11:42:59,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2965504. Throughput: 0: 812.9. Samples: 740794. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:42:59,234][00107] Avg episode reward: [(0, '4.839')] [2023-02-27 11:43:04,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 2981888. Throughput: 0: 814.8. Samples: 745038. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:43:04,230][00107] Avg episode reward: [(0, '4.811')] [2023-02-27 11:43:05,612][20171] Updated weights for policy 0, policy_version 730 (0.0013) [2023-02-27 11:43:09,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 3002368. Throughput: 0: 856.5. Samples: 751244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:43:09,231][00107] Avg episode reward: [(0, '4.680')] [2023-02-27 11:43:14,228][00107] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3018752. Throughput: 0: 856.3. Samples: 754220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:43:14,231][00107] Avg episode reward: [(0, '4.518')] [2023-02-27 11:43:17,949][20171] Updated weights for policy 0, policy_version 740 (0.0029) [2023-02-27 11:43:19,228][00107] Fps is (10 sec: 2867.1, 60 sec: 3277.0, 300 sec: 3263.0). Total num frames: 3031040. Throughput: 0: 813.3. Samples: 758160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:43:19,232][00107] Avg episode reward: [(0, '4.558')] [2023-02-27 11:43:24,228][00107] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 3051520. Throughput: 0: 839.5. Samples: 763284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:43:24,231][00107] Avg episode reward: [(0, '4.751')] [2023-02-27 11:43:28,876][20171] Updated weights for policy 0, policy_version 750 (0.0019) [2023-02-27 11:43:29,228][00107] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 3072000. Throughput: 0: 863.2. Samples: 766316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:43:29,230][00107] Avg episode reward: [(0, '4.705')] [2023-02-27 11:43:34,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 3088384. Throughput: 0: 850.6. Samples: 771718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:43:34,232][00107] Avg episode reward: [(0, '4.545')] [2023-02-27 11:43:39,228][00107] Fps is (10 sec: 2867.1, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3100672. Throughput: 0: 814.9. Samples: 775602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:43:39,235][00107] Avg episode reward: [(0, '4.567')] [2023-02-27 11:43:42,314][20171] Updated weights for policy 0, policy_version 760 (0.0014) [2023-02-27 11:43:44,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3117056. Throughput: 0: 827.5. Samples: 778030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:43:44,237][00107] Avg episode reward: [(0, '4.761')] [2023-02-27 11:43:49,228][00107] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3137536. Throughput: 0: 869.3. Samples: 784158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:43:49,234][00107] Avg episode reward: [(0, '4.970')] [2023-02-27 11:43:53,670][20171] Updated weights for policy 0, policy_version 770 (0.0033) [2023-02-27 11:43:54,228][00107] Fps is (10 sec: 3686.3, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 3153920. Throughput: 0: 841.4. Samples: 789106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-27 11:43:54,237][00107] Avg episode reward: [(0, '5.021')] [2023-02-27 11:43:59,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3166208. Throughput: 0: 818.4. Samples: 791046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:43:59,230][00107] Avg episode reward: [(0, '4.923')] [2023-02-27 11:44:04,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 3186688. Throughput: 0: 840.3. Samples: 795974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-27 11:44:04,237][00107] Avg episode reward: [(0, '4.679')] [2023-02-27 11:44:06,033][20171] Updated weights for policy 0, policy_version 780 (0.0019) [2023-02-27 11:44:09,228][00107] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 3207168. Throughput: 0: 863.6. Samples: 802146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:44:09,231][00107] Avg episode reward: [(0, '4.830')] [2023-02-27 11:44:14,230][00107] Fps is (10 sec: 3276.1, 60 sec: 3344.9, 300 sec: 3290.7). Total num frames: 3219456. Throughput: 0: 851.0. Samples: 804614. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:44:14,236][00107] Avg episode reward: [(0, '4.651')] [2023-02-27 11:44:18,967][20171] Updated weights for policy 0, policy_version 790 (0.0014) [2023-02-27 11:44:19,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 3235840. Throughput: 0: 818.4. Samples: 808544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:44:19,236][00107] Avg episode reward: [(0, '4.629')] [2023-02-27 11:44:24,228][00107] Fps is (10 sec: 3277.6, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 3252224. Throughput: 0: 854.8. Samples: 814070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:44:24,237][00107] Avg episode reward: [(0, '4.881')] [2023-02-27 11:44:29,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 3272704. Throughput: 0: 868.8. Samples: 817126. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:44:29,234][00107] Avg episode reward: [(0, '4.851')] [2023-02-27 11:44:29,506][20171] Updated weights for policy 0, policy_version 800 (0.0020) [2023-02-27 11:44:34,231][00107] Fps is (10 sec: 3276.0, 60 sec: 3276.7, 300 sec: 3290.7). Total num frames: 3284992. Throughput: 0: 836.7. Samples: 821810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:44:34,236][00107] Avg episode reward: [(0, '4.515')] [2023-02-27 11:44:39,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 3301376. Throughput: 0: 812.4. Samples: 825666. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-27 11:44:39,231][00107] Avg episode reward: [(0, '4.578')] [2023-02-27 11:44:43,111][20171] Updated weights for policy 0, policy_version 810 (0.0017) [2023-02-27 11:44:44,228][00107] Fps is (10 sec: 3687.4, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 3321856. Throughput: 0: 833.1. Samples: 828536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:44:44,231][00107] Avg episode reward: [(0, '4.483')] [2023-02-27 11:44:49,228][00107] Fps is (10 sec: 3686.3, 60 sec: 3345.0, 300 sec: 3304.6). Total num frames: 3338240. Throughput: 0: 853.9. Samples: 834400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:44:49,235][00107] Avg episode reward: [(0, '4.653')] [2023-02-27 11:44:54,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 3354624. Throughput: 0: 815.3. Samples: 838836. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:44:54,230][00107] Avg episode reward: [(0, '4.662')] [2023-02-27 11:44:54,251][20157] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000819_3354624.pth... [2023-02-27 11:44:54,412][20157] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000626_2564096.pth [2023-02-27 11:44:55,723][20171] Updated weights for policy 0, policy_version 820 (0.0019) [2023-02-27 11:44:59,228][00107] Fps is (10 sec: 2867.3, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 3366912. Throughput: 0: 802.5. Samples: 840726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:44:59,230][00107] Avg episode reward: [(0, '4.817')] [2023-02-27 11:45:04,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 3387392. Throughput: 0: 835.1. Samples: 846122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:45:04,231][00107] Avg episode reward: [(0, '5.066')] [2023-02-27 11:45:07,051][20171] Updated weights for policy 0, policy_version 830 (0.0023) [2023-02-27 11:45:09,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 3403776. Throughput: 0: 845.7. Samples: 852128. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:45:09,232][00107] Avg episode reward: [(0, '4.968')] [2023-02-27 11:45:14,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3345.2, 300 sec: 3304.6). Total num frames: 3420160. Throughput: 0: 820.6. Samples: 854054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:45:14,236][00107] Avg episode reward: [(0, '4.836')] [2023-02-27 11:45:19,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 3432448. Throughput: 0: 803.5. Samples: 857966. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:45:19,236][00107] Avg episode reward: [(0, '4.581')] [2023-02-27 11:45:20,461][20171] Updated weights for policy 0, policy_version 840 (0.0012) [2023-02-27 11:45:24,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 3452928. Throughput: 0: 852.4. Samples: 864024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:45:24,232][00107] Avg episode reward: [(0, '4.706')] [2023-02-27 11:45:29,230][00107] Fps is (10 sec: 4095.0, 60 sec: 3344.9, 300 sec: 3304.6). Total num frames: 3473408. Throughput: 0: 855.8. Samples: 867050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:45:29,233][00107] Avg episode reward: [(0, '4.814')] [2023-02-27 11:45:31,993][20171] Updated weights for policy 0, policy_version 850 (0.0014) [2023-02-27 11:45:34,231][00107] Fps is (10 sec: 3275.7, 60 sec: 3345.0, 300 sec: 3304.6). Total num frames: 3485696. Throughput: 0: 820.2. Samples: 871310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:45:34,234][00107] Avg episode reward: [(0, '4.877')] [2023-02-27 11:45:39,228][00107] Fps is (10 sec: 2867.9, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 3502080. Throughput: 0: 820.7. Samples: 875768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:45:39,235][00107] Avg episode reward: [(0, '4.698')] [2023-02-27 11:45:44,026][20171] Updated weights for policy 0, policy_version 860 (0.0018) [2023-02-27 11:45:44,228][00107] Fps is (10 sec: 3687.6, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 3522560. Throughput: 0: 846.2. Samples: 878804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:45:44,231][00107] Avg episode reward: [(0, '4.568')] [2023-02-27 11:45:49,229][00107] Fps is (10 sec: 3685.9, 60 sec: 3345.0, 300 sec: 3304.6). Total num frames: 3538944. Throughput: 0: 859.7. Samples: 884810. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:45:49,236][00107] Avg episode reward: [(0, '4.699')] [2023-02-27 11:45:54,228][00107] Fps is (10 sec: 2867.0, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 3551232. Throughput: 0: 812.7. Samples: 888700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:45:54,234][00107] Avg episode reward: [(0, '4.612')] [2023-02-27 11:45:57,502][20171] Updated weights for policy 0, policy_version 870 (0.0014) [2023-02-27 11:45:59,228][00107] Fps is (10 sec: 2867.6, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 3567616. Throughput: 0: 814.7. Samples: 890714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:45:59,236][00107] Avg episode reward: [(0, '4.638')] [2023-02-27 11:46:04,228][00107] Fps is (10 sec: 3686.6, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 3588096. Throughput: 0: 861.5. Samples: 896734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:46:04,235][00107] Avg episode reward: [(0, '4.775')] [2023-02-27 11:46:07,936][20171] Updated weights for policy 0, policy_version 880 (0.0029) [2023-02-27 11:46:09,230][00107] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 3604480. Throughput: 0: 847.0. Samples: 902138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:46:09,235][00107] Avg episode reward: [(0, '4.814')] [2023-02-27 11:46:14,228][00107] Fps is (10 sec: 3276.7, 60 sec: 3345.0, 300 sec: 3318.5). Total num frames: 3620864. Throughput: 0: 822.4. Samples: 904056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:46:14,236][00107] Avg episode reward: [(0, '4.721')] [2023-02-27 11:46:19,228][00107] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 3637248. Throughput: 0: 829.7. Samples: 908642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:46:19,230][00107] Avg episode reward: [(0, '4.727')] [2023-02-27 11:46:20,955][20171] Updated weights for policy 0, policy_version 890 (0.0029) [2023-02-27 11:46:24,228][00107] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3332.4). Total num frames: 3657728. Throughput: 0: 867.4. Samples: 914800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:46:24,231][00107] Avg episode reward: [(0, '4.755')] [2023-02-27 11:46:29,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3332.3). Total num frames: 3674112. Throughput: 0: 863.2. Samples: 917650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:46:29,234][00107] Avg episode reward: [(0, '4.487')] [2023-02-27 11:46:33,708][20171] Updated weights for policy 0, policy_version 900 (0.0015) [2023-02-27 11:46:34,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3345.3, 300 sec: 3332.4). Total num frames: 3686400. Throughput: 0: 811.8. Samples: 921342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:46:34,231][00107] Avg episode reward: [(0, '4.531')] [2023-02-27 11:46:39,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 3702784. Throughput: 0: 838.1. Samples: 926412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:46:39,231][00107] Avg episode reward: [(0, '4.382')] [2023-02-27 11:46:44,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 3723264. Throughput: 0: 861.2. Samples: 929470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:46:44,231][00107] Avg episode reward: [(0, '4.490')] [2023-02-27 11:46:44,548][20171] Updated weights for policy 0, policy_version 910 (0.0014) [2023-02-27 11:46:49,234][00107] Fps is (10 sec: 3684.1, 60 sec: 3344.8, 300 sec: 3332.3). Total num frames: 3739648. Throughput: 0: 845.3. Samples: 934776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:46:49,237][00107] Avg episode reward: [(0, '4.709')] [2023-02-27 11:46:54,228][00107] Fps is (10 sec: 2867.1, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 3751936. Throughput: 0: 811.8. Samples: 938668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:46:54,231][00107] Avg episode reward: [(0, '4.710')] [2023-02-27 11:46:54,250][20157] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000916_3751936.pth... [2023-02-27 11:46:54,425][20157] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000721_2953216.pth [2023-02-27 11:46:58,123][20171] Updated weights for policy 0, policy_version 920 (0.0021) [2023-02-27 11:46:59,228][00107] Fps is (10 sec: 3278.9, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 3772416. Throughput: 0: 826.9. Samples: 941266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:46:59,231][00107] Avg episode reward: [(0, '4.699')] [2023-02-27 11:47:04,228][00107] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 3792896. Throughput: 0: 859.8. Samples: 947334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:47:04,236][00107] Avg episode reward: [(0, '4.495')] [2023-02-27 11:47:09,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 3805184. Throughput: 0: 829.6. Samples: 952134. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:47:09,231][00107] Avg episode reward: [(0, '4.757')] [2023-02-27 11:47:09,697][20171] Updated weights for policy 0, policy_version 930 (0.0018) [2023-02-27 11:47:14,228][00107] Fps is (10 sec: 2457.5, 60 sec: 3276.8, 300 sec: 3332.4). Total num frames: 3817472. Throughput: 0: 809.5. Samples: 954078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:47:14,234][00107] Avg episode reward: [(0, '4.677')] [2023-02-27 11:47:19,228][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 3842048. Throughput: 0: 844.4. Samples: 959338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:47:19,238][00107] Avg episode reward: [(0, '4.591')] [2023-02-27 11:47:21,265][20171] Updated weights for policy 0, policy_version 940 (0.0025) [2023-02-27 11:47:24,228][00107] Fps is (10 sec: 4096.2, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 3858432. Throughput: 0: 869.1. Samples: 965522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:47:24,230][00107] Avg episode reward: [(0, '4.483')] [2023-02-27 11:47:29,228][00107] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 3874816. Throughput: 0: 850.7. Samples: 967752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:47:29,231][00107] Avg episode reward: [(0, '4.558')] [2023-02-27 11:47:34,228][00107] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 3887104. Throughput: 0: 816.6. Samples: 971520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:47:34,236][00107] Avg episode reward: [(0, '4.764')] [2023-02-27 11:47:34,925][20171] Updated weights for policy 0, policy_version 950 (0.0034) [2023-02-27 11:47:39,230][00107] Fps is (10 sec: 3276.2, 60 sec: 3413.2, 300 sec: 3360.1). Total num frames: 3907584. Throughput: 0: 856.7. Samples: 977220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:47:39,235][00107] Avg episode reward: [(0, '4.707')] [2023-02-27 11:47:44,229][00107] Fps is (10 sec: 4095.5, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 3928064. Throughput: 0: 866.2. Samples: 980248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:47:44,237][00107] Avg episode reward: [(0, '4.581')] [2023-02-27 11:47:45,663][20171] Updated weights for policy 0, policy_version 960 (0.0015) [2023-02-27 11:47:49,228][00107] Fps is (10 sec: 3277.5, 60 sec: 3345.4, 300 sec: 3346.2). Total num frames: 3940352. Throughput: 0: 834.4. Samples: 984884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:47:49,232][00107] Avg episode reward: [(0, '4.657')] [2023-02-27 11:47:54,228][00107] Fps is (10 sec: 2457.9, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 3952640. Throughput: 0: 817.9. Samples: 988938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:47:54,235][00107] Avg episode reward: [(0, '4.588')] [2023-02-27 11:47:58,492][20171] Updated weights for policy 0, policy_version 970 (0.0012) [2023-02-27 11:47:59,228][00107] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 3973120. Throughput: 0: 843.0. Samples: 992014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:47:59,237][00107] Avg episode reward: [(0, '4.560')] [2023-02-27 11:48:04,228][00107] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 3993600. Throughput: 0: 861.9. Samples: 998122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:48:04,237][00107] Avg episode reward: [(0, '4.636')] [2023-02-27 11:48:08,267][20157] Stopping Batcher_0... [2023-02-27 11:48:08,269][20157] Loop batcher_evt_loop terminating... [2023-02-27 11:48:08,270][00107] Component Batcher_0 stopped! [2023-02-27 11:48:08,289][20157] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-27 11:48:08,365][20171] Weights refcount: 2 0 [2023-02-27 11:48:08,367][00107] Component InferenceWorker_p0-w0 stopped! [2023-02-27 11:48:08,370][20171] Stopping InferenceWorker_p0-w0... [2023-02-27 11:48:08,370][20171] Loop inference_proc0-0_evt_loop terminating... [2023-02-27 11:48:08,430][00107] Component RolloutWorker_w6 stopped! [2023-02-27 11:48:08,433][20176] Stopping RolloutWorker_w6... [2023-02-27 11:48:08,439][00107] Component RolloutWorker_w7 stopped! [2023-02-27 11:48:08,443][00107] Component RolloutWorker_w0 stopped! [2023-02-27 11:48:08,445][20172] Stopping RolloutWorker_w0... [2023-02-27 11:48:08,439][20179] Stopping RolloutWorker_w7... [2023-02-27 11:48:08,445][20172] Loop rollout_proc0_evt_loop terminating... [2023-02-27 11:48:08,446][20179] Loop rollout_proc7_evt_loop terminating... [2023-02-27 11:48:08,434][20176] Loop rollout_proc6_evt_loop terminating... [2023-02-27 11:48:08,459][00107] Component RolloutWorker_w4 stopped! [2023-02-27 11:48:08,461][20177] Stopping RolloutWorker_w4... [2023-02-27 11:48:08,464][20177] Loop rollout_proc4_evt_loop terminating... [2023-02-27 11:48:08,474][20173] Stopping RolloutWorker_w1... [2023-02-27 11:48:08,474][20173] Loop rollout_proc1_evt_loop terminating... [2023-02-27 11:48:08,473][00107] Component RolloutWorker_w1 stopped! [2023-02-27 11:48:08,488][00107] Component RolloutWorker_w2 stopped! [2023-02-27 11:48:08,491][20174] Stopping RolloutWorker_w2... [2023-02-27 11:48:08,491][20174] Loop rollout_proc2_evt_loop terminating... [2023-02-27 11:48:08,520][20157] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000819_3354624.pth [2023-02-27 11:48:08,537][20157] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-27 11:48:08,567][00107] Component RolloutWorker_w3 stopped! [2023-02-27 11:48:08,573][20175] Stopping RolloutWorker_w3... [2023-02-27 11:48:08,574][20175] Loop rollout_proc3_evt_loop terminating... [2023-02-27 11:48:08,613][00107] Component RolloutWorker_w5 stopped! [2023-02-27 11:48:08,619][20178] Stopping RolloutWorker_w5... [2023-02-27 11:48:08,620][20178] Loop rollout_proc5_evt_loop terminating... [2023-02-27 11:48:08,808][00107] Component LearnerWorker_p0 stopped! [2023-02-27 11:48:08,813][00107] Waiting for process learner_proc0 to stop... [2023-02-27 11:48:08,818][20157] Stopping LearnerWorker_p0... [2023-02-27 11:48:08,819][20157] Loop learner_proc0_evt_loop terminating... [2023-02-27 11:48:11,632][00107] Waiting for process inference_proc0-0 to join... [2023-02-27 11:48:12,110][00107] Waiting for process rollout_proc0 to join... [2023-02-27 11:48:12,773][00107] Waiting for process rollout_proc1 to join... [2023-02-27 11:48:12,776][00107] Waiting for process rollout_proc2 to join... [2023-02-27 11:48:12,778][00107] Waiting for process rollout_proc3 to join... [2023-02-27 11:48:12,780][00107] Waiting for process rollout_proc4 to join... [2023-02-27 11:48:12,781][00107] Waiting for process rollout_proc5 to join... [2023-02-27 11:48:12,784][00107] Waiting for process rollout_proc6 to join... [2023-02-27 11:48:12,785][00107] Waiting for process rollout_proc7 to join... [2023-02-27 11:48:12,786][00107] Batcher 0 profile tree view: batching: 26.8972, releasing_batches: 0.0254 [2023-02-27 11:48:12,788][00107] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0101 wait_policy_total: 527.4529 update_model: 8.5874 weight_update: 0.0036 one_step: 0.0026 handle_policy_step: 546.2621 deserialize: 15.8587, stack: 3.1238, obs_to_device_normalize: 120.9882, forward: 263.7075, send_messages: 28.1778 prepare_outputs: 87.1048 to_cpu: 53.7354 [2023-02-27 11:48:12,789][00107] Learner 0 profile tree view: misc: 0.0060, prepare_batch: 17.0107 train: 76.7285 epoch_init: 0.0125, minibatch_init: 0.0064, losses_postprocess: 0.5033, kl_divergence: 0.6250, after_optimizer: 32.4975 calculate_losses: 27.7643 losses_init: 0.0204, forward_head: 1.8180, bptt_initial: 18.3318, tail: 1.1289, advantages_returns: 0.2516, losses: 3.4604 bptt: 2.3867 bptt_forward_core: 2.2594 update: 14.6643 clip: 1.4372 [2023-02-27 11:48:12,791][00107] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3484, enqueue_policy_requests: 145.5039, env_step: 856.1247, overhead: 22.8647, complete_rollouts: 7.1126 save_policy_outputs: 20.5167 split_output_tensors: 10.0193 [2023-02-27 11:48:12,794][00107] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3580, enqueue_policy_requests: 140.6013, env_step: 858.5151, overhead: 22.8853, complete_rollouts: 6.9748 save_policy_outputs: 21.2746 split_output_tensors: 10.6132 [2023-02-27 11:48:12,796][00107] Loop Runner_EvtLoop terminating... [2023-02-27 11:48:12,799][00107] Runner profile tree view: main_loop: 1154.1242 [2023-02-27 11:48:12,804][00107] Collected {0: 4005888}, FPS: 3470.9 [2023-02-27 11:48:47,205][00107] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 11:48:47,211][00107] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 11:48:47,214][00107] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 11:48:47,218][00107] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 11:48:47,222][00107] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:48:47,224][00107] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 11:48:47,226][00107] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:48:47,227][00107] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-27 11:48:47,230][00107] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-27 11:48:47,231][00107] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-27 11:48:47,234][00107] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 11:48:47,235][00107] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 11:48:47,242][00107] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 11:48:47,249][00107] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 11:48:47,257][00107] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 11:48:47,293][00107] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:48:47,300][00107] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:48:47,303][00107] RunningMeanStd input shape: (1,) [2023-02-27 11:48:47,329][00107] ConvEncoder: input_channels=3 [2023-02-27 11:48:48,131][00107] Conv encoder output size: 512 [2023-02-27 11:48:48,133][00107] Policy head output size: 512 [2023-02-27 11:48:51,087][00107] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-27 11:48:52,438][00107] Num frames 100... [2023-02-27 11:48:52,574][00107] Num frames 200... [2023-02-27 11:48:52,710][00107] Num frames 300... [2023-02-27 11:48:52,842][00107] Num frames 400... [2023-02-27 11:48:52,974][00107] Num frames 500... [2023-02-27 11:48:53,048][00107] Avg episode rewards: #0: 7.120, true rewards: #0: 5.120 [2023-02-27 11:48:53,050][00107] Avg episode reward: 7.120, avg true_objective: 5.120 [2023-02-27 11:48:53,174][00107] Num frames 600... [2023-02-27 11:48:53,314][00107] Num frames 700... [2023-02-27 11:48:53,443][00107] Num frames 800... [2023-02-27 11:48:53,572][00107] Num frames 900... [2023-02-27 11:48:53,713][00107] Avg episode rewards: #0: 6.300, true rewards: #0: 4.800 [2023-02-27 11:48:53,714][00107] Avg episode reward: 6.300, avg true_objective: 4.800 [2023-02-27 11:48:53,771][00107] Num frames 1000... [2023-02-27 11:48:53,906][00107] Num frames 1100... [2023-02-27 11:48:54,047][00107] Num frames 1200... [2023-02-27 11:48:54,174][00107] Num frames 1300... [2023-02-27 11:48:54,330][00107] Avg episode rewards: #0: 5.920, true rewards: #0: 4.587 [2023-02-27 11:48:54,332][00107] Avg episode reward: 5.920, avg true_objective: 4.587 [2023-02-27 11:48:54,367][00107] Num frames 1400... [2023-02-27 11:48:54,489][00107] Num frames 1500... [2023-02-27 11:48:54,614][00107] Num frames 1600... [2023-02-27 11:48:54,711][00107] Avg episode rewards: #0: 5.080, true rewards: #0: 4.080 [2023-02-27 11:48:54,713][00107] Avg episode reward: 5.080, avg true_objective: 4.080 [2023-02-27 11:48:54,817][00107] Num frames 1700... [2023-02-27 11:48:54,955][00107] Num frames 1800... [2023-02-27 11:48:55,085][00107] Num frames 1900... [2023-02-27 11:48:55,214][00107] Num frames 2000... [2023-02-27 11:48:55,373][00107] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160 [2023-02-27 11:48:55,374][00107] Avg episode reward: 5.160, avg true_objective: 4.160 [2023-02-27 11:48:55,403][00107] Num frames 2100... [2023-02-27 11:48:55,535][00107] Num frames 2200... [2023-02-27 11:48:55,661][00107] Num frames 2300... [2023-02-27 11:48:55,789][00107] Num frames 2400... [2023-02-27 11:48:55,922][00107] Num frames 2500... [2023-02-27 11:48:56,020][00107] Avg episode rewards: #0: 5.213, true rewards: #0: 4.213 [2023-02-27 11:48:56,022][00107] Avg episode reward: 5.213, avg true_objective: 4.213 [2023-02-27 11:48:56,124][00107] Num frames 2600... [2023-02-27 11:48:56,252][00107] Num frames 2700... [2023-02-27 11:48:56,381][00107] Num frames 2800... [2023-02-27 11:48:56,511][00107] Num frames 2900... [2023-02-27 11:48:56,642][00107] Num frames 3000... [2023-02-27 11:48:56,798][00107] Avg episode rewards: #0: 5.531, true rewards: #0: 4.389 [2023-02-27 11:48:56,800][00107] Avg episode reward: 5.531, avg true_objective: 4.389 [2023-02-27 11:48:56,840][00107] Num frames 3100... [2023-02-27 11:48:56,968][00107] Num frames 3200... [2023-02-27 11:48:57,110][00107] Num frames 3300... [2023-02-27 11:48:57,246][00107] Num frames 3400... [2023-02-27 11:48:57,383][00107] Num frames 3500... [2023-02-27 11:48:57,510][00107] Num frames 3600... [2023-02-27 11:48:57,588][00107] Avg episode rewards: #0: 5.770, true rewards: #0: 4.520 [2023-02-27 11:48:57,593][00107] Avg episode reward: 5.770, avg true_objective: 4.520 [2023-02-27 11:48:57,712][00107] Num frames 3700... [2023-02-27 11:48:57,843][00107] Num frames 3800... [2023-02-27 11:48:57,977][00107] Num frames 3900... [2023-02-27 11:48:58,116][00107] Num frames 4000... [2023-02-27 11:48:58,214][00107] Avg episode rewards: #0: 5.702, true rewards: #0: 4.480 [2023-02-27 11:48:58,216][00107] Avg episode reward: 5.702, avg true_objective: 4.480 [2023-02-27 11:48:58,307][00107] Num frames 4100... [2023-02-27 11:48:58,444][00107] Num frames 4200... [2023-02-27 11:48:58,574][00107] Num frames 4300... [2023-02-27 11:48:58,699][00107] Num frames 4400... [2023-02-27 11:48:58,778][00107] Avg episode rewards: #0: 5.516, true rewards: #0: 4.416 [2023-02-27 11:48:58,781][00107] Avg episode reward: 5.516, avg true_objective: 4.416 [2023-02-27 11:49:22,050][00107] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-27 11:51:32,107][00107] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 11:51:32,109][00107] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 11:51:32,112][00107] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 11:51:32,115][00107] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 11:51:32,117][00107] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:51:32,120][00107] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 11:51:32,123][00107] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-27 11:51:32,125][00107] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-27 11:51:32,127][00107] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-27 11:51:32,128][00107] Adding new argument 'hf_repository'='KoRiF/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-27 11:51:32,129][00107] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 11:51:32,131][00107] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 11:51:32,133][00107] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 11:51:32,134][00107] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 11:51:32,135][00107] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 11:51:32,163][00107] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:51:32,166][00107] RunningMeanStd input shape: (1,) [2023-02-27 11:51:32,181][00107] ConvEncoder: input_channels=3 [2023-02-27 11:51:32,223][00107] Conv encoder output size: 512 [2023-02-27 11:51:32,228][00107] Policy head output size: 512 [2023-02-27 11:51:32,250][00107] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-27 11:51:32,711][00107] Num frames 100... [2023-02-27 11:51:32,839][00107] Num frames 200... [2023-02-27 11:51:32,963][00107] Num frames 300... [2023-02-27 11:51:33,137][00107] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-27 11:51:33,139][00107] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-27 11:51:33,163][00107] Num frames 400... [2023-02-27 11:51:33,287][00107] Num frames 500... [2023-02-27 11:51:33,412][00107] Num frames 600... [2023-02-27 11:51:33,546][00107] Num frames 700... [2023-02-27 11:51:33,678][00107] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-27 11:51:33,680][00107] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-27 11:51:33,721][00107] Num frames 800... [2023-02-27 11:51:33,847][00107] Num frames 900... [2023-02-27 11:51:33,983][00107] Num frames 1000... [2023-02-27 11:51:34,102][00107] Num frames 1100... [2023-02-27 11:51:34,225][00107] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-27 11:51:34,226][00107] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-27 11:51:34,294][00107] Num frames 1200... [2023-02-27 11:51:34,421][00107] Num frames 1300... [2023-02-27 11:51:34,567][00107] Num frames 1400... [2023-02-27 11:51:34,692][00107] Num frames 1500... [2023-02-27 11:51:34,831][00107] Avg episode rewards: #0: 4.170, true rewards: #0: 3.920 [2023-02-27 11:51:34,833][00107] Avg episode reward: 4.170, avg true_objective: 3.920 [2023-02-27 11:51:34,879][00107] Num frames 1600... [2023-02-27 11:51:34,998][00107] Num frames 1700... [2023-02-27 11:51:35,126][00107] Num frames 1800... [2023-02-27 11:51:35,253][00107] Num frames 1900... [2023-02-27 11:51:35,377][00107] Num frames 2000... [2023-02-27 11:51:35,487][00107] Avg episode rewards: #0: 4.496, true rewards: #0: 4.096 [2023-02-27 11:51:35,490][00107] Avg episode reward: 4.496, avg true_objective: 4.096 [2023-02-27 11:51:35,568][00107] Num frames 2100... [2023-02-27 11:51:35,699][00107] Num frames 2200... [2023-02-27 11:51:35,820][00107] Num frames 2300... [2023-02-27 11:51:35,958][00107] Num frames 2400... [2023-02-27 11:51:36,093][00107] Avg episode rewards: #0: 4.440, true rewards: #0: 4.107 [2023-02-27 11:51:36,095][00107] Avg episode reward: 4.440, avg true_objective: 4.107 [2023-02-27 11:51:36,140][00107] Num frames 2500... [2023-02-27 11:51:36,268][00107] Num frames 2600... [2023-02-27 11:51:36,385][00107] Num frames 2700... [2023-02-27 11:51:36,500][00107] Num frames 2800... [2023-02-27 11:51:36,623][00107] Num frames 2900... [2023-02-27 11:51:36,767][00107] Avg episode rewards: #0: 4.823, true rewards: #0: 4.251 [2023-02-27 11:51:36,771][00107] Avg episode reward: 4.823, avg true_objective: 4.251 [2023-02-27 11:51:36,811][00107] Num frames 3000... [2023-02-27 11:51:36,935][00107] Num frames 3100... [2023-02-27 11:51:37,056][00107] Num frames 3200... [2023-02-27 11:51:37,181][00107] Num frames 3300... [2023-02-27 11:51:37,318][00107] Avg episode rewards: #0: 4.700, true rewards: #0: 4.200 [2023-02-27 11:51:37,321][00107] Avg episode reward: 4.700, avg true_objective: 4.200 [2023-02-27 11:51:37,374][00107] Num frames 3400... [2023-02-27 11:51:37,503][00107] Num frames 3500... [2023-02-27 11:51:37,633][00107] Num frames 3600... [2023-02-27 11:51:37,763][00107] Num frames 3700... [2023-02-27 11:51:37,881][00107] Avg episode rewards: #0: 4.604, true rewards: #0: 4.160 [2023-02-27 11:51:37,882][00107] Avg episode reward: 4.604, avg true_objective: 4.160 [2023-02-27 11:51:37,954][00107] Num frames 3800... [2023-02-27 11:51:38,075][00107] Num frames 3900... [2023-02-27 11:51:38,196][00107] Num frames 4000... [2023-02-27 11:51:38,325][00107] Num frames 4100... [2023-02-27 11:51:38,412][00107] Avg episode rewards: #0: 4.528, true rewards: #0: 4.128 [2023-02-27 11:51:38,414][00107] Avg episode reward: 4.528, avg true_objective: 4.128 [2023-02-27 11:51:59,229][00107] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-27 11:52:02,875][00107] The model has been pushed to https://huggingface.co/KoRiF/rl_course_vizdoom_health_gathering_supreme [2023-02-27 11:53:19,430][00107] Environment doom_basic already registered, overwriting... [2023-02-27 11:53:19,433][00107] Environment doom_two_colors_easy already registered, overwriting... [2023-02-27 11:53:19,435][00107] Environment doom_two_colors_hard already registered, overwriting... [2023-02-27 11:53:19,436][00107] Environment doom_dm already registered, overwriting... [2023-02-27 11:53:19,438][00107] Environment doom_dwango5 already registered, overwriting... [2023-02-27 11:53:19,440][00107] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-02-27 11:53:19,441][00107] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-02-27 11:53:19,443][00107] Environment doom_my_way_home already registered, overwriting... [2023-02-27 11:53:19,445][00107] Environment doom_deadly_corridor already registered, overwriting... [2023-02-27 11:53:19,451][00107] Environment doom_defend_the_center already registered, overwriting... [2023-02-27 11:53:19,452][00107] Environment doom_defend_the_line already registered, overwriting... [2023-02-27 11:53:19,454][00107] Environment doom_health_gathering already registered, overwriting... [2023-02-27 11:53:19,456][00107] Environment doom_health_gathering_supreme already registered, overwriting... [2023-02-27 11:53:19,458][00107] Environment doom_battle already registered, overwriting... [2023-02-27 11:53:19,459][00107] Environment doom_battle2 already registered, overwriting... [2023-02-27 11:53:19,460][00107] Environment doom_duel_bots already registered, overwriting... [2023-02-27 11:53:19,462][00107] Environment doom_deathmatch_bots already registered, overwriting... [2023-02-27 11:53:19,463][00107] Environment doom_duel already registered, overwriting... [2023-02-27 11:53:19,464][00107] Environment doom_deathmatch_full already registered, overwriting... [2023-02-27 11:53:19,466][00107] Environment doom_benchmark already registered, overwriting... [2023-02-27 11:53:19,467][00107] register_encoder_factory: [2023-02-27 11:53:19,500][00107] Loading legacy config file train_dir/doom_deathmatch_bots_2222/cfg.json instead of train_dir/doom_deathmatch_bots_2222/config.json [2023-02-27 11:53:19,502][00107] Loading existing experiment configuration from train_dir/doom_deathmatch_bots_2222/config.json [2023-02-27 11:53:19,503][00107] Overriding arg 'experiment' with value 'doom_deathmatch_bots_2222' passed from command line [2023-02-27 11:53:19,505][00107] Overriding arg 'train_dir' with value 'train_dir' passed from command line [2023-02-27 11:53:19,506][00107] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 11:53:19,508][00107] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file! [2023-02-27 11:53:19,509][00107] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file! [2023-02-27 11:53:19,510][00107] Adding new argument 'env_gpu_observations'=True that is not in the saved config file! [2023-02-27 11:53:19,512][00107] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 11:53:19,513][00107] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 11:53:19,514][00107] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:53:19,516][00107] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 11:53:19,517][00107] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:53:19,518][00107] Adding new argument 'max_num_episodes'=1 that is not in the saved config file! [2023-02-27 11:53:19,520][00107] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-27 11:53:19,521][00107] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-27 11:53:19,522][00107] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 11:53:19,524][00107] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 11:53:19,525][00107] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 11:53:19,526][00107] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 11:53:19,528][00107] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 11:53:19,571][00107] Port 40300 is available [2023-02-27 11:53:19,574][00107] Using port 40300 [2023-02-27 11:53:19,576][00107] RunningMeanStd input shape: (23,) [2023-02-27 11:53:19,580][00107] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:53:19,581][00107] RunningMeanStd input shape: (1,) [2023-02-27 11:53:19,598][00107] ConvEncoder: input_channels=3 [2023-02-27 11:53:19,643][00107] Conv encoder output size: 512 [2023-02-27 11:53:19,646][00107] Policy head output size: 512 [2023-02-27 11:53:19,693][00107] Loading state from checkpoint train_dir/doom_deathmatch_bots_2222/checkpoint_p0/checkpoint_000282220_2311946240.pth... [2023-02-27 11:53:19,728][00107] Using port 40300 on host... [2023-02-27 11:53:20,084][00107] Initialized w:0 v:0 player:0 [2023-02-27 11:53:20,374][00107] Num frames 100... [2023-02-27 11:53:20,638][00107] Num frames 200... [2023-02-27 11:53:20,888][00107] Num frames 300... [2023-02-27 11:53:21,139][00107] Num frames 400... [2023-02-27 11:53:21,416][00107] Num frames 500... [2023-02-27 11:53:21,671][00107] Num frames 600... [2023-02-27 11:53:21,928][00107] Num frames 700... [2023-02-27 11:53:22,186][00107] Num frames 800... [2023-02-27 11:53:22,442][00107] Num frames 900... [2023-02-27 11:53:22,706][00107] Num frames 1000... [2023-02-27 11:53:22,966][00107] Num frames 1100... [2023-02-27 11:53:23,172][00107] Num frames 1200... [2023-02-27 11:53:23,367][00107] Num frames 1300... [2023-02-27 11:53:23,549][00107] Num frames 1400... [2023-02-27 11:53:23,740][00107] Num frames 1500... [2023-02-27 11:53:23,930][00107] Num frames 1600... [2023-02-27 11:53:24,118][00107] Num frames 1700... [2023-02-27 11:53:24,294][00107] Num frames 1800... [2023-02-27 11:53:24,484][00107] Num frames 1900... [2023-02-27 11:53:24,679][00107] Num frames 2000... [2023-02-27 11:53:24,868][00107] Num frames 2100... [2023-02-27 11:53:25,049][00107] Num frames 2200... [2023-02-27 11:53:25,236][00107] Num frames 2300... [2023-02-27 11:53:25,411][00107] Num frames 2400... [2023-02-27 11:53:25,592][00107] Num frames 2500... [2023-02-27 11:53:25,767][00107] Num frames 2600... [2023-02-27 11:53:25,946][00107] Num frames 2700... [2023-02-27 11:53:26,126][00107] Num frames 2800... [2023-02-27 11:53:26,307][00107] Num frames 2900... [2023-02-27 11:53:26,495][00107] Num frames 3000... [2023-02-27 11:53:26,678][00107] Num frames 3100... [2023-02-27 11:53:26,862][00107] Num frames 3200... [2023-02-27 11:53:27,051][00107] Num frames 3300... [2023-02-27 11:53:27,230][00107] Num frames 3400... [2023-02-27 11:53:27,406][00107] Num frames 3500... [2023-02-27 11:53:27,600][00107] Num frames 3600... [2023-02-27 11:53:27,780][00107] Num frames 3700... [2023-02-27 11:53:27,954][00107] Num frames 3800... [2023-02-27 11:53:28,131][00107] Num frames 3900... [2023-02-27 11:53:28,312][00107] Num frames 4000... [2023-02-27 11:53:28,494][00107] Num frames 4100... [2023-02-27 11:53:28,691][00107] Num frames 4200... [2023-02-27 11:53:28,869][00107] Num frames 4300... [2023-02-27 11:53:29,046][00107] Num frames 4400... [2023-02-27 11:53:29,228][00107] Num frames 4500... [2023-02-27 11:53:29,411][00107] Num frames 4600... [2023-02-27 11:53:29,583][00107] Num frames 4700... [2023-02-27 11:53:29,776][00107] Num frames 4800... [2023-02-27 11:53:29,972][00107] Num frames 4900... [2023-02-27 11:53:30,156][00107] Num frames 5000... [2023-02-27 11:53:30,377][00107] Num frames 5100... [2023-02-27 11:53:30,567][00107] Num frames 5200... [2023-02-27 11:53:30,761][00107] Num frames 5300... [2023-02-27 11:53:30,951][00107] Num frames 5400... [2023-02-27 11:53:31,126][00107] Num frames 5500... [2023-02-27 11:53:31,315][00107] Num frames 5600... [2023-02-27 11:53:31,488][00107] Num frames 5700... [2023-02-27 11:53:31,708][00107] Num frames 5800... [2023-02-27 11:53:31,885][00107] Num frames 5900... [2023-02-27 11:53:32,067][00107] Num frames 6000... [2023-02-27 11:53:32,241][00107] Num frames 6100... [2023-02-27 11:53:32,423][00107] Num frames 6200... [2023-02-27 11:53:32,608][00107] Num frames 6300... [2023-02-27 11:53:32,791][00107] Num frames 6400... [2023-02-27 11:53:32,983][00107] Num frames 6500... [2023-02-27 11:53:33,228][00107] Num frames 6600... [2023-02-27 11:53:33,485][00107] Num frames 6700... [2023-02-27 11:53:33,749][00107] Num frames 6800... [2023-02-27 11:53:34,011][00107] Num frames 6900... [2023-02-27 11:53:34,262][00107] Num frames 7000... [2023-02-27 11:53:34,511][00107] Num frames 7100... [2023-02-27 11:53:34,771][00107] Num frames 7200... [2023-02-27 11:53:35,021][00107] Num frames 7300... [2023-02-27 11:53:35,274][00107] Num frames 7400... [2023-02-27 11:53:35,526][00107] Num frames 7500... [2023-02-27 11:53:35,798][00107] Num frames 7600... [2023-02-27 11:53:36,071][00107] Num frames 7700... [2023-02-27 11:53:36,260][00107] Num frames 7800... [2023-02-27 11:53:36,445][00107] Num frames 7900... [2023-02-27 11:53:36,629][00107] Num frames 8000... [2023-02-27 11:53:36,810][00107] Num frames 8100... [2023-02-27 11:53:36,990][00107] Num frames 8200... [2023-02-27 11:53:37,180][00107] Num frames 8300... [2023-02-27 11:53:37,365][00107] DAMAGECOUNT value on done: 7437.0 [2023-02-27 11:53:37,368][00107] Sum rewards: 97.978, reward structure: {'DEATHCOUNT': '-15.000', 'HEALTH': '-6.285', 'AMMO5': '0.008', 'AMMO2': '0.027', 'AMMO4': '0.135', 'AMMO3': '0.252', 'WEAPON4': '0.300', 'WEAPON5': '0.300', 'weapon4': '0.398', 'weapon5': '0.542', 'weapon2': '1.376', 'WEAPON3': '1.800', 'HITCOUNT': '3.990', 'weapon3': '13.824', 'DAMAGECOUNT': '22.311', 'FRAGCOUNT': '74.000'} [2023-02-27 11:53:37,434][00107] Avg episode rewards: #0: 97.973, true rewards: #0: 74.000 [2023-02-27 11:53:37,436][00107] Avg episode reward: 97.973, avg true_objective: 74.000 [2023-02-27 11:53:37,445][00107] Num frames 8400... [2023-02-27 11:54:27,397][00107] Replay video saved to train_dir/doom_deathmatch_bots_2222/replay.mp4! [2023-02-27 12:19:45,716][00107] Environment doom_basic already registered, overwriting... [2023-02-27 12:19:45,719][00107] Environment doom_two_colors_easy already registered, overwriting... [2023-02-27 12:19:45,724][00107] Environment doom_two_colors_hard already registered, overwriting... [2023-02-27 12:19:45,726][00107] Environment doom_dm already registered, overwriting... [2023-02-27 12:19:45,733][00107] Environment doom_dwango5 already registered, overwriting... [2023-02-27 12:19:45,735][00107] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-02-27 12:19:45,737][00107] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-02-27 12:19:45,739][00107] Environment doom_my_way_home already registered, overwriting... [2023-02-27 12:19:45,741][00107] Environment doom_deadly_corridor already registered, overwriting... [2023-02-27 12:19:45,743][00107] Environment doom_defend_the_center already registered, overwriting... [2023-02-27 12:19:45,745][00107] Environment doom_defend_the_line already registered, overwriting... [2023-02-27 12:19:45,747][00107] Environment doom_health_gathering already registered, overwriting... [2023-02-27 12:19:45,748][00107] Environment doom_health_gathering_supreme already registered, overwriting... [2023-02-27 12:19:45,750][00107] Environment doom_battle already registered, overwriting... [2023-02-27 12:19:45,753][00107] Environment doom_battle2 already registered, overwriting... [2023-02-27 12:19:45,755][00107] Environment doom_duel_bots already registered, overwriting... [2023-02-27 12:19:45,757][00107] Environment doom_deathmatch_bots already registered, overwriting... [2023-02-27 12:19:45,760][00107] Environment doom_duel already registered, overwriting... [2023-02-27 12:19:45,761][00107] Environment doom_deathmatch_full already registered, overwriting... [2023-02-27 12:19:45,763][00107] Environment doom_benchmark already registered, overwriting... [2023-02-27 12:19:45,764][00107] register_encoder_factory: [2023-02-27 12:19:45,809][00107] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 12:19:45,812][00107] Overriding arg 'train_for_env_steps' with value 10000000 passed from command line [2023-02-27 12:19:45,819][00107] Experiment dir /content/train_dir/default_experiment already exists! [2023-02-27 12:19:45,821][00107] Resuming existing experiment from /content/train_dir/default_experiment... [2023-02-27 12:19:45,824][00107] Weights and Biases integration disabled [2023-02-27 12:19:45,829][00107] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-02-27 12:19:47,689][00107] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=10000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-02-27 12:19:47,692][00107] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-27 12:19:47,697][00107] Rollout worker 0 uses device cpu [2023-02-27 12:19:47,699][00107] Rollout worker 1 uses device cpu [2023-02-27 12:19:47,700][00107] Rollout worker 2 uses device cpu [2023-02-27 12:19:47,702][00107] Rollout worker 3 uses device cpu [2023-02-27 12:19:47,704][00107] Rollout worker 4 uses device cpu [2023-02-27 12:19:47,705][00107] Rollout worker 5 uses device cpu [2023-02-27 12:19:47,707][00107] Rollout worker 6 uses device cpu [2023-02-27 12:19:47,708][00107] Rollout worker 7 uses device cpu [2023-02-27 12:19:47,832][00107] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 12:19:47,834][00107] InferenceWorker_p0-w0: min num requests: 2 [2023-02-27 12:19:47,867][00107] Starting all processes... [2023-02-27 12:19:47,869][00107] Starting process learner_proc0 [2023-02-27 12:19:48,003][00107] Starting all processes... [2023-02-27 12:19:48,013][00107] Starting process inference_proc0-0 [2023-02-27 12:19:48,013][00107] Starting process rollout_proc0 [2023-02-27 12:19:48,014][00107] Starting process rollout_proc1 [2023-02-27 12:19:48,014][00107] Starting process rollout_proc2 [2023-02-27 12:19:48,018][00107] Starting process rollout_proc4 [2023-02-27 12:19:48,018][00107] Starting process rollout_proc5 [2023-02-27 12:19:48,018][00107] Starting process rollout_proc6 [2023-02-27 12:19:48,018][00107] Starting process rollout_proc7 [2023-02-27 12:19:48,018][00107] Starting process rollout_proc3 [2023-02-27 12:19:56,416][36588] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 12:19:56,416][36588] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-27 12:19:56,463][36588] Num visible devices: 1 [2023-02-27 12:19:56,501][36588] Starting seed is not provided [2023-02-27 12:19:56,501][36588] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 12:19:56,502][36588] Initializing actor-critic model on device cuda:0 [2023-02-27 12:19:56,503][36588] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 12:19:56,504][36588] RunningMeanStd input shape: (1,) [2023-02-27 12:19:56,587][36588] ConvEncoder: input_channels=3 [2023-02-27 12:19:57,850][36602] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 12:19:57,850][36602] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-27 12:19:57,858][36588] Conv encoder output size: 512 [2023-02-27 12:19:57,864][36588] Policy head output size: 512 [2023-02-27 12:19:57,951][36602] Num visible devices: 1 [2023-02-27 12:19:58,081][36588] Created Actor Critic model with architecture: [2023-02-27 12:19:58,087][36588] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-27 12:19:58,217][36605] Worker 2 uses CPU cores [0] [2023-02-27 12:19:58,453][36603] Worker 1 uses CPU cores [1] [2023-02-27 12:19:59,488][36611] Worker 0 uses CPU cores [0] [2023-02-27 12:19:59,706][36613] Worker 5 uses CPU cores [1] [2023-02-27 12:19:59,717][36615] Worker 4 uses CPU cores [0] [2023-02-27 12:19:59,902][36619] Worker 3 uses CPU cores [1] [2023-02-27 12:19:59,910][36617] Worker 6 uses CPU cores [0] [2023-02-27 12:19:59,993][36625] Worker 7 uses CPU cores [1] [2023-02-27 12:20:02,601][36588] Using optimizer [2023-02-27 12:20:02,602][36588] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-27 12:20:02,638][36588] Loading model from checkpoint [2023-02-27 12:20:02,643][36588] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2023-02-27 12:20:02,644][36588] Initialized policy 0 weights for model version 978 [2023-02-27 12:20:02,647][36588] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 12:20:02,655][36588] LearnerWorker_p0 finished initialization! [2023-02-27 12:20:02,894][36602] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 12:20:02,895][36602] RunningMeanStd input shape: (1,) [2023-02-27 12:20:02,909][36602] ConvEncoder: input_channels=3 [2023-02-27 12:20:03,016][36602] Conv encoder output size: 512 [2023-02-27 12:20:03,016][36602] Policy head output size: 512 [2023-02-27 12:20:05,511][00107] Inference worker 0-0 is ready! [2023-02-27 12:20:05,513][00107] All inference workers are ready! Signal rollout workers to start! [2023-02-27 12:20:05,652][36603] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 12:20:05,655][36625] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 12:20:05,672][36613] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 12:20:05,681][36619] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 12:20:05,690][36617] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 12:20:05,689][36615] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 12:20:05,701][36605] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 12:20:05,703][36611] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 12:20:05,830][00107] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 12:20:06,948][36611] Decorrelating experience for 0 frames... [2023-02-27 12:20:06,945][36617] Decorrelating experience for 0 frames... [2023-02-27 12:20:06,952][36615] Decorrelating experience for 0 frames... [2023-02-27 12:20:07,284][36625] Decorrelating experience for 0 frames... [2023-02-27 12:20:07,293][36603] Decorrelating experience for 0 frames... [2023-02-27 12:20:07,306][36619] Decorrelating experience for 0 frames... [2023-02-27 12:20:07,326][36613] Decorrelating experience for 0 frames... [2023-02-27 12:20:07,691][36625] Decorrelating experience for 32 frames... [2023-02-27 12:20:07,804][36611] Decorrelating experience for 32 frames... [2023-02-27 12:20:07,808][36617] Decorrelating experience for 32 frames... [2023-02-27 12:20:07,824][00107] Heartbeat connected on Batcher_0 [2023-02-27 12:20:07,830][00107] Heartbeat connected on LearnerWorker_p0 [2023-02-27 12:20:07,861][00107] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-27 12:20:08,282][36615] Decorrelating experience for 32 frames... [2023-02-27 12:20:08,807][36611] Decorrelating experience for 64 frames... [2023-02-27 12:20:08,866][36613] Decorrelating experience for 32 frames... [2023-02-27 12:20:09,097][36603] Decorrelating experience for 32 frames... [2023-02-27 12:20:09,102][36619] Decorrelating experience for 32 frames... [2023-02-27 12:20:09,239][36615] Decorrelating experience for 64 frames... [2023-02-27 12:20:09,784][36611] Decorrelating experience for 96 frames... [2023-02-27 12:20:09,783][36625] Decorrelating experience for 64 frames... [2023-02-27 12:20:10,022][00107] Heartbeat connected on RolloutWorker_w0 [2023-02-27 12:20:10,234][36617] Decorrelating experience for 64 frames... [2023-02-27 12:20:10,777][36613] Decorrelating experience for 64 frames... [2023-02-27 12:20:10,830][00107] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 12:20:11,325][36615] Decorrelating experience for 96 frames... [2023-02-27 12:20:11,346][36603] Decorrelating experience for 64 frames... [2023-02-27 12:20:11,379][36605] Decorrelating experience for 0 frames... [2023-02-27 12:20:11,419][36619] Decorrelating experience for 64 frames... [2023-02-27 12:20:11,742][00107] Heartbeat connected on RolloutWorker_w4 [2023-02-27 12:20:12,196][36617] Decorrelating experience for 96 frames... [2023-02-27 12:20:12,858][00107] Heartbeat connected on RolloutWorker_w6 [2023-02-27 12:20:13,551][36625] Decorrelating experience for 96 frames... [2023-02-27 12:20:13,738][36603] Decorrelating experience for 96 frames... [2023-02-27 12:20:13,823][36619] Decorrelating experience for 96 frames... [2023-02-27 12:20:14,181][00107] Heartbeat connected on RolloutWorker_w7 [2023-02-27 12:20:14,427][36605] Decorrelating experience for 32 frames... [2023-02-27 12:20:14,541][00107] Heartbeat connected on RolloutWorker_w1 [2023-02-27 12:20:14,586][00107] Heartbeat connected on RolloutWorker_w3 [2023-02-27 12:20:15,830][00107] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 60.8. Samples: 608. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 12:20:15,832][00107] Avg episode reward: [(0, '2.176')] [2023-02-27 12:20:17,828][36613] Decorrelating experience for 96 frames... [2023-02-27 12:20:18,416][00107] Heartbeat connected on RolloutWorker_w5 [2023-02-27 12:20:19,243][36588] Signal inference workers to stop experience collection... [2023-02-27 12:20:19,261][36602] InferenceWorker_p0-w0: stopping experience collection [2023-02-27 12:20:19,488][36605] Decorrelating experience for 64 frames... [2023-02-27 12:20:20,052][36605] Decorrelating experience for 96 frames... [2023-02-27 12:20:20,119][00107] Heartbeat connected on RolloutWorker_w2 [2023-02-27 12:20:20,830][00107] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 166.7. Samples: 2500. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 12:20:20,832][00107] Avg episode reward: [(0, '2.882')] [2023-02-27 12:20:22,231][36588] Signal inference workers to resume experience collection... [2023-02-27 12:20:22,233][36602] InferenceWorker_p0-w0: resuming experience collection [2023-02-27 12:20:25,830][00107] Fps is (10 sec: 1638.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 4022272. Throughput: 0: 170.4. Samples: 3408. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 12:20:25,838][00107] Avg episode reward: [(0, '3.743')] [2023-02-27 12:20:30,830][00107] Fps is (10 sec: 3686.4, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 4042752. Throughput: 0: 364.3. Samples: 9108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:20:30,833][00107] Avg episode reward: [(0, '4.363')] [2023-02-27 12:20:31,173][36602] Updated weights for policy 0, policy_version 988 (0.0021) [2023-02-27 12:20:35,831][00107] Fps is (10 sec: 3685.9, 60 sec: 1774.9, 300 sec: 1774.9). Total num frames: 4059136. Throughput: 0: 465.8. Samples: 13976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-27 12:20:35,842][00107] Avg episode reward: [(0, '4.626')] [2023-02-27 12:20:40,830][00107] Fps is (10 sec: 2867.2, 60 sec: 1872.5, 300 sec: 1872.5). Total num frames: 4071424. Throughput: 0: 455.8. Samples: 15952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:20:40,832][00107] Avg episode reward: [(0, '4.666')] [2023-02-27 12:20:44,462][36602] Updated weights for policy 0, policy_version 998 (0.0017) [2023-02-27 12:20:45,830][00107] Fps is (10 sec: 3277.3, 60 sec: 2150.4, 300 sec: 2150.4). Total num frames: 4091904. Throughput: 0: 526.3. Samples: 21052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:20:45,832][00107] Avg episode reward: [(0, '4.639')] [2023-02-27 12:20:50,830][00107] Fps is (10 sec: 4096.0, 60 sec: 2366.6, 300 sec: 2366.6). Total num frames: 4112384. Throughput: 0: 608.6. Samples: 27388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:20:50,832][00107] Avg episode reward: [(0, '4.710')] [2023-02-27 12:20:55,830][00107] Fps is (10 sec: 3276.8, 60 sec: 2375.7, 300 sec: 2375.7). Total num frames: 4124672. Throughput: 0: 658.4. Samples: 29628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:20:55,837][00107] Avg episode reward: [(0, '4.563')] [2023-02-27 12:20:56,202][36602] Updated weights for policy 0, policy_version 1008 (0.0023) [2023-02-27 12:21:00,830][00107] Fps is (10 sec: 2867.2, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 4141056. Throughput: 0: 732.8. Samples: 33584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:21:00,832][00107] Avg episode reward: [(0, '4.694')] [2023-02-27 12:21:05,829][00107] Fps is (10 sec: 3686.4, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 4161536. Throughput: 0: 825.5. Samples: 39648. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 12:21:05,836][00107] Avg episode reward: [(0, '4.608')] [2023-02-27 12:21:07,502][36602] Updated weights for policy 0, policy_version 1018 (0.0014) [2023-02-27 12:21:10,831][00107] Fps is (10 sec: 4095.4, 60 sec: 2935.4, 300 sec: 2709.6). Total num frames: 4182016. Throughput: 0: 875.4. Samples: 42804. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:21:10,833][00107] Avg episode reward: [(0, '4.884')] [2023-02-27 12:21:15,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 2691.7). Total num frames: 4194304. Throughput: 0: 854.9. Samples: 47580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:21:15,834][00107] Avg episode reward: [(0, '4.874')] [2023-02-27 12:21:20,434][36602] Updated weights for policy 0, policy_version 1028 (0.0026) [2023-02-27 12:21:20,830][00107] Fps is (10 sec: 2867.6, 60 sec: 3413.3, 300 sec: 2730.7). Total num frames: 4210688. Throughput: 0: 845.2. Samples: 52008. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 12:21:20,832][00107] Avg episode reward: [(0, '4.864')] [2023-02-27 12:21:25,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 2816.0). Total num frames: 4231168. Throughput: 0: 871.8. Samples: 55182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:21:25,837][00107] Avg episode reward: [(0, '4.847')] [2023-02-27 12:21:30,175][36602] Updated weights for policy 0, policy_version 1038 (0.0014) [2023-02-27 12:21:30,836][00107] Fps is (10 sec: 4093.3, 60 sec: 3481.2, 300 sec: 2891.1). Total num frames: 4251648. Throughput: 0: 901.1. Samples: 61606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:21:30,846][00107] Avg episode reward: [(0, '4.728')] [2023-02-27 12:21:35,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 2867.2). Total num frames: 4263936. Throughput: 0: 850.3. Samples: 65650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:21:35,836][00107] Avg episode reward: [(0, '4.721')] [2023-02-27 12:21:40,829][00107] Fps is (10 sec: 2869.1, 60 sec: 3481.6, 300 sec: 2888.8). Total num frames: 4280320. Throughput: 0: 845.2. Samples: 67664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:21:40,831][00107] Avg episode reward: [(0, '4.730')] [2023-02-27 12:21:42,990][36602] Updated weights for policy 0, policy_version 1048 (0.0019) [2023-02-27 12:21:45,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 2949.1). Total num frames: 4300800. Throughput: 0: 894.3. Samples: 73828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:21:45,832][00107] Avg episode reward: [(0, '4.769')] [2023-02-27 12:21:45,845][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001050_4300800.pth... [2023-02-27 12:21:46,004][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000916_3751936.pth [2023-02-27 12:21:50,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3003.7). Total num frames: 4321280. Throughput: 0: 885.9. Samples: 79514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:21:50,832][00107] Avg episode reward: [(0, '4.748')] [2023-02-27 12:21:54,951][36602] Updated weights for policy 0, policy_version 1058 (0.0025) [2023-02-27 12:21:55,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 2978.9). Total num frames: 4333568. Throughput: 0: 860.1. Samples: 81506. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:21:55,836][00107] Avg episode reward: [(0, '4.735')] [2023-02-27 12:22:00,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 2991.9). Total num frames: 4349952. Throughput: 0: 855.6. Samples: 86082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:22:00,832][00107] Avg episode reward: [(0, '5.041')] [2023-02-27 12:22:05,772][36602] Updated weights for policy 0, policy_version 1068 (0.0013) [2023-02-27 12:22:05,830][00107] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3072.0). Total num frames: 4374528. Throughput: 0: 899.0. Samples: 92462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:22:05,837][00107] Avg episode reward: [(0, '5.104')] [2023-02-27 12:22:10,832][00107] Fps is (10 sec: 4094.8, 60 sec: 3481.5, 300 sec: 3080.1). Total num frames: 4390912. Throughput: 0: 895.5. Samples: 95480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:22:10,839][00107] Avg episode reward: [(0, '4.768')] [2023-02-27 12:22:15,830][00107] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3056.2). Total num frames: 4403200. Throughput: 0: 842.6. Samples: 99516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:22:15,834][00107] Avg episode reward: [(0, '4.697')] [2023-02-27 12:22:18,981][36602] Updated weights for policy 0, policy_version 1078 (0.0022) [2023-02-27 12:22:20,830][00107] Fps is (10 sec: 2868.0, 60 sec: 3481.6, 300 sec: 3064.4). Total num frames: 4419584. Throughput: 0: 868.0. Samples: 104708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:22:20,831][00107] Avg episode reward: [(0, '4.698')] [2023-02-27 12:22:25,829][00107] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3130.5). Total num frames: 4444160. Throughput: 0: 893.1. Samples: 107852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:22:25,838][00107] Avg episode reward: [(0, '4.874')] [2023-02-27 12:22:29,293][36602] Updated weights for policy 0, policy_version 1088 (0.0013) [2023-02-27 12:22:30,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3482.0, 300 sec: 3135.6). Total num frames: 4460544. Throughput: 0: 880.1. Samples: 113434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:22:30,832][00107] Avg episode reward: [(0, '4.865')] [2023-02-27 12:22:35,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3113.0). Total num frames: 4472832. Throughput: 0: 844.0. Samples: 117492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:22:35,835][00107] Avg episode reward: [(0, '4.813')] [2023-02-27 12:22:40,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3144.7). Total num frames: 4493312. Throughput: 0: 859.4. Samples: 120180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:22:40,835][00107] Avg episode reward: [(0, '4.616')] [2023-02-27 12:22:41,692][36602] Updated weights for policy 0, policy_version 1098 (0.0014) [2023-02-27 12:22:45,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3174.4). Total num frames: 4513792. Throughput: 0: 899.4. Samples: 126556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:22:45,835][00107] Avg episode reward: [(0, '4.878')] [2023-02-27 12:22:50,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3177.5). Total num frames: 4530176. Throughput: 0: 867.6. Samples: 131502. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:22:50,836][00107] Avg episode reward: [(0, '4.917')] [2023-02-27 12:22:53,679][36602] Updated weights for policy 0, policy_version 1108 (0.0013) [2023-02-27 12:22:55,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3156.3). Total num frames: 4542464. Throughput: 0: 845.3. Samples: 133514. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:22:55,837][00107] Avg episode reward: [(0, '4.937')] [2023-02-27 12:23:00,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3183.2). Total num frames: 4562944. Throughput: 0: 876.8. Samples: 138974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:23:00,832][00107] Avg episode reward: [(0, '4.930')] [2023-02-27 12:23:04,233][36602] Updated weights for policy 0, policy_version 1118 (0.0030) [2023-02-27 12:23:05,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3208.5). Total num frames: 4583424. Throughput: 0: 904.2. Samples: 145396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:23:05,837][00107] Avg episode reward: [(0, '4.867')] [2023-02-27 12:23:10,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.8, 300 sec: 3210.4). Total num frames: 4599808. Throughput: 0: 883.0. Samples: 147586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:23:10,833][00107] Avg episode reward: [(0, '5.048')] [2023-02-27 12:23:15,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3190.6). Total num frames: 4612096. Throughput: 0: 847.7. Samples: 151580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:23:15,833][00107] Avg episode reward: [(0, '4.911')] [2023-02-27 12:23:17,256][36602] Updated weights for policy 0, policy_version 1128 (0.0026) [2023-02-27 12:23:20,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3213.8). Total num frames: 4632576. Throughput: 0: 892.4. Samples: 157652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:23:20,836][00107] Avg episode reward: [(0, '4.753')] [2023-02-27 12:23:25,829][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3235.8). Total num frames: 4653056. Throughput: 0: 903.2. Samples: 160824. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:23:25,834][00107] Avg episode reward: [(0, '4.977')] [2023-02-27 12:23:27,537][36602] Updated weights for policy 0, policy_version 1138 (0.0014) [2023-02-27 12:23:30,830][00107] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3236.8). Total num frames: 4669440. Throughput: 0: 869.2. Samples: 165672. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:23:30,838][00107] Avg episode reward: [(0, '5.020')] [2023-02-27 12:23:35,830][00107] Fps is (10 sec: 2867.0, 60 sec: 3481.6, 300 sec: 3218.3). Total num frames: 4681728. Throughput: 0: 857.9. Samples: 170110. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:23:35,835][00107] Avg episode reward: [(0, '4.918')] [2023-02-27 12:23:39,896][36602] Updated weights for policy 0, policy_version 1148 (0.0015) [2023-02-27 12:23:40,829][00107] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3257.8). Total num frames: 4706304. Throughput: 0: 883.8. Samples: 173284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:23:40,835][00107] Avg episode reward: [(0, '5.014')] [2023-02-27 12:23:45,830][00107] Fps is (10 sec: 4096.2, 60 sec: 3481.6, 300 sec: 3258.2). Total num frames: 4722688. Throughput: 0: 901.7. Samples: 179552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:23:45,834][00107] Avg episode reward: [(0, '4.948')] [2023-02-27 12:23:45,848][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001153_4722688.pth... [2023-02-27 12:23:46,082][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2023-02-27 12:23:50,832][00107] Fps is (10 sec: 2866.4, 60 sec: 3413.2, 300 sec: 3240.4). Total num frames: 4734976. Throughput: 0: 849.8. Samples: 183638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:23:50,835][00107] Avg episode reward: [(0, '4.771')] [2023-02-27 12:23:52,456][36602] Updated weights for policy 0, policy_version 1158 (0.0013) [2023-02-27 12:23:55,834][00107] Fps is (10 sec: 3275.4, 60 sec: 3549.6, 300 sec: 3258.9). Total num frames: 4755456. Throughput: 0: 846.5. Samples: 185684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:23:55,842][00107] Avg episode reward: [(0, '4.555')] [2023-02-27 12:24:00,830][00107] Fps is (10 sec: 4097.0, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 4775936. Throughput: 0: 898.8. Samples: 192024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:24:00,832][00107] Avg episode reward: [(0, '4.586')] [2023-02-27 12:24:02,396][36602] Updated weights for policy 0, policy_version 1168 (0.0024) [2023-02-27 12:24:05,830][00107] Fps is (10 sec: 3688.0, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 4792320. Throughput: 0: 893.7. Samples: 197868. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:24:05,836][00107] Avg episode reward: [(0, '4.651')] [2023-02-27 12:24:10,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 4808704. Throughput: 0: 867.4. Samples: 199858. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-27 12:24:10,837][00107] Avg episode reward: [(0, '4.661')] [2023-02-27 12:24:15,363][36602] Updated weights for policy 0, policy_version 1178 (0.0036) [2023-02-27 12:24:15,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 4825088. Throughput: 0: 860.0. Samples: 204372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:24:15,833][00107] Avg episode reward: [(0, '4.863')] [2023-02-27 12:24:20,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3292.9). Total num frames: 4845568. Throughput: 0: 904.1. Samples: 210792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:24:20,835][00107] Avg episode reward: [(0, '4.967')] [2023-02-27 12:24:25,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3292.6). Total num frames: 4861952. Throughput: 0: 900.2. Samples: 213792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:24:25,831][00107] Avg episode reward: [(0, '4.663')] [2023-02-27 12:24:25,921][36602] Updated weights for policy 0, policy_version 1188 (0.0017) [2023-02-27 12:24:30,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3292.3). Total num frames: 4878336. Throughput: 0: 850.3. Samples: 217814. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:24:30,834][00107] Avg episode reward: [(0, '4.690')] [2023-02-27 12:24:35,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3292.0). Total num frames: 4894720. Throughput: 0: 879.3. Samples: 223206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:24:35,837][00107] Avg episode reward: [(0, '4.712')] [2023-02-27 12:24:37,995][36602] Updated weights for policy 0, policy_version 1198 (0.0019) [2023-02-27 12:24:40,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3306.6). Total num frames: 4915200. Throughput: 0: 904.3. Samples: 226374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:24:40,832][00107] Avg episode reward: [(0, '4.512')] [2023-02-27 12:24:45,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3306.1). Total num frames: 4931584. Throughput: 0: 882.2. Samples: 231724. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:24:45,834][00107] Avg episode reward: [(0, '4.745')] [2023-02-27 12:24:50,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3291.2). Total num frames: 4943872. Throughput: 0: 839.6. Samples: 235650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:24:50,832][00107] Avg episode reward: [(0, '4.700')] [2023-02-27 12:24:51,261][36602] Updated weights for policy 0, policy_version 1208 (0.0015) [2023-02-27 12:24:55,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.9, 300 sec: 3305.0). Total num frames: 4964352. Throughput: 0: 860.2. Samples: 238566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:24:55,832][00107] Avg episode reward: [(0, '4.823')] [2023-02-27 12:25:00,810][36602] Updated weights for policy 0, policy_version 1218 (0.0014) [2023-02-27 12:25:00,829][00107] Fps is (10 sec: 4505.7, 60 sec: 3549.9, 300 sec: 3332.3). Total num frames: 4988928. Throughput: 0: 903.1. Samples: 245010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:25:00,835][00107] Avg episode reward: [(0, '4.704')] [2023-02-27 12:25:05,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 5001216. Throughput: 0: 863.3. Samples: 249642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:25:05,832][00107] Avg episode reward: [(0, '4.627')] [2023-02-27 12:25:10,829][00107] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 5013504. Throughput: 0: 839.7. Samples: 251580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:25:10,832][00107] Avg episode reward: [(0, '4.634')] [2023-02-27 12:25:14,092][36602] Updated weights for policy 0, policy_version 1228 (0.0030) [2023-02-27 12:25:15,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 5033984. Throughput: 0: 872.8. Samples: 257090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:25:15,833][00107] Avg episode reward: [(0, '4.637')] [2023-02-27 12:25:20,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 5054464. Throughput: 0: 894.4. Samples: 263452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:25:20,832][00107] Avg episode reward: [(0, '4.703')] [2023-02-27 12:25:25,550][36602] Updated weights for policy 0, policy_version 1238 (0.0014) [2023-02-27 12:25:25,830][00107] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 5070848. Throughput: 0: 868.7. Samples: 265466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:25:25,835][00107] Avg episode reward: [(0, '4.851')] [2023-02-27 12:25:30,829][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 5083136. Throughput: 0: 839.7. Samples: 269510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:25:30,835][00107] Avg episode reward: [(0, '4.982')] [2023-02-27 12:25:35,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 5103616. Throughput: 0: 891.3. Samples: 275760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:25:35,838][00107] Avg episode reward: [(0, '4.936')] [2023-02-27 12:25:36,810][36602] Updated weights for policy 0, policy_version 1248 (0.0015) [2023-02-27 12:25:40,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 5124096. Throughput: 0: 896.1. Samples: 278892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:25:40,835][00107] Avg episode reward: [(0, '4.644')] [2023-02-27 12:25:45,834][00107] Fps is (10 sec: 3684.7, 60 sec: 3481.3, 300 sec: 3485.0). Total num frames: 5140480. Throughput: 0: 851.8. Samples: 283346. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:25:45,837][00107] Avg episode reward: [(0, '4.483')] [2023-02-27 12:25:45,856][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001255_5140480.pth... [2023-02-27 12:25:46,048][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001050_4300800.pth [2023-02-27 12:25:49,864][36602] Updated weights for policy 0, policy_version 1258 (0.0013) [2023-02-27 12:25:50,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 5152768. Throughput: 0: 849.9. Samples: 287888. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:25:50,832][00107] Avg episode reward: [(0, '4.541')] [2023-02-27 12:25:55,829][00107] Fps is (10 sec: 3688.2, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5177344. Throughput: 0: 877.7. Samples: 291076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:25:55,836][00107] Avg episode reward: [(0, '4.807')] [2023-02-27 12:26:00,020][36602] Updated weights for policy 0, policy_version 1268 (0.0019) [2023-02-27 12:26:00,831][00107] Fps is (10 sec: 4095.3, 60 sec: 3413.2, 300 sec: 3498.9). Total num frames: 5193728. Throughput: 0: 894.5. Samples: 297346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:26:00,839][00107] Avg episode reward: [(0, '4.960')] [2023-02-27 12:26:05,831][00107] Fps is (10 sec: 2866.7, 60 sec: 3413.2, 300 sec: 3471.2). Total num frames: 5206016. Throughput: 0: 841.3. Samples: 301314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:26:05,837][00107] Avg episode reward: [(0, '4.858')] [2023-02-27 12:26:10,830][00107] Fps is (10 sec: 3277.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 5226496. Throughput: 0: 841.4. Samples: 303328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:26:10,832][00107] Avg episode reward: [(0, '4.673')] [2023-02-27 12:26:12,661][36602] Updated weights for policy 0, policy_version 1278 (0.0022) [2023-02-27 12:26:15,830][00107] Fps is (10 sec: 4096.7, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5246976. Throughput: 0: 891.3. Samples: 309620. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:26:15,831][00107] Avg episode reward: [(0, '4.717')] [2023-02-27 12:26:20,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 5263360. Throughput: 0: 875.6. Samples: 315162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:26:20,837][00107] Avg episode reward: [(0, '4.912')] [2023-02-27 12:26:25,015][36602] Updated weights for policy 0, policy_version 1288 (0.0028) [2023-02-27 12:26:25,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3471.3). Total num frames: 5275648. Throughput: 0: 850.1. Samples: 317146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:26:25,837][00107] Avg episode reward: [(0, '5.133')] [2023-02-27 12:26:30,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 5296128. Throughput: 0: 860.4. Samples: 322062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:26:30,831][00107] Avg episode reward: [(0, '4.866')] [2023-02-27 12:26:35,081][36602] Updated weights for policy 0, policy_version 1298 (0.0012) [2023-02-27 12:26:35,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5316608. Throughput: 0: 903.7. Samples: 328554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:26:35,837][00107] Avg episode reward: [(0, '4.702')] [2023-02-27 12:26:40,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 5332992. Throughput: 0: 895.5. Samples: 331374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:26:40,832][00107] Avg episode reward: [(0, '4.650')] [2023-02-27 12:26:45,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.6, 300 sec: 3471.2). Total num frames: 5345280. Throughput: 0: 844.2. Samples: 335332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:26:45,834][00107] Avg episode reward: [(0, '4.826')] [2023-02-27 12:26:48,138][36602] Updated weights for policy 0, policy_version 1308 (0.0017) [2023-02-27 12:26:50,830][00107] Fps is (10 sec: 3276.5, 60 sec: 3549.8, 300 sec: 3498.9). Total num frames: 5365760. Throughput: 0: 882.1. Samples: 341008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:26:50,838][00107] Avg episode reward: [(0, '4.826')] [2023-02-27 12:26:55,829][00107] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 5390336. Throughput: 0: 908.9. Samples: 344230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:26:55,831][00107] Avg episode reward: [(0, '4.777')] [2023-02-27 12:26:58,250][36602] Updated weights for policy 0, policy_version 1318 (0.0012) [2023-02-27 12:27:00,832][00107] Fps is (10 sec: 3685.9, 60 sec: 3481.6, 300 sec: 3485.0). Total num frames: 5402624. Throughput: 0: 887.4. Samples: 349554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:27:00,834][00107] Avg episode reward: [(0, '4.742')] [2023-02-27 12:27:05,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3550.0, 300 sec: 3485.1). Total num frames: 5419008. Throughput: 0: 856.8. Samples: 353716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:27:05,836][00107] Avg episode reward: [(0, '4.604')] [2023-02-27 12:27:10,542][36602] Updated weights for policy 0, policy_version 1328 (0.0018) [2023-02-27 12:27:10,830][00107] Fps is (10 sec: 3687.2, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5439488. Throughput: 0: 883.1. Samples: 356886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:27:10,836][00107] Avg episode reward: [(0, '4.620')] [2023-02-27 12:27:15,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 5459968. Throughput: 0: 912.2. Samples: 363112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:27:15,834][00107] Avg episode reward: [(0, '4.684')] [2023-02-27 12:27:20,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 5472256. Throughput: 0: 868.1. Samples: 367618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:27:20,832][00107] Avg episode reward: [(0, '4.719')] [2023-02-27 12:27:22,891][36602] Updated weights for policy 0, policy_version 1338 (0.0012) [2023-02-27 12:27:25,832][00107] Fps is (10 sec: 2866.4, 60 sec: 3549.7, 300 sec: 3485.0). Total num frames: 5488640. Throughput: 0: 850.7. Samples: 369656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:27:25,838][00107] Avg episode reward: [(0, '4.979')] [2023-02-27 12:27:30,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5509120. Throughput: 0: 895.6. Samples: 375632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:27:30,839][00107] Avg episode reward: [(0, '4.626')] [2023-02-27 12:27:33,049][36602] Updated weights for policy 0, policy_version 1348 (0.0013) [2023-02-27 12:27:35,830][00107] Fps is (10 sec: 4097.1, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5529600. Throughput: 0: 908.3. Samples: 381882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:27:35,835][00107] Avg episode reward: [(0, '4.457')] [2023-02-27 12:27:40,830][00107] Fps is (10 sec: 3276.5, 60 sec: 3481.5, 300 sec: 3485.1). Total num frames: 5541888. Throughput: 0: 881.0. Samples: 383874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:27:40,838][00107] Avg episode reward: [(0, '4.575')] [2023-02-27 12:27:45,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 5558272. Throughput: 0: 857.2. Samples: 388124. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:27:45,832][00107] Avg episode reward: [(0, '4.583')] [2023-02-27 12:27:45,900][36602] Updated weights for policy 0, policy_version 1358 (0.0015) [2023-02-27 12:27:45,900][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001358_5562368.pth... [2023-02-27 12:27:46,046][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001153_4722688.pth [2023-02-27 12:27:50,830][00107] Fps is (10 sec: 4096.4, 60 sec: 3618.2, 300 sec: 3526.7). Total num frames: 5582848. Throughput: 0: 906.8. Samples: 394520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:27:50,837][00107] Avg episode reward: [(0, '4.509')] [2023-02-27 12:27:55,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 5599232. Throughput: 0: 906.9. Samples: 397698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:27:55,833][00107] Avg episode reward: [(0, '4.634')] [2023-02-27 12:27:56,506][36602] Updated weights for policy 0, policy_version 1368 (0.0023) [2023-02-27 12:28:00,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3485.1). Total num frames: 5611520. Throughput: 0: 861.0. Samples: 401858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:28:00,833][00107] Avg episode reward: [(0, '4.786')] [2023-02-27 12:28:05,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 5632000. Throughput: 0: 881.6. Samples: 407290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:28:05,839][00107] Avg episode reward: [(0, '4.794')] [2023-02-27 12:28:08,173][36602] Updated weights for policy 0, policy_version 1378 (0.0012) [2023-02-27 12:28:10,830][00107] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 5652480. Throughput: 0: 908.9. Samples: 410552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:28:10,832][00107] Avg episode reward: [(0, '4.530')] [2023-02-27 12:28:15,830][00107] Fps is (10 sec: 3686.2, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 5668864. Throughput: 0: 904.9. Samples: 416352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:28:15,833][00107] Avg episode reward: [(0, '4.632')] [2023-02-27 12:28:20,445][36602] Updated weights for policy 0, policy_version 1388 (0.0014) [2023-02-27 12:28:20,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 5685248. Throughput: 0: 857.4. Samples: 420466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:28:20,839][00107] Avg episode reward: [(0, '4.766')] [2023-02-27 12:28:25,829][00107] Fps is (10 sec: 3686.6, 60 sec: 3618.3, 300 sec: 3512.8). Total num frames: 5705728. Throughput: 0: 872.0. Samples: 423112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:28:25,834][00107] Avg episode reward: [(0, '4.926')] [2023-02-27 12:28:30,479][36602] Updated weights for policy 0, policy_version 1398 (0.0016) [2023-02-27 12:28:30,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 5726208. Throughput: 0: 923.2. Samples: 429668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:28:30,831][00107] Avg episode reward: [(0, '5.133')] [2023-02-27 12:28:35,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5742592. Throughput: 0: 892.7. Samples: 434692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:28:35,834][00107] Avg episode reward: [(0, '5.053')] [2023-02-27 12:28:40,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 5754880. Throughput: 0: 867.2. Samples: 436720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:28:40,839][00107] Avg episode reward: [(0, '5.075')] [2023-02-27 12:28:43,246][36602] Updated weights for policy 0, policy_version 1408 (0.0020) [2023-02-27 12:28:45,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3526.8). Total num frames: 5775360. Throughput: 0: 893.6. Samples: 442070. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:28:45,836][00107] Avg episode reward: [(0, '4.934')] [2023-02-27 12:28:50,830][00107] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3526.8). Total num frames: 5795840. Throughput: 0: 917.3. Samples: 448568. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:28:50,838][00107] Avg episode reward: [(0, '5.085')] [2023-02-27 12:28:53,768][36602] Updated weights for policy 0, policy_version 1418 (0.0015) [2023-02-27 12:28:55,833][00107] Fps is (10 sec: 3685.0, 60 sec: 3549.6, 300 sec: 3512.8). Total num frames: 5812224. Throughput: 0: 893.5. Samples: 450764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:28:55,840][00107] Avg episode reward: [(0, '5.213')] [2023-02-27 12:29:00,829][00107] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 5824512. Throughput: 0: 856.5. Samples: 454896. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:29:00,838][00107] Avg episode reward: [(0, '5.082')] [2023-02-27 12:29:05,475][36602] Updated weights for policy 0, policy_version 1428 (0.0019) [2023-02-27 12:29:05,830][00107] Fps is (10 sec: 3687.8, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 5849088. Throughput: 0: 905.2. Samples: 461202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:29:05,838][00107] Avg episode reward: [(0, '5.286')] [2023-02-27 12:29:10,834][00107] Fps is (10 sec: 4503.5, 60 sec: 3617.8, 300 sec: 3540.6). Total num frames: 5869568. Throughput: 0: 919.6. Samples: 464500. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:29:10,838][00107] Avg episode reward: [(0, '5.030')] [2023-02-27 12:29:15,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5881856. Throughput: 0: 880.6. Samples: 469296. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:29:15,832][00107] Avg episode reward: [(0, '5.182')] [2023-02-27 12:29:17,765][36602] Updated weights for policy 0, policy_version 1438 (0.0016) [2023-02-27 12:29:20,829][00107] Fps is (10 sec: 2868.6, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5898240. Throughput: 0: 871.2. Samples: 473894. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-27 12:29:20,837][00107] Avg episode reward: [(0, '5.186')] [2023-02-27 12:29:25,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 5918720. Throughput: 0: 897.8. Samples: 477122. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:29:25,831][00107] Avg episode reward: [(0, '5.409')] [2023-02-27 12:29:25,851][36588] Saving new best policy, reward=5.409! [2023-02-27 12:29:28,099][36602] Updated weights for policy 0, policy_version 1448 (0.0030) [2023-02-27 12:29:30,832][00107] Fps is (10 sec: 4094.8, 60 sec: 3549.7, 300 sec: 3540.6). Total num frames: 5939200. Throughput: 0: 919.1. Samples: 483434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:29:30,840][00107] Avg episode reward: [(0, '5.245')] [2023-02-27 12:29:35,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 5951488. Throughput: 0: 865.6. Samples: 487518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:29:35,833][00107] Avg episode reward: [(0, '5.295')] [2023-02-27 12:29:40,689][36602] Updated weights for policy 0, policy_version 1458 (0.0034) [2023-02-27 12:29:40,830][00107] Fps is (10 sec: 3277.7, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 5971968. Throughput: 0: 864.2. Samples: 489650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:29:40,837][00107] Avg episode reward: [(0, '5.441')] [2023-02-27 12:29:40,842][36588] Saving new best policy, reward=5.441! [2023-02-27 12:29:45,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 5992448. Throughput: 0: 911.2. Samples: 495898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:29:45,837][00107] Avg episode reward: [(0, '5.075')] [2023-02-27 12:29:45,850][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001463_5992448.pth... [2023-02-27 12:29:45,982][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001255_5140480.pth [2023-02-27 12:29:50,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 6008832. Throughput: 0: 892.4. Samples: 501360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:29:50,835][00107] Avg episode reward: [(0, '5.114')] [2023-02-27 12:29:51,988][36602] Updated weights for policy 0, policy_version 1468 (0.0019) [2023-02-27 12:29:55,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.8, 300 sec: 3499.0). Total num frames: 6021120. Throughput: 0: 864.0. Samples: 503376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:29:55,835][00107] Avg episode reward: [(0, '5.221')] [2023-02-27 12:30:00,832][00107] Fps is (10 sec: 3275.9, 60 sec: 3618.0, 300 sec: 3526.7). Total num frames: 6041600. Throughput: 0: 870.4. Samples: 508466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:30:00,837][00107] Avg episode reward: [(0, '5.041')] [2023-02-27 12:30:03,384][36602] Updated weights for policy 0, policy_version 1478 (0.0022) [2023-02-27 12:30:05,829][00107] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 6062080. Throughput: 0: 913.9. Samples: 515018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:30:05,837][00107] Avg episode reward: [(0, '4.857')] [2023-02-27 12:30:10,830][00107] Fps is (10 sec: 3687.4, 60 sec: 3481.9, 300 sec: 3540.6). Total num frames: 6078464. Throughput: 0: 903.5. Samples: 517778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:30:10,836][00107] Avg episode reward: [(0, '5.128')] [2023-02-27 12:30:15,830][00107] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 6090752. Throughput: 0: 851.4. Samples: 521746. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-27 12:30:15,832][00107] Avg episode reward: [(0, '5.219')] [2023-02-27 12:30:15,947][36602] Updated weights for policy 0, policy_version 1488 (0.0033) [2023-02-27 12:30:20,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 6111232. Throughput: 0: 887.5. Samples: 527456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:30:20,836][00107] Avg episode reward: [(0, '5.349')] [2023-02-27 12:30:25,740][36602] Updated weights for policy 0, policy_version 1498 (0.0013) [2023-02-27 12:30:25,830][00107] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 6135808. Throughput: 0: 911.3. Samples: 530660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:30:25,837][00107] Avg episode reward: [(0, '5.295')] [2023-02-27 12:30:30,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.8, 300 sec: 3540.6). Total num frames: 6148096. Throughput: 0: 887.2. Samples: 535822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:30:30,832][00107] Avg episode reward: [(0, '5.060')] [2023-02-27 12:30:35,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 6164480. Throughput: 0: 854.7. Samples: 539822. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:30:35,836][00107] Avg episode reward: [(0, '5.154')] [2023-02-27 12:30:38,574][36602] Updated weights for policy 0, policy_version 1508 (0.0021) [2023-02-27 12:30:40,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.7). Total num frames: 6184960. Throughput: 0: 880.2. Samples: 542986. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-27 12:30:40,832][00107] Avg episode reward: [(0, '4.977')] [2023-02-27 12:30:45,830][00107] Fps is (10 sec: 4095.8, 60 sec: 3549.8, 300 sec: 3568.4). Total num frames: 6205440. Throughput: 0: 908.9. Samples: 549364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:30:45,837][00107] Avg episode reward: [(0, '4.997')] [2023-02-27 12:30:50,037][36602] Updated weights for policy 0, policy_version 1518 (0.0023) [2023-02-27 12:30:50,831][00107] Fps is (10 sec: 3276.3, 60 sec: 3481.5, 300 sec: 3526.7). Total num frames: 6217728. Throughput: 0: 861.8. Samples: 553800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:30:50,835][00107] Avg episode reward: [(0, '4.919')] [2023-02-27 12:30:55,830][00107] Fps is (10 sec: 2867.4, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 6234112. Throughput: 0: 845.3. Samples: 555818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:30:55,832][00107] Avg episode reward: [(0, '4.986')] [2023-02-27 12:31:00,829][00107] Fps is (10 sec: 3686.9, 60 sec: 3550.0, 300 sec: 3554.5). Total num frames: 6254592. Throughput: 0: 891.6. Samples: 561866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:31:00,832][00107] Avg episode reward: [(0, '4.860')] [2023-02-27 12:31:01,274][36602] Updated weights for policy 0, policy_version 1528 (0.0017) [2023-02-27 12:31:05,831][00107] Fps is (10 sec: 4095.3, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 6275072. Throughput: 0: 904.6. Samples: 568166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:31:05,835][00107] Avg episode reward: [(0, '4.840')] [2023-02-27 12:31:10,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 6287360. Throughput: 0: 879.7. Samples: 570246. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:31:10,834][00107] Avg episode reward: [(0, '4.726')] [2023-02-27 12:31:13,840][36602] Updated weights for policy 0, policy_version 1538 (0.0063) [2023-02-27 12:31:15,830][00107] Fps is (10 sec: 2867.7, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 6303744. Throughput: 0: 856.1. Samples: 574348. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-27 12:31:15,836][00107] Avg episode reward: [(0, '4.693')] [2023-02-27 12:31:20,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 6324224. Throughput: 0: 908.1. Samples: 580688. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:31:20,832][00107] Avg episode reward: [(0, '5.035')] [2023-02-27 12:31:24,084][36602] Updated weights for policy 0, policy_version 1548 (0.0014) [2023-02-27 12:31:25,830][00107] Fps is (10 sec: 4095.7, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 6344704. Throughput: 0: 907.5. Samples: 583822. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:31:25,836][00107] Avg episode reward: [(0, '4.985')] [2023-02-27 12:31:30,830][00107] Fps is (10 sec: 3276.5, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 6356992. Throughput: 0: 857.9. Samples: 587970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:31:30,838][00107] Avg episode reward: [(0, '4.790')] [2023-02-27 12:31:35,830][00107] Fps is (10 sec: 2457.8, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 6369280. Throughput: 0: 849.0. Samples: 592002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:31:35,832][00107] Avg episode reward: [(0, '4.848')] [2023-02-27 12:31:38,204][36602] Updated weights for policy 0, policy_version 1558 (0.0045) [2023-02-27 12:31:40,830][00107] Fps is (10 sec: 3277.1, 60 sec: 3413.3, 300 sec: 3540.6). Total num frames: 6389760. Throughput: 0: 871.4. Samples: 595030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:31:40,832][00107] Avg episode reward: [(0, '4.859')] [2023-02-27 12:31:45,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3526.7). Total num frames: 6406144. Throughput: 0: 862.5. Samples: 600678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:31:45,833][00107] Avg episode reward: [(0, '4.858')] [2023-02-27 12:31:45,846][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001564_6406144.pth... [2023-02-27 12:31:46,007][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001358_5562368.pth [2023-02-27 12:31:50,833][00107] Fps is (10 sec: 2866.2, 60 sec: 3344.9, 300 sec: 3485.0). Total num frames: 6418432. Throughput: 0: 803.0. Samples: 604304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:31:50,836][00107] Avg episode reward: [(0, '4.735')] [2023-02-27 12:31:51,406][36602] Updated weights for policy 0, policy_version 1568 (0.0026) [2023-02-27 12:31:55,832][00107] Fps is (10 sec: 2866.5, 60 sec: 3344.9, 300 sec: 3499.0). Total num frames: 6434816. Throughput: 0: 797.6. Samples: 606142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:31:55,834][00107] Avg episode reward: [(0, '4.595')] [2023-02-27 12:32:00,830][00107] Fps is (10 sec: 3687.7, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 6455296. Throughput: 0: 834.4. Samples: 611896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:32:00,832][00107] Avg episode reward: [(0, '4.505')] [2023-02-27 12:32:02,816][36602] Updated weights for policy 0, policy_version 1578 (0.0016) [2023-02-27 12:32:05,830][00107] Fps is (10 sec: 3277.6, 60 sec: 3208.6, 300 sec: 3485.1). Total num frames: 6467584. Throughput: 0: 797.5. Samples: 616576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:32:05,835][00107] Avg episode reward: [(0, '4.626')] [2023-02-27 12:32:10,834][00107] Fps is (10 sec: 2456.4, 60 sec: 3208.3, 300 sec: 3457.2). Total num frames: 6479872. Throughput: 0: 765.3. Samples: 618264. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:32:10,839][00107] Avg episode reward: [(0, '4.645')] [2023-02-27 12:32:15,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3471.2). Total num frames: 6496256. Throughput: 0: 769.5. Samples: 622598. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:32:15,832][00107] Avg episode reward: [(0, '4.880')] [2023-02-27 12:32:17,210][36602] Updated weights for policy 0, policy_version 1588 (0.0024) [2023-02-27 12:32:20,830][00107] Fps is (10 sec: 3688.2, 60 sec: 3208.5, 300 sec: 3485.1). Total num frames: 6516736. Throughput: 0: 813.9. Samples: 628628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:32:20,837][00107] Avg episode reward: [(0, '4.527')] [2023-02-27 12:32:25,830][00107] Fps is (10 sec: 3686.2, 60 sec: 3140.3, 300 sec: 3471.2). Total num frames: 6533120. Throughput: 0: 815.6. Samples: 631732. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:32:25,835][00107] Avg episode reward: [(0, '4.481')] [2023-02-27 12:32:29,190][36602] Updated weights for policy 0, policy_version 1598 (0.0025) [2023-02-27 12:32:30,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3208.6, 300 sec: 3457.3). Total num frames: 6549504. Throughput: 0: 779.5. Samples: 635756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:32:30,834][00107] Avg episode reward: [(0, '4.603')] [2023-02-27 12:32:35,830][00107] Fps is (10 sec: 3277.0, 60 sec: 3276.8, 300 sec: 3471.2). Total num frames: 6565888. Throughput: 0: 812.4. Samples: 640858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:32:35,832][00107] Avg episode reward: [(0, '4.776')] [2023-02-27 12:32:39,978][36602] Updated weights for policy 0, policy_version 1608 (0.0023) [2023-02-27 12:32:40,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3485.1). Total num frames: 6586368. Throughput: 0: 843.1. Samples: 644078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:32:40,838][00107] Avg episode reward: [(0, '4.943')] [2023-02-27 12:32:45,830][00107] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3457.3). Total num frames: 6602752. Throughput: 0: 840.4. Samples: 649714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:32:45,838][00107] Avg episode reward: [(0, '5.014')] [2023-02-27 12:32:50,831][00107] Fps is (10 sec: 2866.7, 60 sec: 3276.9, 300 sec: 3443.4). Total num frames: 6615040. Throughput: 0: 823.4. Samples: 653630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:32:50,834][00107] Avg episode reward: [(0, '4.758')] [2023-02-27 12:32:53,092][36602] Updated weights for policy 0, policy_version 1618 (0.0012) [2023-02-27 12:32:55,829][00107] Fps is (10 sec: 3276.9, 60 sec: 3345.2, 300 sec: 3471.2). Total num frames: 6635520. Throughput: 0: 844.6. Samples: 656268. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:32:55,832][00107] Avg episode reward: [(0, '4.775')] [2023-02-27 12:33:00,829][00107] Fps is (10 sec: 4096.7, 60 sec: 3345.1, 300 sec: 3471.2). Total num frames: 6656000. Throughput: 0: 887.8. Samples: 662548. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:33:00,832][00107] Avg episode reward: [(0, '4.896')] [2023-02-27 12:33:04,109][36602] Updated weights for policy 0, policy_version 1628 (0.0014) [2023-02-27 12:33:05,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 6672384. Throughput: 0: 851.9. Samples: 666962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:33:05,833][00107] Avg episode reward: [(0, '4.745')] [2023-02-27 12:33:10,834][00107] Fps is (10 sec: 2456.7, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 6680576. Throughput: 0: 822.3. Samples: 668738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:33:10,836][00107] Avg episode reward: [(0, '4.802')] [2023-02-27 12:33:15,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6701056. Throughput: 0: 848.8. Samples: 673954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:33:15,833][00107] Avg episode reward: [(0, '4.613')] [2023-02-27 12:33:16,823][36602] Updated weights for policy 0, policy_version 1638 (0.0015) [2023-02-27 12:33:20,829][00107] Fps is (10 sec: 4507.3, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6725632. Throughput: 0: 878.4. Samples: 680384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:33:20,836][00107] Avg episode reward: [(0, '4.590')] [2023-02-27 12:33:25,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 6737920. Throughput: 0: 860.1. Samples: 682784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:33:25,837][00107] Avg episode reward: [(0, '4.760')] [2023-02-27 12:33:29,383][36602] Updated weights for policy 0, policy_version 1648 (0.0013) [2023-02-27 12:33:30,830][00107] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 6750208. Throughput: 0: 824.3. Samples: 686806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:33:30,837][00107] Avg episode reward: [(0, '4.871')] [2023-02-27 12:33:35,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6770688. Throughput: 0: 865.6. Samples: 692582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:33:35,831][00107] Avg episode reward: [(0, '4.956')] [2023-02-27 12:33:39,662][36602] Updated weights for policy 0, policy_version 1658 (0.0012) [2023-02-27 12:33:40,830][00107] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6795264. Throughput: 0: 878.2. Samples: 695788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:33:40,835][00107] Avg episode reward: [(0, '4.974')] [2023-02-27 12:33:45,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 6807552. Throughput: 0: 852.0. Samples: 700886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:33:45,831][00107] Avg episode reward: [(0, '5.053')] [2023-02-27 12:33:45,851][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001662_6807552.pth... [2023-02-27 12:33:46,061][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001463_5992448.pth [2023-02-27 12:33:50,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3429.6). Total num frames: 6823936. Throughput: 0: 842.7. Samples: 704884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:33:50,833][00107] Avg episode reward: [(0, '4.838')] [2023-02-27 12:33:52,631][36602] Updated weights for policy 0, policy_version 1668 (0.0023) [2023-02-27 12:33:55,830][00107] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6844416. Throughput: 0: 873.2. Samples: 708028. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:33:55,833][00107] Avg episode reward: [(0, '4.748')] [2023-02-27 12:34:00,830][00107] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 6864896. Throughput: 0: 899.3. Samples: 714424. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-27 12:34:00,834][00107] Avg episode reward: [(0, '4.679')] [2023-02-27 12:34:03,547][36602] Updated weights for policy 0, policy_version 1678 (0.0016) [2023-02-27 12:34:05,830][00107] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 6877184. Throughput: 0: 842.6. Samples: 718300. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:34:05,836][00107] Avg episode reward: [(0, '4.834')] [2023-02-27 12:34:10,830][00107] Fps is (10 sec: 2457.7, 60 sec: 3481.8, 300 sec: 3415.6). Total num frames: 6889472. Throughput: 0: 829.0. Samples: 720090. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:34:10,832][00107] Avg episode reward: [(0, '4.858')] [2023-02-27 12:34:15,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 6909952. Throughput: 0: 866.1. Samples: 725780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:34:15,832][00107] Avg episode reward: [(0, '5.005')] [2023-02-27 12:34:16,259][36602] Updated weights for policy 0, policy_version 1688 (0.0022) [2023-02-27 12:34:20,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 6930432. Throughput: 0: 875.7. Samples: 731990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:34:20,832][00107] Avg episode reward: [(0, '4.911')] [2023-02-27 12:34:25,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 6942720. Throughput: 0: 848.2. Samples: 733956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:34:25,836][00107] Avg episode reward: [(0, '4.900')] [2023-02-27 12:34:29,226][36602] Updated weights for policy 0, policy_version 1698 (0.0014) [2023-02-27 12:34:30,829][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 6959104. Throughput: 0: 825.8. Samples: 738048. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:34:30,831][00107] Avg episode reward: [(0, '4.972')] [2023-02-27 12:34:35,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 6979584. Throughput: 0: 879.4. Samples: 744456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:34:35,832][00107] Avg episode reward: [(0, '4.925')] [2023-02-27 12:34:38,945][36602] Updated weights for policy 0, policy_version 1708 (0.0025) [2023-02-27 12:34:40,833][00107] Fps is (10 sec: 4094.5, 60 sec: 3413.1, 300 sec: 3415.6). Total num frames: 7000064. Throughput: 0: 880.9. Samples: 747672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:34:40,837][00107] Avg episode reward: [(0, '4.630')] [2023-02-27 12:34:45,830][00107] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 7012352. Throughput: 0: 834.7. Samples: 751984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:34:45,832][00107] Avg episode reward: [(0, '4.436')] [2023-02-27 12:34:50,830][00107] Fps is (10 sec: 2868.2, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 7028736. Throughput: 0: 854.6. Samples: 756756. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-27 12:34:50,838][00107] Avg episode reward: [(0, '4.436')] [2023-02-27 12:34:52,039][36602] Updated weights for policy 0, policy_version 1718 (0.0017) [2023-02-27 12:34:55,830][00107] Fps is (10 sec: 3686.5, 60 sec: 3413.4, 300 sec: 3415.7). Total num frames: 7049216. Throughput: 0: 885.1. Samples: 759918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:34:55,837][00107] Avg episode reward: [(0, '4.437')] [2023-02-27 12:35:00,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 3415.6). Total num frames: 7069696. Throughput: 0: 895.1. Samples: 766060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:35:00,839][00107] Avg episode reward: [(0, '4.667')] [2023-02-27 12:35:03,760][36602] Updated weights for policy 0, policy_version 1728 (0.0019) [2023-02-27 12:35:05,833][00107] Fps is (10 sec: 2866.2, 60 sec: 3344.9, 300 sec: 3387.8). Total num frames: 7077888. Throughput: 0: 833.8. Samples: 769514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:35:05,843][00107] Avg episode reward: [(0, '4.679')] [2023-02-27 12:35:10,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 7098368. Throughput: 0: 835.0. Samples: 771532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:35:10,832][00107] Avg episode reward: [(0, '4.649')] [2023-02-27 12:35:15,447][36602] Updated weights for policy 0, policy_version 1738 (0.0023) [2023-02-27 12:35:15,830][00107] Fps is (10 sec: 4097.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 7118848. Throughput: 0: 883.5. Samples: 777804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:35:15,838][00107] Avg episode reward: [(0, '4.744')] [2023-02-27 12:35:20,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7135232. Throughput: 0: 859.1. Samples: 783114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:35:20,833][00107] Avg episode reward: [(0, '4.802')] [2023-02-27 12:35:25,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7147520. Throughput: 0: 831.7. Samples: 785094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:35:25,832][00107] Avg episode reward: [(0, '4.865')] [2023-02-27 12:35:28,745][36602] Updated weights for policy 0, policy_version 1748 (0.0036) [2023-02-27 12:35:30,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 7168000. Throughput: 0: 843.8. Samples: 789954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:35:30,832][00107] Avg episode reward: [(0, '5.099')] [2023-02-27 12:35:35,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 7188480. Throughput: 0: 878.2. Samples: 796274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:35:35,839][00107] Avg episode reward: [(0, '4.959')] [2023-02-27 12:35:39,082][36602] Updated weights for policy 0, policy_version 1758 (0.0013) [2023-02-27 12:35:40,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.5, 300 sec: 3387.9). Total num frames: 7204864. Throughput: 0: 868.4. Samples: 798998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:35:40,831][00107] Avg episode reward: [(0, '4.627')] [2023-02-27 12:35:45,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7217152. Throughput: 0: 819.9. Samples: 802954. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-27 12:35:45,836][00107] Avg episode reward: [(0, '4.540')] [2023-02-27 12:35:45,852][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001762_7217152.pth... [2023-02-27 12:35:46,077][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001564_6406144.pth [2023-02-27 12:35:50,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 7237632. Throughput: 0: 863.1. Samples: 808352. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-27 12:35:50,836][00107] Avg episode reward: [(0, '4.589')] [2023-02-27 12:35:51,445][36602] Updated weights for policy 0, policy_version 1768 (0.0030) [2023-02-27 12:35:55,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 7258112. Throughput: 0: 889.1. Samples: 811540. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-27 12:35:55,837][00107] Avg episode reward: [(0, '4.546')] [2023-02-27 12:36:00,832][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7274496. Throughput: 0: 869.1. Samples: 816914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:36:00,837][00107] Avg episode reward: [(0, '4.667')] [2023-02-27 12:36:04,197][36602] Updated weights for policy 0, policy_version 1778 (0.0029) [2023-02-27 12:36:05,830][00107] Fps is (10 sec: 2457.6, 60 sec: 3413.5, 300 sec: 3374.0). Total num frames: 7282688. Throughput: 0: 829.2. Samples: 820426. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:36:05,837][00107] Avg episode reward: [(0, '4.710')] [2023-02-27 12:36:10,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7303168. Throughput: 0: 842.7. Samples: 823016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:36:10,832][00107] Avg episode reward: [(0, '4.679')] [2023-02-27 12:36:15,055][36602] Updated weights for policy 0, policy_version 1788 (0.0023) [2023-02-27 12:36:15,830][00107] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7323648. Throughput: 0: 877.4. Samples: 829436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:36:15,836][00107] Avg episode reward: [(0, '4.521')] [2023-02-27 12:36:20,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 7340032. Throughput: 0: 841.9. Samples: 834158. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:36:20,839][00107] Avg episode reward: [(0, '4.569')] [2023-02-27 12:36:25,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 7352320. Throughput: 0: 826.2. Samples: 836176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:36:25,832][00107] Avg episode reward: [(0, '4.603')] [2023-02-27 12:36:28,126][36602] Updated weights for policy 0, policy_version 1798 (0.0013) [2023-02-27 12:36:30,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 7372800. Throughput: 0: 865.5. Samples: 841902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:36:30,832][00107] Avg episode reward: [(0, '4.528')] [2023-02-27 12:36:35,830][00107] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 7397376. Throughput: 0: 888.1. Samples: 848316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:36:35,832][00107] Avg episode reward: [(0, '4.473')] [2023-02-27 12:36:38,909][36602] Updated weights for policy 0, policy_version 1808 (0.0024) [2023-02-27 12:36:40,837][00107] Fps is (10 sec: 3683.7, 60 sec: 3412.9, 300 sec: 3401.7). Total num frames: 7409664. Throughput: 0: 863.0. Samples: 850382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:36:40,840][00107] Avg episode reward: [(0, '4.607')] [2023-02-27 12:36:45,830][00107] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 7421952. Throughput: 0: 832.1. Samples: 854360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:36:45,837][00107] Avg episode reward: [(0, '4.878')] [2023-02-27 12:36:50,787][36602] Updated weights for policy 0, policy_version 1818 (0.0013) [2023-02-27 12:36:50,829][00107] Fps is (10 sec: 3689.1, 60 sec: 3481.6, 300 sec: 3429.6). Total num frames: 7446528. Throughput: 0: 892.9. Samples: 860608. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-27 12:36:50,831][00107] Avg episode reward: [(0, '4.737')] [2023-02-27 12:36:55,833][00107] Fps is (10 sec: 4094.5, 60 sec: 3413.1, 300 sec: 3415.6). Total num frames: 7462912. Throughput: 0: 905.6. Samples: 863772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:36:55,835][00107] Avg episode reward: [(0, '4.504')] [2023-02-27 12:37:00,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 7479296. Throughput: 0: 864.2. Samples: 868324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:37:00,834][00107] Avg episode reward: [(0, '4.648')] [2023-02-27 12:37:04,097][36602] Updated weights for policy 0, policy_version 1828 (0.0025) [2023-02-27 12:37:05,830][00107] Fps is (10 sec: 2868.2, 60 sec: 3481.6, 300 sec: 3429.6). Total num frames: 7491584. Throughput: 0: 844.7. Samples: 872170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:37:05,833][00107] Avg episode reward: [(0, '4.811')] [2023-02-27 12:37:10,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 7512064. Throughput: 0: 866.8. Samples: 875184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:37:10,835][00107] Avg episode reward: [(0, '4.703')] [2023-02-27 12:37:14,192][36602] Updated weights for policy 0, policy_version 1838 (0.0020) [2023-02-27 12:37:15,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 7532544. Throughput: 0: 879.0. Samples: 881456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:37:15,832][00107] Avg episode reward: [(0, '4.456')] [2023-02-27 12:37:20,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 7544832. Throughput: 0: 825.3. Samples: 885454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:37:20,838][00107] Avg episode reward: [(0, '4.511')] [2023-02-27 12:37:25,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 7561216. Throughput: 0: 823.7. Samples: 887442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:37:25,831][00107] Avg episode reward: [(0, '4.712')] [2023-02-27 12:37:27,250][36602] Updated weights for policy 0, policy_version 1848 (0.0020) [2023-02-27 12:37:30,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 7581696. Throughput: 0: 878.3. Samples: 893882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:37:30,832][00107] Avg episode reward: [(0, '5.034')] [2023-02-27 12:37:35,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 7602176. Throughput: 0: 865.3. Samples: 899546. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:37:35,834][00107] Avg episode reward: [(0, '4.924')] [2023-02-27 12:37:38,777][36602] Updated weights for policy 0, policy_version 1858 (0.0032) [2023-02-27 12:37:40,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.7, 300 sec: 3429.5). Total num frames: 7614464. Throughput: 0: 840.6. Samples: 901596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:37:40,837][00107] Avg episode reward: [(0, '4.816')] [2023-02-27 12:37:45,830][00107] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 7630848. Throughput: 0: 842.8. Samples: 906250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:37:45,832][00107] Avg episode reward: [(0, '4.745')] [2023-02-27 12:37:45,846][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001863_7630848.pth... [2023-02-27 12:37:46,042][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001662_6807552.pth [2023-02-27 12:37:49,924][36602] Updated weights for policy 0, policy_version 1868 (0.0021) [2023-02-27 12:37:50,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 7651328. Throughput: 0: 898.5. Samples: 912604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:37:50,835][00107] Avg episode reward: [(0, '4.633')] [2023-02-27 12:37:55,836][00107] Fps is (10 sec: 4093.3, 60 sec: 3481.4, 300 sec: 3443.3). Total num frames: 7671808. Throughput: 0: 898.4. Samples: 915616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:37:55,839][00107] Avg episode reward: [(0, '4.514')] [2023-02-27 12:38:00,830][00107] Fps is (10 sec: 3276.6, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 7684096. Throughput: 0: 849.1. Samples: 919664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:38:00,833][00107] Avg episode reward: [(0, '4.449')] [2023-02-27 12:38:03,265][36602] Updated weights for policy 0, policy_version 1878 (0.0014) [2023-02-27 12:38:05,830][00107] Fps is (10 sec: 2869.1, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 7700480. Throughput: 0: 860.8. Samples: 924190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:38:05,833][00107] Avg episode reward: [(0, '4.505')] [2023-02-27 12:38:10,829][00107] Fps is (10 sec: 3686.6, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 7720960. Throughput: 0: 881.7. Samples: 927120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:38:10,837][00107] Avg episode reward: [(0, '4.641')] [2023-02-27 12:38:14,228][36602] Updated weights for policy 0, policy_version 1888 (0.0020) [2023-02-27 12:38:15,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 7737344. Throughput: 0: 859.3. Samples: 932552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:38:15,834][00107] Avg episode reward: [(0, '4.464')] [2023-02-27 12:38:20,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 7749632. Throughput: 0: 822.4. Samples: 936556. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-27 12:38:20,831][00107] Avg episode reward: [(0, '4.636')] [2023-02-27 12:38:25,830][00107] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 7770112. Throughput: 0: 839.6. Samples: 939380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:38:25,832][00107] Avg episode reward: [(0, '4.604')] [2023-02-27 12:38:26,617][36602] Updated weights for policy 0, policy_version 1898 (0.0023) [2023-02-27 12:38:30,829][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 7790592. Throughput: 0: 879.4. Samples: 945824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:38:30,831][00107] Avg episode reward: [(0, '4.630')] [2023-02-27 12:38:35,832][00107] Fps is (10 sec: 3685.6, 60 sec: 3413.2, 300 sec: 3429.5). Total num frames: 7806976. Throughput: 0: 846.3. Samples: 950688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:38:35,837][00107] Avg episode reward: [(0, '4.551')] [2023-02-27 12:38:38,845][36602] Updated weights for policy 0, policy_version 1908 (0.0012) [2023-02-27 12:38:40,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 7819264. Throughput: 0: 823.6. Samples: 952674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:38:40,832][00107] Avg episode reward: [(0, '4.519')] [2023-02-27 12:38:45,830][00107] Fps is (10 sec: 3277.5, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 7839744. Throughput: 0: 856.6. Samples: 958210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:38:45,838][00107] Avg episode reward: [(0, '4.594')] [2023-02-27 12:38:49,211][36602] Updated weights for policy 0, policy_version 1918 (0.0028) [2023-02-27 12:38:50,830][00107] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 7860224. Throughput: 0: 896.9. Samples: 964552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:38:50,832][00107] Avg episode reward: [(0, '4.826')] [2023-02-27 12:38:55,830][00107] Fps is (10 sec: 3686.5, 60 sec: 3413.7, 300 sec: 3429.5). Total num frames: 7876608. Throughput: 0: 879.3. Samples: 966688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:38:55,832][00107] Avg episode reward: [(0, '4.944')] [2023-02-27 12:39:00,830][00107] Fps is (10 sec: 2867.3, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 7888896. Throughput: 0: 849.1. Samples: 970760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:39:00,832][00107] Avg episode reward: [(0, '4.666')] [2023-02-27 12:39:02,208][36602] Updated weights for policy 0, policy_version 1928 (0.0027) [2023-02-27 12:39:05,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 7909376. Throughput: 0: 878.9. Samples: 976108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:39:05,838][00107] Avg episode reward: [(0, '4.647')] [2023-02-27 12:39:10,830][00107] Fps is (10 sec: 3686.1, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 7925760. Throughput: 0: 879.3. Samples: 978950. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:39:10,836][00107] Avg episode reward: [(0, '4.816')] [2023-02-27 12:39:14,501][36602] Updated weights for policy 0, policy_version 1938 (0.0014) [2023-02-27 12:39:15,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 7938048. Throughput: 0: 839.4. Samples: 983596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:39:15,833][00107] Avg episode reward: [(0, '4.815')] [2023-02-27 12:39:20,829][00107] Fps is (10 sec: 2867.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 7954432. Throughput: 0: 827.4. Samples: 987920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:39:20,832][00107] Avg episode reward: [(0, '4.768')] [2023-02-27 12:39:25,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 7974912. Throughput: 0: 855.9. Samples: 991190. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 12:39:25,837][00107] Avg episode reward: [(0, '4.724')] [2023-02-27 12:39:26,065][36602] Updated weights for policy 0, policy_version 1948 (0.0022) [2023-02-27 12:39:30,835][00107] Fps is (10 sec: 4093.7, 60 sec: 3413.0, 300 sec: 3443.4). Total num frames: 7995392. Throughput: 0: 876.0. Samples: 997636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:39:30,838][00107] Avg episode reward: [(0, '4.798')] [2023-02-27 12:39:35,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.5, 300 sec: 3429.6). Total num frames: 8011776. Throughput: 0: 826.8. Samples: 1001760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:39:35,832][00107] Avg episode reward: [(0, '4.800')] [2023-02-27 12:39:39,039][36602] Updated weights for policy 0, policy_version 1958 (0.0013) [2023-02-27 12:39:40,829][00107] Fps is (10 sec: 2868.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 8024064. Throughput: 0: 822.0. Samples: 1003676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:39:40,832][00107] Avg episode reward: [(0, '5.053')] [2023-02-27 12:39:45,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8048640. Throughput: 0: 870.4. Samples: 1009926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:39:45,838][00107] Avg episode reward: [(0, '5.117')] [2023-02-27 12:39:45,852][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001965_8048640.pth... [2023-02-27 12:39:46,003][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001762_7217152.pth [2023-02-27 12:39:48,706][36602] Updated weights for policy 0, policy_version 1968 (0.0012) [2023-02-27 12:39:50,829][00107] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 3443.4). Total num frames: 8065024. Throughput: 0: 874.4. Samples: 1015454. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:39:50,837][00107] Avg episode reward: [(0, '4.791')] [2023-02-27 12:39:55,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 8077312. Throughput: 0: 856.9. Samples: 1017510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:39:55,834][00107] Avg episode reward: [(0, '4.670')] [2023-02-27 12:40:00,833][00107] Fps is (10 sec: 2866.2, 60 sec: 3413.1, 300 sec: 3443.4). Total num frames: 8093696. Throughput: 0: 856.6. Samples: 1022148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:40:00,838][00107] Avg episode reward: [(0, '4.775')] [2023-02-27 12:40:01,891][36602] Updated weights for policy 0, policy_version 1978 (0.0032) [2023-02-27 12:40:05,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 8114176. Throughput: 0: 883.3. Samples: 1027668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:40:05,837][00107] Avg episode reward: [(0, '4.987')] [2023-02-27 12:40:10,829][00107] Fps is (10 sec: 3687.7, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 8130560. Throughput: 0: 874.3. Samples: 1030534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:40:10,832][00107] Avg episode reward: [(0, '4.918')] [2023-02-27 12:40:15,121][36602] Updated weights for policy 0, policy_version 1988 (0.0017) [2023-02-27 12:40:15,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 8142848. Throughput: 0: 820.2. Samples: 1034540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:40:15,833][00107] Avg episode reward: [(0, '4.800')] [2023-02-27 12:40:20,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 8163328. Throughput: 0: 845.9. Samples: 1039824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-27 12:40:20,832][00107] Avg episode reward: [(0, '4.875')] [2023-02-27 12:40:25,391][36602] Updated weights for policy 0, policy_version 1998 (0.0012) [2023-02-27 12:40:25,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 8183808. Throughput: 0: 873.9. Samples: 1043002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:40:25,833][00107] Avg episode reward: [(0, '5.114')] [2023-02-27 12:40:30,833][00107] Fps is (10 sec: 3685.1, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 8200192. Throughput: 0: 859.0. Samples: 1048582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:40:30,838][00107] Avg episode reward: [(0, '5.271')] [2023-02-27 12:40:35,831][00107] Fps is (10 sec: 2866.9, 60 sec: 3345.0, 300 sec: 3415.6). Total num frames: 8212480. Throughput: 0: 825.7. Samples: 1052610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:40:35,840][00107] Avg episode reward: [(0, '5.108')] [2023-02-27 12:40:38,555][36602] Updated weights for policy 0, policy_version 2008 (0.0012) [2023-02-27 12:40:40,830][00107] Fps is (10 sec: 3278.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 8232960. Throughput: 0: 841.4. Samples: 1055372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:40:40,831][00107] Avg episode reward: [(0, '5.257')] [2023-02-27 12:40:45,830][00107] Fps is (10 sec: 4096.5, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 8253440. Throughput: 0: 879.3. Samples: 1061712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:40:45,832][00107] Avg episode reward: [(0, '5.593')] [2023-02-27 12:40:45,852][36588] Saving new best policy, reward=5.593! [2023-02-27 12:40:48,962][36602] Updated weights for policy 0, policy_version 2018 (0.0021) [2023-02-27 12:40:50,831][00107] Fps is (10 sec: 3685.8, 60 sec: 3413.2, 300 sec: 3429.5). Total num frames: 8269824. Throughput: 0: 865.6. Samples: 1066620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:40:50,836][00107] Avg episode reward: [(0, '5.561')] [2023-02-27 12:40:55,833][00107] Fps is (10 sec: 2866.3, 60 sec: 3413.1, 300 sec: 3415.6). Total num frames: 8282112. Throughput: 0: 846.6. Samples: 1068632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:40:55,836][00107] Avg episode reward: [(0, '5.377')] [2023-02-27 12:41:00,830][00107] Fps is (10 sec: 3277.3, 60 sec: 3481.8, 300 sec: 3457.3). Total num frames: 8302592. Throughput: 0: 879.8. Samples: 1074130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:41:00,832][00107] Avg episode reward: [(0, '5.250')] [2023-02-27 12:41:01,045][36602] Updated weights for policy 0, policy_version 2028 (0.0021) [2023-02-27 12:41:05,830][00107] Fps is (10 sec: 4097.3, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8323072. Throughput: 0: 886.9. Samples: 1079734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:41:05,832][00107] Avg episode reward: [(0, '5.119')] [2023-02-27 12:41:10,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 8335360. Throughput: 0: 861.5. Samples: 1081768. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:41:10,832][00107] Avg episode reward: [(0, '5.063')] [2023-02-27 12:41:14,961][36602] Updated weights for policy 0, policy_version 2038 (0.0029) [2023-02-27 12:41:15,829][00107] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 8347648. Throughput: 0: 827.2. Samples: 1085804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:41:15,831][00107] Avg episode reward: [(0, '5.243')] [2023-02-27 12:41:20,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8372224. Throughput: 0: 876.4. Samples: 1092048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:41:20,832][00107] Avg episode reward: [(0, '4.937')] [2023-02-27 12:41:24,457][36602] Updated weights for policy 0, policy_version 2048 (0.0012) [2023-02-27 12:41:25,830][00107] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8392704. Throughput: 0: 885.8. Samples: 1095234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:41:25,836][00107] Avg episode reward: [(0, '4.705')] [2023-02-27 12:41:30,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.5, 300 sec: 3415.6). Total num frames: 8404992. Throughput: 0: 849.7. Samples: 1099948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:41:30,836][00107] Avg episode reward: [(0, '4.846')] [2023-02-27 12:41:35,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3429.6). Total num frames: 8421376. Throughput: 0: 840.2. Samples: 1104426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:41:35,841][00107] Avg episode reward: [(0, '4.806')] [2023-02-27 12:41:37,458][36602] Updated weights for policy 0, policy_version 2058 (0.0014) [2023-02-27 12:41:40,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8441856. Throughput: 0: 867.2. Samples: 1107654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:41:40,831][00107] Avg episode reward: [(0, '4.669')] [2023-02-27 12:41:45,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 8462336. Throughput: 0: 887.3. Samples: 1114060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:41:45,832][00107] Avg episode reward: [(0, '4.507')] [2023-02-27 12:41:45,850][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002066_8462336.pth... [2023-02-27 12:41:46,089][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001863_7630848.pth [2023-02-27 12:41:48,569][36602] Updated weights for policy 0, policy_version 2068 (0.0035) [2023-02-27 12:41:50,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3429.6). Total num frames: 8474624. Throughput: 0: 851.5. Samples: 1118050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:41:50,835][00107] Avg episode reward: [(0, '4.508')] [2023-02-27 12:41:55,829][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.8, 300 sec: 3429.5). Total num frames: 8491008. Throughput: 0: 851.2. Samples: 1120074. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:41:55,832][00107] Avg episode reward: [(0, '4.786')] [2023-02-27 12:41:59,866][36602] Updated weights for policy 0, policy_version 2078 (0.0013) [2023-02-27 12:42:00,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8511488. Throughput: 0: 904.6. Samples: 1126512. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:42:00,832][00107] Avg episode reward: [(0, '4.725')] [2023-02-27 12:42:05,830][00107] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 8527872. Throughput: 0: 875.3. Samples: 1131436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:42:05,832][00107] Avg episode reward: [(0, '4.734')] [2023-02-27 12:42:10,830][00107] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 8540160. Throughput: 0: 843.9. Samples: 1133212. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:42:10,833][00107] Avg episode reward: [(0, '4.631')] [2023-02-27 12:42:13,990][36602] Updated weights for policy 0, policy_version 2088 (0.0025) [2023-02-27 12:42:15,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 8556544. Throughput: 0: 841.1. Samples: 1137798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:42:15,837][00107] Avg episode reward: [(0, '4.592')] [2023-02-27 12:42:20,829][00107] Fps is (10 sec: 4096.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8581120. Throughput: 0: 883.7. Samples: 1144192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:42:20,832][00107] Avg episode reward: [(0, '4.657')] [2023-02-27 12:42:23,950][36602] Updated weights for policy 0, policy_version 2098 (0.0020) [2023-02-27 12:42:25,835][00107] Fps is (10 sec: 4093.7, 60 sec: 3413.0, 300 sec: 3443.4). Total num frames: 8597504. Throughput: 0: 878.5. Samples: 1147192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:42:25,838][00107] Avg episode reward: [(0, '4.742')] [2023-02-27 12:42:30,831][00107] Fps is (10 sec: 2866.8, 60 sec: 3413.2, 300 sec: 3415.6). Total num frames: 8609792. Throughput: 0: 824.6. Samples: 1151166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:42:30,842][00107] Avg episode reward: [(0, '4.656')] [2023-02-27 12:42:35,830][00107] Fps is (10 sec: 3278.6, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 8630272. Throughput: 0: 854.0. Samples: 1156482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:42:35,831][00107] Avg episode reward: [(0, '4.618')] [2023-02-27 12:42:36,656][36602] Updated weights for policy 0, policy_version 2108 (0.0012) [2023-02-27 12:42:40,829][00107] Fps is (10 sec: 4096.7, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8650752. Throughput: 0: 878.9. Samples: 1159624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:42:40,832][00107] Avg episode reward: [(0, '4.713')] [2023-02-27 12:42:45,834][00107] Fps is (10 sec: 3684.7, 60 sec: 3413.1, 300 sec: 3443.4). Total num frames: 8667136. Throughput: 0: 857.1. Samples: 1165084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:42:45,837][00107] Avg episode reward: [(0, '4.819')] [2023-02-27 12:42:48,837][36602] Updated weights for policy 0, policy_version 2118 (0.0019) [2023-02-27 12:42:50,830][00107] Fps is (10 sec: 2867.0, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 8679424. Throughput: 0: 837.5. Samples: 1169124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:42:50,834][00107] Avg episode reward: [(0, '4.678')] [2023-02-27 12:42:55,830][00107] Fps is (10 sec: 3278.3, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 8699904. Throughput: 0: 859.8. Samples: 1171902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:42:55,835][00107] Avg episode reward: [(0, '4.629')] [2023-02-27 12:42:59,393][36602] Updated weights for policy 0, policy_version 2128 (0.0015) [2023-02-27 12:43:00,829][00107] Fps is (10 sec: 4096.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8720384. Throughput: 0: 899.6. Samples: 1178280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:43:00,837][00107] Avg episode reward: [(0, '4.572')] [2023-02-27 12:43:05,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 8732672. Throughput: 0: 852.5. Samples: 1182556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:43:05,832][00107] Avg episode reward: [(0, '4.514')] [2023-02-27 12:43:10,830][00107] Fps is (10 sec: 2457.5, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 8744960. Throughput: 0: 824.0. Samples: 1184266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:43:10,835][00107] Avg episode reward: [(0, '4.554')] [2023-02-27 12:43:13,432][36602] Updated weights for policy 0, policy_version 2138 (0.0023) [2023-02-27 12:43:15,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 8765440. Throughput: 0: 855.6. Samples: 1189668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:43:15,838][00107] Avg episode reward: [(0, '4.656')] [2023-02-27 12:43:20,829][00107] Fps is (10 sec: 4096.2, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 8785920. Throughput: 0: 878.5. Samples: 1196016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:43:20,831][00107] Avg episode reward: [(0, '4.816')] [2023-02-27 12:43:24,434][36602] Updated weights for policy 0, policy_version 2148 (0.0019) [2023-02-27 12:43:25,832][00107] Fps is (10 sec: 3685.6, 60 sec: 3413.5, 300 sec: 3429.5). Total num frames: 8802304. Throughput: 0: 856.4. Samples: 1198164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:43:25,835][00107] Avg episode reward: [(0, '4.860')] [2023-02-27 12:43:30,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.4, 300 sec: 3415.7). Total num frames: 8814592. Throughput: 0: 825.6. Samples: 1202230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:43:30,832][00107] Avg episode reward: [(0, '4.908')] [2023-02-27 12:43:35,830][00107] Fps is (10 sec: 3277.5, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 8835072. Throughput: 0: 871.3. Samples: 1208332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:43:35,840][00107] Avg episode reward: [(0, '4.758')] [2023-02-27 12:43:36,171][36602] Updated weights for policy 0, policy_version 2158 (0.0015) [2023-02-27 12:43:40,834][00107] Fps is (10 sec: 4094.4, 60 sec: 3413.1, 300 sec: 3443.4). Total num frames: 8855552. Throughput: 0: 880.6. Samples: 1211532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:43:40,836][00107] Avg episode reward: [(0, '4.679')] [2023-02-27 12:43:45,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3345.3, 300 sec: 3415.6). Total num frames: 8867840. Throughput: 0: 841.4. Samples: 1216144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:43:45,837][00107] Avg episode reward: [(0, '4.646')] [2023-02-27 12:43:45,948][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002166_8871936.pth... [2023-02-27 12:43:46,124][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001965_8048640.pth [2023-02-27 12:43:49,192][36602] Updated weights for policy 0, policy_version 2168 (0.0020) [2023-02-27 12:43:50,830][00107] Fps is (10 sec: 2868.3, 60 sec: 3413.4, 300 sec: 3415.6). Total num frames: 8884224. Throughput: 0: 842.7. Samples: 1220478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:43:50,833][00107] Avg episode reward: [(0, '4.776')] [2023-02-27 12:43:55,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 8904704. Throughput: 0: 875.1. Samples: 1223646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:43:55,833][00107] Avg episode reward: [(0, '4.777')] [2023-02-27 12:43:58,785][36602] Updated weights for policy 0, policy_version 2178 (0.0015) [2023-02-27 12:44:00,834][00107] Fps is (10 sec: 4094.1, 60 sec: 3413.1, 300 sec: 3443.4). Total num frames: 8925184. Throughput: 0: 898.7. Samples: 1230112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:44:00,837][00107] Avg episode reward: [(0, '4.554')] [2023-02-27 12:44:05,830][00107] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 8937472. Throughput: 0: 833.9. Samples: 1233540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:44:05,834][00107] Avg episode reward: [(0, '4.533')] [2023-02-27 12:44:10,829][00107] Fps is (10 sec: 2868.6, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 8953856. Throughput: 0: 826.5. Samples: 1235354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:44:10,832][00107] Avg episode reward: [(0, '4.734')] [2023-02-27 12:44:12,676][36602] Updated weights for policy 0, policy_version 2188 (0.0020) [2023-02-27 12:44:15,830][00107] Fps is (10 sec: 3686.6, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8974336. Throughput: 0: 874.8. Samples: 1241598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:44:15,832][00107] Avg episode reward: [(0, '4.898')] [2023-02-27 12:44:20,830][00107] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 8990720. Throughput: 0: 867.2. Samples: 1247354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:44:20,832][00107] Avg episode reward: [(0, '4.708')] [2023-02-27 12:44:24,212][36602] Updated weights for policy 0, policy_version 2198 (0.0012) [2023-02-27 12:44:25,833][00107] Fps is (10 sec: 3275.6, 60 sec: 3413.3, 300 sec: 3429.6). Total num frames: 9007104. Throughput: 0: 841.0. Samples: 1249378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:44:25,840][00107] Avg episode reward: [(0, '4.593')] [2023-02-27 12:44:30,830][00107] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 9023488. Throughput: 0: 840.2. Samples: 1253954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:44:30,839][00107] Avg episode reward: [(0, '4.668')] [2023-02-27 12:44:35,287][36602] Updated weights for policy 0, policy_version 2208 (0.0033) [2023-02-27 12:44:35,829][00107] Fps is (10 sec: 3687.7, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 9043968. Throughput: 0: 886.5. Samples: 1260370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:44:35,832][00107] Avg episode reward: [(0, '4.770')] [2023-02-27 12:44:40,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.6, 300 sec: 3429.5). Total num frames: 9060352. Throughput: 0: 884.9. Samples: 1263466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:44:40,834][00107] Avg episode reward: [(0, '4.587')] [2023-02-27 12:44:45,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 9076736. Throughput: 0: 831.2. Samples: 1267510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:44:45,843][00107] Avg episode reward: [(0, '4.547')] [2023-02-27 12:44:48,199][36602] Updated weights for policy 0, policy_version 2218 (0.0012) [2023-02-27 12:44:50,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 9093120. Throughput: 0: 875.3. Samples: 1272928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:44:50,832][00107] Avg episode reward: [(0, '4.650')] [2023-02-27 12:44:55,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 9117696. Throughput: 0: 905.2. Samples: 1276090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:44:55,831][00107] Avg episode reward: [(0, '4.614')] [2023-02-27 12:44:57,893][36602] Updated weights for policy 0, policy_version 2228 (0.0014) [2023-02-27 12:45:00,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.6, 300 sec: 3443.4). Total num frames: 9129984. Throughput: 0: 890.4. Samples: 1281664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:45:00,831][00107] Avg episode reward: [(0, '4.648')] [2023-02-27 12:45:05,830][00107] Fps is (10 sec: 2457.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9142272. Throughput: 0: 840.6. Samples: 1285180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:45:05,841][00107] Avg episode reward: [(0, '4.748')] [2023-02-27 12:45:10,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 9162752. Throughput: 0: 845.7. Samples: 1287432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:45:10,832][00107] Avg episode reward: [(0, '4.993')] [2023-02-27 12:45:11,701][36602] Updated weights for policy 0, policy_version 2238 (0.0016) [2023-02-27 12:45:15,830][00107] Fps is (10 sec: 4096.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 9183232. Throughput: 0: 885.8. Samples: 1293814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:45:15,832][00107] Avg episode reward: [(0, '4.918')] [2023-02-27 12:45:20,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9195520. Throughput: 0: 853.4. Samples: 1298774. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:45:20,835][00107] Avg episode reward: [(0, '4.560')] [2023-02-27 12:45:24,105][36602] Updated weights for policy 0, policy_version 2248 (0.0020) [2023-02-27 12:45:25,830][00107] Fps is (10 sec: 2867.0, 60 sec: 3413.5, 300 sec: 3429.6). Total num frames: 9211904. Throughput: 0: 827.4. Samples: 1300700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:45:25,839][00107] Avg episode reward: [(0, '4.520')] [2023-02-27 12:45:30,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 9232384. Throughput: 0: 854.4. Samples: 1305956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:45:30,837][00107] Avg episode reward: [(0, '4.632')] [2023-02-27 12:45:34,709][36602] Updated weights for policy 0, policy_version 2258 (0.0014) [2023-02-27 12:45:35,829][00107] Fps is (10 sec: 4096.3, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 9252864. Throughput: 0: 874.8. Samples: 1312296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:45:35,838][00107] Avg episode reward: [(0, '4.591')] [2023-02-27 12:45:40,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9265152. Throughput: 0: 857.5. Samples: 1314676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:45:40,832][00107] Avg episode reward: [(0, '4.636')] [2023-02-27 12:45:45,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9281536. Throughput: 0: 821.9. Samples: 1318650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:45:45,839][00107] Avg episode reward: [(0, '4.605')] [2023-02-27 12:45:45,857][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002266_9281536.pth... [2023-02-27 12:45:46,001][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002066_8462336.pth [2023-02-27 12:45:47,850][36602] Updated weights for policy 0, policy_version 2268 (0.0028) [2023-02-27 12:45:50,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.5). Total num frames: 9297920. Throughput: 0: 871.3. Samples: 1324386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:45:50,837][00107] Avg episode reward: [(0, '4.790')] [2023-02-27 12:45:55,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 9322496. Throughput: 0: 890.2. Samples: 1327492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:45:55,832][00107] Avg episode reward: [(0, '4.705')] [2023-02-27 12:45:58,747][36602] Updated weights for policy 0, policy_version 2278 (0.0017) [2023-02-27 12:46:00,830][00107] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9334784. Throughput: 0: 859.2. Samples: 1332480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:46:00,836][00107] Avg episode reward: [(0, '4.654')] [2023-02-27 12:46:05,830][00107] Fps is (10 sec: 2457.6, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 9347072. Throughput: 0: 830.4. Samples: 1336140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-27 12:46:05,832][00107] Avg episode reward: [(0, '4.775')] [2023-02-27 12:46:10,829][00107] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 9367552. Throughput: 0: 850.0. Samples: 1338950. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 12:46:10,831][00107] Avg episode reward: [(0, '4.834')] [2023-02-27 12:46:11,697][36602] Updated weights for policy 0, policy_version 2288 (0.0015) [2023-02-27 12:46:15,829][00107] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 9388032. Throughput: 0: 872.9. Samples: 1345236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:46:15,836][00107] Avg episode reward: [(0, '4.801')] [2023-02-27 12:46:20,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 9400320. Throughput: 0: 825.0. Samples: 1349422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:46:20,838][00107] Avg episode reward: [(0, '4.701')] [2023-02-27 12:46:24,730][36602] Updated weights for policy 0, policy_version 2298 (0.0023) [2023-02-27 12:46:25,830][00107] Fps is (10 sec: 2867.1, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 9416704. Throughput: 0: 817.7. Samples: 1351472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:46:25,833][00107] Avg episode reward: [(0, '4.823')] [2023-02-27 12:46:30,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 9437184. Throughput: 0: 865.8. Samples: 1357610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-27 12:46:30,832][00107] Avg episode reward: [(0, '4.681')] [2023-02-27 12:46:34,185][36602] Updated weights for policy 0, policy_version 2308 (0.0012) [2023-02-27 12:46:35,831][00107] Fps is (10 sec: 4095.4, 60 sec: 3413.2, 300 sec: 3443.4). Total num frames: 9457664. Throughput: 0: 872.2. Samples: 1363636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:46:35,834][00107] Avg episode reward: [(0, '4.869')] [2023-02-27 12:46:40,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 9469952. Throughput: 0: 847.6. Samples: 1365636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:46:40,836][00107] Avg episode reward: [(0, '5.108')] [2023-02-27 12:46:45,830][00107] Fps is (10 sec: 2867.7, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9486336. Throughput: 0: 837.2. Samples: 1370154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:46:45,831][00107] Avg episode reward: [(0, '4.948')] [2023-02-27 12:46:47,091][36602] Updated weights for policy 0, policy_version 2318 (0.0026) [2023-02-27 12:46:50,829][00107] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 9510912. Throughput: 0: 897.9. Samples: 1376546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:46:50,837][00107] Avg episode reward: [(0, '4.925')] [2023-02-27 12:46:55,836][00107] Fps is (10 sec: 4093.4, 60 sec: 3413.0, 300 sec: 3443.3). Total num frames: 9527296. Throughput: 0: 907.8. Samples: 1379808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:46:55,838][00107] Avg episode reward: [(0, '4.732')] [2023-02-27 12:46:58,129][36602] Updated weights for policy 0, policy_version 2328 (0.0012) [2023-02-27 12:47:00,829][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9539584. Throughput: 0: 861.4. Samples: 1384000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:47:00,832][00107] Avg episode reward: [(0, '4.757')] [2023-02-27 12:47:05,830][00107] Fps is (10 sec: 2869.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 9555968. Throughput: 0: 865.2. Samples: 1388358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:47:05,833][00107] Avg episode reward: [(0, '4.973')] [2023-02-27 12:47:10,230][36602] Updated weights for policy 0, policy_version 2338 (0.0013) [2023-02-27 12:47:10,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 9576448. Throughput: 0: 883.7. Samples: 1391240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:47:10,832][00107] Avg episode reward: [(0, '5.040')] [2023-02-27 12:47:15,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9592832. Throughput: 0: 870.0. Samples: 1396762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:47:15,838][00107] Avg episode reward: [(0, '4.982')] [2023-02-27 12:47:20,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 9605120. Throughput: 0: 817.9. Samples: 1400440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:47:20,837][00107] Avg episode reward: [(0, '4.808')] [2023-02-27 12:47:24,611][36602] Updated weights for policy 0, policy_version 2348 (0.0034) [2023-02-27 12:47:25,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9621504. Throughput: 0: 821.4. Samples: 1402598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:47:25,831][00107] Avg episode reward: [(0, '4.679')] [2023-02-27 12:47:30,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9641984. Throughput: 0: 857.0. Samples: 1408718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:47:30,832][00107] Avg episode reward: [(0, '4.632')] [2023-02-27 12:47:35,292][36602] Updated weights for policy 0, policy_version 2358 (0.0012) [2023-02-27 12:47:35,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3415.6). Total num frames: 9658368. Throughput: 0: 830.1. Samples: 1413902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:47:35,836][00107] Avg episode reward: [(0, '4.591')] [2023-02-27 12:47:40,829][00107] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 9670656. Throughput: 0: 801.6. Samples: 1415874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:47:40,836][00107] Avg episode reward: [(0, '4.721')] [2023-02-27 12:47:45,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9691136. Throughput: 0: 820.4. Samples: 1420920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:47:45,832][00107] Avg episode reward: [(0, '5.031')] [2023-02-27 12:47:45,850][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002366_9691136.pth... [2023-02-27 12:47:45,993][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002166_8871936.pth [2023-02-27 12:47:47,481][36602] Updated weights for policy 0, policy_version 2368 (0.0021) [2023-02-27 12:47:50,829][00107] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 9711616. Throughput: 0: 861.9. Samples: 1427144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:47:50,831][00107] Avg episode reward: [(0, '4.928')] [2023-02-27 12:47:55,830][00107] Fps is (10 sec: 3686.2, 60 sec: 3345.4, 300 sec: 3415.6). Total num frames: 9728000. Throughput: 0: 853.9. Samples: 1429668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:47:55,833][00107] Avg episode reward: [(0, '4.615')] [2023-02-27 12:48:00,258][36602] Updated weights for policy 0, policy_version 2378 (0.0018) [2023-02-27 12:48:00,830][00107] Fps is (10 sec: 2867.1, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 9740288. Throughput: 0: 820.8. Samples: 1433700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:48:00,835][00107] Avg episode reward: [(0, '4.628')] [2023-02-27 12:48:05,830][00107] Fps is (10 sec: 2867.3, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 9756672. Throughput: 0: 850.2. Samples: 1438700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:48:05,838][00107] Avg episode reward: [(0, '4.842')] [2023-02-27 12:48:10,830][00107] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 9777152. Throughput: 0: 864.8. Samples: 1441516. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:48:10,832][00107] Avg episode reward: [(0, '4.837')] [2023-02-27 12:48:11,369][36602] Updated weights for policy 0, policy_version 2388 (0.0021) [2023-02-27 12:48:15,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3401.8). Total num frames: 9789440. Throughput: 0: 840.1. Samples: 1446524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:48:15,836][00107] Avg episode reward: [(0, '4.750')] [2023-02-27 12:48:20,829][00107] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 9805824. Throughput: 0: 818.5. Samples: 1450734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:48:20,832][00107] Avg episode reward: [(0, '4.594')] [2023-02-27 12:48:24,474][36602] Updated weights for policy 0, policy_version 2398 (0.0012) [2023-02-27 12:48:25,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9826304. Throughput: 0: 845.2. Samples: 1453910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 12:48:25,832][00107] Avg episode reward: [(0, '4.654')] [2023-02-27 12:48:30,834][00107] Fps is (10 sec: 4094.1, 60 sec: 3413.1, 300 sec: 3429.5). Total num frames: 9846784. Throughput: 0: 874.8. Samples: 1460292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:48:30,840][00107] Avg episode reward: [(0, '4.669')] [2023-02-27 12:48:35,830][00107] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 9859072. Throughput: 0: 830.7. Samples: 1464524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:48:35,840][00107] Avg episode reward: [(0, '4.733')] [2023-02-27 12:48:36,138][36602] Updated weights for policy 0, policy_version 2408 (0.0018) [2023-02-27 12:48:40,830][00107] Fps is (10 sec: 2868.5, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 9875456. Throughput: 0: 819.1. Samples: 1466528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:48:40,832][00107] Avg episode reward: [(0, '4.784')] [2023-02-27 12:48:45,830][00107] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9895936. Throughput: 0: 863.8. Samples: 1472570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 12:48:45,838][00107] Avg episode reward: [(0, '4.677')] [2023-02-27 12:48:47,098][36602] Updated weights for policy 0, policy_version 2418 (0.0021) [2023-02-27 12:48:50,829][00107] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9916416. Throughput: 0: 883.0. Samples: 1478436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:48:50,831][00107] Avg episode reward: [(0, '4.656')] [2023-02-27 12:48:55,832][00107] Fps is (10 sec: 3276.0, 60 sec: 3345.0, 300 sec: 3401.8). Total num frames: 9928704. Throughput: 0: 865.4. Samples: 1480462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:48:55,840][00107] Avg episode reward: [(0, '4.731')] [2023-02-27 12:49:00,049][36602] Updated weights for policy 0, policy_version 2428 (0.0013) [2023-02-27 12:49:00,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 9945088. Throughput: 0: 852.5. Samples: 1484886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:49:00,832][00107] Avg episode reward: [(0, '4.930')] [2023-02-27 12:49:05,830][00107] Fps is (10 sec: 3687.3, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 9965568. Throughput: 0: 880.4. Samples: 1490352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:49:05,837][00107] Avg episode reward: [(0, '4.883')] [2023-02-27 12:49:10,831][00107] Fps is (10 sec: 3685.8, 60 sec: 3413.2, 300 sec: 3415.6). Total num frames: 9981952. Throughput: 0: 874.6. Samples: 1493268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:49:10,834][00107] Avg episode reward: [(0, '4.749')] [2023-02-27 12:49:12,188][36602] Updated weights for policy 0, policy_version 2438 (0.0013) [2023-02-27 12:49:15,830][00107] Fps is (10 sec: 2867.0, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 9994240. Throughput: 0: 823.0. Samples: 1497322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-27 12:49:15,833][00107] Avg episode reward: [(0, '4.705')] [2023-02-27 12:49:19,273][36588] Stopping Batcher_0... [2023-02-27 12:49:19,275][36588] Loop batcher_evt_loop terminating... [2023-02-27 12:49:19,276][00107] Component Batcher_0 stopped! [2023-02-27 12:49:19,284][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-02-27 12:49:19,320][36602] Weights refcount: 2 0 [2023-02-27 12:49:19,328][36602] Stopping InferenceWorker_p0-w0... [2023-02-27 12:49:19,329][36602] Loop inference_proc0-0_evt_loop terminating... [2023-02-27 12:49:19,329][00107] Component InferenceWorker_p0-w0 stopped! [2023-02-27 12:49:19,428][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002266_9281536.pth [2023-02-27 12:49:19,443][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-02-27 12:49:19,501][36619] Stopping RolloutWorker_w3... [2023-02-27 12:49:19,501][00107] Component RolloutWorker_w3 stopped! [2023-02-27 12:49:19,511][36605] Stopping RolloutWorker_w2... [2023-02-27 12:49:19,511][36605] Loop rollout_proc2_evt_loop terminating... [2023-02-27 12:49:19,513][36615] Stopping RolloutWorker_w4... [2023-02-27 12:49:19,512][00107] Component RolloutWorker_w2 stopped! [2023-02-27 12:49:19,516][36617] Stopping RolloutWorker_w6... [2023-02-27 12:49:19,519][36617] Loop rollout_proc6_evt_loop terminating... [2023-02-27 12:49:19,520][36615] Loop rollout_proc4_evt_loop terminating... [2023-02-27 12:49:19,516][00107] Component RolloutWorker_w4 stopped! [2023-02-27 12:49:19,521][00107] Component RolloutWorker_w6 stopped! [2023-02-27 12:49:19,503][36619] Loop rollout_proc3_evt_loop terminating... [2023-02-27 12:49:19,524][36611] Stopping RolloutWorker_w0... [2023-02-27 12:49:19,524][00107] Component RolloutWorker_w0 stopped! [2023-02-27 12:49:19,527][36611] Loop rollout_proc0_evt_loop terminating... [2023-02-27 12:49:19,541][00107] Component RolloutWorker_w7 stopped! [2023-02-27 12:49:19,543][36625] Stopping RolloutWorker_w7... [2023-02-27 12:49:19,545][36625] Loop rollout_proc7_evt_loop terminating... [2023-02-27 12:49:19,553][36613] Stopping RolloutWorker_w5... [2023-02-27 12:49:19,552][00107] Component RolloutWorker_w5 stopped! [2023-02-27 12:49:19,563][36613] Loop rollout_proc5_evt_loop terminating... [2023-02-27 12:49:19,611][36603] Stopping RolloutWorker_w1... [2023-02-27 12:49:19,611][00107] Component RolloutWorker_w1 stopped! [2023-02-27 12:49:19,619][36603] Loop rollout_proc1_evt_loop terminating... [2023-02-27 12:49:19,695][00107] Component LearnerWorker_p0 stopped! [2023-02-27 12:49:19,702][00107] Waiting for process learner_proc0 to stop... [2023-02-27 12:49:19,695][36588] Stopping LearnerWorker_p0... [2023-02-27 12:49:19,709][36588] Loop learner_proc0_evt_loop terminating... [2023-02-27 12:49:22,321][00107] Waiting for process inference_proc0-0 to join... [2023-02-27 12:49:22,405][00107] Waiting for process rollout_proc0 to join... [2023-02-27 12:49:22,407][00107] Waiting for process rollout_proc1 to join... [2023-02-27 12:49:22,500][00107] Waiting for process rollout_proc2 to join... [2023-02-27 12:49:22,502][00107] Waiting for process rollout_proc3 to join... [2023-02-27 12:49:22,509][00107] Waiting for process rollout_proc4 to join... [2023-02-27 12:49:22,513][00107] Waiting for process rollout_proc5 to join... [2023-02-27 12:49:22,515][00107] Waiting for process rollout_proc6 to join... [2023-02-27 12:49:22,518][00107] Waiting for process rollout_proc7 to join... [2023-02-27 12:49:22,519][00107] Batcher 0 profile tree view: batching: 40.1217, releasing_batches: 0.0466 [2023-02-27 12:49:22,521][00107] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 806.4133 update_model: 12.7522 weight_update: 0.0022 one_step: 0.0070 handle_policy_step: 862.4692 deserialize: 25.7586, stack: 5.0748, obs_to_device_normalize: 190.8770, forward: 415.7173, send_messages: 44.3556 prepare_outputs: 136.8741 to_cpu: 82.9091 [2023-02-27 12:49:22,523][00107] Learner 0 profile tree view: misc: 0.0093, prepare_batch: 24.1627 train: 121.8860 epoch_init: 0.0093, minibatch_init: 0.0207, losses_postprocess: 0.9822, kl_divergence: 0.9118, after_optimizer: 4.9356 calculate_losses: 41.8561 losses_init: 0.0051, forward_head: 3.0213, bptt_initial: 27.2348, tail: 1.8025, advantages_returns: 0.4406, losses: 5.3536 bptt: 3.4238 bptt_forward_core: 3.2784 update: 71.9802 clip: 2.2662 [2023-02-27 12:49:22,526][00107] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4486, enqueue_policy_requests: 224.9518, env_step: 1330.9066, overhead: 37.4617, complete_rollouts: 10.6690 save_policy_outputs: 32.9040 split_output_tensors: 15.8306 [2023-02-27 12:49:22,528][00107] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.6556, enqueue_policy_requests: 222.0219, env_step: 1335.7480, overhead: 37.5753, complete_rollouts: 11.2510 save_policy_outputs: 31.9910 split_output_tensors: 15.5536 [2023-02-27 12:49:22,531][00107] Loop Runner_EvtLoop terminating... [2023-02-27 12:49:22,534][00107] Runner profile tree view: main_loop: 1774.6673 [2023-02-27 12:49:22,537][00107] Collected {0: 10006528}, FPS: 3381.3 [2023-02-27 12:51:38,512][00107] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 12:51:38,515][00107] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 12:51:38,517][00107] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 12:51:38,519][00107] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 12:51:38,523][00107] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 12:51:38,524][00107] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 12:51:38,525][00107] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 12:51:38,530][00107] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-27 12:51:38,531][00107] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-27 12:51:38,532][00107] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-27 12:51:38,533][00107] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 12:51:38,534][00107] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 12:51:38,535][00107] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 12:51:38,538][00107] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 12:51:38,540][00107] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 12:51:38,572][00107] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 12:51:38,575][00107] RunningMeanStd input shape: (1,) [2023-02-27 12:51:38,598][00107] ConvEncoder: input_channels=3 [2023-02-27 12:51:38,673][00107] Conv encoder output size: 512 [2023-02-27 12:51:38,676][00107] Policy head output size: 512 [2023-02-27 12:51:38,724][00107] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-02-27 12:51:39,495][00107] Num frames 100... [2023-02-27 12:51:39,673][00107] Num frames 200... [2023-02-27 12:51:39,864][00107] Num frames 300... [2023-02-27 12:51:40,047][00107] Num frames 400... [2023-02-27 12:51:40,190][00107] Avg episode rewards: #0: 6.480, true rewards: #0: 4.480 [2023-02-27 12:51:40,192][00107] Avg episode reward: 6.480, avg true_objective: 4.480 [2023-02-27 12:51:40,295][00107] Num frames 500... [2023-02-27 12:51:40,480][00107] Num frames 600... [2023-02-27 12:51:40,662][00107] Num frames 700... [2023-02-27 12:51:40,851][00107] Num frames 800... [2023-02-27 12:51:40,965][00107] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160 [2023-02-27 12:51:40,968][00107] Avg episode reward: 5.160, avg true_objective: 4.160 [2023-02-27 12:51:41,087][00107] Num frames 900... [2023-02-27 12:51:41,283][00107] Num frames 1000... [2023-02-27 12:51:41,465][00107] Num frames 1100... [2023-02-27 12:51:41,645][00107] Num frames 1200... [2023-02-27 12:51:41,768][00107] Num frames 1300... [2023-02-27 12:51:41,843][00107] Avg episode rewards: #0: 5.707, true rewards: #0: 4.373 [2023-02-27 12:51:41,845][00107] Avg episode reward: 5.707, avg true_objective: 4.373 [2023-02-27 12:51:41,956][00107] Num frames 1400... [2023-02-27 12:51:42,084][00107] Num frames 1500... [2023-02-27 12:51:42,210][00107] Num frames 1600... [2023-02-27 12:51:42,383][00107] Avg episode rewards: #0: 5.240, true rewards: #0: 4.240 [2023-02-27 12:51:42,385][00107] Avg episode reward: 5.240, avg true_objective: 4.240 [2023-02-27 12:51:42,395][00107] Num frames 1700... [2023-02-27 12:51:42,523][00107] Num frames 1800... [2023-02-27 12:51:42,643][00107] Num frames 1900... [2023-02-27 12:51:42,770][00107] Num frames 2000... [2023-02-27 12:51:42,923][00107] Avg episode rewards: #0: 4.960, true rewards: #0: 4.160 [2023-02-27 12:51:42,924][00107] Avg episode reward: 4.960, avg true_objective: 4.160 [2023-02-27 12:51:42,953][00107] Num frames 2100... [2023-02-27 12:51:43,071][00107] Num frames 2200... [2023-02-27 12:51:43,197][00107] Num frames 2300... [2023-02-27 12:51:43,303][00107] Avg episode rewards: #0: 4.560, true rewards: #0: 3.893 [2023-02-27 12:51:43,306][00107] Avg episode reward: 4.560, avg true_objective: 3.893 [2023-02-27 12:51:43,387][00107] Num frames 2400... [2023-02-27 12:51:43,508][00107] Num frames 2500... [2023-02-27 12:51:43,635][00107] Num frames 2600... [2023-02-27 12:51:43,756][00107] Num frames 2700... [2023-02-27 12:51:43,881][00107] Num frames 2800... [2023-02-27 12:51:44,035][00107] Avg episode rewards: #0: 4.829, true rewards: #0: 4.114 [2023-02-27 12:51:44,036][00107] Avg episode reward: 4.829, avg true_objective: 4.114 [2023-02-27 12:51:44,066][00107] Num frames 2900... [2023-02-27 12:51:44,195][00107] Num frames 3000... [2023-02-27 12:51:44,315][00107] Num frames 3100... [2023-02-27 12:51:44,446][00107] Num frames 3200... [2023-02-27 12:51:44,566][00107] Num frames 3300... [2023-02-27 12:51:44,732][00107] Avg episode rewards: #0: 5.115, true rewards: #0: 4.240 [2023-02-27 12:51:44,734][00107] Avg episode reward: 5.115, avg true_objective: 4.240 [2023-02-27 12:51:44,747][00107] Num frames 3400... [2023-02-27 12:51:44,873][00107] Num frames 3500... [2023-02-27 12:51:45,005][00107] Num frames 3600... [2023-02-27 12:51:45,126][00107] Num frames 3700... [2023-02-27 12:51:45,257][00107] Num frames 3800... [2023-02-27 12:51:45,391][00107] Num frames 3900... [2023-02-27 12:51:45,527][00107] Avg episode rewards: #0: 5.373, true rewards: #0: 4.373 [2023-02-27 12:51:45,529][00107] Avg episode reward: 5.373, avg true_objective: 4.373 [2023-02-27 12:51:45,692][00107] Num frames 4000... [2023-02-27 12:51:45,888][00107] Num frames 4100... [2023-02-27 12:51:46,156][00107] Num frames 4200... [2023-02-27 12:51:46,368][00107] Num frames 4300... [2023-02-27 12:51:46,558][00107] Avg episode rewards: #0: 5.252, true rewards: #0: 4.352 [2023-02-27 12:51:46,560][00107] Avg episode reward: 5.252, avg true_objective: 4.352 [2023-02-27 12:52:12,107][00107] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-27 12:54:07,980][00107] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 12:54:07,983][00107] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 12:54:07,990][00107] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 12:54:07,992][00107] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 12:54:07,994][00107] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 12:54:07,997][00107] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 12:54:08,005][00107] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-27 12:54:08,008][00107] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-27 12:54:08,009][00107] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-27 12:54:08,012][00107] Adding new argument 'hf_repository'='KoRiF/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-27 12:54:08,014][00107] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 12:54:08,016][00107] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 12:54:08,019][00107] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 12:54:08,021][00107] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 12:54:08,023][00107] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 12:54:08,049][00107] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 12:54:08,054][00107] RunningMeanStd input shape: (1,) [2023-02-27 12:54:08,076][00107] ConvEncoder: input_channels=3 [2023-02-27 12:54:08,136][00107] Conv encoder output size: 512 [2023-02-27 12:54:08,142][00107] Policy head output size: 512 [2023-02-27 12:54:08,173][00107] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-02-27 12:54:08,879][00107] Num frames 100... [2023-02-27 12:54:09,077][00107] Num frames 200... [2023-02-27 12:54:09,244][00107] Num frames 300... [2023-02-27 12:54:09,385][00107] Num frames 400... [2023-02-27 12:54:09,504][00107] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2023-02-27 12:54:09,506][00107] Avg episode reward: 5.480, avg true_objective: 4.480 [2023-02-27 12:54:09,576][00107] Num frames 500... [2023-02-27 12:54:09,698][00107] Num frames 600... [2023-02-27 12:54:09,832][00107] Num frames 700... [2023-02-27 12:54:09,967][00107] Num frames 800... [2023-02-27 12:54:10,088][00107] Num frames 900... [2023-02-27 12:54:10,185][00107] Avg episode rewards: #0: 6.140, true rewards: #0: 4.640 [2023-02-27 12:54:10,187][00107] Avg episode reward: 6.140, avg true_objective: 4.640 [2023-02-27 12:54:10,295][00107] Num frames 1000... [2023-02-27 12:54:10,414][00107] Num frames 1100... [2023-02-27 12:54:10,536][00107] Num frames 1200... [2023-02-27 12:54:10,658][00107] Num frames 1300... [2023-02-27 12:54:10,731][00107] Avg episode rewards: #0: 5.373, true rewards: #0: 4.373 [2023-02-27 12:54:10,733][00107] Avg episode reward: 5.373, avg true_objective: 4.373 [2023-02-27 12:54:10,841][00107] Num frames 1400... [2023-02-27 12:54:10,979][00107] Num frames 1500... [2023-02-27 12:54:11,107][00107] Num frames 1600... [2023-02-27 12:54:11,280][00107] Avg episode rewards: #0: 4.990, true rewards: #0: 4.240 [2023-02-27 12:54:11,282][00107] Avg episode reward: 4.990, avg true_objective: 4.240 [2023-02-27 12:54:11,293][00107] Num frames 1700... [2023-02-27 12:54:11,428][00107] Num frames 1800... [2023-02-27 12:54:11,557][00107] Num frames 1900... [2023-02-27 12:54:11,688][00107] Num frames 2000... [2023-02-27 12:54:11,819][00107] Num frames 2100... [2023-02-27 12:54:11,930][00107] Avg episode rewards: #0: 5.088, true rewards: #0: 4.288 [2023-02-27 12:54:11,932][00107] Avg episode reward: 5.088, avg true_objective: 4.288 [2023-02-27 12:54:12,019][00107] Num frames 2200... [2023-02-27 12:54:12,152][00107] Num frames 2300... [2023-02-27 12:54:12,286][00107] Num frames 2400... [2023-02-27 12:54:12,429][00107] Num frames 2500... [2023-02-27 12:54:12,521][00107] Avg episode rewards: #0: 4.880, true rewards: #0: 4.213 [2023-02-27 12:54:12,522][00107] Avg episode reward: 4.880, avg true_objective: 4.213 [2023-02-27 12:54:12,614][00107] Num frames 2600... [2023-02-27 12:54:12,742][00107] Num frames 2700... [2023-02-27 12:54:12,872][00107] Num frames 2800... [2023-02-27 12:54:13,006][00107] Num frames 2900... [2023-02-27 12:54:13,141][00107] Avg episode rewards: #0: 4.920, true rewards: #0: 4.206 [2023-02-27 12:54:13,142][00107] Avg episode reward: 4.920, avg true_objective: 4.206 [2023-02-27 12:54:13,218][00107] Num frames 3000... [2023-02-27 12:54:13,345][00107] Num frames 3100... [2023-02-27 12:54:13,469][00107] Num frames 3200... [2023-02-27 12:54:13,598][00107] Num frames 3300... [2023-02-27 12:54:13,725][00107] Avg episode rewards: #0: 5.075, true rewards: #0: 4.200 [2023-02-27 12:54:13,726][00107] Avg episode reward: 5.075, avg true_objective: 4.200 [2023-02-27 12:54:13,779][00107] Num frames 3400... [2023-02-27 12:54:13,898][00107] Num frames 3500... [2023-02-27 12:54:14,030][00107] Num frames 3600... [2023-02-27 12:54:14,169][00107] Num frames 3700... [2023-02-27 12:54:14,280][00107] Avg episode rewards: #0: 4.938, true rewards: #0: 4.160 [2023-02-27 12:54:14,283][00107] Avg episode reward: 4.938, avg true_objective: 4.160 [2023-02-27 12:54:14,357][00107] Num frames 3800... [2023-02-27 12:54:14,485][00107] Num frames 3900... [2023-02-27 12:54:14,612][00107] Num frames 4000... [2023-02-27 12:54:14,731][00107] Num frames 4100... [2023-02-27 12:54:14,818][00107] Avg episode rewards: #0: 4.828, true rewards: #0: 4.128 [2023-02-27 12:54:14,820][00107] Avg episode reward: 4.828, avg true_objective: 4.128 [2023-02-27 12:54:36,484][00107] Replay video saved to /content/train_dir/default_experiment/replay.mp4!