[2024-07-05 00:02:20,879][45457] Saving configuration to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/config.json... [2024-07-05 00:02:20,880][45457] Rollout worker 0 uses device cpu [2024-07-05 00:02:20,881][45457] Rollout worker 1 uses device cpu [2024-07-05 00:02:20,881][45457] Rollout worker 2 uses device cpu [2024-07-05 00:02:20,881][45457] Rollout worker 3 uses device cpu [2024-07-05 00:02:20,881][45457] Rollout worker 4 uses device cpu [2024-07-05 00:02:20,882][45457] Rollout worker 5 uses device cpu [2024-07-05 00:02:20,882][45457] Rollout worker 6 uses device cpu [2024-07-05 00:02:20,882][45457] Rollout worker 7 uses device cpu [2024-07-05 00:02:20,916][45457] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 00:02:20,917][45457] InferenceWorker_p0-w0: min num requests: 2 [2024-07-05 00:02:20,945][45457] Starting all processes... [2024-07-05 00:02:20,945][45457] Starting process learner_proc0 [2024-07-05 00:02:21,587][45457] Starting all processes... [2024-07-05 00:02:21,602][45457] Starting process inference_proc0-0 [2024-07-05 00:02:21,602][45457] Starting process rollout_proc0 [2024-07-05 00:02:21,603][45457] Starting process rollout_proc1 [2024-07-05 00:02:21,603][45457] Starting process rollout_proc2 [2024-07-05 00:02:21,603][45457] Starting process rollout_proc3 [2024-07-05 00:02:21,604][45457] Starting process rollout_proc4 [2024-07-05 00:02:21,604][45457] Starting process rollout_proc5 [2024-07-05 00:02:21,605][45457] Starting process rollout_proc6 [2024-07-05 00:02:21,605][45457] Starting process rollout_proc7 [2024-07-05 00:02:24,190][45738] Worker 4 uses CPU cores [8, 9] [2024-07-05 00:02:24,319][45734] Worker 0 uses CPU cores [0, 1] [2024-07-05 00:02:24,435][45735] Worker 1 uses CPU cores [2, 3] [2024-07-05 00:02:24,496][45736] Worker 2 uses CPU cores [4, 5] [2024-07-05 00:02:24,533][45737] Worker 3 uses CPU cores [6, 7] [2024-07-05 00:02:24,548][45740] Worker 6 uses CPU cores [12, 13] [2024-07-05 00:02:24,665][45741] Worker 7 uses CPU cores [14, 15] [2024-07-05 00:02:24,689][45720] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 00:02:24,689][45720] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-07-05 00:02:24,722][45733] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 00:02:24,722][45733] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-07-05 00:02:24,740][45720] Num visible devices: 1 [2024-07-05 00:02:24,758][45720] Setting fixed seed 200 [2024-07-05 00:02:24,769][45720] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 00:02:24,769][45720] Initializing actor-critic model on device cuda:0 [2024-07-05 00:02:24,769][45720] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 00:02:24,770][45720] RunningMeanStd input shape: (1,) [2024-07-05 00:02:24,770][45733] Num visible devices: 1 [2024-07-05 00:02:24,778][45720] Num input channels: 3 [2024-07-05 00:02:24,790][45720] Convolutional layer output size: 4608 [2024-07-05 00:02:24,802][45720] Policy head output size: 512 [2024-07-05 00:02:24,860][45739] Worker 5 uses CPU cores [10, 11] [2024-07-05 00:02:24,905][45720] Created Actor Critic model with architecture: [2024-07-05 00:02:24,905][45720] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ResnetEncoder( (conv_head): Sequential( (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (2): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (3): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (5): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (6): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (7): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (8): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (9): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (10): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (11): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (12): ELU(alpha=1.0) ) (mlp_layers): Sequential( (0): Linear(in_features=4608, out_features=512, bias=True) (1): ELU(alpha=1.0) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-07-05 00:02:25,003][45720] Using optimizer [2024-07-05 00:02:25,507][45720] No checkpoints found [2024-07-05 00:02:25,507][45720] Did not load from checkpoint, starting from scratch! [2024-07-05 00:02:25,507][45720] Initialized policy 0 weights for model version 0 [2024-07-05 00:02:25,508][45720] LearnerWorker_p0 finished initialization! [2024-07-05 00:02:25,509][45720] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 00:02:25,563][45457] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 00:02:25,580][45733] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 00:02:25,582][45733] RunningMeanStd input shape: (1,) [2024-07-05 00:02:25,589][45733] Num input channels: 3 [2024-07-05 00:02:25,600][45733] Convolutional layer output size: 4608 [2024-07-05 00:02:25,611][45733] Policy head output size: 512 [2024-07-05 00:02:25,735][45457] Inference worker 0-0 is ready! [2024-07-05 00:02:25,736][45457] All inference workers are ready! Signal rollout workers to start! [2024-07-05 00:02:25,765][45741] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 00:02:25,765][45739] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 00:02:25,766][45737] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 00:02:25,766][45735] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 00:02:25,766][45736] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 00:02:25,766][45734] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 00:02:25,766][45740] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 00:02:25,767][45738] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 00:02:26,263][45736] Decorrelating experience for 0 frames... [2024-07-05 00:02:26,264][45735] Decorrelating experience for 0 frames... [2024-07-05 00:02:26,264][45741] Decorrelating experience for 0 frames... [2024-07-05 00:02:26,264][45737] Decorrelating experience for 0 frames... [2024-07-05 00:02:26,264][45734] Decorrelating experience for 0 frames... [2024-07-05 00:02:26,265][45739] Decorrelating experience for 0 frames... [2024-07-05 00:02:26,420][45735] Decorrelating experience for 32 frames... [2024-07-05 00:02:26,421][45737] Decorrelating experience for 32 frames... [2024-07-05 00:02:26,422][45734] Decorrelating experience for 32 frames... [2024-07-05 00:02:26,422][45741] Decorrelating experience for 32 frames... [2024-07-05 00:02:26,422][45739] Decorrelating experience for 32 frames... [2024-07-05 00:02:26,467][45738] Decorrelating experience for 0 frames... [2024-07-05 00:02:26,615][45740] Decorrelating experience for 0 frames... [2024-07-05 00:02:26,626][45735] Decorrelating experience for 64 frames... [2024-07-05 00:02:26,628][45737] Decorrelating experience for 64 frames... [2024-07-05 00:02:26,629][45739] Decorrelating experience for 64 frames... [2024-07-05 00:02:26,629][45734] Decorrelating experience for 64 frames... [2024-07-05 00:02:26,677][45736] Decorrelating experience for 32 frames... [2024-07-05 00:02:26,781][45738] Decorrelating experience for 32 frames... [2024-07-05 00:02:26,811][45735] Decorrelating experience for 96 frames... [2024-07-05 00:02:26,811][45737] Decorrelating experience for 96 frames... [2024-07-05 00:02:26,812][45734] Decorrelating experience for 96 frames... [2024-07-05 00:02:26,837][45740] Decorrelating experience for 32 frames... [2024-07-05 00:02:26,845][45741] Decorrelating experience for 64 frames... [2024-07-05 00:02:26,887][45736] Decorrelating experience for 64 frames... [2024-07-05 00:02:26,969][45739] Decorrelating experience for 96 frames... [2024-07-05 00:02:27,025][45741] Decorrelating experience for 96 frames... [2024-07-05 00:02:27,032][45740] Decorrelating experience for 64 frames... [2024-07-05 00:02:27,069][45738] Decorrelating experience for 64 frames... [2024-07-05 00:02:27,161][45736] Decorrelating experience for 96 frames... [2024-07-05 00:02:27,206][45740] Decorrelating experience for 96 frames... [2024-07-05 00:02:27,349][45738] Decorrelating experience for 96 frames... [2024-07-05 00:02:27,930][45720] Signal inference workers to stop experience collection... [2024-07-05 00:02:27,935][45733] InferenceWorker_p0-w0: stopping experience collection [2024-07-05 00:02:30,563][45457] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 450.8. Samples: 2254. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 00:02:30,564][45457] Avg episode reward: [(0, '1.905')] [2024-07-05 00:02:31,327][45720] Signal inference workers to resume experience collection... [2024-07-05 00:02:31,327][45733] InferenceWorker_p0-w0: resuming experience collection [2024-07-05 00:02:35,563][45457] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3686.4). Total num frames: 36864. Throughput: 0: 877.6. Samples: 8776. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 00:02:35,564][45457] Avg episode reward: [(0, '4.019')] [2024-07-05 00:02:35,777][45733] Updated weights for policy 0, policy_version 10 (0.0100) [2024-07-05 00:02:40,408][45733] Updated weights for policy 0, policy_version 20 (0.0012) [2024-07-05 00:02:40,563][45457] Fps is (10 sec: 8192.0, 60 sec: 5461.3, 300 sec: 5461.3). Total num frames: 81920. Throughput: 0: 1042.0. Samples: 15630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 00:02:40,564][45457] Avg episode reward: [(0, '4.720')] [2024-07-05 00:02:40,909][45457] Heartbeat connected on Batcher_0 [2024-07-05 00:02:40,912][45457] Heartbeat connected on LearnerWorker_p0 [2024-07-05 00:02:40,921][45457] Heartbeat connected on InferenceWorker_p0-w0 [2024-07-05 00:02:40,922][45457] Heartbeat connected on RolloutWorker_w0 [2024-07-05 00:02:40,925][45457] Heartbeat connected on RolloutWorker_w1 [2024-07-05 00:02:40,928][45457] Heartbeat connected on RolloutWorker_w2 [2024-07-05 00:02:40,931][45457] Heartbeat connected on RolloutWorker_w3 [2024-07-05 00:02:40,935][45457] Heartbeat connected on RolloutWorker_w4 [2024-07-05 00:02:40,938][45457] Heartbeat connected on RolloutWorker_w5 [2024-07-05 00:02:40,942][45457] Heartbeat connected on RolloutWorker_w6 [2024-07-05 00:02:40,946][45457] Heartbeat connected on RolloutWorker_w7 [2024-07-05 00:02:44,979][45733] Updated weights for policy 0, policy_version 30 (0.0011) [2024-07-05 00:02:45,563][45457] Fps is (10 sec: 9011.1, 60 sec: 6348.8, 300 sec: 6348.8). Total num frames: 126976. Throughput: 0: 1445.6. Samples: 28912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 00:02:45,564][45457] Avg episode reward: [(0, '5.009')] [2024-07-05 00:02:45,905][45720] Saving new best policy, reward=5.009! [2024-07-05 00:02:49,704][45733] Updated weights for policy 0, policy_version 40 (0.0011) [2024-07-05 00:02:50,562][45457] Fps is (10 sec: 8601.8, 60 sec: 6717.5, 300 sec: 6717.5). Total num frames: 167936. Throughput: 0: 1678.5. Samples: 41962. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 00:02:50,563][45457] Avg episode reward: [(0, '5.006')] [2024-07-05 00:02:54,399][45733] Updated weights for policy 0, policy_version 50 (0.0012) [2024-07-05 00:02:55,563][45457] Fps is (10 sec: 8601.7, 60 sec: 7099.7, 300 sec: 7099.7). Total num frames: 212992. Throughput: 0: 1619.1. Samples: 48574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 00:02:55,564][45457] Avg episode reward: [(0, '5.173')] [2024-07-05 00:02:55,774][45720] Saving new best policy, reward=5.173! [2024-07-05 00:02:59,032][45733] Updated weights for policy 0, policy_version 60 (0.0012) [2024-07-05 00:03:00,563][45457] Fps is (10 sec: 9011.1, 60 sec: 7372.8, 300 sec: 7372.8). Total num frames: 258048. Throughput: 0: 1765.3. Samples: 61786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 00:03:00,564][45457] Avg episode reward: [(0, '5.099')] [2024-07-05 00:03:03,738][45733] Updated weights for policy 0, policy_version 70 (0.0012) [2024-07-05 00:03:05,563][45457] Fps is (10 sec: 8601.5, 60 sec: 7475.2, 300 sec: 7475.2). Total num frames: 299008. Throughput: 0: 1872.4. Samples: 74898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 00:03:05,564][45457] Avg episode reward: [(0, '5.730')] [2024-07-05 00:03:05,635][45720] Saving new best policy, reward=5.730! [2024-07-05 00:03:08,546][45733] Updated weights for policy 0, policy_version 80 (0.0011) [2024-07-05 00:03:10,563][45457] Fps is (10 sec: 8601.6, 60 sec: 7645.9, 300 sec: 7645.9). Total num frames: 344064. Throughput: 0: 1805.3. Samples: 81240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 00:03:10,563][45457] Avg episode reward: [(0, '5.998')] [2024-07-05 00:03:10,918][45720] Saving new best policy, reward=5.998! [2024-07-05 00:03:13,273][45733] Updated weights for policy 0, policy_version 90 (0.0011) [2024-07-05 00:03:15,562][45457] Fps is (10 sec: 8601.8, 60 sec: 7700.5, 300 sec: 7700.5). Total num frames: 385024. Throughput: 0: 2040.5. Samples: 94076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 00:03:15,563][45457] Avg episode reward: [(0, '6.024')] [2024-07-05 00:03:15,667][45720] Saving new best policy, reward=6.024! [2024-07-05 00:03:18,069][45733] Updated weights for policy 0, policy_version 100 (0.0011) [2024-07-05 00:03:20,563][45457] Fps is (10 sec: 8601.6, 60 sec: 7819.6, 300 sec: 7819.6). Total num frames: 430080. Throughput: 0: 2184.6. Samples: 107082. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:03:20,564][45457] Avg episode reward: [(0, '6.507')] [2024-07-05 00:03:20,918][45720] Saving new best policy, reward=6.507! [2024-07-05 00:03:22,877][45733] Updated weights for policy 0, policy_version 110 (0.0011) [2024-07-05 00:03:25,562][45457] Fps is (10 sec: 8601.5, 60 sec: 7850.7, 300 sec: 7850.7). Total num frames: 471040. Throughput: 0: 2171.5. Samples: 113346. Policy #0 lag: (min: 0.0, avg: 0.9, max: 1.0) [2024-07-05 00:03:25,563][45457] Avg episode reward: [(0, '5.608')] [2024-07-05 00:03:27,631][45733] Updated weights for policy 0, policy_version 120 (0.0011) [2024-07-05 00:03:30,563][45457] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 7939.9). Total num frames: 516096. Throughput: 0: 2167.2. Samples: 126438. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:03:30,564][45457] Avg episode reward: [(0, '5.695')] [2024-07-05 00:03:32,342][45733] Updated weights for policy 0, policy_version 130 (0.0011) [2024-07-05 00:03:35,563][45457] Fps is (10 sec: 8601.5, 60 sec: 8669.9, 300 sec: 7957.9). Total num frames: 557056. Throughput: 0: 2166.1. Samples: 139436. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:03:35,564][45457] Avg episode reward: [(0, '7.029')] [2024-07-05 00:03:35,633][45720] Saving new best policy, reward=7.029! [2024-07-05 00:03:37,106][45733] Updated weights for policy 0, policy_version 140 (0.0011) [2024-07-05 00:03:40,563][45457] Fps is (10 sec: 8601.7, 60 sec: 8669.9, 300 sec: 8028.2). Total num frames: 602112. Throughput: 0: 2160.7. Samples: 145806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:03:40,563][45457] Avg episode reward: [(0, '7.417')] [2024-07-05 00:03:40,889][45720] Saving new best policy, reward=7.417! [2024-07-05 00:03:41,840][45733] Updated weights for policy 0, policy_version 150 (0.0012) [2024-07-05 00:03:45,562][45457] Fps is (10 sec: 8601.7, 60 sec: 8601.6, 300 sec: 8038.4). Total num frames: 643072. Throughput: 0: 2153.0. Samples: 158670. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:03:45,563][45457] Avg episode reward: [(0, '7.178')] [2024-07-05 00:03:46,612][45733] Updated weights for policy 0, policy_version 160 (0.0011) [2024-07-05 00:03:50,563][45457] Fps is (10 sec: 8601.0, 60 sec: 8669.8, 300 sec: 8095.6). Total num frames: 688128. Throughput: 0: 2151.0. Samples: 171696. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:03:50,568][45457] Avg episode reward: [(0, '6.736')] [2024-07-05 00:03:51,379][45733] Updated weights for policy 0, policy_version 170 (0.0012) [2024-07-05 00:03:55,563][45457] Fps is (10 sec: 8601.5, 60 sec: 8601.6, 300 sec: 8101.0). Total num frames: 729088. Throughput: 0: 2151.1. Samples: 178040. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:03:55,563][45457] Avg episode reward: [(0, '8.034')] [2024-07-05 00:03:55,657][45720] Saving new best policy, reward=8.034! [2024-07-05 00:03:56,147][45733] Updated weights for policy 0, policy_version 180 (0.0012) [2024-07-05 00:04:00,564][45457] Fps is (10 sec: 8601.3, 60 sec: 8601.5, 300 sec: 8148.8). Total num frames: 774144. Throughput: 0: 2152.6. Samples: 190946. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:04:00,566][45457] Avg episode reward: [(0, '8.884')] [2024-07-05 00:04:00,938][45720] Saving new best policy, reward=8.884! [2024-07-05 00:04:00,939][45733] Updated weights for policy 0, policy_version 190 (0.0012) [2024-07-05 00:04:05,563][45457] Fps is (10 sec: 8601.5, 60 sec: 8601.6, 300 sec: 8151.0). Total num frames: 815104. Throughput: 0: 2147.5. Samples: 203718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:04:05,564][45457] Avg episode reward: [(0, '9.313')] [2024-07-05 00:04:05,678][45720] Saving new best policy, reward=9.313! [2024-07-05 00:04:05,681][45733] Updated weights for policy 0, policy_version 200 (0.0011) [2024-07-05 00:04:10,537][45733] Updated weights for policy 0, policy_version 210 (0.0011) [2024-07-05 00:04:10,562][45457] Fps is (10 sec: 8602.6, 60 sec: 8601.6, 300 sec: 8192.0). Total num frames: 860160. Throughput: 0: 2155.6. Samples: 210350. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:04:10,564][45457] Avg episode reward: [(0, '9.937')] [2024-07-05 00:04:10,565][45720] Saving new best policy, reward=9.937! [2024-07-05 00:04:15,465][45733] Updated weights for policy 0, policy_version 220 (0.0012) [2024-07-05 00:04:15,564][45457] Fps is (10 sec: 8600.8, 60 sec: 8601.4, 300 sec: 8191.9). Total num frames: 901120. Throughput: 0: 2141.2. Samples: 222794. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:04:15,568][45457] Avg episode reward: [(0, '12.334')] [2024-07-05 00:04:15,961][45720] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000000221_905216.pth... [2024-07-05 00:04:16,068][45720] Saving new best policy, reward=12.334! [2024-07-05 00:04:20,434][45733] Updated weights for policy 0, policy_version 230 (0.0012) [2024-07-05 00:04:20,563][45457] Fps is (10 sec: 8191.9, 60 sec: 8533.3, 300 sec: 8192.0). Total num frames: 942080. Throughput: 0: 2128.4. Samples: 235214. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:04:20,563][45457] Avg episode reward: [(0, '13.733')] [2024-07-05 00:04:20,914][45720] Saving new best policy, reward=13.733! [2024-07-05 00:04:25,324][45733] Updated weights for policy 0, policy_version 240 (0.0011) [2024-07-05 00:04:25,562][45457] Fps is (10 sec: 8192.9, 60 sec: 8533.3, 300 sec: 8192.0). Total num frames: 983040. Throughput: 0: 2123.8. Samples: 241376. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:04:25,563][45457] Avg episode reward: [(0, '11.842')] [2024-07-05 00:04:30,088][45733] Updated weights for policy 0, policy_version 250 (0.0011) [2024-07-05 00:04:30,562][45457] Fps is (10 sec: 8192.1, 60 sec: 8465.1, 300 sec: 8192.0). Total num frames: 1024000. Throughput: 0: 2126.3. Samples: 254352. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:04:30,564][45457] Avg episode reward: [(0, '12.738')] [2024-07-05 00:04:34,873][45733] Updated weights for policy 0, policy_version 260 (0.0012) [2024-07-05 00:04:35,564][45457] Fps is (10 sec: 8600.8, 60 sec: 8533.2, 300 sec: 8223.4). Total num frames: 1069056. Throughput: 0: 2120.1. Samples: 267102. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:04:35,565][45457] Avg episode reward: [(0, '13.469')] [2024-07-05 00:04:39,695][45733] Updated weights for policy 0, policy_version 270 (0.0012) [2024-07-05 00:04:40,562][45457] Fps is (10 sec: 8601.6, 60 sec: 8465.1, 300 sec: 8222.3). Total num frames: 1110016. Throughput: 0: 2119.2. Samples: 273404. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:04:40,563][45457] Avg episode reward: [(0, '12.910')] [2024-07-05 00:04:44,483][45733] Updated weights for policy 0, policy_version 280 (0.0011) [2024-07-05 00:04:45,563][45457] Fps is (10 sec: 8601.9, 60 sec: 8533.2, 300 sec: 8250.5). Total num frames: 1155072. Throughput: 0: 2119.9. Samples: 286340. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:04:45,568][45457] Avg episode reward: [(0, '14.045')] [2024-07-05 00:04:45,920][45720] Saving new best policy, reward=14.045! [2024-07-05 00:04:49,293][45733] Updated weights for policy 0, policy_version 290 (0.0011) [2024-07-05 00:04:50,563][45457] Fps is (10 sec: 8600.9, 60 sec: 8465.1, 300 sec: 8248.5). Total num frames: 1196032. Throughput: 0: 2117.8. Samples: 299020. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:04:50,565][45457] Avg episode reward: [(0, '14.268')] [2024-07-05 00:04:50,720][45720] Saving new best policy, reward=14.268! [2024-07-05 00:04:54,165][45733] Updated weights for policy 0, policy_version 300 (0.0011) [2024-07-05 00:04:55,562][45457] Fps is (10 sec: 8192.6, 60 sec: 8465.1, 300 sec: 8246.6). Total num frames: 1236992. Throughput: 0: 2107.6. Samples: 305194. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 00:04:55,563][45457] Avg episode reward: [(0, '13.382')] [2024-07-05 00:04:58,942][45733] Updated weights for policy 0, policy_version 310 (0.0011) [2024-07-05 00:05:00,563][45457] Fps is (10 sec: 8602.2, 60 sec: 8465.2, 300 sec: 8271.3). Total num frames: 1282048. Throughput: 0: 2121.2. Samples: 318246. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 00:05:00,564][45457] Avg episode reward: [(0, '15.909')] [2024-07-05 00:05:00,858][45720] Saving new best policy, reward=15.909! [2024-07-05 00:05:03,823][45733] Updated weights for policy 0, policy_version 320 (0.0011) [2024-07-05 00:05:05,563][45457] Fps is (10 sec: 8601.5, 60 sec: 8465.1, 300 sec: 8268.8). Total num frames: 1323008. Throughput: 0: 2123.6. Samples: 330776. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 00:05:05,564][45457] Avg episode reward: [(0, '17.249')] [2024-07-05 00:05:05,723][45720] Saving new best policy, reward=17.249! [2024-07-05 00:05:08,668][45733] Updated weights for policy 0, policy_version 330 (0.0011) [2024-07-05 00:05:10,563][45457] Fps is (10 sec: 8192.1, 60 sec: 8396.8, 300 sec: 8266.5). Total num frames: 1363968. Throughput: 0: 2127.2. Samples: 337102. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:05:10,564][45457] Avg episode reward: [(0, '18.132')] [2024-07-05 00:05:10,565][45720] Saving new best policy, reward=18.132! [2024-07-05 00:05:13,522][45733] Updated weights for policy 0, policy_version 340 (0.0012) [2024-07-05 00:05:15,563][45457] Fps is (10 sec: 8601.0, 60 sec: 8465.1, 300 sec: 8288.3). Total num frames: 1409024. Throughput: 0: 2122.3. Samples: 349856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:05:15,565][45457] Avg episode reward: [(0, '18.656')] [2024-07-05 00:05:15,908][45720] Saving new best policy, reward=18.656! [2024-07-05 00:05:18,371][45733] Updated weights for policy 0, policy_version 350 (0.0011) [2024-07-05 00:05:20,563][45457] Fps is (10 sec: 8601.2, 60 sec: 8465.0, 300 sec: 8285.6). Total num frames: 1449984. Throughput: 0: 2118.2. Samples: 362420. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:05:20,564][45457] Avg episode reward: [(0, '20.924')] [2024-07-05 00:05:20,800][45720] Saving new best policy, reward=20.924! [2024-07-05 00:05:23,208][45733] Updated weights for policy 0, policy_version 360 (0.0011) [2024-07-05 00:05:25,563][45457] Fps is (10 sec: 8192.6, 60 sec: 8465.0, 300 sec: 8283.0). Total num frames: 1490944. Throughput: 0: 2119.2. Samples: 368770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:05:25,564][45457] Avg episode reward: [(0, '19.165')] [2024-07-05 00:05:28,046][45733] Updated weights for policy 0, policy_version 370 (0.0011) [2024-07-05 00:05:30,564][45457] Fps is (10 sec: 8601.2, 60 sec: 8533.2, 300 sec: 8302.7). Total num frames: 1536000. Throughput: 0: 2116.9. Samples: 381602. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:05:30,565][45457] Avg episode reward: [(0, '17.827')] [2024-07-05 00:05:32,898][45733] Updated weights for policy 0, policy_version 380 (0.0011) [2024-07-05 00:05:35,564][45457] Fps is (10 sec: 8600.9, 60 sec: 8465.1, 300 sec: 8299.8). Total num frames: 1576960. Throughput: 0: 2113.9. Samples: 394146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:05:35,566][45457] Avg episode reward: [(0, '19.011')] [2024-07-05 00:05:37,689][45733] Updated weights for policy 0, policy_version 390 (0.0011) [2024-07-05 00:05:40,563][45457] Fps is (10 sec: 8192.7, 60 sec: 8465.0, 300 sec: 8297.0). Total num frames: 1617920. Throughput: 0: 2121.9. Samples: 400682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:05:40,564][45457] Avg episode reward: [(0, '18.876')] [2024-07-05 00:05:42,506][45733] Updated weights for policy 0, policy_version 400 (0.0011) [2024-07-05 00:05:45,563][45457] Fps is (10 sec: 8602.3, 60 sec: 8465.1, 300 sec: 8314.9). Total num frames: 1662976. Throughput: 0: 2113.2. Samples: 413338. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:05:45,564][45457] Avg episode reward: [(0, '19.716')] [2024-07-05 00:05:47,220][45733] Updated weights for policy 0, policy_version 410 (0.0013) [2024-07-05 00:05:50,562][45457] Fps is (10 sec: 8601.7, 60 sec: 8465.2, 300 sec: 8311.9). Total num frames: 1703936. Throughput: 0: 2124.6. Samples: 426382. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:05:50,563][45457] Avg episode reward: [(0, '19.487')] [2024-07-05 00:05:52,035][45733] Updated weights for policy 0, policy_version 420 (0.0013) [2024-07-05 00:05:55,563][45457] Fps is (10 sec: 8601.6, 60 sec: 8533.3, 300 sec: 8328.5). Total num frames: 1748992. Throughput: 0: 2124.3. Samples: 432694. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:05:55,564][45457] Avg episode reward: [(0, '19.981')] [2024-07-05 00:05:56,869][45733] Updated weights for policy 0, policy_version 430 (0.0012) [2024-07-05 00:06:00,563][45457] Fps is (10 sec: 8601.5, 60 sec: 8465.1, 300 sec: 8325.4). Total num frames: 1789952. Throughput: 0: 2120.3. Samples: 445266. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:06:00,564][45457] Avg episode reward: [(0, '20.921')] [2024-07-05 00:06:01,749][45733] Updated weights for policy 0, policy_version 440 (0.0011) [2024-07-05 00:06:05,563][45457] Fps is (10 sec: 8192.0, 60 sec: 8465.1, 300 sec: 8322.3). Total num frames: 1830912. Throughput: 0: 2125.8. Samples: 458082. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:06:05,564][45457] Avg episode reward: [(0, '23.653')] [2024-07-05 00:06:05,637][45720] Saving new best policy, reward=23.653! [2024-07-05 00:06:06,627][45733] Updated weights for policy 0, policy_version 450 (0.0011) [2024-07-05 00:06:10,562][45457] Fps is (10 sec: 8601.7, 60 sec: 8533.3, 300 sec: 8337.6). Total num frames: 1875968. Throughput: 0: 2123.7. Samples: 464334. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:06:10,563][45457] Avg episode reward: [(0, '22.495')] [2024-07-05 00:06:11,493][45733] Updated weights for policy 0, policy_version 460 (0.0011) [2024-07-05 00:06:15,563][45457] Fps is (10 sec: 8601.5, 60 sec: 8465.2, 300 sec: 8334.5). Total num frames: 1916928. Throughput: 0: 2117.2. Samples: 476874. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:06:15,564][45457] Avg episode reward: [(0, '21.926')] [2024-07-05 00:06:15,820][45720] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000000469_1921024.pth... [2024-07-05 00:06:16,341][45733] Updated weights for policy 0, policy_version 470 (0.0011) [2024-07-05 00:06:20,564][45457] Fps is (10 sec: 8191.2, 60 sec: 8465.0, 300 sec: 8331.4). Total num frames: 1957888. Throughput: 0: 2119.5. Samples: 489522. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:06:20,565][45457] Avg episode reward: [(0, '21.250')] [2024-07-05 00:06:21,184][45733] Updated weights for policy 0, policy_version 480 (0.0011) [2024-07-05 00:06:25,564][45457] Fps is (10 sec: 8600.8, 60 sec: 8533.2, 300 sec: 8345.6). Total num frames: 2002944. Throughput: 0: 2118.2. Samples: 496004. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:06:25,567][45457] Avg episode reward: [(0, '23.038')] [2024-07-05 00:06:26,002][45733] Updated weights for policy 0, policy_version 490 (0.0011) [2024-07-05 00:06:30,563][45457] Fps is (10 sec: 8602.4, 60 sec: 8465.2, 300 sec: 8342.5). Total num frames: 2043904. Throughput: 0: 2117.6. Samples: 508628. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:06:30,563][45457] Avg episode reward: [(0, '23.902')] [2024-07-05 00:06:30,868][45720] Saving new best policy, reward=23.902! [2024-07-05 00:06:30,870][45733] Updated weights for policy 0, policy_version 500 (0.0011) [2024-07-05 00:06:35,563][45457] Fps is (10 sec: 8192.9, 60 sec: 8465.2, 300 sec: 8339.5). Total num frames: 2084864. Throughput: 0: 2103.4. Samples: 521034. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:06:35,563][45457] Avg episode reward: [(0, '24.194')] [2024-07-05 00:06:35,771][45720] Saving new best policy, reward=24.194! [2024-07-05 00:06:35,772][45733] Updated weights for policy 0, policy_version 510 (0.0011) [2024-07-05 00:06:40,563][45457] Fps is (10 sec: 8191.9, 60 sec: 8465.1, 300 sec: 8336.6). Total num frames: 2125824. Throughput: 0: 2096.3. Samples: 527028. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:06:40,564][45457] Avg episode reward: [(0, '23.401')] [2024-07-05 00:06:40,840][45733] Updated weights for policy 0, policy_version 520 (0.0012) [2024-07-05 00:06:45,563][45457] Fps is (10 sec: 8192.0, 60 sec: 8396.8, 300 sec: 8333.8). Total num frames: 2166784. Throughput: 0: 2094.9. Samples: 539538. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:06:45,563][45457] Avg episode reward: [(0, '24.109')] [2024-07-05 00:06:45,664][45733] Updated weights for policy 0, policy_version 530 (0.0011) [2024-07-05 00:06:50,563][45457] Fps is (10 sec: 8192.1, 60 sec: 8396.8, 300 sec: 8331.1). Total num frames: 2207744. Throughput: 0: 2092.8. Samples: 552258. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:06:50,564][45457] Avg episode reward: [(0, '25.860')] [2024-07-05 00:06:50,624][45720] Saving new best policy, reward=25.860! [2024-07-05 00:06:50,627][45733] Updated weights for policy 0, policy_version 540 (0.0012) [2024-07-05 00:06:55,562][45457] Fps is (10 sec: 8192.0, 60 sec: 8328.6, 300 sec: 8328.5). Total num frames: 2248704. Throughput: 0: 2092.0. Samples: 558472. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:06:55,563][45457] Avg episode reward: [(0, '24.812')] [2024-07-05 00:06:55,568][45733] Updated weights for policy 0, policy_version 550 (0.0012) [2024-07-05 00:07:00,430][45733] Updated weights for policy 0, policy_version 560 (0.0011) [2024-07-05 00:07:00,563][45457] Fps is (10 sec: 8601.6, 60 sec: 8396.8, 300 sec: 8340.9). Total num frames: 2293760. Throughput: 0: 2090.4. Samples: 570942. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:07:00,563][45457] Avg episode reward: [(0, '25.045')] [2024-07-05 00:07:05,264][45733] Updated weights for policy 0, policy_version 570 (0.0011) [2024-07-05 00:07:05,563][45457] Fps is (10 sec: 8601.6, 60 sec: 8396.8, 300 sec: 8338.3). Total num frames: 2334720. Throughput: 0: 2088.2. Samples: 583488. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:07:05,564][45457] Avg episode reward: [(0, '24.328')] [2024-07-05 00:07:10,148][45733] Updated weights for policy 0, policy_version 580 (0.0011) [2024-07-05 00:07:10,563][45457] Fps is (10 sec: 8192.0, 60 sec: 8328.5, 300 sec: 8335.7). Total num frames: 2375680. Throughput: 0: 2085.9. Samples: 589866. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:07:10,564][45457] Avg episode reward: [(0, '24.134')] [2024-07-05 00:07:14,973][45733] Updated weights for policy 0, policy_version 590 (0.0012) [2024-07-05 00:07:15,564][45457] Fps is (10 sec: 8600.8, 60 sec: 8396.7, 300 sec: 8347.3). Total num frames: 2420736. Throughput: 0: 2090.0. Samples: 602678. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:07:15,566][45457] Avg episode reward: [(0, '24.807')] [2024-07-05 00:07:19,803][45733] Updated weights for policy 0, policy_version 600 (0.0012) [2024-07-05 00:07:20,562][45457] Fps is (10 sec: 8601.7, 60 sec: 8396.9, 300 sec: 8344.7). Total num frames: 2461696. Throughput: 0: 2093.0. Samples: 615220. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:07:20,564][45457] Avg episode reward: [(0, '23.460')] [2024-07-05 00:07:24,650][45733] Updated weights for policy 0, policy_version 610 (0.0012) [2024-07-05 00:07:25,562][45457] Fps is (10 sec: 8192.8, 60 sec: 8328.7, 300 sec: 8483.6). Total num frames: 2502656. Throughput: 0: 2099.9. Samples: 621524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:07:25,563][45457] Avg episode reward: [(0, '23.811')] [2024-07-05 00:07:29,488][45733] Updated weights for policy 0, policy_version 620 (0.0012) [2024-07-05 00:07:30,562][45457] Fps is (10 sec: 8601.6, 60 sec: 8396.8, 300 sec: 8511.3). Total num frames: 2547712. Throughput: 0: 2107.1. Samples: 634356. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:07:30,563][45457] Avg episode reward: [(0, '23.031')] [2024-07-05 00:07:34,368][45733] Updated weights for policy 0, policy_version 630 (0.0011) [2024-07-05 00:07:35,563][45457] Fps is (10 sec: 8601.5, 60 sec: 8396.8, 300 sec: 8497.5). Total num frames: 2588672. Throughput: 0: 2103.0. Samples: 646892. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:07:35,563][45457] Avg episode reward: [(0, '24.536')] [2024-07-05 00:07:39,183][45733] Updated weights for policy 0, policy_version 640 (0.0012) [2024-07-05 00:07:40,562][45457] Fps is (10 sec: 8192.0, 60 sec: 8396.8, 300 sec: 8483.6). Total num frames: 2629632. Throughput: 0: 2106.5. Samples: 653264. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:07:40,563][45457] Avg episode reward: [(0, '26.081')] [2024-07-05 00:07:40,612][45720] Saving new best policy, reward=26.081! [2024-07-05 00:07:44,021][45733] Updated weights for policy 0, policy_version 650 (0.0012) [2024-07-05 00:07:45,564][45457] Fps is (10 sec: 8600.5, 60 sec: 8464.9, 300 sec: 8497.4). Total num frames: 2674688. Throughput: 0: 2113.8. Samples: 666066. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:07:45,567][45457] Avg episode reward: [(0, '25.760')] [2024-07-05 00:07:48,838][45733] Updated weights for policy 0, policy_version 660 (0.0011) [2024-07-05 00:07:50,562][45457] Fps is (10 sec: 8601.6, 60 sec: 8465.1, 300 sec: 8483.6). Total num frames: 2715648. Throughput: 0: 2115.0. Samples: 678662. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:07:50,563][45457] Avg episode reward: [(0, '23.094')] [2024-07-05 00:07:53,710][45733] Updated weights for policy 0, policy_version 670 (0.0012) [2024-07-05 00:07:55,563][45457] Fps is (10 sec: 8193.0, 60 sec: 8465.1, 300 sec: 8469.7). Total num frames: 2756608. Throughput: 0: 2114.3. Samples: 685008. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:07:55,564][45457] Avg episode reward: [(0, '23.225')] [2024-07-05 00:07:58,542][45733] Updated weights for policy 0, policy_version 680 (0.0012) [2024-07-05 00:08:00,563][45457] Fps is (10 sec: 8600.9, 60 sec: 8465.0, 300 sec: 8483.6). Total num frames: 2801664. Throughput: 0: 2114.7. Samples: 697840. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:08:00,566][45457] Avg episode reward: [(0, '22.793')] [2024-07-05 00:08:03,308][45733] Updated weights for policy 0, policy_version 690 (0.0012) [2024-07-05 00:08:05,563][45457] Fps is (10 sec: 8601.6, 60 sec: 8465.1, 300 sec: 8469.7). Total num frames: 2842624. Throughput: 0: 2117.8. Samples: 710522. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:08:05,564][45457] Avg episode reward: [(0, '25.394')] [2024-07-05 00:08:08,057][45733] Updated weights for policy 0, policy_version 700 (0.0012) [2024-07-05 00:08:10,563][45457] Fps is (10 sec: 8602.2, 60 sec: 8533.3, 300 sec: 8483.6). Total num frames: 2887680. Throughput: 0: 2126.8. Samples: 717228. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:08:10,563][45457] Avg episode reward: [(0, '24.393')] [2024-07-05 00:08:12,825][45733] Updated weights for policy 0, policy_version 710 (0.0012) [2024-07-05 00:08:15,563][45457] Fps is (10 sec: 8601.6, 60 sec: 8465.2, 300 sec: 8469.7). Total num frames: 2928640. Throughput: 0: 2124.8. Samples: 729972. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:08:15,564][45457] Avg episode reward: [(0, '24.082')] [2024-07-05 00:08:15,720][45720] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000000716_2932736.pth... [2024-07-05 00:08:15,806][45720] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000000221_905216.pth [2024-07-05 00:08:17,693][45733] Updated weights for policy 0, policy_version 720 (0.0012) [2024-07-05 00:08:20,562][45457] Fps is (10 sec: 8192.0, 60 sec: 8465.1, 300 sec: 8469.7). Total num frames: 2969600. Throughput: 0: 2131.3. Samples: 742800. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:08:20,563][45457] Avg episode reward: [(0, '25.008')] [2024-07-05 00:08:22,512][45733] Updated weights for policy 0, policy_version 730 (0.0011) [2024-07-05 00:08:25,563][45457] Fps is (10 sec: 8601.7, 60 sec: 8533.3, 300 sec: 8469.7). Total num frames: 3014656. Throughput: 0: 2129.5. Samples: 749090. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:08:25,564][45457] Avg episode reward: [(0, '24.908')] [2024-07-05 00:08:27,306][45733] Updated weights for policy 0, policy_version 740 (0.0012) [2024-07-05 00:08:30,563][45457] Fps is (10 sec: 8601.5, 60 sec: 8465.1, 300 sec: 8469.7). Total num frames: 3055616. Throughput: 0: 2126.9. Samples: 761774. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:08:30,564][45457] Avg episode reward: [(0, '23.716')] [2024-07-05 00:08:32,134][45733] Updated weights for policy 0, policy_version 750 (0.0012) [2024-07-05 00:08:35,563][45457] Fps is (10 sec: 8601.6, 60 sec: 8533.3, 300 sec: 8469.7). Total num frames: 3100672. Throughput: 0: 2134.2. Samples: 774702. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:08:35,563][45457] Avg episode reward: [(0, '23.904')] [2024-07-05 00:08:36,933][45733] Updated weights for policy 0, policy_version 760 (0.0011) [2024-07-05 00:08:40,562][45457] Fps is (10 sec: 8601.7, 60 sec: 8533.3, 300 sec: 8469.7). Total num frames: 3141632. Throughput: 0: 2133.9. Samples: 781034. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:08:40,564][45457] Avg episode reward: [(0, '24.201')] [2024-07-05 00:08:41,784][45733] Updated weights for policy 0, policy_version 770 (0.0012) [2024-07-05 00:08:45,564][45457] Fps is (10 sec: 8191.3, 60 sec: 8465.1, 300 sec: 8455.8). Total num frames: 3182592. Throughput: 0: 2135.2. Samples: 793924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:08:45,568][45457] Avg episode reward: [(0, '25.184')] [2024-07-05 00:08:46,559][45733] Updated weights for policy 0, policy_version 780 (0.0012) [2024-07-05 00:08:50,563][45457] Fps is (10 sec: 8601.5, 60 sec: 8533.3, 300 sec: 8469.7). Total num frames: 3227648. Throughput: 0: 2133.8. Samples: 806542. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:08:50,564][45457] Avg episode reward: [(0, '25.337')] [2024-07-05 00:08:51,431][45733] Updated weights for policy 0, policy_version 790 (0.0011) [2024-07-05 00:08:55,564][45457] Fps is (10 sec: 8601.6, 60 sec: 8533.2, 300 sec: 8455.8). Total num frames: 3268608. Throughput: 0: 2123.4. Samples: 812782. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:08:55,568][45457] Avg episode reward: [(0, '27.362')] [2024-07-05 00:08:55,781][45720] Saving new best policy, reward=27.362! [2024-07-05 00:08:56,340][45733] Updated weights for policy 0, policy_version 800 (0.0012) [2024-07-05 00:09:00,563][45457] Fps is (10 sec: 8192.1, 60 sec: 8465.2, 300 sec: 8455.8). Total num frames: 3309568. Throughput: 0: 2118.3. Samples: 825294. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:09:00,564][45457] Avg episode reward: [(0, '27.047')] [2024-07-05 00:09:01,150][45733] Updated weights for policy 0, policy_version 810 (0.0012) [2024-07-05 00:09:05,562][45457] Fps is (10 sec: 8602.4, 60 sec: 8533.4, 300 sec: 8455.8). Total num frames: 3354624. Throughput: 0: 2117.7. Samples: 838096. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:09:05,564][45457] Avg episode reward: [(0, '26.206')] [2024-07-05 00:09:06,047][45733] Updated weights for policy 0, policy_version 820 (0.0012) [2024-07-05 00:09:10,562][45457] Fps is (10 sec: 8601.7, 60 sec: 8465.1, 300 sec: 8455.8). Total num frames: 3395584. Throughput: 0: 2113.1. Samples: 844180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:09:10,563][45457] Avg episode reward: [(0, '25.412')] [2024-07-05 00:09:11,010][45733] Updated weights for policy 0, policy_version 830 (0.0012) [2024-07-05 00:09:15,563][45457] Fps is (10 sec: 8191.9, 60 sec: 8465.1, 300 sec: 8455.8). Total num frames: 3436544. Throughput: 0: 2112.1. Samples: 856818. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:09:15,564][45457] Avg episode reward: [(0, '24.555')] [2024-07-05 00:09:15,927][45733] Updated weights for policy 0, policy_version 840 (0.0012) [2024-07-05 00:09:20,562][45457] Fps is (10 sec: 8192.0, 60 sec: 8465.1, 300 sec: 8455.8). Total num frames: 3477504. Throughput: 0: 2102.2. Samples: 869302. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:09:20,563][45457] Avg episode reward: [(0, '25.205')] [2024-07-05 00:09:20,707][45733] Updated weights for policy 0, policy_version 850 (0.0011) [2024-07-05 00:09:25,496][45733] Updated weights for policy 0, policy_version 860 (0.0012) [2024-07-05 00:09:25,563][45457] Fps is (10 sec: 8601.7, 60 sec: 8465.1, 300 sec: 8469.7). Total num frames: 3522560. Throughput: 0: 2108.7. Samples: 875924. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:09:25,563][45457] Avg episode reward: [(0, '25.967')] [2024-07-05 00:09:30,329][45733] Updated weights for policy 0, policy_version 870 (0.0012) [2024-07-05 00:09:30,563][45457] Fps is (10 sec: 8601.4, 60 sec: 8465.0, 300 sec: 8455.8). Total num frames: 3563520. Throughput: 0: 2102.4. Samples: 888530. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:09:30,564][45457] Avg episode reward: [(0, '26.150')] [2024-07-05 00:09:35,125][45733] Updated weights for policy 0, policy_version 880 (0.0012) [2024-07-05 00:09:35,563][45457] Fps is (10 sec: 8192.0, 60 sec: 8396.8, 300 sec: 8455.8). Total num frames: 3604480. Throughput: 0: 2109.8. Samples: 901482. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:09:35,564][45457] Avg episode reward: [(0, '28.216')] [2024-07-05 00:09:35,606][45720] Saving new best policy, reward=28.216! [2024-07-05 00:09:39,996][45733] Updated weights for policy 0, policy_version 890 (0.0012) [2024-07-05 00:09:40,563][45457] Fps is (10 sec: 8601.7, 60 sec: 8465.0, 300 sec: 8455.8). Total num frames: 3649536. Throughput: 0: 2108.9. Samples: 907680. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:09:40,564][45457] Avg episode reward: [(0, '27.075')] [2024-07-05 00:09:44,929][45733] Updated weights for policy 0, policy_version 900 (0.0012) [2024-07-05 00:09:45,563][45457] Fps is (10 sec: 8601.0, 60 sec: 8465.1, 300 sec: 8455.8). Total num frames: 3690496. Throughput: 0: 2110.1. Samples: 920250. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:09:45,564][45457] Avg episode reward: [(0, '28.084')] [2024-07-05 00:09:49,793][45733] Updated weights for policy 0, policy_version 910 (0.0012) [2024-07-05 00:09:50,564][45457] Fps is (10 sec: 8191.4, 60 sec: 8396.7, 300 sec: 8455.8). Total num frames: 3731456. Throughput: 0: 2102.3. Samples: 932702. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:09:50,568][45457] Avg episode reward: [(0, '26.819')] [2024-07-05 00:09:54,696][45733] Updated weights for policy 0, policy_version 920 (0.0012) [2024-07-05 00:09:55,564][45457] Fps is (10 sec: 8191.9, 60 sec: 8396.8, 300 sec: 8441.9). Total num frames: 3772416. Throughput: 0: 2105.6. Samples: 938934. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:09:55,566][45457] Avg episode reward: [(0, '27.790')] [2024-07-05 00:09:59,567][45733] Updated weights for policy 0, policy_version 930 (0.0011) [2024-07-05 00:10:00,563][45457] Fps is (10 sec: 8602.3, 60 sec: 8465.1, 300 sec: 8455.8). Total num frames: 3817472. Throughput: 0: 2109.3. Samples: 951736. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:10:00,563][45457] Avg episode reward: [(0, '27.043')] [2024-07-05 00:10:04,437][45733] Updated weights for policy 0, policy_version 940 (0.0012) [2024-07-05 00:10:05,563][45457] Fps is (10 sec: 8602.3, 60 sec: 8396.8, 300 sec: 8455.8). Total num frames: 3858432. Throughput: 0: 2111.0. Samples: 964298. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:10:05,563][45457] Avg episode reward: [(0, '26.886')] [2024-07-05 00:10:09,362][45733] Updated weights for policy 0, policy_version 950 (0.0012) [2024-07-05 00:10:10,563][45457] Fps is (10 sec: 8192.0, 60 sec: 8396.8, 300 sec: 8441.9). Total num frames: 3899392. Throughput: 0: 2102.3. Samples: 970528. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:10:10,564][45457] Avg episode reward: [(0, '28.959')] [2024-07-05 00:10:10,792][45720] Saving new best policy, reward=28.959! [2024-07-05 00:10:14,219][45733] Updated weights for policy 0, policy_version 960 (0.0011) [2024-07-05 00:10:15,563][45457] Fps is (10 sec: 8191.4, 60 sec: 8396.7, 300 sec: 8441.9). Total num frames: 3940352. Throughput: 0: 2100.8. Samples: 983066. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:10:15,565][45457] Avg episode reward: [(0, '28.199')] [2024-07-05 00:10:15,710][45720] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000000963_3944448.pth... [2024-07-05 00:10:15,818][45720] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000000469_1921024.pth [2024-07-05 00:10:19,263][45733] Updated weights for policy 0, policy_version 970 (0.0012) [2024-07-05 00:10:20,562][45457] Fps is (10 sec: 8192.1, 60 sec: 8396.8, 300 sec: 8441.9). Total num frames: 3981312. Throughput: 0: 2085.1. Samples: 995312. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 00:10:20,563][45457] Avg episode reward: [(0, '27.336')] [2024-07-05 00:10:24,141][45733] Updated weights for policy 0, policy_version 980 (0.0012) [2024-07-05 00:10:25,563][45457] Fps is (10 sec: 8192.7, 60 sec: 8328.5, 300 sec: 8428.1). Total num frames: 4022272. Throughput: 0: 2093.6. Samples: 1001892. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:10:25,564][45457] Avg episode reward: [(0, '26.246')] [2024-07-05 00:10:29,082][45733] Updated weights for policy 0, policy_version 990 (0.0012) [2024-07-05 00:10:30,562][45457] Fps is (10 sec: 8601.5, 60 sec: 8396.8, 300 sec: 8442.0). Total num frames: 4067328. Throughput: 0: 2090.6. Samples: 1014326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:10:30,563][45457] Avg episode reward: [(0, '28.156')] [2024-07-05 00:10:33,899][45733] Updated weights for policy 0, policy_version 1000 (0.0011) [2024-07-05 00:10:35,563][45457] Fps is (10 sec: 8601.5, 60 sec: 8396.8, 300 sec: 8441.9). Total num frames: 4108288. Throughput: 0: 2092.2. Samples: 1026850. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:10:35,564][45457] Avg episode reward: [(0, '28.749')] [2024-07-05 00:10:38,767][45733] Updated weights for policy 0, policy_version 1010 (0.0012) [2024-07-05 00:10:40,563][45457] Fps is (10 sec: 8192.0, 60 sec: 8328.5, 300 sec: 8428.0). Total num frames: 4149248. Throughput: 0: 2091.6. Samples: 1033056. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:10:40,564][45457] Avg episode reward: [(0, '28.596')] [2024-07-05 00:10:43,658][45733] Updated weights for policy 0, policy_version 1020 (0.0012) [2024-07-05 00:10:45,563][45457] Fps is (10 sec: 8192.1, 60 sec: 8328.6, 300 sec: 8428.0). Total num frames: 4190208. Throughput: 0: 2093.1. Samples: 1045926. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:10:45,563][45457] Avg episode reward: [(0, '27.802')] [2024-07-05 00:10:48,601][45733] Updated weights for policy 0, policy_version 1030 (0.0012) [2024-07-05 00:10:50,562][45457] Fps is (10 sec: 8601.7, 60 sec: 8396.9, 300 sec: 8428.0). Total num frames: 4235264. Throughput: 0: 2091.1. Samples: 1058398. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:10:50,563][45457] Avg episode reward: [(0, '25.825')] [2024-07-05 00:10:53,300][45733] Updated weights for policy 0, policy_version 1040 (0.0012) [2024-07-05 00:10:55,563][45457] Fps is (10 sec: 8601.5, 60 sec: 8396.9, 300 sec: 8428.0). Total num frames: 4276224. Throughput: 0: 2094.8. Samples: 1064792. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:10:55,564][45457] Avg episode reward: [(0, '25.309')] [2024-07-05 00:10:58,020][45733] Updated weights for policy 0, policy_version 1050 (0.0014) [2024-07-05 00:11:00,563][45457] Fps is (10 sec: 8601.6, 60 sec: 8396.8, 300 sec: 8441.9). Total num frames: 4321280. Throughput: 0: 2106.4. Samples: 1077852. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:11:00,564][45457] Avg episode reward: [(0, '26.441')] [2024-07-05 00:11:02,854][45733] Updated weights for policy 0, policy_version 1060 (0.0023) [2024-07-05 00:11:05,564][45457] Fps is (10 sec: 7372.2, 60 sec: 8191.9, 300 sec: 8386.4). Total num frames: 4349952. Throughput: 0: 2055.4. Samples: 1087806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:11:05,569][45457] Avg episode reward: [(0, '26.663')] [2024-07-05 00:11:10,563][45457] Fps is (10 sec: 3686.1, 60 sec: 7645.8, 300 sec: 8275.3). Total num frames: 4358144. Throughput: 0: 1959.1. Samples: 1090052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:11:10,568][45457] Avg episode reward: [(0, '26.834')] [2024-07-05 00:11:15,564][45457] Fps is (10 sec: 2457.6, 60 sec: 7236.3, 300 sec: 8192.0). Total num frames: 4374528. Throughput: 0: 1770.4. Samples: 1093996. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-07-05 00:11:15,568][45457] Avg episode reward: [(0, '27.659')] [2024-07-05 00:11:18,343][45733] Updated weights for policy 0, policy_version 1070 (0.0090) [2024-07-05 00:11:20,563][45457] Fps is (10 sec: 2867.2, 60 sec: 6758.3, 300 sec: 8080.9). Total num frames: 4386816. Throughput: 0: 1579.4. Samples: 1097926. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-07-05 00:11:20,568][45457] Avg episode reward: [(0, '27.766')] [2024-07-05 00:11:25,564][45457] Fps is (10 sec: 2457.6, 60 sec: 6280.5, 300 sec: 7983.7). Total num frames: 4399104. Throughput: 0: 1485.8. Samples: 1099918. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-07-05 00:11:25,568][45457] Avg episode reward: [(0, '27.233')] [2024-07-05 00:11:30,564][45457] Fps is (10 sec: 2457.5, 60 sec: 5734.3, 300 sec: 7886.5). Total num frames: 4411392. Throughput: 0: 1287.7. Samples: 1103874. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-07-05 00:11:30,567][45457] Avg episode reward: [(0, '27.273')] [2024-07-05 00:11:33,820][45733] Updated weights for policy 0, policy_version 1080 (0.0088) [2024-07-05 00:11:35,564][45457] Fps is (10 sec: 2867.2, 60 sec: 5324.7, 300 sec: 7803.2). Total num frames: 4427776. Throughput: 0: 1099.1. Samples: 1107860. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 00:11:35,569][45457] Avg episode reward: [(0, '25.135')] [2024-07-05 00:11:40,564][45457] Fps is (10 sec: 2867.2, 60 sec: 4846.9, 300 sec: 7706.0). Total num frames: 4440064. Throughput: 0: 1001.4. Samples: 1109854. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-07-05 00:11:40,568][45457] Avg episode reward: [(0, '24.835')] [2024-07-05 00:11:45,564][45457] Fps is (10 sec: 2457.5, 60 sec: 4369.0, 300 sec: 7608.8). Total num frames: 4452352. Throughput: 0: 799.0. Samples: 1113810. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 00:11:45,568][45457] Avg episode reward: [(0, '25.513')] [2024-07-05 00:11:49,287][45733] Updated weights for policy 0, policy_version 1090 (0.0086) [2024-07-05 00:11:50,563][45457] Fps is (10 sec: 2457.6, 60 sec: 3822.9, 300 sec: 7511.6). Total num frames: 4464640. Throughput: 0: 666.1. Samples: 1117780. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 00:11:50,567][45457] Avg episode reward: [(0, '25.722')] [2024-07-05 00:11:55,564][45457] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 7414.4). Total num frames: 4481024. Throughput: 0: 660.4. Samples: 1119770. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 00:11:55,569][45457] Avg episode reward: [(0, '25.471')] [2024-07-05 00:12:00,564][45457] Fps is (10 sec: 2867.1, 60 sec: 2867.2, 300 sec: 7317.2). Total num frames: 4493312. Throughput: 0: 659.1. Samples: 1123656. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 00:12:00,569][45457] Avg episode reward: [(0, '26.600')] [2024-07-05 00:12:04,895][45733] Updated weights for policy 0, policy_version 1100 (0.0089) [2024-07-05 00:12:05,564][45457] Fps is (10 sec: 2457.7, 60 sec: 2594.1, 300 sec: 7220.0). Total num frames: 4505600. Throughput: 0: 656.1. Samples: 1127452. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 00:12:05,568][45457] Avg episode reward: [(0, '26.292')] [2024-07-05 00:12:10,563][45457] Fps is (10 sec: 3686.7, 60 sec: 2867.2, 300 sec: 7150.7). Total num frames: 4530176. Throughput: 0: 664.4. Samples: 1129816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:12:10,564][45457] Avg episode reward: [(0, '26.955')] [2024-07-05 00:12:12,183][45733] Updated weights for policy 0, policy_version 1110 (0.0041) [2024-07-05 00:12:15,563][45457] Fps is (10 sec: 6963.8, 60 sec: 3345.1, 300 sec: 7164.5). Total num frames: 4575232. Throughput: 0: 829.2. Samples: 1141186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:12:15,564][45457] Avg episode reward: [(0, '29.109')] [2024-07-05 00:12:15,568][45720] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001117_4575232.pth... [2024-07-05 00:12:15,695][45720] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000000716_2932736.pth [2024-07-05 00:12:15,710][45720] Saving new best policy, reward=29.109! [2024-07-05 00:12:17,018][45733] Updated weights for policy 0, policy_version 1120 (0.0015) [2024-07-05 00:12:20,563][45457] Fps is (10 sec: 8601.7, 60 sec: 3823.0, 300 sec: 7164.5). Total num frames: 4616192. Throughput: 0: 1025.5. Samples: 1154008. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 00:12:20,564][45457] Avg episode reward: [(0, '28.579')] [2024-07-05 00:12:21,727][45733] Updated weights for policy 0, policy_version 1130 (0.0012) [2024-07-05 00:12:25,562][45457] Fps is (10 sec: 8601.7, 60 sec: 4369.1, 300 sec: 7164.5). Total num frames: 4661248. Throughput: 0: 1128.0. Samples: 1160612. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 00:12:25,563][45457] Avg episode reward: [(0, '27.690')] [2024-07-05 00:12:26,489][45733] Updated weights for policy 0, policy_version 1140 (0.0012) [2024-07-05 00:12:30,562][45457] Fps is (10 sec: 8601.7, 60 sec: 4847.0, 300 sec: 7164.5). Total num frames: 4702208. Throughput: 0: 1323.5. Samples: 1173364. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 00:12:30,563][45457] Avg episode reward: [(0, '28.016')] [2024-07-05 00:12:31,297][45733] Updated weights for policy 0, policy_version 1150 (0.0012) [2024-07-05 00:12:35,563][45457] Fps is (10 sec: 8191.3, 60 sec: 5256.6, 300 sec: 7164.5). Total num frames: 4743168. Throughput: 0: 1524.7. Samples: 1186390. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 00:12:35,568][45457] Avg episode reward: [(0, '29.795')] [2024-07-05 00:12:35,576][45720] Saving new best policy, reward=29.795! [2024-07-05 00:12:36,135][45733] Updated weights for policy 0, policy_version 1160 (0.0012) [2024-07-05 00:12:40,563][45457] Fps is (10 sec: 8600.7, 60 sec: 5802.7, 300 sec: 7164.5). Total num frames: 4788224. Throughput: 0: 1619.8. Samples: 1192662. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 00:12:40,565][45457] Avg episode reward: [(0, '30.795')] [2024-07-05 00:12:40,865][45720] Saving new best policy, reward=30.795! [2024-07-05 00:12:40,868][45733] Updated weights for policy 0, policy_version 1170 (0.0012) [2024-07-05 00:12:45,563][45457] Fps is (10 sec: 8602.2, 60 sec: 6280.7, 300 sec: 7164.5). Total num frames: 4829184. Throughput: 0: 1813.1. Samples: 1205244. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 00:12:45,563][45457] Avg episode reward: [(0, '30.346')] [2024-07-05 00:12:45,736][45733] Updated weights for policy 0, policy_version 1180 (0.0011) [2024-07-05 00:12:50,429][45733] Updated weights for policy 0, policy_version 1190 (0.0011) [2024-07-05 00:12:50,562][45457] Fps is (10 sec: 8602.5, 60 sec: 6826.8, 300 sec: 7178.4). Total num frames: 4874240. Throughput: 0: 2019.3. Samples: 1218318. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 00:12:50,563][45457] Avg episode reward: [(0, '29.584')] [2024-07-05 00:12:55,286][45733] Updated weights for policy 0, policy_version 1200 (0.0011) [2024-07-05 00:12:55,564][45457] Fps is (10 sec: 8600.8, 60 sec: 7236.3, 300 sec: 7164.5). Total num frames: 4915200. Throughput: 0: 2105.2. Samples: 1224550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:12:55,568][45457] Avg episode reward: [(0, '27.384')] [2024-07-05 00:13:00,122][45733] Updated weights for policy 0, policy_version 1210 (0.0012) [2024-07-05 00:13:00,562][45457] Fps is (10 sec: 8192.0, 60 sec: 7714.3, 300 sec: 7164.5). Total num frames: 4956160. Throughput: 0: 2139.0. Samples: 1237442. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:13:00,563][45457] Avg episode reward: [(0, '26.764')] [2024-07-05 00:13:05,041][45733] Updated weights for policy 0, policy_version 1220 (0.0012) [2024-07-05 00:13:05,563][45457] Fps is (10 sec: 8602.3, 60 sec: 8260.4, 300 sec: 7164.5). Total num frames: 5001216. Throughput: 0: 2131.8. Samples: 1249940. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 00:13:05,564][45457] Avg episode reward: [(0, '26.776')] [2024-07-05 00:13:06,018][45720] Stopping Batcher_0... [2024-07-05 00:13:06,019][45720] Loop batcher_evt_loop terminating... [2024-07-05 00:13:06,019][45720] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-07-05 00:13:06,021][45457] Component Batcher_0 stopped! [2024-07-05 00:13:06,027][45735] Stopping RolloutWorker_w1... [2024-07-05 00:13:06,027][45734] Stopping RolloutWorker_w0... [2024-07-05 00:13:06,027][45740] Stopping RolloutWorker_w6... [2024-07-05 00:13:06,027][45739] Stopping RolloutWorker_w5... [2024-07-05 00:13:06,027][45741] Stopping RolloutWorker_w7... [2024-07-05 00:13:06,027][45735] Loop rollout_proc1_evt_loop terminating... [2024-07-05 00:13:06,027][45738] Stopping RolloutWorker_w4... [2024-07-05 00:13:06,027][45734] Loop rollout_proc0_evt_loop terminating... [2024-07-05 00:13:06,027][45740] Loop rollout_proc6_evt_loop terminating... [2024-07-05 00:13:06,027][45739] Loop rollout_proc5_evt_loop terminating... [2024-07-05 00:13:06,028][45741] Loop rollout_proc7_evt_loop terminating... [2024-07-05 00:13:06,028][45738] Loop rollout_proc4_evt_loop terminating... [2024-07-05 00:13:06,028][45737] Stopping RolloutWorker_w3... [2024-07-05 00:13:06,028][45737] Loop rollout_proc3_evt_loop terminating... [2024-07-05 00:13:06,028][45736] Stopping RolloutWorker_w2... [2024-07-05 00:13:06,027][45457] Component RolloutWorker_w1 stopped! [2024-07-05 00:13:06,028][45736] Loop rollout_proc2_evt_loop terminating... [2024-07-05 00:13:06,028][45457] Component RolloutWorker_w0 stopped! [2024-07-05 00:13:06,029][45457] Component RolloutWorker_w6 stopped! [2024-07-05 00:13:06,030][45457] Component RolloutWorker_w5 stopped! [2024-07-05 00:13:06,030][45457] Component RolloutWorker_w7 stopped! [2024-07-05 00:13:06,031][45457] Component RolloutWorker_w4 stopped! [2024-07-05 00:13:06,032][45457] Component RolloutWorker_w3 stopped! [2024-07-05 00:13:06,032][45457] Component RolloutWorker_w2 stopped! [2024-07-05 00:13:06,051][45733] Weights refcount: 2 0 [2024-07-05 00:13:06,053][45733] Stopping InferenceWorker_p0-w0... [2024-07-05 00:13:06,053][45733] Loop inference_proc0-0_evt_loop terminating... [2024-07-05 00:13:06,053][45457] Component InferenceWorker_p0-w0 stopped! [2024-07-05 00:13:06,115][45720] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000000963_3944448.pth [2024-07-05 00:13:06,125][45720] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-07-05 00:13:06,251][45720] Stopping LearnerWorker_p0... [2024-07-05 00:13:06,252][45720] Loop learner_proc0_evt_loop terminating... [2024-07-05 00:13:06,251][45457] Component LearnerWorker_p0 stopped! [2024-07-05 00:13:06,253][45457] Waiting for process learner_proc0 to stop... [2024-07-05 00:13:07,243][45457] Waiting for process inference_proc0-0 to join... [2024-07-05 00:13:07,245][45457] Waiting for process rollout_proc0 to join... [2024-07-05 00:13:07,246][45457] Waiting for process rollout_proc1 to join... [2024-07-05 00:13:07,247][45457] Waiting for process rollout_proc2 to join... [2024-07-05 00:13:07,247][45457] Waiting for process rollout_proc3 to join... [2024-07-05 00:13:07,248][45457] Waiting for process rollout_proc4 to join... [2024-07-05 00:13:07,249][45457] Waiting for process rollout_proc5 to join... [2024-07-05 00:13:07,249][45457] Waiting for process rollout_proc6 to join... [2024-07-05 00:13:07,250][45457] Waiting for process rollout_proc7 to join... [2024-07-05 00:13:07,250][45457] Batcher 0 profile tree view: batching: 10.6002, releasing_batches: 0.0368 [2024-07-05 00:13:07,251][45457] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 3.5569 update_model: 4.1065 weight_update: 0.0011 one_step: 0.0028 handle_policy_step: 618.6853 deserialize: 8.4782, stack: 1.2233, obs_to_device_normalize: 101.5059, forward: 377.7692, send_messages: 11.8519 prepare_outputs: 109.2451 to_cpu: 98.5216 [2024-07-05 00:13:07,252][45457] Learner 0 profile tree view: misc: 0.0084, prepare_batch: 23.2111 train: 512.3367 epoch_init: 0.0171, minibatch_init: 0.0237, losses_postprocess: 0.4656, kl_divergence: 0.2496, after_optimizer: 328.6709 calculate_losses: 172.7099 losses_init: 0.0117, forward_head: 6.1247, bptt_initial: 162.9234, tail: 0.5551, advantages_returns: 0.1363, losses: 1.7374 bptt: 0.9351 bptt_forward_core: 0.8944 update: 9.5899 clip: 1.1686 [2024-07-05 00:13:07,252][45457] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.1612, enqueue_policy_requests: 8.4153, env_step: 109.0174, overhead: 10.2251, complete_rollouts: 0.2774 save_policy_outputs: 10.4592 split_output_tensors: 4.8611 [2024-07-05 00:13:07,252][45457] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.1197, enqueue_policy_requests: 8.3571, env_step: 116.6297, overhead: 9.9265, complete_rollouts: 0.2259 save_policy_outputs: 10.4195 split_output_tensors: 4.8793 [2024-07-05 00:13:07,253][45457] Loop Runner_EvtLoop terminating... [2024-07-05 00:13:07,254][45457] Runner profile tree view: main_loop: 646.3090 [2024-07-05 00:13:07,254][45457] Collected {0: 5005312}, FPS: 7744.5 [2024-07-05 00:13:30,505][45457] Loading existing experiment configuration from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/config.json [2024-07-05 00:13:30,507][45457] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-05 00:13:30,508][45457] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-05 00:13:30,508][45457] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-05 00:13:30,509][45457] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-05 00:13:30,509][45457] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-05 00:13:30,510][45457] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-07-05 00:13:30,511][45457] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-05 00:13:30,511][45457] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-07-05 00:13:30,512][45457] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-07-05 00:13:30,512][45457] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-05 00:13:30,512][45457] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-05 00:13:30,513][45457] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-05 00:13:30,513][45457] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-05 00:13:30,513][45457] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-05 00:13:30,540][45457] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 00:13:30,542][45457] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 00:13:30,544][45457] RunningMeanStd input shape: (1,) [2024-07-05 00:13:30,559][45457] Num input channels: 3 [2024-07-05 00:13:30,573][45457] Convolutional layer output size: 4608 [2024-07-05 00:13:30,601][45457] Policy head output size: 512 [2024-07-05 00:13:30,838][45457] Loading state from checkpoint /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-07-05 00:13:31,866][45457] Num frames 100... [2024-07-05 00:13:31,987][45457] Num frames 200... [2024-07-05 00:13:32,107][45457] Num frames 300... [2024-07-05 00:13:32,245][45457] Num frames 400... [2024-07-05 00:13:32,367][45457] Num frames 500... [2024-07-05 00:13:32,490][45457] Num frames 600... [2024-07-05 00:13:32,614][45457] Num frames 700... [2024-07-05 00:13:32,733][45457] Num frames 800... [2024-07-05 00:13:32,829][45457] Num frames 900... [2024-07-05 00:13:32,926][45457] Num frames 1000... [2024-07-05 00:13:33,022][45457] Num frames 1100... [2024-07-05 00:13:33,124][45457] Num frames 1200... [2024-07-05 00:13:33,228][45457] Num frames 1300... [2024-07-05 00:13:33,330][45457] Num frames 1400... [2024-07-05 00:13:33,429][45457] Num frames 1500... [2024-07-05 00:13:33,528][45457] Num frames 1600... [2024-07-05 00:13:33,624][45457] Num frames 1700... [2024-07-05 00:13:33,718][45457] Num frames 1800... [2024-07-05 00:13:33,817][45457] Num frames 1900... [2024-07-05 00:13:33,912][45457] Num frames 2000... [2024-07-05 00:13:34,012][45457] Num frames 2100... [2024-07-05 00:13:34,064][45457] Avg episode rewards: #0: 56.999, true rewards: #0: 21.000 [2024-07-05 00:13:34,065][45457] Avg episode reward: 56.999, avg true_objective: 21.000 [2024-07-05 00:13:34,162][45457] Num frames 2200... [2024-07-05 00:13:34,259][45457] Num frames 2300... [2024-07-05 00:13:34,365][45457] Avg episode rewards: #0: 29.779, true rewards: #0: 11.780 [2024-07-05 00:13:34,367][45457] Avg episode reward: 29.779, avg true_objective: 11.780 [2024-07-05 00:13:34,410][45457] Num frames 2400... [2024-07-05 00:13:34,508][45457] Num frames 2500... [2024-07-05 00:13:34,604][45457] Num frames 2600... [2024-07-05 00:13:34,704][45457] Num frames 2700... [2024-07-05 00:13:34,806][45457] Num frames 2800... [2024-07-05 00:13:34,921][45457] Num frames 2900... [2024-07-05 00:13:35,035][45457] Num frames 3000... [2024-07-05 00:13:35,132][45457] Num frames 3100... [2024-07-05 00:13:35,228][45457] Num frames 3200... [2024-07-05 00:13:35,323][45457] Num frames 3300... [2024-07-05 00:13:35,422][45457] Num frames 3400... [2024-07-05 00:13:35,513][45457] Num frames 3500... [2024-07-05 00:13:35,607][45457] Num frames 3600... [2024-07-05 00:13:35,701][45457] Num frames 3700... [2024-07-05 00:13:35,752][45457] Avg episode rewards: #0: 29.000, true rewards: #0: 12.333 [2024-07-05 00:13:35,753][45457] Avg episode reward: 29.000, avg true_objective: 12.333 [2024-07-05 00:13:35,847][45457] Num frames 3800... [2024-07-05 00:13:35,940][45457] Num frames 3900... [2024-07-05 00:13:36,033][45457] Num frames 4000... [2024-07-05 00:13:36,125][45457] Num frames 4100... [2024-07-05 00:13:36,217][45457] Num frames 4200... [2024-07-05 00:13:36,312][45457] Num frames 4300... [2024-07-05 00:13:36,405][45457] Num frames 4400... [2024-07-05 00:13:36,500][45457] Num frames 4500... [2024-07-05 00:13:36,596][45457] Num frames 4600... [2024-07-05 00:13:36,690][45457] Num frames 4700... [2024-07-05 00:13:36,784][45457] Num frames 4800... [2024-07-05 00:13:36,858][45457] Avg episode rewards: #0: 28.800, true rewards: #0: 12.050 [2024-07-05 00:13:36,859][45457] Avg episode reward: 28.800, avg true_objective: 12.050 [2024-07-05 00:13:36,936][45457] Num frames 4900... [2024-07-05 00:13:37,030][45457] Num frames 5000... [2024-07-05 00:13:37,126][45457] Num frames 5100... [2024-07-05 00:13:37,224][45457] Num frames 5200... [2024-07-05 00:13:37,318][45457] Num frames 5300... [2024-07-05 00:13:37,414][45457] Num frames 5400... [2024-07-05 00:13:37,509][45457] Num frames 5500... [2024-07-05 00:13:37,606][45457] Num frames 5600... [2024-07-05 00:13:37,703][45457] Num frames 5700... [2024-07-05 00:13:37,800][45457] Num frames 5800... [2024-07-05 00:13:37,895][45457] Num frames 5900... [2024-07-05 00:13:37,990][45457] Num frames 6000... [2024-07-05 00:13:38,085][45457] Num frames 6100... [2024-07-05 00:13:38,180][45457] Num frames 6200... [2024-07-05 00:13:38,275][45457] Num frames 6300... [2024-07-05 00:13:38,369][45457] Num frames 6400... [2024-07-05 00:13:38,463][45457] Num frames 6500... [2024-07-05 00:13:38,559][45457] Num frames 6600... [2024-07-05 00:13:38,654][45457] Num frames 6700... [2024-07-05 00:13:38,751][45457] Num frames 6800... [2024-07-05 00:13:38,846][45457] Num frames 6900... [2024-07-05 00:13:38,920][45457] Avg episode rewards: #0: 34.240, true rewards: #0: 13.840 [2024-07-05 00:13:38,921][45457] Avg episode reward: 34.240, avg true_objective: 13.840 [2024-07-05 00:13:38,998][45457] Num frames 7000... [2024-07-05 00:13:39,093][45457] Num frames 7100... [2024-07-05 00:13:39,189][45457] Num frames 7200... [2024-07-05 00:13:39,284][45457] Num frames 7300... [2024-07-05 00:13:39,377][45457] Num frames 7400... [2024-07-05 00:13:39,474][45457] Num frames 7500... [2024-07-05 00:13:39,575][45457] Num frames 7600... [2024-07-05 00:13:39,672][45457] Num frames 7700... [2024-07-05 00:13:39,768][45457] Num frames 7800... [2024-07-05 00:13:39,864][45457] Num frames 7900... [2024-07-05 00:13:39,960][45457] Num frames 8000... [2024-07-05 00:13:40,057][45457] Num frames 8100... [2024-07-05 00:13:40,157][45457] Num frames 8200... [2024-07-05 00:13:40,255][45457] Num frames 8300... [2024-07-05 00:13:40,352][45457] Num frames 8400... [2024-07-05 00:13:40,450][45457] Num frames 8500... [2024-07-05 00:13:40,548][45457] Num frames 8600... [2024-07-05 00:13:40,697][45457] Avg episode rewards: #0: 35.831, true rewards: #0: 14.498 [2024-07-05 00:13:40,698][45457] Avg episode reward: 35.831, avg true_objective: 14.498 [2024-07-05 00:13:40,699][45457] Num frames 8700... [2024-07-05 00:13:40,791][45457] Num frames 8800... [2024-07-05 00:13:40,883][45457] Num frames 8900... [2024-07-05 00:13:40,978][45457] Num frames 9000... [2024-07-05 00:13:41,073][45457] Num frames 9100... [2024-07-05 00:13:41,163][45457] Num frames 9200... [2024-07-05 00:13:41,253][45457] Num frames 9300... [2024-07-05 00:13:41,310][45457] Avg episode rewards: #0: 32.575, true rewards: #0: 13.290 [2024-07-05 00:13:41,311][45457] Avg episode reward: 32.575, avg true_objective: 13.290 [2024-07-05 00:13:41,399][45457] Num frames 9400... [2024-07-05 00:13:41,489][45457] Num frames 9500... [2024-07-05 00:13:41,583][45457] Num frames 9600... [2024-07-05 00:13:41,677][45457] Num frames 9700... [2024-07-05 00:13:41,771][45457] Num frames 9800... [2024-07-05 00:13:41,864][45457] Num frames 9900... [2024-07-05 00:13:41,960][45457] Num frames 10000... [2024-07-05 00:13:42,056][45457] Num frames 10100... [2024-07-05 00:13:42,152][45457] Num frames 10200... [2024-07-05 00:13:42,243][45457] Num frames 10300... [2024-07-05 00:13:42,335][45457] Num frames 10400... [2024-07-05 00:13:42,432][45457] Num frames 10500... [2024-07-05 00:13:42,523][45457] Num frames 10600... [2024-07-05 00:13:42,615][45457] Num frames 10700... [2024-07-05 00:13:42,736][45457] Avg episode rewards: #0: 33.467, true rewards: #0: 13.468 [2024-07-05 00:13:42,738][45457] Avg episode reward: 33.467, avg true_objective: 13.468 [2024-07-05 00:13:42,764][45457] Num frames 10800... [2024-07-05 00:13:42,857][45457] Num frames 10900... [2024-07-05 00:13:42,951][45457] Num frames 11000... [2024-07-05 00:13:43,042][45457] Num frames 11100... [2024-07-05 00:13:43,135][45457] Num frames 11200... [2024-07-05 00:13:43,230][45457] Num frames 11300... [2024-07-05 00:13:43,321][45457] Num frames 11400... [2024-07-05 00:13:43,421][45457] Num frames 11500... [2024-07-05 00:13:43,520][45457] Num frames 11600... [2024-07-05 00:13:43,614][45457] Num frames 11700... [2024-07-05 00:13:43,717][45457] Num frames 11800... [2024-07-05 00:13:43,812][45457] Num frames 11900... [2024-07-05 00:13:43,950][45457] Avg episode rewards: #0: 32.433, true rewards: #0: 13.322 [2024-07-05 00:13:43,951][45457] Avg episode reward: 32.433, avg true_objective: 13.322 [2024-07-05 00:13:43,961][45457] Num frames 12000... [2024-07-05 00:13:44,054][45457] Num frames 12100... [2024-07-05 00:13:44,148][45457] Num frames 12200... [2024-07-05 00:13:44,245][45457] Num frames 12300... [2024-07-05 00:13:44,341][45457] Num frames 12400... [2024-07-05 00:13:44,436][45457] Num frames 12500... [2024-07-05 00:13:44,554][45457] Avg episode rewards: #0: 30.366, true rewards: #0: 12.566 [2024-07-05 00:13:44,555][45457] Avg episode reward: 30.366, avg true_objective: 12.566 [2024-07-05 00:13:59,152][45457] Replay video saved to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/replay.mp4! [2024-07-05 00:21:35,280][45457] Loading existing experiment configuration from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/config.json [2024-07-05 00:21:35,281][45457] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-05 00:21:35,281][45457] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-05 00:21:35,282][45457] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-05 00:21:35,282][45457] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-05 00:21:35,283][45457] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-05 00:21:35,283][45457] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-07-05 00:21:35,283][45457] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-05 00:21:35,284][45457] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-07-05 00:21:35,284][45457] Adding new argument 'hf_repository'='ra9hu/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-07-05 00:21:35,285][45457] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-05 00:21:35,285][45457] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-05 00:21:35,285][45457] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-05 00:21:35,286][45457] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-05 00:21:35,286][45457] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-05 00:21:35,307][45457] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 00:21:35,309][45457] RunningMeanStd input shape: (1,) [2024-07-05 00:21:35,319][45457] Num input channels: 3 [2024-07-05 00:21:35,328][45457] Convolutional layer output size: 4608 [2024-07-05 00:21:35,342][45457] Policy head output size: 512 [2024-07-05 00:21:35,419][45457] Loading state from checkpoint /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-07-05 00:21:36,108][45457] Num frames 100... [2024-07-05 00:21:36,216][45457] Num frames 200... [2024-07-05 00:21:36,317][45457] Num frames 300... [2024-07-05 00:21:36,421][45457] Num frames 400... [2024-07-05 00:21:36,518][45457] Num frames 500... [2024-07-05 00:21:36,607][45457] Num frames 600... [2024-07-05 00:21:36,701][45457] Num frames 700... [2024-07-05 00:21:36,795][45457] Num frames 800... [2024-07-05 00:21:36,899][45457] Num frames 900... [2024-07-05 00:21:36,985][45457] Num frames 1000... [2024-07-05 00:21:37,075][45457] Num frames 1100... [2024-07-05 00:21:37,167][45457] Num frames 1200... [2024-07-05 00:21:37,262][45457] Num frames 1300... [2024-07-05 00:21:37,360][45457] Num frames 1400... [2024-07-05 00:21:37,457][45457] Num frames 1500... [2024-07-05 00:21:37,550][45457] Num frames 1600... [2024-07-05 00:21:37,645][45457] Num frames 1700... [2024-07-05 00:21:37,738][45457] Num frames 1800... [2024-07-05 00:21:37,816][45457] Avg episode rewards: #0: 46.239, true rewards: #0: 18.240 [2024-07-05 00:21:37,817][45457] Avg episode reward: 46.239, avg true_objective: 18.240 [2024-07-05 00:21:37,886][45457] Num frames 1900... [2024-07-05 00:21:37,980][45457] Num frames 2000... [2024-07-05 00:21:38,072][45457] Num frames 2100... [2024-07-05 00:21:38,167][45457] Num frames 2200... [2024-07-05 00:21:38,262][45457] Num frames 2300... [2024-07-05 00:21:38,358][45457] Num frames 2400... [2024-07-05 00:21:38,410][45457] Avg episode rewards: #0: 28.000, true rewards: #0: 12.000 [2024-07-05 00:21:38,411][45457] Avg episode reward: 28.000, avg true_objective: 12.000 [2024-07-05 00:21:38,505][45457] Num frames 2500... [2024-07-05 00:21:38,597][45457] Num frames 2600... [2024-07-05 00:21:38,690][45457] Num frames 2700... [2024-07-05 00:21:38,786][45457] Num frames 2800... [2024-07-05 00:21:38,878][45457] Num frames 2900... [2024-07-05 00:21:38,975][45457] Num frames 3000... [2024-07-05 00:21:39,072][45457] Num frames 3100... [2024-07-05 00:21:39,169][45457] Num frames 3200... [2024-07-05 00:21:39,314][45457] Avg episode rewards: #0: 23.986, true rewards: #0: 10.987 [2024-07-05 00:21:39,316][45457] Avg episode reward: 23.986, avg true_objective: 10.987 [2024-07-05 00:21:39,321][45457] Num frames 3300... [2024-07-05 00:21:39,416][45457] Num frames 3400... [2024-07-05 00:21:39,508][45457] Num frames 3500... [2024-07-05 00:21:39,604][45457] Num frames 3600... [2024-07-05 00:21:39,700][45457] Num frames 3700... [2024-07-05 00:21:39,814][45457] Num frames 3800... [2024-07-05 00:21:39,934][45457] Num frames 3900... [2024-07-05 00:21:40,033][45457] Num frames 4000... [2024-07-05 00:21:40,124][45457] Num frames 4100... [2024-07-05 00:21:40,215][45457] Num frames 4200... [2024-07-05 00:21:40,305][45457] Num frames 4300... [2024-07-05 00:21:40,399][45457] Num frames 4400... [2024-07-05 00:21:40,492][45457] Num frames 4500... [2024-07-05 00:21:40,584][45457] Num frames 4600... [2024-07-05 00:21:40,676][45457] Num frames 4700... [2024-07-05 00:21:40,770][45457] Num frames 4800... [2024-07-05 00:21:40,862][45457] Num frames 4900... [2024-07-05 00:21:40,955][45457] Num frames 5000... [2024-07-05 00:21:41,047][45457] Num frames 5100... [2024-07-05 00:21:41,140][45457] Num frames 5200... [2024-07-05 00:21:41,232][45457] Num frames 5300... [2024-07-05 00:21:41,374][45457] Avg episode rewards: #0: 31.490, true rewards: #0: 13.490 [2024-07-05 00:21:41,375][45457] Avg episode reward: 31.490, avg true_objective: 13.490 [2024-07-05 00:21:41,379][45457] Num frames 5400... [2024-07-05 00:21:41,471][45457] Num frames 5500... [2024-07-05 00:21:41,564][45457] Num frames 5600... [2024-07-05 00:21:41,657][45457] Num frames 5700... [2024-07-05 00:21:41,751][45457] Num frames 5800... [2024-07-05 00:21:41,840][45457] Num frames 5900... [2024-07-05 00:21:41,930][45457] Num frames 6000... [2024-07-05 00:21:41,989][45457] Avg episode rewards: #0: 27.008, true rewards: #0: 12.008 [2024-07-05 00:21:41,990][45457] Avg episode reward: 27.008, avg true_objective: 12.008 [2024-07-05 00:21:42,078][45457] Num frames 6100... [2024-07-05 00:21:42,169][45457] Num frames 6200... [2024-07-05 00:21:42,262][45457] Num frames 6300... [2024-07-05 00:21:42,355][45457] Num frames 6400... [2024-07-05 00:21:42,456][45457] Avg episode rewards: #0: 23.420, true rewards: #0: 10.753 [2024-07-05 00:21:42,456][45457] Avg episode reward: 23.420, avg true_objective: 10.753 [2024-07-05 00:21:42,501][45457] Num frames 6500... [2024-07-05 00:21:42,594][45457] Num frames 6600... [2024-07-05 00:21:42,686][45457] Num frames 6700... [2024-07-05 00:21:42,778][45457] Num frames 6800... [2024-07-05 00:21:42,867][45457] Num frames 6900... [2024-07-05 00:21:42,960][45457] Num frames 7000... [2024-07-05 00:21:43,050][45457] Num frames 7100... [2024-07-05 00:21:43,140][45457] Num frames 7200... [2024-07-05 00:21:43,232][45457] Num frames 7300... [2024-07-05 00:21:43,332][45457] Num frames 7400... [2024-07-05 00:21:43,423][45457] Num frames 7500... [2024-07-05 00:21:43,517][45457] Num frames 7600... [2024-07-05 00:21:43,609][45457] Num frames 7700... [2024-07-05 00:21:43,702][45457] Num frames 7800... [2024-07-05 00:21:43,794][45457] Num frames 7900... [2024-07-05 00:21:43,879][45457] Num frames 8000... [2024-07-05 00:21:43,958][45457] Num frames 8100... [2024-07-05 00:21:44,040][45457] Num frames 8200... [2024-07-05 00:21:44,119][45457] Num frames 8300... [2024-07-05 00:21:44,197][45457] Num frames 8400... [2024-07-05 00:21:44,277][45457] Num frames 8500... [2024-07-05 00:21:44,371][45457] Avg episode rewards: #0: 28.788, true rewards: #0: 12.217 [2024-07-05 00:21:44,372][45457] Avg episode reward: 28.788, avg true_objective: 12.217 [2024-07-05 00:21:44,408][45457] Num frames 8600... [2024-07-05 00:21:44,479][45457] Num frames 8700... [2024-07-05 00:21:44,561][45457] Num frames 8800... [2024-07-05 00:21:44,641][45457] Num frames 8900... [2024-07-05 00:21:44,727][45457] Num frames 9000... [2024-07-05 00:21:44,813][45457] Num frames 9100... [2024-07-05 00:21:44,897][45457] Num frames 9200... [2024-07-05 00:21:44,987][45457] Num frames 9300... [2024-07-05 00:21:45,072][45457] Num frames 9400... [2024-07-05 00:21:45,157][45457] Num frames 9500... [2024-07-05 00:21:45,242][45457] Num frames 9600... [2024-07-05 00:21:45,328][45457] Num frames 9700... [2024-07-05 00:21:45,412][45457] Num frames 9800... [2024-07-05 00:21:45,497][45457] Num frames 9900... [2024-07-05 00:21:45,583][45457] Num frames 10000... [2024-07-05 00:21:45,668][45457] Num frames 10100... [2024-07-05 00:21:45,754][45457] Num frames 10200... [2024-07-05 00:21:45,838][45457] Num frames 10300... [2024-07-05 00:21:45,923][45457] Num frames 10400... [2024-07-05 00:21:46,006][45457] Num frames 10500... [2024-07-05 00:21:46,092][45457] Num frames 10600... [2024-07-05 00:21:46,191][45457] Avg episode rewards: #0: 32.190, true rewards: #0: 13.315 [2024-07-05 00:21:46,193][45457] Avg episode reward: 32.190, avg true_objective: 13.315 [2024-07-05 00:21:46,234][45457] Num frames 10700... [2024-07-05 00:21:46,317][45457] Num frames 10800... [2024-07-05 00:21:46,398][45457] Num frames 10900... [2024-07-05 00:21:46,484][45457] Num frames 11000... [2024-07-05 00:21:46,566][45457] Num frames 11100... [2024-07-05 00:21:46,648][45457] Num frames 11200... [2024-07-05 00:21:46,731][45457] Num frames 11300... [2024-07-05 00:21:46,817][45457] Num frames 11400... [2024-07-05 00:21:46,903][45457] Num frames 11500... [2024-07-05 00:21:46,996][45457] Num frames 11600... [2024-07-05 00:21:47,081][45457] Num frames 11700... [2024-07-05 00:21:47,168][45457] Num frames 11800... [2024-07-05 00:21:47,263][45457] Num frames 11900... [2024-07-05 00:21:47,352][45457] Num frames 12000... [2024-07-05 00:21:47,438][45457] Num frames 12100... [2024-07-05 00:21:47,526][45457] Num frames 12200... [2024-07-05 00:21:47,613][45457] Num frames 12300... [2024-07-05 00:21:47,701][45457] Num frames 12400... [2024-07-05 00:21:47,789][45457] Num frames 12500... [2024-07-05 00:21:47,876][45457] Num frames 12600... [2024-07-05 00:21:47,965][45457] Num frames 12700... [2024-07-05 00:21:48,065][45457] Avg episode rewards: #0: 33.946, true rewards: #0: 14.169 [2024-07-05 00:21:48,068][45457] Avg episode reward: 33.946, avg true_objective: 14.169 [2024-07-05 00:21:48,111][45457] Num frames 12800... [2024-07-05 00:21:48,196][45457] Num frames 12900... [2024-07-05 00:21:48,282][45457] Num frames 13000... [2024-07-05 00:21:48,369][45457] Num frames 13100... [2024-07-05 00:21:48,456][45457] Num frames 13200... [2024-07-05 00:21:48,564][45457] Avg episode rewards: #0: 31.264, true rewards: #0: 13.264 [2024-07-05 00:21:48,566][45457] Avg episode reward: 31.264, avg true_objective: 13.264 [2024-07-05 00:22:03,968][45457] Replay video saved to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/replay.mp4! [2024-07-05 00:22:47,229][45457] The model has been pushed to https://huggingface.co/ra9hu/rl_course_vizdoom_health_gathering_supreme [2024-07-05 00:32:07,320][45457] Loading existing experiment configuration from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/config.json [2024-07-05 00:32:07,321][45457] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-05 00:32:07,322][45457] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-05 00:32:07,322][45457] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-05 00:32:07,323][45457] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-05 00:32:07,324][45457] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-05 00:32:07,324][45457] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-07-05 00:32:07,324][45457] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-05 00:32:07,325][45457] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-07-05 00:32:07,325][45457] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-07-05 00:32:07,325][45457] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-05 00:32:07,325][45457] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-05 00:32:07,326][45457] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-05 00:32:07,326][45457] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-05 00:32:07,326][45457] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-05 00:32:07,345][45457] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 00:32:07,346][45457] RunningMeanStd input shape: (1,) [2024-07-05 00:32:07,357][45457] ConvEncoder: input_channels=3 [2024-07-05 00:32:07,480][45457] Conv encoder output size: 512 [2024-07-05 00:32:07,482][45457] Policy head output size: 512 [2024-07-05 00:32:07,507][45457] Loading state from checkpoint /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... [2024-07-05 00:33:54,772][45457] Environment doom_basic already registered, overwriting... [2024-07-05 00:33:54,774][45457] Environment doom_two_colors_easy already registered, overwriting... [2024-07-05 00:33:54,774][45457] Environment doom_two_colors_hard already registered, overwriting... [2024-07-05 00:33:54,775][45457] Environment doom_dm already registered, overwriting... [2024-07-05 00:33:54,775][45457] Environment doom_dwango5 already registered, overwriting... [2024-07-05 00:33:54,775][45457] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-07-05 00:33:54,776][45457] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-07-05 00:33:54,776][45457] Environment doom_my_way_home already registered, overwriting... [2024-07-05 00:33:54,776][45457] Environment doom_deadly_corridor already registered, overwriting... [2024-07-05 00:33:54,776][45457] Environment doom_defend_the_center already registered, overwriting... [2024-07-05 00:33:54,777][45457] Environment doom_defend_the_line already registered, overwriting... [2024-07-05 00:33:54,777][45457] Environment doom_health_gathering already registered, overwriting... [2024-07-05 00:33:54,777][45457] Environment doom_health_gathering_supreme already registered, overwriting... [2024-07-05 00:33:54,778][45457] Environment doom_battle already registered, overwriting... [2024-07-05 00:33:54,778][45457] Environment doom_battle2 already registered, overwriting... [2024-07-05 00:33:54,779][45457] Environment doom_duel_bots already registered, overwriting... [2024-07-05 00:33:54,779][45457] Environment doom_deathmatch_bots already registered, overwriting... [2024-07-05 00:33:54,780][45457] Environment doom_duel already registered, overwriting... [2024-07-05 00:33:54,780][45457] Environment doom_deathmatch_full already registered, overwriting... [2024-07-05 00:33:54,781][45457] Environment doom_benchmark already registered, overwriting... [2024-07-05 00:33:54,781][45457] register_encoder_factory: [2024-07-05 00:33:54,789][45457] Saved parameter configuration for experiment default_experiment not found! [2024-07-05 00:33:54,790][45457] Starting experiment from scratch! [2024-07-05 00:33:54,795][45457] Experiment dir /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment already exists! [2024-07-05 00:33:54,797][45457] Resuming existing experiment from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment... [2024-07-05 00:33:54,797][45457] Weights and Biases integration disabled [2024-07-05 00:33:54,799][45457] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-07-05 00:33:58,058][45457] Automatically setting recurrence to 32 [2024-07-05 00:33:58,059][45457] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=20000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=20000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 20000000} git_hash=unknown git_repo_name=not a git repository [2024-07-05 00:33:58,060][45457] Saving configuration to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/config.json... [2024-07-05 00:33:58,061][45457] Rollout worker 0 uses device cpu [2024-07-05 00:33:58,061][45457] Rollout worker 1 uses device cpu [2024-07-05 00:33:58,062][45457] Rollout worker 2 uses device cpu [2024-07-05 00:33:58,062][45457] Rollout worker 3 uses device cpu [2024-07-05 00:33:58,062][45457] Rollout worker 4 uses device cpu [2024-07-05 00:33:58,063][45457] Rollout worker 5 uses device cpu [2024-07-05 00:33:58,063][45457] Rollout worker 6 uses device cpu [2024-07-05 00:33:58,063][45457] Rollout worker 7 uses device cpu [2024-07-05 00:33:58,093][45457] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 00:33:58,094][45457] InferenceWorker_p0-w0: min num requests: 2 [2024-07-05 00:33:58,121][45457] Starting all processes... [2024-07-05 00:33:58,122][45457] Starting process learner_proc0 [2024-07-05 00:33:58,171][45457] Starting all processes... [2024-07-05 00:33:58,174][45457] Starting process inference_proc0-0 [2024-07-05 00:33:58,174][45457] Starting process rollout_proc0 [2024-07-05 00:33:58,175][45457] Starting process rollout_proc1 [2024-07-05 00:33:58,175][45457] Starting process rollout_proc2 [2024-07-05 00:33:58,175][45457] Starting process rollout_proc3 [2024-07-05 00:33:58,177][45457] Starting process rollout_proc4 [2024-07-05 00:33:58,178][45457] Starting process rollout_proc5 [2024-07-05 00:33:58,179][45457] Starting process rollout_proc6 [2024-07-05 00:33:58,179][45457] Starting process rollout_proc7 [2024-07-05 00:34:02,272][45457] Inference worker 0-0 is ready! [2024-07-05 00:34:02,273][45457] All inference workers are ready! Signal rollout workers to start! [2024-07-05 00:34:04,800][45457] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 00:34:04,801][45457] Avg episode reward: [(0, '1.869')] [2024-07-05 00:34:09,800][45457] Fps is (10 sec: 18022.6, 60 sec: 18022.6, 300 sec: 18022.6). Total num frames: 90112. Throughput: 0: 3338.0. Samples: 16690. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-05 00:34:09,801][45457] Avg episode reward: [(0, '4.463')] [2024-07-05 00:34:14,800][45457] Fps is (10 sec: 20070.5, 60 sec: 20070.5, 300 sec: 20070.5). Total num frames: 200704. Throughput: 0: 4792.2. Samples: 47922. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-05 00:34:14,801][45457] Avg episode reward: [(0, '4.281')] [2024-07-05 00:34:18,085][45457] Heartbeat connected on Batcher_0 [2024-07-05 00:34:18,089][45457] Heartbeat connected on LearnerWorker_p0 [2024-07-05 00:34:18,097][45457] Heartbeat connected on InferenceWorker_p0-w0 [2024-07-05 00:34:18,099][45457] Heartbeat connected on RolloutWorker_w0 [2024-07-05 00:34:18,102][45457] Heartbeat connected on RolloutWorker_w1 [2024-07-05 00:34:18,104][45457] Heartbeat connected on RolloutWorker_w2 [2024-07-05 00:34:18,108][45457] Heartbeat connected on RolloutWorker_w3 [2024-07-05 00:34:18,112][45457] Heartbeat connected on RolloutWorker_w4 [2024-07-05 00:34:18,115][45457] Heartbeat connected on RolloutWorker_w5 [2024-07-05 00:34:18,119][45457] Heartbeat connected on RolloutWorker_w6 [2024-07-05 00:34:18,126][45457] Heartbeat connected on RolloutWorker_w7 [2024-07-05 00:34:19,800][45457] Fps is (10 sec: 21707.3, 60 sec: 20479.1, 300 sec: 20479.1). Total num frames: 307200. Throughput: 0: 4328.5. Samples: 64930. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-05 00:34:19,804][45457] Avg episode reward: [(0, '4.474')] [2024-07-05 00:34:24,800][45457] Fps is (10 sec: 21708.6, 60 sec: 20889.5, 300 sec: 20889.5). Total num frames: 417792. Throughput: 0: 4825.5. Samples: 96510. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-05 00:34:24,801][45457] Avg episode reward: [(0, '4.698')] [2024-07-05 00:34:29,801][45457] Fps is (10 sec: 21298.7, 60 sec: 20807.0, 300 sec: 20807.0). Total num frames: 520192. Throughput: 0: 5156.6. Samples: 128920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-05 00:34:29,806][45457] Avg episode reward: [(0, '4.553')] [2024-07-05 00:34:34,800][45457] Fps is (10 sec: 21709.1, 60 sec: 21162.7, 300 sec: 21162.7). Total num frames: 634880. Throughput: 0: 4842.4. Samples: 145272. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-05 00:34:34,800][45457] Avg episode reward: [(0, '4.488')] [2024-07-05 00:34:39,800][45457] Fps is (10 sec: 22530.0, 60 sec: 21299.2, 300 sec: 21299.2). Total num frames: 745472. Throughput: 0: 5105.6. Samples: 178696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-05 00:34:39,801][45457] Avg episode reward: [(0, '4.690')] [2024-07-05 00:34:44,800][45457] Fps is (10 sec: 22527.9, 60 sec: 21504.0, 300 sec: 21504.0). Total num frames: 860160. Throughput: 0: 5316.6. Samples: 212664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-05 00:34:44,800][45457] Avg episode reward: [(0, '4.569')] [2024-07-05 00:34:49,800][45457] Fps is (10 sec: 22528.1, 60 sec: 21572.3, 300 sec: 21572.3). Total num frames: 970752. Throughput: 0: 5103.0. Samples: 229634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-05 00:34:49,800][45457] Avg episode reward: [(0, '4.602')] [2024-07-05 00:34:54,800][45457] Fps is (10 sec: 22118.4, 60 sec: 21626.9, 300 sec: 21626.9). Total num frames: 1081344. Throughput: 0: 5466.3. Samples: 262674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-05 00:34:54,800][45457] Avg episode reward: [(0, '4.647')] [2024-07-05 00:34:59,800][45457] Fps is (10 sec: 21708.9, 60 sec: 21597.1, 300 sec: 21597.1). Total num frames: 1187840. Throughput: 0: 5499.1. Samples: 295380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-05 00:34:59,800][45457] Avg episode reward: [(0, '4.902')] [2024-07-05 00:35:04,801][45457] Fps is (10 sec: 21297.4, 60 sec: 21572.0, 300 sec: 21572.0). Total num frames: 1294336. Throughput: 0: 5484.8. Samples: 311748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-05 00:35:04,802][45457] Avg episode reward: [(0, '5.150')] [2024-07-05 00:35:09,800][45457] Fps is (10 sec: 21708.7, 60 sec: 21913.6, 300 sec: 21614.3). Total num frames: 1404928. Throughput: 0: 5499.5. Samples: 343988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-05 00:35:09,801][45457] Avg episode reward: [(0, '5.423')] [2024-07-05 00:35:14,800][45457] Fps is (10 sec: 22120.3, 60 sec: 21913.6, 300 sec: 21650.3). Total num frames: 1515520. Throughput: 0: 5503.3. Samples: 376562. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-05 00:35:14,800][45457] Avg episode reward: [(0, '6.091')] [2024-07-05 00:35:19,801][45457] Fps is (10 sec: 21706.8, 60 sec: 21913.5, 300 sec: 21626.6). Total num frames: 1622016. Throughput: 0: 5509.0. Samples: 393184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 00:35:19,803][45457] Avg episode reward: [(0, '6.431')] [2024-07-05 00:35:24,800][45457] Fps is (10 sec: 22118.4, 60 sec: 21981.9, 300 sec: 21708.8). Total num frames: 1736704. Throughput: 0: 5507.9. Samples: 426552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 00:35:24,800][45457] Avg episode reward: [(0, '7.559')] [2024-07-05 00:35:29,801][45457] Fps is (10 sec: 22118.6, 60 sec: 22050.2, 300 sec: 21684.5). Total num frames: 1843200. Throughput: 0: 5483.1. Samples: 459408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 00:35:29,805][45457] Avg episode reward: [(0, '8.770')] [2024-07-05 00:35:34,800][45457] Fps is (10 sec: 21708.7, 60 sec: 21981.9, 300 sec: 21708.8). Total num frames: 1953792. Throughput: 0: 5470.5. Samples: 475808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 00:35:34,801][45457] Avg episode reward: [(0, '9.763')] [2024-07-05 00:35:39,801][45457] Fps is (10 sec: 22118.2, 60 sec: 21981.5, 300 sec: 21730.2). Total num frames: 2064384. Throughput: 0: 5474.7. Samples: 509042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-05 00:35:39,807][45457] Avg episode reward: [(0, '11.270')] [2024-07-05 00:35:44,800][45457] Fps is (10 sec: 22118.3, 60 sec: 21913.6, 300 sec: 21749.8). Total num frames: 2174976. Throughput: 0: 5482.3. Samples: 542082. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-05 00:35:44,801][45457] Avg episode reward: [(0, '10.500')] [2024-07-05 00:35:49,800][45457] Fps is (10 sec: 22530.2, 60 sec: 21981.9, 300 sec: 21806.3). Total num frames: 2289664. Throughput: 0: 5498.8. Samples: 559190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-05 00:35:49,801][45457] Avg episode reward: [(0, '11.387')] [2024-07-05 00:35:54,800][45457] Fps is (10 sec: 22937.2, 60 sec: 22050.1, 300 sec: 21857.7). Total num frames: 2404352. Throughput: 0: 5542.7. Samples: 593412. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-05 00:35:54,804][45457] Avg episode reward: [(0, '13.066')] [2024-07-05 00:35:59,800][45457] Fps is (10 sec: 22937.7, 60 sec: 22186.7, 300 sec: 21904.7). Total num frames: 2519040. Throughput: 0: 5587.8. Samples: 628014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-05 00:35:59,800][45457] Avg episode reward: [(0, '15.449')] [2024-07-05 00:36:04,800][45457] Fps is (10 sec: 22938.0, 60 sec: 22323.5, 300 sec: 21947.7). Total num frames: 2633728. Throughput: 0: 5602.1. Samples: 645272. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-05 00:36:04,800][45457] Avg episode reward: [(0, '16.120')] [2024-07-05 00:36:09,800][45457] Fps is (10 sec: 22528.0, 60 sec: 22323.2, 300 sec: 21954.6). Total num frames: 2744320. Throughput: 0: 5603.2. Samples: 678698. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-05 00:36:09,801][45457] Avg episode reward: [(0, '20.173')] [2024-07-05 00:36:14,800][45457] Fps is (10 sec: 22118.4, 60 sec: 22323.2, 300 sec: 21960.9). Total num frames: 2854912. Throughput: 0: 5606.5. Samples: 711698. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-05 00:36:14,801][45457] Avg episode reward: [(0, '18.417')] [2024-07-05 00:36:19,800][45457] Fps is (10 sec: 22118.4, 60 sec: 22391.8, 300 sec: 21966.7). Total num frames: 2965504. Throughput: 0: 5612.1. Samples: 728354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-05 00:36:19,801][45457] Avg episode reward: [(0, '18.290')] [2024-07-05 00:36:24,800][45457] Fps is (10 sec: 22528.0, 60 sec: 22391.5, 300 sec: 22001.4). Total num frames: 3080192. Throughput: 0: 5632.3. Samples: 762492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-05 00:36:24,801][45457] Avg episode reward: [(0, '17.796')] [2024-07-05 00:36:29,800][45457] Fps is (10 sec: 22937.1, 60 sec: 22528.3, 300 sec: 22033.6). Total num frames: 3194880. Throughput: 0: 5666.9. Samples: 797094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-05 00:36:29,801][45457] Avg episode reward: [(0, '18.992')] [2024-07-05 00:36:34,800][45457] Fps is (10 sec: 22527.2, 60 sec: 22527.9, 300 sec: 22036.4). Total num frames: 3305472. Throughput: 0: 5657.2. Samples: 813768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-05 00:36:34,801][45457] Avg episode reward: [(0, '22.368')] [2024-07-05 00:36:39,800][45457] Fps is (10 sec: 22528.1, 60 sec: 22596.6, 300 sec: 22065.5). Total num frames: 3420160. Throughput: 0: 5643.2. Samples: 847358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-05 00:36:39,801][45457] Avg episode reward: [(0, '19.952')] [2024-07-05 00:36:44,800][45457] Fps is (10 sec: 22938.4, 60 sec: 22664.5, 300 sec: 22092.8). Total num frames: 3534848. Throughput: 0: 5633.3. Samples: 881512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-05 00:36:44,800][45457] Avg episode reward: [(0, '17.793')] [2024-07-05 00:36:49,800][45457] Fps is (10 sec: 24576.3, 60 sec: 22937.6, 300 sec: 22217.7). Total num frames: 3665920. Throughput: 0: 5646.3. Samples: 899356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 00:36:49,801][45457] Avg episode reward: [(0, '22.511')] [2024-07-05 00:36:54,800][45457] Fps is (10 sec: 27443.2, 60 sec: 23415.5, 300 sec: 22407.5). Total num frames: 3809280. Throughput: 0: 5869.8. Samples: 942838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 00:36:54,800][45457] Avg episode reward: [(0, '22.164')] [2024-07-05 00:36:59,800][45457] Fps is (10 sec: 28671.8, 60 sec: 23893.3, 300 sec: 22586.5). Total num frames: 3952640. Throughput: 0: 6096.7. Samples: 986052. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:36:59,801][45457] Avg episode reward: [(0, '22.819')] [2024-07-05 00:37:04,800][45457] Fps is (10 sec: 28672.2, 60 sec: 24371.2, 300 sec: 22755.6). Total num frames: 4096000. Throughput: 0: 6201.9. Samples: 1007440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 00:37:04,800][45457] Avg episode reward: [(0, '22.560')] [2024-07-05 00:37:09,800][45457] Fps is (10 sec: 28262.7, 60 sec: 24849.0, 300 sec: 22893.3). Total num frames: 4235264. Throughput: 0: 6373.3. Samples: 1049292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 00:37:09,801][45457] Avg episode reward: [(0, '21.920')] [2024-07-05 00:37:14,800][45457] Fps is (10 sec: 27852.6, 60 sec: 25326.9, 300 sec: 23023.8). Total num frames: 4374528. Throughput: 0: 6541.7. Samples: 1091470. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:37:14,801][45457] Avg episode reward: [(0, '21.734')] [2024-07-05 00:37:19,800][45457] Fps is (10 sec: 28672.0, 60 sec: 25941.3, 300 sec: 23189.7). Total num frames: 4521984. Throughput: 0: 6646.8. Samples: 1112872. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:37:19,800][45457] Avg episode reward: [(0, '20.028')] [2024-07-05 00:37:24,800][45457] Fps is (10 sec: 29081.8, 60 sec: 26419.2, 300 sec: 23326.7). Total num frames: 4665344. Throughput: 0: 6876.0. Samples: 1156778. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:37:24,800][45457] Avg episode reward: [(0, '22.795')] [2024-07-05 00:37:29,800][45457] Fps is (10 sec: 28672.0, 60 sec: 26897.1, 300 sec: 23457.1). Total num frames: 4808704. Throughput: 0: 7079.4. Samples: 1200086. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:37:29,801][45457] Avg episode reward: [(0, '21.090')] [2024-07-05 00:37:34,800][45457] Fps is (10 sec: 29081.5, 60 sec: 27511.6, 300 sec: 23600.8). Total num frames: 4956160. Throughput: 0: 7157.2. Samples: 1221428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:37:34,800][45457] Avg episode reward: [(0, '22.262')] [2024-07-05 00:37:39,800][45457] Fps is (10 sec: 28672.1, 60 sec: 27921.1, 300 sec: 23699.7). Total num frames: 5095424. Throughput: 0: 7146.4. Samples: 1264428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:37:39,801][45457] Avg episode reward: [(0, '18.676')] [2024-07-05 00:37:44,800][45457] Fps is (10 sec: 28262.0, 60 sec: 28398.9, 300 sec: 23812.6). Total num frames: 5238784. Throughput: 0: 7148.1. Samples: 1307718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:37:44,802][45457] Avg episode reward: [(0, '23.004')] [2024-07-05 00:37:49,800][45457] Fps is (10 sec: 28262.5, 60 sec: 28535.5, 300 sec: 23902.4). Total num frames: 5378048. Throughput: 0: 7120.5. Samples: 1327862. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:37:49,801][45457] Avg episode reward: [(0, '24.526')] [2024-07-05 00:37:54,800][45457] Fps is (10 sec: 27443.6, 60 sec: 28398.9, 300 sec: 23970.5). Total num frames: 5513216. Throughput: 0: 7125.9. Samples: 1369958. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:37:54,800][45457] Avg episode reward: [(0, '21.678')] [2024-07-05 00:37:59,800][45457] Fps is (10 sec: 27852.5, 60 sec: 28399.0, 300 sec: 24070.5). Total num frames: 5656576. Throughput: 0: 7118.3. Samples: 1411792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:37:59,801][45457] Avg episode reward: [(0, '23.258')] [2024-07-05 00:38:04,800][45457] Fps is (10 sec: 28262.3, 60 sec: 28330.6, 300 sec: 24149.3). Total num frames: 5795840. Throughput: 0: 7095.2. Samples: 1432156. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:38:04,800][45457] Avg episode reward: [(0, '23.293')] [2024-07-05 00:38:09,800][45457] Fps is (10 sec: 28262.7, 60 sec: 28399.0, 300 sec: 24241.6). Total num frames: 5939200. Throughput: 0: 7071.1. Samples: 1474978. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:38:09,800][45457] Avg episode reward: [(0, '22.360')] [2024-07-05 00:38:14,800][45457] Fps is (10 sec: 28672.1, 60 sec: 28467.2, 300 sec: 24330.2). Total num frames: 6082560. Throughput: 0: 7066.7. Samples: 1518088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:38:14,801][45457] Avg episode reward: [(0, '23.374')] [2024-07-05 00:38:19,800][45457] Fps is (10 sec: 28671.9, 60 sec: 28398.9, 300 sec: 24415.4). Total num frames: 6225920. Throughput: 0: 7075.2. Samples: 1539812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:38:19,800][45457] Avg episode reward: [(0, '22.925')] [2024-07-05 00:38:24,800][45457] Fps is (10 sec: 28671.9, 60 sec: 28398.9, 300 sec: 24497.2). Total num frames: 6369280. Throughput: 0: 7076.4. Samples: 1582868. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:38:24,800][45457] Avg episode reward: [(0, '22.703')] [2024-07-05 00:38:29,800][45457] Fps is (10 sec: 28671.8, 60 sec: 28398.9, 300 sec: 24576.0). Total num frames: 6512640. Throughput: 0: 7078.2. Samples: 1626238. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:38:29,801][45457] Avg episode reward: [(0, '23.612')] [2024-07-05 00:38:34,800][45457] Fps is (10 sec: 29081.7, 60 sec: 28398.9, 300 sec: 24667.0). Total num frames: 6660096. Throughput: 0: 7109.5. Samples: 1647788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:38:34,800][45457] Avg episode reward: [(0, '21.244')] [2024-07-05 00:38:39,800][45457] Fps is (10 sec: 29081.8, 60 sec: 28467.2, 300 sec: 24739.8). Total num frames: 6803456. Throughput: 0: 7140.7. Samples: 1691288. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:38:39,801][45457] Avg episode reward: [(0, '22.165')] [2024-07-05 00:38:44,800][45457] Fps is (10 sec: 28671.8, 60 sec: 28467.2, 300 sec: 24810.1). Total num frames: 6946816. Throughput: 0: 7170.4. Samples: 1734460. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:38:44,801][45457] Avg episode reward: [(0, '23.205')] [2024-07-05 00:38:49,800][45457] Fps is (10 sec: 29081.7, 60 sec: 28603.7, 300 sec: 24892.2). Total num frames: 7094272. Throughput: 0: 7194.6. Samples: 1755914. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:38:49,800][45457] Avg episode reward: [(0, '26.917')] [2024-07-05 00:38:54,800][45457] Fps is (10 sec: 29081.8, 60 sec: 28740.3, 300 sec: 24957.4). Total num frames: 7237632. Throughput: 0: 7222.3. Samples: 1799982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:38:54,800][45457] Avg episode reward: [(0, '25.928')] [2024-07-05 00:38:59,800][45457] Fps is (10 sec: 28672.0, 60 sec: 28740.3, 300 sec: 25020.3). Total num frames: 7380992. Throughput: 0: 7216.7. Samples: 1842840. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:38:59,800][45457] Avg episode reward: [(0, '23.616')] [2024-07-05 00:39:04,800][45457] Fps is (10 sec: 28672.0, 60 sec: 28808.5, 300 sec: 25200.8). Total num frames: 7524352. Throughput: 0: 7218.4. Samples: 1864638. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:39:04,800][45457] Avg episode reward: [(0, '21.793')] [2024-07-05 00:39:09,800][45457] Fps is (10 sec: 28262.5, 60 sec: 28740.3, 300 sec: 25298.0). Total num frames: 7663616. Throughput: 0: 7188.4. Samples: 1906344. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:39:09,800][45457] Avg episode reward: [(0, '22.157')] [2024-07-05 00:39:14,800][45457] Fps is (10 sec: 27853.0, 60 sec: 28672.0, 300 sec: 25409.1). Total num frames: 7802880. Throughput: 0: 7158.8. Samples: 1948382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:39:14,800][45457] Avg episode reward: [(0, '26.325')] [2024-07-05 00:39:19,800][45457] Fps is (10 sec: 28262.4, 60 sec: 28672.0, 300 sec: 25520.2). Total num frames: 7946240. Throughput: 0: 7160.9. Samples: 1970030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 00:39:19,800][45457] Avg episode reward: [(0, '23.914')] [2024-07-05 00:39:24,800][45457] Fps is (10 sec: 29081.5, 60 sec: 28740.3, 300 sec: 25673.0). Total num frames: 8093696. Throughput: 0: 7162.1. Samples: 2013584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:39:24,800][45457] Avg episode reward: [(0, '22.325')] [2024-07-05 00:39:29,800][45457] Fps is (10 sec: 29081.2, 60 sec: 28740.3, 300 sec: 25770.1). Total num frames: 8237056. Throughput: 0: 7152.4. Samples: 2056318. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:39:29,800][45457] Avg episode reward: [(0, '24.291')] [2024-07-05 00:39:34,800][45457] Fps is (10 sec: 28672.1, 60 sec: 28672.0, 300 sec: 25881.2). Total num frames: 8380416. Throughput: 0: 7156.2. Samples: 2077942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 00:39:34,800][45457] Avg episode reward: [(0, '22.779')] [2024-07-05 00:39:39,800][45457] Fps is (10 sec: 28672.2, 60 sec: 28672.0, 300 sec: 25978.4). Total num frames: 8523776. Throughput: 0: 7140.4. Samples: 2121302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:39:39,801][45457] Avg episode reward: [(0, '23.909')] [2024-07-05 00:39:44,800][45457] Fps is (10 sec: 28671.9, 60 sec: 28672.0, 300 sec: 26089.4). Total num frames: 8667136. Throughput: 0: 7146.1. Samples: 2164416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:39:44,801][45457] Avg episode reward: [(0, '22.889')] [2024-07-05 00:39:49,800][45457] Fps is (10 sec: 28672.0, 60 sec: 28603.7, 300 sec: 26200.5). Total num frames: 8810496. Throughput: 0: 7140.8. Samples: 2185976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:39:49,801][45457] Avg episode reward: [(0, '23.607')] [2024-07-05 00:39:54,800][45457] Fps is (10 sec: 28671.9, 60 sec: 28603.7, 300 sec: 26325.5). Total num frames: 8953856. Throughput: 0: 7173.8. Samples: 2229164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:39:54,801][45457] Avg episode reward: [(0, '24.056')] [2024-07-05 00:39:59,800][45457] Fps is (10 sec: 28671.9, 60 sec: 28603.7, 300 sec: 26450.5). Total num frames: 9097216. Throughput: 0: 7198.3. Samples: 2272306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 00:39:59,800][45457] Avg episode reward: [(0, '21.414')] [2024-07-05 00:40:04,800][45457] Fps is (10 sec: 29081.9, 60 sec: 28672.0, 300 sec: 26575.4). Total num frames: 9244672. Throughput: 0: 7204.9. Samples: 2294252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:40:04,801][45457] Avg episode reward: [(0, '24.699')] [2024-07-05 00:40:09,800][45457] Fps is (10 sec: 29491.4, 60 sec: 28808.5, 300 sec: 26700.4). Total num frames: 9392128. Throughput: 0: 7211.4. Samples: 2338096. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:40:09,800][45457] Avg episode reward: [(0, '24.326')] [2024-07-05 00:40:14,800][45457] Fps is (10 sec: 29491.0, 60 sec: 28945.0, 300 sec: 26839.3). Total num frames: 9539584. Throughput: 0: 7236.0. Samples: 2381938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 00:40:14,800][45457] Avg episode reward: [(0, '24.015')] [2024-07-05 00:40:19,800][45457] Fps is (10 sec: 29081.5, 60 sec: 28945.0, 300 sec: 26936.4). Total num frames: 9682944. Throughput: 0: 7244.6. Samples: 2403948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 00:40:19,801][45457] Avg episode reward: [(0, '24.010')] [2024-07-05 00:40:24,800][45457] Fps is (10 sec: 29081.8, 60 sec: 28945.1, 300 sec: 27075.3). Total num frames: 9830400. Throughput: 0: 7257.2. Samples: 2447876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:40:24,800][45457] Avg episode reward: [(0, '24.237')] [2024-07-05 00:40:29,800][45457] Fps is (10 sec: 29491.3, 60 sec: 29013.4, 300 sec: 27200.2). Total num frames: 9977856. Throughput: 0: 7271.3. Samples: 2491626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 00:40:29,800][45457] Avg episode reward: [(0, '25.429')] [2024-07-05 00:40:34,800][45457] Fps is (10 sec: 29081.4, 60 sec: 29013.3, 300 sec: 27311.4). Total num frames: 10121216. Throughput: 0: 7279.1. Samples: 2513536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:40:34,800][45457] Avg episode reward: [(0, '23.838')] [2024-07-05 00:40:39,800][45457] Fps is (10 sec: 29081.5, 60 sec: 29081.6, 300 sec: 27436.3). Total num frames: 10268672. Throughput: 0: 7290.7. Samples: 2557246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:40:39,800][45457] Avg episode reward: [(0, '20.812')] [2024-07-05 00:40:44,800][45457] Fps is (10 sec: 29081.5, 60 sec: 29081.6, 300 sec: 27533.4). Total num frames: 10412032. Throughput: 0: 7300.4. Samples: 2600824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:40:44,801][45457] Avg episode reward: [(0, '24.904')] [2024-07-05 00:40:49,800][45457] Fps is (10 sec: 29081.7, 60 sec: 29149.9, 300 sec: 27644.6). Total num frames: 10559488. Throughput: 0: 7298.5. Samples: 2622686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 00:40:49,801][45457] Avg episode reward: [(0, '23.735')] [2024-07-05 00:40:54,800][45457] Fps is (10 sec: 28262.5, 60 sec: 29013.3, 300 sec: 27713.9). Total num frames: 10694656. Throughput: 0: 7275.5. Samples: 2665494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 00:40:54,801][45457] Avg episode reward: [(0, '24.966')] [2024-07-05 00:40:59,800][45457] Fps is (10 sec: 27033.4, 60 sec: 28876.8, 300 sec: 27783.4). Total num frames: 10829824. Throughput: 0: 7180.8. Samples: 2705076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 00:40:59,801][45457] Avg episode reward: [(0, '26.890')] [2024-07-05 00:41:04,800][45457] Fps is (10 sec: 27443.2, 60 sec: 28740.2, 300 sec: 27880.6). Total num frames: 10969088. Throughput: 0: 7171.3. Samples: 2726656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:41:04,801][45457] Avg episode reward: [(0, '23.776')] [2024-07-05 00:41:09,800][45457] Fps is (10 sec: 27853.0, 60 sec: 28603.7, 300 sec: 27977.8). Total num frames: 11108352. Throughput: 0: 7106.0. Samples: 2767646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:41:09,801][45457] Avg episode reward: [(0, '24.034')] [2024-07-05 00:41:14,800][45457] Fps is (10 sec: 27852.8, 60 sec: 28467.2, 300 sec: 28075.0). Total num frames: 11247616. Throughput: 0: 7064.3. Samples: 2809518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:41:14,801][45457] Avg episode reward: [(0, '23.614')] [2024-07-05 00:41:19,800][45457] Fps is (10 sec: 27852.5, 60 sec: 28398.9, 300 sec: 28158.3). Total num frames: 11386880. Throughput: 0: 7036.1. Samples: 2830162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 00:41:19,801][45457] Avg episode reward: [(0, '23.876')] [2024-07-05 00:41:24,800][45457] Fps is (10 sec: 27852.3, 60 sec: 28262.3, 300 sec: 28241.6). Total num frames: 11526144. Throughput: 0: 6999.5. Samples: 2872226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 00:41:24,801][45457] Avg episode reward: [(0, '24.947')] [2024-07-05 00:41:29,800][45457] Fps is (10 sec: 28262.0, 60 sec: 28194.0, 300 sec: 28352.7). Total num frames: 11669504. Throughput: 0: 6972.5. Samples: 2914588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 00:41:29,801][45457] Avg episode reward: [(0, '25.547')] [2024-07-05 00:41:34,800][45457] Fps is (10 sec: 28672.7, 60 sec: 28194.2, 300 sec: 28449.9). Total num frames: 11812864. Throughput: 0: 6970.4. Samples: 2936356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 00:41:34,800][45457] Avg episode reward: [(0, '25.152')] [2024-07-05 00:41:39,800][45457] Fps is (10 sec: 29491.9, 60 sec: 28262.4, 300 sec: 28574.8). Total num frames: 11964416. Throughput: 0: 6999.6. Samples: 2980476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 00:41:39,800][45457] Avg episode reward: [(0, '28.407')] [2024-07-05 00:41:44,800][45457] Fps is (10 sec: 29900.7, 60 sec: 28330.7, 300 sec: 28630.4). Total num frames: 12111872. Throughput: 0: 7104.9. Samples: 3024798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 00:41:44,800][45457] Avg episode reward: [(0, '25.111')] [2024-07-05 00:41:49,800][45457] Fps is (10 sec: 29491.1, 60 sec: 28330.6, 300 sec: 28644.2). Total num frames: 12259328. Throughput: 0: 7119.4. Samples: 3047028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:41:49,800][45457] Avg episode reward: [(0, '26.962')] [2024-07-05 00:41:54,800][45457] Fps is (10 sec: 29081.5, 60 sec: 28467.2, 300 sec: 28644.2). Total num frames: 12402688. Throughput: 0: 7189.5. Samples: 3091172. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:41:54,801][45457] Avg episode reward: [(0, '27.277')] [2024-07-05 00:41:59,800][45457] Fps is (10 sec: 28672.0, 60 sec: 28603.7, 300 sec: 28644.2). Total num frames: 12546048. Throughput: 0: 7206.2. Samples: 3133798. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:41:59,801][45457] Avg episode reward: [(0, '26.010')] [2024-07-05 00:42:04,800][45457] Fps is (10 sec: 28262.2, 60 sec: 28603.7, 300 sec: 28644.2). Total num frames: 12685312. Throughput: 0: 7212.3. Samples: 3154716. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:42:04,801][45457] Avg episode reward: [(0, '27.871')] [2024-07-05 00:42:09,800][45457] Fps is (10 sec: 28672.0, 60 sec: 28740.2, 300 sec: 28672.0). Total num frames: 12832768. Throughput: 0: 7253.9. Samples: 3198648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:42:09,800][45457] Avg episode reward: [(0, '27.815')] [2024-07-05 00:42:14,800][45457] Fps is (10 sec: 29491.4, 60 sec: 28876.8, 300 sec: 28672.0). Total num frames: 12980224. Throughput: 0: 7291.7. Samples: 3242712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:42:14,800][45457] Avg episode reward: [(0, '29.022')] [2024-07-05 00:42:19,800][45457] Fps is (10 sec: 29491.3, 60 sec: 29013.4, 300 sec: 28685.9). Total num frames: 13127680. Throughput: 0: 7294.4. Samples: 3264602. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:42:19,800][45457] Avg episode reward: [(0, '28.886')] [2024-07-05 00:42:24,800][45457] Fps is (10 sec: 29081.8, 60 sec: 29081.7, 300 sec: 28685.9). Total num frames: 13271040. Throughput: 0: 7285.0. Samples: 3308302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:42:24,801][45457] Avg episode reward: [(0, '26.917')] [2024-07-05 00:42:29,800][45457] Fps is (10 sec: 29081.5, 60 sec: 29150.0, 300 sec: 28685.9). Total num frames: 13418496. Throughput: 0: 7274.3. Samples: 3352140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:42:29,801][45457] Avg episode reward: [(0, '25.300')] [2024-07-05 00:42:34,800][45457] Fps is (10 sec: 29491.1, 60 sec: 29218.1, 300 sec: 28713.7). Total num frames: 13565952. Throughput: 0: 7265.3. Samples: 3373968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:42:34,801][45457] Avg episode reward: [(0, '28.757')] [2024-07-05 00:42:39,800][45457] Fps is (10 sec: 29081.7, 60 sec: 29081.6, 300 sec: 28713.7). Total num frames: 13709312. Throughput: 0: 7259.7. Samples: 3417856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:42:39,800][45457] Avg episode reward: [(0, '28.691')] [2024-07-05 00:42:44,800][45457] Fps is (10 sec: 29081.5, 60 sec: 29081.6, 300 sec: 28741.4). Total num frames: 13856768. Throughput: 0: 7280.4. Samples: 3461418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:42:44,801][45457] Avg episode reward: [(0, '28.524')] [2024-07-05 00:42:49,800][45457] Fps is (10 sec: 29491.2, 60 sec: 29081.6, 300 sec: 28783.1). Total num frames: 14004224. Throughput: 0: 7303.7. Samples: 3483380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 00:42:49,800][45457] Avg episode reward: [(0, '26.348')] [2024-07-05 00:42:54,800][45457] Fps is (10 sec: 29081.7, 60 sec: 29081.6, 300 sec: 28783.1). Total num frames: 14147584. Throughput: 0: 7304.2. Samples: 3527336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 00:42:54,800][45457] Avg episode reward: [(0, '27.243')] [2024-07-05 00:42:59,800][45457] Fps is (10 sec: 29081.4, 60 sec: 29149.9, 300 sec: 28810.8). Total num frames: 14295040. Throughput: 0: 7306.0. Samples: 3571482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:42:59,801][45457] Avg episode reward: [(0, '26.286')] [2024-07-05 00:43:04,800][45457] Fps is (10 sec: 29490.8, 60 sec: 29286.4, 300 sec: 28824.7). Total num frames: 14442496. Throughput: 0: 7308.5. Samples: 3593486. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:43:04,800][45457] Avg episode reward: [(0, '29.797')] [2024-07-05 00:43:09,800][45457] Fps is (10 sec: 29491.1, 60 sec: 29286.4, 300 sec: 28838.6). Total num frames: 14589952. Throughput: 0: 7312.9. Samples: 3637382. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:43:09,800][45457] Avg episode reward: [(0, '30.006')] [2024-07-05 00:43:14,800][45457] Fps is (10 sec: 29491.6, 60 sec: 29286.4, 300 sec: 28852.5). Total num frames: 14737408. Throughput: 0: 7318.1. Samples: 3681454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 00:43:14,800][45457] Avg episode reward: [(0, '28.449')] [2024-07-05 00:43:19,800][45457] Fps is (10 sec: 29491.4, 60 sec: 29286.4, 300 sec: 28866.4). Total num frames: 14884864. Throughput: 0: 7322.9. Samples: 3703498. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:43:19,800][45457] Avg episode reward: [(0, '26.979')] [2024-07-05 00:43:24,800][45457] Fps is (10 sec: 29491.4, 60 sec: 29354.7, 300 sec: 28880.3). Total num frames: 15032320. Throughput: 0: 7325.3. Samples: 3747496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 00:43:24,800][45457] Avg episode reward: [(0, '26.532')] [2024-07-05 00:43:29,800][45457] Fps is (10 sec: 29081.7, 60 sec: 29286.4, 300 sec: 28866.4). Total num frames: 15175680. Throughput: 0: 7337.3. Samples: 3791596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 00:43:29,800][45457] Avg episode reward: [(0, '27.630')] [2024-07-05 00:43:34,800][45457] Fps is (10 sec: 29081.5, 60 sec: 29286.4, 300 sec: 28880.3). Total num frames: 15323136. Throughput: 0: 7344.1. Samples: 3813866. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:43:34,800][45457] Avg episode reward: [(0, '30.526')] [2024-07-05 00:43:39,800][45457] Fps is (10 sec: 27852.6, 60 sec: 29081.6, 300 sec: 28838.6). Total num frames: 15454208. Throughput: 0: 7273.0. Samples: 3854622. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 00:43:39,800][45457] Avg episode reward: [(0, '28.501')] [2024-07-05 00:43:44,800][45457] Fps is (10 sec: 27033.5, 60 sec: 28945.1, 300 sec: 28810.8). Total num frames: 15593472. Throughput: 0: 7204.0. Samples: 3895660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 00:43:44,800][45457] Avg episode reward: [(0, '26.379')] [2024-07-05 00:43:49,800][45457] Fps is (10 sec: 27852.9, 60 sec: 28808.5, 300 sec: 28797.0). Total num frames: 15732736. Throughput: 0: 7178.3. Samples: 3916510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 00:43:49,801][45457] Avg episode reward: [(0, '26.720')] [2024-07-05 00:43:54,800][45457] Fps is (10 sec: 28262.3, 60 sec: 28808.5, 300 sec: 28797.0). Total num frames: 15876096. Throughput: 0: 7154.8. Samples: 3959346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 00:43:54,800][45457] Avg episode reward: [(0, '27.638')] [2024-07-05 00:43:59,800][45457] Fps is (10 sec: 28671.9, 60 sec: 28740.3, 300 sec: 28797.0). Total num frames: 16019456. Throughput: 0: 7134.0. Samples: 4002484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 00:43:59,801][45457] Avg episode reward: [(0, '31.449')] [2024-07-05 00:44:04,800][45457] Fps is (10 sec: 29081.8, 60 sec: 28740.4, 300 sec: 28824.7). Total num frames: 16166912. Throughput: 0: 7139.6. Samples: 4024780. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:44:04,800][45457] Avg episode reward: [(0, '34.074')] [2024-07-05 00:44:09,800][45457] Fps is (10 sec: 29081.8, 60 sec: 28672.1, 300 sec: 28838.6). Total num frames: 16310272. Throughput: 0: 7123.0. Samples: 4068032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:44:09,800][45457] Avg episode reward: [(0, '28.801')] [2024-07-05 00:44:14,800][45457] Fps is (10 sec: 29081.5, 60 sec: 28672.0, 300 sec: 28852.5). Total num frames: 16457728. Throughput: 0: 7108.7. Samples: 4111488. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:44:14,800][45457] Avg episode reward: [(0, '29.606')] [2024-07-05 00:44:19,800][45457] Fps is (10 sec: 29081.4, 60 sec: 28603.7, 300 sec: 28838.6). Total num frames: 16601088. Throughput: 0: 7099.3. Samples: 4133334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:44:19,800][45457] Avg episode reward: [(0, '33.475')] [2024-07-05 00:44:24,800][45457] Fps is (10 sec: 28671.9, 60 sec: 28535.4, 300 sec: 28838.6). Total num frames: 16744448. Throughput: 0: 7163.8. Samples: 4176994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:44:24,801][45457] Avg episode reward: [(0, '29.985')] [2024-07-05 00:44:29,800][45457] Fps is (10 sec: 29081.7, 60 sec: 28603.7, 300 sec: 28852.5). Total num frames: 16891904. Throughput: 0: 7222.4. Samples: 4220670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:44:29,800][45457] Avg episode reward: [(0, '30.740')] [2024-07-05 00:44:34,800][45457] Fps is (10 sec: 29081.6, 60 sec: 28535.4, 300 sec: 28852.5). Total num frames: 17035264. Throughput: 0: 7241.3. Samples: 4242370. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:44:34,801][45457] Avg episode reward: [(0, '27.860')] [2024-07-05 00:44:39,800][45457] Fps is (10 sec: 28671.7, 60 sec: 28740.2, 300 sec: 28852.5). Total num frames: 17178624. Throughput: 0: 7243.6. Samples: 4285310. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:44:39,801][45457] Avg episode reward: [(0, '31.510')] [2024-07-05 00:44:44,800][45457] Fps is (10 sec: 29081.6, 60 sec: 28876.8, 300 sec: 28866.4). Total num frames: 17326080. Throughput: 0: 7252.7. Samples: 4328858. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:44:44,800][45457] Avg episode reward: [(0, '33.207')] [2024-07-05 00:44:49,800][45457] Fps is (10 sec: 29081.9, 60 sec: 28945.1, 300 sec: 28866.4). Total num frames: 17469440. Throughput: 0: 7240.0. Samples: 4350580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:44:49,800][45457] Avg episode reward: [(0, '28.141')] [2024-07-05 00:44:54,800][45457] Fps is (10 sec: 29081.6, 60 sec: 29013.3, 300 sec: 28880.3). Total num frames: 17616896. Throughput: 0: 7246.5. Samples: 4394124. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:44:54,800][45457] Avg episode reward: [(0, '28.240')] [2024-07-05 00:44:59,800][45457] Fps is (10 sec: 29081.4, 60 sec: 29013.3, 300 sec: 28866.4). Total num frames: 17760256. Throughput: 0: 7253.3. Samples: 4437888. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:44:59,800][45457] Avg episode reward: [(0, '26.141')] [2024-07-05 00:45:04,800][45457] Fps is (10 sec: 29081.8, 60 sec: 29013.3, 300 sec: 28866.4). Total num frames: 17907712. Throughput: 0: 7259.3. Samples: 4460004. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:45:04,800][45457] Avg episode reward: [(0, '27.048')] [2024-07-05 00:45:09,800][45457] Fps is (10 sec: 29491.2, 60 sec: 29081.6, 300 sec: 28866.4). Total num frames: 18055168. Throughput: 0: 7268.8. Samples: 4504092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:45:09,800][45457] Avg episode reward: [(0, '32.117')] [2024-07-05 00:45:14,800][45457] Fps is (10 sec: 29491.1, 60 sec: 29081.6, 300 sec: 28880.3). Total num frames: 18202624. Throughput: 0: 7269.5. Samples: 4547796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:45:14,800][45457] Avg episode reward: [(0, '31.444')] [2024-07-05 00:45:19,800][45457] Fps is (10 sec: 29081.7, 60 sec: 29081.6, 300 sec: 28866.4). Total num frames: 18345984. Throughput: 0: 7271.6. Samples: 4569592. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:45:19,800][45457] Avg episode reward: [(0, '31.100')] [2024-07-05 00:45:24,800][45457] Fps is (10 sec: 28672.0, 60 sec: 29081.6, 300 sec: 28852.5). Total num frames: 18489344. Throughput: 0: 7277.7. Samples: 4612804. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:45:24,800][45457] Avg episode reward: [(0, '29.468')] [2024-07-05 00:45:29,800][45457] Fps is (10 sec: 28671.9, 60 sec: 29013.3, 300 sec: 28852.5). Total num frames: 18632704. Throughput: 0: 7267.1. Samples: 4655876. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:45:29,800][45457] Avg episode reward: [(0, '31.971')] [2024-07-05 00:45:34,800][45457] Fps is (10 sec: 28671.8, 60 sec: 29013.3, 300 sec: 28838.6). Total num frames: 18776064. Throughput: 0: 7264.3. Samples: 4677472. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:45:34,800][45457] Avg episode reward: [(0, '31.432')] [2024-07-05 00:45:39,800][45457] Fps is (10 sec: 29081.8, 60 sec: 29081.7, 300 sec: 28852.5). Total num frames: 18923520. Throughput: 0: 7264.2. Samples: 4721014. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:45:39,800][45457] Avg episode reward: [(0, '29.573')] [2024-07-05 00:45:44,800][45457] Fps is (10 sec: 29081.5, 60 sec: 29013.3, 300 sec: 28838.6). Total num frames: 19066880. Throughput: 0: 7254.3. Samples: 4764330. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:45:44,801][45457] Avg episode reward: [(0, '31.777')] [2024-07-05 00:45:49,800][45457] Fps is (10 sec: 29081.4, 60 sec: 29081.6, 300 sec: 28880.3). Total num frames: 19214336. Throughput: 0: 7248.6. Samples: 4786190. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:45:49,801][45457] Avg episode reward: [(0, '32.344')] [2024-07-05 00:45:54,800][45457] Fps is (10 sec: 29081.7, 60 sec: 29013.3, 300 sec: 28908.0). Total num frames: 19357696. Throughput: 0: 7240.8. Samples: 4829928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:45:54,800][45457] Avg episode reward: [(0, '32.536')] [2024-07-05 00:45:59,800][45457] Fps is (10 sec: 29081.3, 60 sec: 29081.6, 300 sec: 28935.8). Total num frames: 19505152. Throughput: 0: 7232.0. Samples: 4873236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:45:59,801][45457] Avg episode reward: [(0, '31.148')] [2024-07-05 00:46:04,800][45457] Fps is (10 sec: 29081.8, 60 sec: 29013.3, 300 sec: 28949.7). Total num frames: 19648512. Throughput: 0: 7231.4. Samples: 4895004. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:46:04,801][45457] Avg episode reward: [(0, '31.797')] [2024-07-05 00:46:09,800][45457] Fps is (10 sec: 28672.3, 60 sec: 28945.1, 300 sec: 28963.6). Total num frames: 19791872. Throughput: 0: 7236.4. Samples: 4938444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 00:46:09,800][45457] Avg episode reward: [(0, '32.176')] [2024-07-05 00:46:14,800][45457] Fps is (10 sec: 28671.9, 60 sec: 28876.8, 300 sec: 28977.5). Total num frames: 19935232. Throughput: 0: 7242.5. Samples: 4981788. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 00:46:14,800][45457] Avg episode reward: [(0, '34.624')] [2024-07-05 00:46:17,083][45457] Component Batcher_0 stopped! [2024-07-05 00:46:17,095][45457] Component RolloutWorker_w7 stopped! [2024-07-05 00:46:17,096][45457] Component RolloutWorker_w2 stopped! [2024-07-05 00:46:17,097][45457] Component RolloutWorker_w6 stopped! [2024-07-05 00:46:17,097][45457] Component RolloutWorker_w5 stopped! [2024-07-05 00:46:17,098][45457] Component RolloutWorker_w1 stopped! [2024-07-05 00:46:17,099][45457] Component RolloutWorker_w4 stopped! [2024-07-05 00:46:17,099][45457] Component RolloutWorker_w3 stopped! [2024-07-05 00:46:17,100][45457] Component RolloutWorker_w0 stopped! [2024-07-05 00:46:17,116][45457] Component InferenceWorker_p0-w0 stopped! [2024-07-05 00:46:17,250][45457] Component LearnerWorker_p0 stopped! [2024-07-05 00:46:17,251][45457] Waiting for process learner_proc0 to stop... [2024-07-05 00:46:18,226][45457] Waiting for process inference_proc0-0 to join... [2024-07-05 00:46:18,227][45457] Waiting for process rollout_proc0 to join... [2024-07-05 00:46:18,227][45457] Waiting for process rollout_proc1 to join... [2024-07-05 00:46:18,228][45457] Waiting for process rollout_proc2 to join... [2024-07-05 00:46:18,228][45457] Waiting for process rollout_proc3 to join... [2024-07-05 00:46:18,228][45457] Waiting for process rollout_proc4 to join... [2024-07-05 00:46:18,229][45457] Waiting for process rollout_proc5 to join... [2024-07-05 00:46:18,229][45457] Waiting for process rollout_proc6 to join... [2024-07-05 00:46:18,230][45457] Waiting for process rollout_proc7 to join... [2024-07-05 00:46:18,230][45457] Batcher 0 profile tree view: batching: 26.9927, releasing_batches: 0.1167 [2024-07-05 00:46:18,231][45457] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 8.8195 update_model: 10.7823 weight_update: 0.0007 one_step: 0.0022 handle_policy_step: 681.7734 deserialize: 26.6959, stack: 3.6982, obs_to_device_normalize: 149.7028, forward: 372.6342, send_messages: 32.3696 prepare_outputs: 72.1520 to_cpu: 43.3974 [2024-07-05 00:46:18,231][45457] Learner 0 profile tree view: misc: 0.0172, prepare_batch: 44.0229 train: 124.9755 epoch_init: 0.0157, minibatch_init: 0.0230, losses_postprocess: 0.6276, kl_divergence: 0.6599, after_optimizer: 50.0722 calculate_losses: 55.1219 losses_init: 0.0092, forward_head: 2.0814, bptt_initial: 42.7142, tail: 2.0637, advantages_returns: 0.4992, losses: 2.3709 bptt: 4.7810 bptt_forward_core: 4.5881 update: 17.1506 clip: 2.1479 [2024-07-05 00:46:18,231][45457] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3953, enqueue_policy_requests: 22.8109, env_step: 280.5938, overhead: 25.4768, complete_rollouts: 0.6825 save_policy_outputs: 21.3364 split_output_tensors: 10.1050 [2024-07-05 00:46:18,231][45457] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3921, enqueue_policy_requests: 23.0384, env_step: 308.7818, overhead: 25.1818, complete_rollouts: 0.6843 save_policy_outputs: 21.6358 split_output_tensors: 10.4110 [2024-07-05 00:46:18,232][45457] Loop Runner_EvtLoop terminating... [2024-07-05 00:46:18,232][45457] Runner profile tree view: main_loop: 740.1108 [2024-07-05 00:46:18,232][45457] Collected {0: 20004864}, FPS: 27029.6 [2024-07-05 00:47:20,823][45457] Loading existing experiment configuration from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/config.json [2024-07-05 00:47:20,823][45457] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-05 00:47:20,824][45457] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-05 00:47:20,824][45457] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-05 00:47:20,824][45457] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-05 00:47:20,824][45457] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-05 00:47:20,824][45457] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-07-05 00:47:20,825][45457] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-05 00:47:20,825][45457] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-07-05 00:47:20,825][45457] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-07-05 00:47:20,825][45457] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-05 00:47:20,826][45457] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-05 00:47:20,826][45457] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-05 00:47:20,826][45457] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-05 00:47:20,827][45457] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-05 00:47:20,842][45457] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 00:47:20,843][45457] RunningMeanStd input shape: (1,) [2024-07-05 00:47:20,849][45457] ConvEncoder: input_channels=3 [2024-07-05 00:47:20,880][45457] Conv encoder output size: 512 [2024-07-05 00:47:20,881][45457] Policy head output size: 512 [2024-07-05 00:47:20,897][45457] Loading state from checkpoint /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... [2024-07-05 00:47:21,524][45457] Num frames 100... [2024-07-05 00:47:21,583][45457] Num frames 200... [2024-07-05 00:47:21,641][45457] Num frames 300... [2024-07-05 00:47:21,702][45457] Num frames 400... [2024-07-05 00:47:21,761][45457] Num frames 500... [2024-07-05 00:47:21,819][45457] Num frames 600... [2024-07-05 00:47:21,878][45457] Num frames 700... [2024-07-05 00:47:21,937][45457] Num frames 800... [2024-07-05 00:47:21,995][45457] Num frames 900... [2024-07-05 00:47:22,061][45457] Num frames 1000... [2024-07-05 00:47:22,124][45457] Num frames 1100... [2024-07-05 00:47:22,189][45457] Num frames 1200... [2024-07-05 00:47:22,256][45457] Num frames 1300... [2024-07-05 00:47:22,320][45457] Num frames 1400... [2024-07-05 00:47:22,384][45457] Num frames 1500... [2024-07-05 00:47:22,450][45457] Num frames 1600... [2024-07-05 00:47:22,545][45457] Avg episode rewards: #0: 40.640, true rewards: #0: 16.640 [2024-07-05 00:47:22,546][45457] Avg episode reward: 40.640, avg true_objective: 16.640 [2024-07-05 00:47:22,574][45457] Num frames 1700... [2024-07-05 00:47:22,636][45457] Num frames 1800... [2024-07-05 00:47:22,699][45457] Num frames 1900... [2024-07-05 00:47:22,761][45457] Num frames 2000... [2024-07-05 00:47:22,823][45457] Num frames 2100... [2024-07-05 00:47:22,885][45457] Num frames 2200... [2024-07-05 00:47:22,944][45457] Num frames 2300... [2024-07-05 00:47:23,039][45457] Avg episode rewards: #0: 27.840, true rewards: #0: 11.840 [2024-07-05 00:47:23,041][45457] Avg episode reward: 27.840, avg true_objective: 11.840 [2024-07-05 00:47:23,066][45457] Num frames 2400... [2024-07-05 00:47:23,128][45457] Num frames 2500... [2024-07-05 00:47:23,190][45457] Num frames 2600... [2024-07-05 00:47:23,248][45457] Num frames 2700... [2024-07-05 00:47:23,308][45457] Num frames 2800... [2024-07-05 00:47:23,366][45457] Num frames 2900... [2024-07-05 00:47:23,423][45457] Num frames 3000... [2024-07-05 00:47:23,482][45457] Num frames 3100... [2024-07-05 00:47:23,542][45457] Num frames 3200... [2024-07-05 00:47:23,605][45457] Num frames 3300... [2024-07-05 00:47:23,668][45457] Num frames 3400... [2024-07-05 00:47:23,732][45457] Num frames 3500... [2024-07-05 00:47:23,802][45457] Num frames 3600... [2024-07-05 00:47:23,869][45457] Num frames 3700... [2024-07-05 00:47:23,936][45457] Num frames 3800... [2024-07-05 00:47:24,009][45457] Num frames 3900... [2024-07-05 00:47:24,087][45457] Num frames 4000... [2024-07-05 00:47:24,161][45457] Num frames 4100... [2024-07-05 00:47:24,233][45457] Avg episode rewards: #0: 33.093, true rewards: #0: 13.760 [2024-07-05 00:47:24,234][45457] Avg episode reward: 33.093, avg true_objective: 13.760 [2024-07-05 00:47:24,295][45457] Num frames 4200... [2024-07-05 00:47:24,355][45457] Num frames 4300... [2024-07-05 00:47:24,415][45457] Num frames 4400... [2024-07-05 00:47:24,478][45457] Num frames 4500... [2024-07-05 00:47:24,539][45457] Num frames 4600... [2024-07-05 00:47:24,599][45457] Num frames 4700... [2024-07-05 00:47:24,659][45457] Num frames 4800... [2024-07-05 00:47:24,719][45457] Num frames 4900... [2024-07-05 00:47:24,779][45457] Num frames 5000... [2024-07-05 00:47:24,838][45457] Num frames 5100... [2024-07-05 00:47:24,898][45457] Num frames 5200... [2024-07-05 00:47:24,957][45457] Num frames 5300... [2024-07-05 00:47:25,018][45457] Num frames 5400... [2024-07-05 00:47:25,077][45457] Num frames 5500... [2024-07-05 00:47:25,137][45457] Num frames 5600... [2024-07-05 00:47:25,197][45457] Num frames 5700... [2024-07-05 00:47:25,256][45457] Num frames 5800... [2024-07-05 00:47:25,360][45457] Avg episode rewards: #0: 37.470, true rewards: #0: 14.720 [2024-07-05 00:47:25,361][45457] Avg episode reward: 37.470, avg true_objective: 14.720 [2024-07-05 00:47:25,373][45457] Num frames 5900... [2024-07-05 00:47:25,433][45457] Num frames 6000... [2024-07-05 00:47:25,492][45457] Num frames 6100... [2024-07-05 00:47:25,552][45457] Num frames 6200... [2024-07-05 00:47:25,613][45457] Num frames 6300... [2024-07-05 00:47:25,672][45457] Num frames 6400... [2024-07-05 00:47:25,731][45457] Num frames 6500... [2024-07-05 00:47:25,791][45457] Num frames 6600... [2024-07-05 00:47:25,851][45457] Num frames 6700... [2024-07-05 00:47:25,909][45457] Num frames 6800... [2024-07-05 00:47:25,968][45457] Num frames 6900... [2024-07-05 00:47:26,026][45457] Num frames 7000... [2024-07-05 00:47:26,085][45457] Num frames 7100... [2024-07-05 00:47:26,141][45457] Avg episode rewards: #0: 34.808, true rewards: #0: 14.208 [2024-07-05 00:47:26,143][45457] Avg episode reward: 34.808, avg true_objective: 14.208 [2024-07-05 00:47:26,206][45457] Num frames 7200... [2024-07-05 00:47:26,264][45457] Num frames 7300... [2024-07-05 00:47:26,323][45457] Num frames 7400... [2024-07-05 00:47:26,383][45457] Num frames 7500... [2024-07-05 00:47:26,444][45457] Num frames 7600... [2024-07-05 00:47:26,505][45457] Num frames 7700... [2024-07-05 00:47:26,569][45457] Num frames 7800... [2024-07-05 00:47:26,627][45457] Num frames 7900... [2024-07-05 00:47:26,687][45457] Num frames 8000... [2024-07-05 00:47:26,749][45457] Num frames 8100... [2024-07-05 00:47:26,814][45457] Num frames 8200... [2024-07-05 00:47:26,875][45457] Num frames 8300... [2024-07-05 00:47:26,935][45457] Num frames 8400... [2024-07-05 00:47:26,992][45457] Num frames 8500... [2024-07-05 00:47:27,054][45457] Num frames 8600... [2024-07-05 00:47:27,112][45457] Avg episode rewards: #0: 35.513, true rewards: #0: 14.347 [2024-07-05 00:47:27,114][45457] Avg episode reward: 35.513, avg true_objective: 14.347 [2024-07-05 00:47:27,174][45457] Num frames 8700... [2024-07-05 00:47:27,232][45457] Num frames 8800... [2024-07-05 00:47:27,290][45457] Num frames 8900... [2024-07-05 00:47:27,362][45457] Num frames 9000... [2024-07-05 00:47:27,423][45457] Num frames 9100... [2024-07-05 00:47:27,484][45457] Num frames 9200... [2024-07-05 00:47:27,544][45457] Num frames 9300... [2024-07-05 00:47:27,604][45457] Num frames 9400... [2024-07-05 00:47:27,665][45457] Num frames 9500... [2024-07-05 00:47:27,727][45457] Num frames 9600... [2024-07-05 00:47:27,790][45457] Avg episode rewards: #0: 33.734, true rewards: #0: 13.734 [2024-07-05 00:47:27,792][45457] Avg episode reward: 33.734, avg true_objective: 13.734 [2024-07-05 00:47:27,851][45457] Num frames 9700... [2024-07-05 00:47:27,909][45457] Num frames 9800... [2024-07-05 00:47:27,967][45457] Num frames 9900... [2024-07-05 00:47:28,025][45457] Num frames 10000... [2024-07-05 00:47:28,086][45457] Num frames 10100... [2024-07-05 00:47:28,146][45457] Num frames 10200... [2024-07-05 00:47:28,208][45457] Num frames 10300... [2024-07-05 00:47:28,268][45457] Num frames 10400... [2024-07-05 00:47:28,327][45457] Num frames 10500... [2024-07-05 00:47:28,388][45457] Num frames 10600... [2024-07-05 00:47:28,446][45457] Avg episode rewards: #0: 32.132, true rewards: #0: 13.257 [2024-07-05 00:47:28,447][45457] Avg episode reward: 32.132, avg true_objective: 13.257 [2024-07-05 00:47:28,507][45457] Num frames 10700... [2024-07-05 00:47:28,567][45457] Num frames 10800... [2024-07-05 00:47:28,627][45457] Num frames 10900... [2024-07-05 00:47:28,688][45457] Num frames 11000... [2024-07-05 00:47:28,750][45457] Num frames 11100... [2024-07-05 00:47:28,810][45457] Num frames 11200... [2024-07-05 00:47:28,870][45457] Num frames 11300... [2024-07-05 00:47:28,928][45457] Num frames 11400... [2024-07-05 00:47:29,024][45457] Avg episode rewards: #0: 30.744, true rewards: #0: 12.744 [2024-07-05 00:47:29,025][45457] Avg episode reward: 30.744, avg true_objective: 12.744 [2024-07-05 00:47:29,049][45457] Num frames 11500... [2024-07-05 00:47:29,109][45457] Num frames 11600... [2024-07-05 00:47:29,165][45457] Num frames 11700... [2024-07-05 00:47:29,225][45457] Num frames 11800... [2024-07-05 00:47:29,285][45457] Num frames 11900... [2024-07-05 00:47:29,363][45457] Avg episode rewards: #0: 28.639, true rewards: #0: 11.939 [2024-07-05 00:47:29,364][45457] Avg episode reward: 28.639, avg true_objective: 11.939 [2024-07-05 00:47:41,804][45457] Replay video saved to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/replay.mp4! [2024-07-05 10:23:44,255][11302] Saving configuration to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/config.json... [2024-07-05 10:23:44,257][11302] Rollout worker 0 uses device cpu [2024-07-05 10:23:44,258][11302] Rollout worker 1 uses device cpu [2024-07-05 10:23:44,258][11302] Rollout worker 2 uses device cpu [2024-07-05 10:23:44,258][11302] Rollout worker 3 uses device cpu [2024-07-05 10:23:44,259][11302] Rollout worker 4 uses device cpu [2024-07-05 10:23:44,259][11302] Rollout worker 5 uses device cpu [2024-07-05 10:23:44,259][11302] Rollout worker 6 uses device cpu [2024-07-05 10:23:44,260][11302] Rollout worker 7 uses device cpu [2024-07-05 10:23:44,260][11302] Rollout worker 8 uses device cpu [2024-07-05 10:23:44,260][11302] Rollout worker 9 uses device cpu [2024-07-05 10:23:44,260][11302] Rollout worker 10 uses device cpu [2024-07-05 10:23:44,261][11302] Rollout worker 11 uses device cpu [2024-07-05 10:23:44,261][11302] Rollout worker 12 uses device cpu [2024-07-05 10:23:44,261][11302] Rollout worker 13 uses device cpu [2024-07-05 10:23:44,262][11302] Rollout worker 14 uses device cpu [2024-07-05 10:23:44,262][11302] Rollout worker 15 uses device cpu [2024-07-05 10:23:44,370][11302] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:23:44,371][11302] InferenceWorker_p0-w0: min num requests: 5 [2024-07-05 10:23:44,460][11302] Starting all processes... [2024-07-05 10:23:44,461][11302] Starting process learner_proc0 [2024-07-05 10:23:45,124][11302] Starting all processes... [2024-07-05 10:23:45,131][11302] Starting process inference_proc0-0 [2024-07-05 10:23:45,132][11302] Starting process rollout_proc0 [2024-07-05 10:23:45,132][11302] Starting process rollout_proc1 [2024-07-05 10:23:45,133][11302] Starting process rollout_proc2 [2024-07-05 10:23:45,135][11302] Starting process rollout_proc3 [2024-07-05 10:23:45,136][11302] Starting process rollout_proc4 [2024-07-05 10:23:45,140][11302] Starting process rollout_proc5 [2024-07-05 10:23:45,142][11302] Starting process rollout_proc6 [2024-07-05 10:23:45,142][11302] Starting process rollout_proc7 [2024-07-05 10:23:45,142][11302] Starting process rollout_proc8 [2024-07-05 10:23:45,143][11302] Starting process rollout_proc9 [2024-07-05 10:23:45,144][11302] Starting process rollout_proc10 [2024-07-05 10:23:45,145][11302] Starting process rollout_proc11 [2024-07-05 10:23:45,148][11302] Starting process rollout_proc12 [2024-07-05 10:23:45,157][11302] Starting process rollout_proc13 [2024-07-05 10:23:45,157][11302] Starting process rollout_proc14 [2024-07-05 10:23:45,171][11302] Starting process rollout_proc15 [2024-07-05 10:23:49,312][11874] Worker 7 uses CPU cores [7] [2024-07-05 10:23:49,324][11867] Worker 0 uses CPU cores [0] [2024-07-05 10:23:49,436][11875] Worker 8 uses CPU cores [8] [2024-07-05 10:23:49,460][11869] Worker 2 uses CPU cores [2] [2024-07-05 10:23:49,665][11878] Worker 11 uses CPU cores [11] [2024-07-05 10:23:49,698][11846] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:23:49,699][11846] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-07-05 10:23:49,725][11894] Worker 12 uses CPU cores [12] [2024-07-05 10:23:49,735][11871] Worker 4 uses CPU cores [4] [2024-07-05 10:23:49,736][11876] Worker 10 uses CPU cores [10] [2024-07-05 10:23:49,753][11870] Worker 3 uses CPU cores [3] [2024-07-05 10:23:49,772][11897] Worker 15 uses CPU cores [15] [2024-07-05 10:23:49,782][11895] Worker 13 uses CPU cores [13] [2024-07-05 10:23:49,783][11846] Num visible devices: 1 [2024-07-05 10:23:49,823][11877] Worker 9 uses CPU cores [9] [2024-07-05 10:23:49,832][11846] Setting fixed seed 200 [2024-07-05 10:23:49,846][11846] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:23:49,846][11846] Initializing actor-critic model on device cuda:0 [2024-07-05 10:23:49,846][11846] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 10:23:49,847][11846] RunningMeanStd input shape: (1,) [2024-07-05 10:23:49,852][11896] Worker 14 uses CPU cores [14] [2024-07-05 10:23:49,862][11846] Num input channels: 3 [2024-07-05 10:23:49,867][11872] Worker 5 uses CPU cores [5] [2024-07-05 10:23:49,902][11868] Worker 1 uses CPU cores [1] [2024-07-05 10:23:49,906][11846] Convolutional layer output size: 4608 [2024-07-05 10:23:49,926][11846] Policy head output size: 512 [2024-07-05 10:23:50,058][11873] Worker 6 uses CPU cores [6] [2024-07-05 10:23:50,058][11846] Created Actor Critic model with architecture: [2024-07-05 10:23:50,058][11846] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ResnetEncoder( (conv_head): Sequential( (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (2): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (3): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (5): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (6): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (7): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (8): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (9): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (10): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (11): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (12): ELU(alpha=1.0) ) (mlp_layers): Sequential( (0): Linear(in_features=4608, out_features=512, bias=True) (1): ELU(alpha=1.0) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-07-05 10:23:50,059][11866] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:23:50,059][11866] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-07-05 10:23:50,112][11866] Num visible devices: 1 [2024-07-05 10:23:50,193][11846] Using optimizer [2024-07-05 10:23:50,713][11846] Loading state from checkpoint /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-07-05 10:23:50,798][11846] Loading model from checkpoint [2024-07-05 10:23:50,800][11846] Loaded experiment state at self.train_step=1222, self.env_steps=5005312 [2024-07-05 10:23:50,800][11846] Initialized policy 0 weights for model version 1222 [2024-07-05 10:23:50,802][11846] LearnerWorker_p0 finished initialization! [2024-07-05 10:23:50,802][11846] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:23:50,895][11866] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 10:23:50,895][11866] RunningMeanStd input shape: (1,) [2024-07-05 10:23:50,902][11866] Num input channels: 3 [2024-07-05 10:23:50,913][11866] Convolutional layer output size: 4608 [2024-07-05 10:23:50,924][11866] Policy head output size: 512 [2024-07-05 10:23:51,053][11302] Inference worker 0-0 is ready! [2024-07-05 10:23:51,054][11302] All inference workers are ready! Signal rollout workers to start! [2024-07-05 10:23:51,133][11868] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:23:51,145][11874] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:23:51,153][11872] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:23:51,160][11895] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:23:51,163][11871] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:23:51,167][11878] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:23:51,167][11867] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:23:51,167][11873] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:23:51,168][11870] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:23:51,169][11897] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:23:51,170][11894] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:23:51,172][11875] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:23:51,175][11876] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:23:51,180][11896] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:23:51,181][11869] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:23:51,260][11877] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:23:51,899][11871] Decorrelating experience for 0 frames... [2024-07-05 10:23:51,899][11873] Decorrelating experience for 0 frames... [2024-07-05 10:23:51,899][11868] Decorrelating experience for 0 frames... [2024-07-05 10:23:51,899][11867] Decorrelating experience for 0 frames... [2024-07-05 10:23:51,899][11895] Decorrelating experience for 0 frames... [2024-07-05 10:23:51,899][11894] Decorrelating experience for 0 frames... [2024-07-05 10:23:51,899][11870] Decorrelating experience for 0 frames... [2024-07-05 10:23:51,899][11869] Decorrelating experience for 0 frames... [2024-07-05 10:23:52,086][11894] Decorrelating experience for 32 frames... [2024-07-05 10:23:52,113][11876] Decorrelating experience for 0 frames... [2024-07-05 10:23:52,147][11873] Decorrelating experience for 32 frames... [2024-07-05 10:23:52,156][11867] Decorrelating experience for 32 frames... [2024-07-05 10:23:52,156][11868] Decorrelating experience for 32 frames... [2024-07-05 10:23:52,157][11869] Decorrelating experience for 32 frames... [2024-07-05 10:23:52,157][11870] Decorrelating experience for 32 frames... [2024-07-05 10:23:52,180][11874] Decorrelating experience for 0 frames... [2024-07-05 10:23:52,279][11894] Decorrelating experience for 64 frames... [2024-07-05 10:23:52,329][11871] Decorrelating experience for 32 frames... [2024-07-05 10:23:52,340][11897] Decorrelating experience for 0 frames... [2024-07-05 10:23:52,341][11878] Decorrelating experience for 0 frames... [2024-07-05 10:23:52,344][11873] Decorrelating experience for 64 frames... [2024-07-05 10:23:52,349][11869] Decorrelating experience for 64 frames... [2024-07-05 10:23:52,354][11867] Decorrelating experience for 64 frames... [2024-07-05 10:23:52,495][11876] Decorrelating experience for 32 frames... [2024-07-05 10:23:52,512][11871] Decorrelating experience for 64 frames... [2024-07-05 10:23:52,512][11897] Decorrelating experience for 32 frames... [2024-07-05 10:23:52,523][11894] Decorrelating experience for 96 frames... [2024-07-05 10:23:52,609][11867] Decorrelating experience for 96 frames... [2024-07-05 10:23:52,609][11868] Decorrelating experience for 64 frames... [2024-07-05 10:23:52,691][11873] Decorrelating experience for 96 frames... [2024-07-05 10:23:52,701][11896] Decorrelating experience for 0 frames... [2024-07-05 10:23:52,796][11894] Decorrelating experience for 128 frames... [2024-07-05 10:23:52,808][11870] Decorrelating experience for 64 frames... [2024-07-05 10:23:52,808][11871] Decorrelating experience for 96 frames... [2024-07-05 10:23:52,880][11896] Decorrelating experience for 32 frames... [2024-07-05 10:23:52,882][11867] Decorrelating experience for 128 frames... [2024-07-05 10:23:52,886][11875] Decorrelating experience for 0 frames... [2024-07-05 10:23:52,989][11876] Decorrelating experience for 64 frames... [2024-07-05 10:23:53,015][11870] Decorrelating experience for 96 frames... [2024-07-05 10:23:53,070][11873] Decorrelating experience for 128 frames... [2024-07-05 10:23:53,080][11896] Decorrelating experience for 64 frames... [2024-07-05 10:23:53,081][11872] Decorrelating experience for 0 frames... [2024-07-05 10:23:53,124][11867] Decorrelating experience for 160 frames... [2024-07-05 10:23:53,151][11874] Decorrelating experience for 32 frames... [2024-07-05 10:23:53,185][11875] Decorrelating experience for 32 frames... [2024-07-05 10:23:53,250][11894] Decorrelating experience for 160 frames... [2024-07-05 10:23:53,317][11896] Decorrelating experience for 96 frames... [2024-07-05 10:23:53,350][11868] Decorrelating experience for 96 frames... [2024-07-05 10:23:53,377][11878] Decorrelating experience for 32 frames... [2024-07-05 10:23:53,396][11876] Decorrelating experience for 96 frames... [2024-07-05 10:23:53,409][11875] Decorrelating experience for 64 frames... [2024-07-05 10:23:53,427][11871] Decorrelating experience for 128 frames... [2024-07-05 10:23:53,433][11874] Decorrelating experience for 64 frames... [2024-07-05 10:23:53,550][11867] Decorrelating experience for 192 frames... [2024-07-05 10:23:53,585][11894] Decorrelating experience for 192 frames... [2024-07-05 10:23:53,657][11869] Decorrelating experience for 96 frames... [2024-07-05 10:23:53,663][11896] Decorrelating experience for 128 frames... [2024-07-05 10:23:53,663][11875] Decorrelating experience for 96 frames... [2024-07-05 10:23:53,692][11872] Decorrelating experience for 32 frames... [2024-07-05 10:23:53,694][11874] Decorrelating experience for 96 frames... [2024-07-05 10:23:53,713][11895] Decorrelating experience for 32 frames... [2024-07-05 10:23:53,786][11302] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 5005312. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:23:53,855][11878] Decorrelating experience for 64 frames... [2024-07-05 10:23:53,893][11876] Decorrelating experience for 128 frames... [2024-07-05 10:23:53,929][11894] Decorrelating experience for 224 frames... [2024-07-05 10:23:53,955][11867] Decorrelating experience for 224 frames... [2024-07-05 10:23:53,975][11869] Decorrelating experience for 128 frames... [2024-07-05 10:23:53,987][11872] Decorrelating experience for 64 frames... [2024-07-05 10:23:53,993][11871] Decorrelating experience for 160 frames... [2024-07-05 10:23:54,000][11868] Decorrelating experience for 128 frames... [2024-07-05 10:23:54,138][11895] Decorrelating experience for 64 frames... [2024-07-05 10:23:54,151][11896] Decorrelating experience for 160 frames... [2024-07-05 10:23:54,156][11876] Decorrelating experience for 160 frames... [2024-07-05 10:23:54,192][11874] Decorrelating experience for 128 frames... [2024-07-05 10:23:54,251][11869] Decorrelating experience for 160 frames... [2024-07-05 10:23:54,260][11875] Decorrelating experience for 128 frames... [2024-07-05 10:23:54,301][11868] Decorrelating experience for 160 frames... [2024-07-05 10:23:54,376][11871] Decorrelating experience for 192 frames... [2024-07-05 10:23:54,378][11878] Decorrelating experience for 96 frames... [2024-07-05 10:23:54,381][11895] Decorrelating experience for 96 frames... [2024-07-05 10:23:54,476][11897] Decorrelating experience for 64 frames... [2024-07-05 10:23:54,577][11896] Decorrelating experience for 192 frames... [2024-07-05 10:23:54,633][11874] Decorrelating experience for 160 frames... [2024-07-05 10:23:54,635][11870] Decorrelating experience for 128 frames... [2024-07-05 10:23:54,649][11868] Decorrelating experience for 192 frames... [2024-07-05 10:23:54,702][11872] Decorrelating experience for 96 frames... [2024-07-05 10:23:54,739][11895] Decorrelating experience for 128 frames... [2024-07-05 10:23:54,781][11897] Decorrelating experience for 96 frames... [2024-07-05 10:23:54,789][11878] Decorrelating experience for 128 frames... [2024-07-05 10:23:54,879][11876] Decorrelating experience for 192 frames... [2024-07-05 10:23:54,936][11877] Decorrelating experience for 0 frames... [2024-07-05 10:23:54,974][11870] Decorrelating experience for 160 frames... [2024-07-05 10:23:54,980][11868] Decorrelating experience for 224 frames... [2024-07-05 10:23:55,026][11874] Decorrelating experience for 192 frames... [2024-07-05 10:23:55,042][11895] Decorrelating experience for 160 frames... [2024-07-05 10:23:55,061][11872] Decorrelating experience for 128 frames... [2024-07-05 10:23:55,062][11896] Decorrelating experience for 224 frames... [2024-07-05 10:23:55,174][11878] Decorrelating experience for 160 frames... [2024-07-05 10:23:55,186][11869] Decorrelating experience for 192 frames... [2024-07-05 10:23:55,218][11877] Decorrelating experience for 32 frames... [2024-07-05 10:23:55,244][11873] Decorrelating experience for 160 frames... [2024-07-05 10:23:55,287][11876] Decorrelating experience for 224 frames... [2024-07-05 10:23:55,371][11870] Decorrelating experience for 192 frames... [2024-07-05 10:23:55,436][11874] Decorrelating experience for 224 frames... [2024-07-05 10:23:55,482][11897] Decorrelating experience for 128 frames... [2024-07-05 10:23:55,483][11877] Decorrelating experience for 64 frames... [2024-07-05 10:23:55,532][11871] Decorrelating experience for 224 frames... [2024-07-05 10:23:55,605][11869] Decorrelating experience for 224 frames... [2024-07-05 10:23:55,637][11873] Decorrelating experience for 192 frames... [2024-07-05 10:23:55,700][11895] Decorrelating experience for 192 frames... [2024-07-05 10:23:55,805][11870] Decorrelating experience for 224 frames... [2024-07-05 10:23:55,820][11875] Decorrelating experience for 160 frames... [2024-07-05 10:23:55,824][11897] Decorrelating experience for 160 frames... [2024-07-05 10:23:55,836][11877] Decorrelating experience for 96 frames... [2024-07-05 10:23:55,884][11878] Decorrelating experience for 192 frames... [2024-07-05 10:23:55,993][11873] Decorrelating experience for 224 frames... [2024-07-05 10:23:56,041][11895] Decorrelating experience for 224 frames... [2024-07-05 10:23:56,184][11897] Decorrelating experience for 192 frames... [2024-07-05 10:23:56,191][11872] Decorrelating experience for 160 frames... [2024-07-05 10:23:56,251][11875] Decorrelating experience for 192 frames... [2024-07-05 10:23:56,289][11878] Decorrelating experience for 224 frames... [2024-07-05 10:23:56,349][11877] Decorrelating experience for 128 frames... [2024-07-05 10:23:56,565][11897] Decorrelating experience for 224 frames... [2024-07-05 10:23:56,607][11872] Decorrelating experience for 192 frames... [2024-07-05 10:23:56,893][11877] Decorrelating experience for 160 frames... [2024-07-05 10:23:56,904][11875] Decorrelating experience for 224 frames... [2024-07-05 10:23:56,960][11846] Signal inference workers to stop experience collection... [2024-07-05 10:23:56,973][11866] InferenceWorker_p0-w0: stopping experience collection [2024-07-05 10:23:57,111][11872] Decorrelating experience for 224 frames... [2024-07-05 10:23:57,199][11877] Decorrelating experience for 192 frames... [2024-07-05 10:23:57,405][11877] Decorrelating experience for 224 frames... [2024-07-05 10:23:58,786][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 1212.8. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:23:58,787][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:23:58,841][11846] EvtLoop [learner_proc0_evt_loop, process=learner_proc0] unhandled exception in slot='on_new_training_batch' connected to emitter=Emitter(object_id='Batcher_0', signal_name='training_batches_available'), args=(0,) Traceback (most recent call last): File "/home/raghu/anaconda3/envs/rl/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/raghu/anaconda3/envs/rl/lib/python3.10/site-packages/sample_factory/algo/learning/learner_worker.py", line 150, in on_new_training_batch stats = self.learner.train(self.batcher.training_batches[batch_idx]) File "/home/raghu/anaconda3/envs/rl/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 1046, in train train_stats = self._train(buff, self.cfg.batch_size, experience_size, num_invalids) File "/home/raghu/anaconda3/envs/rl/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 731, in _train ) = self._calculate_losses(mb, num_invalids) File "/home/raghu/anaconda3/envs/rl/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 572, in _calculate_losses core_output_seq, _ = self.actor_critic.forward_core(head_output_seq, rnn_states) File "/home/raghu/anaconda3/envs/rl/lib/python3.10/site-packages/sample_factory/model/actor_critic.py", line 159, in forward_core x, new_rnn_states = self.core(head_output, rnn_states) File "/home/raghu/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/raghu/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "/home/raghu/anaconda3/envs/rl/lib/python3.10/site-packages/sample_factory/model/core.py", line 49, in forward x, new_rnn_states = self.core(head_output, rnn_states.contiguous()) File "/home/raghu/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/raghu/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "/home/raghu/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 1136, in forward result = _VF.gru(input, batch_sizes, hx, self._flat_weights, self.bias, torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 28.00 MiB. GPU [2024-07-05 10:23:58,842][11846] Unhandled exception CUDA out of memory. Tried to allocate 28.00 MiB. GPU in evt loop learner_proc0_evt_loop [2024-07-05 10:24:03,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 606.4. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:24:03,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:24:04,365][11302] Heartbeat connected on Batcher_0 [2024-07-05 10:24:04,372][11302] Heartbeat connected on InferenceWorker_p0-w0 [2024-07-05 10:24:04,375][11302] Heartbeat connected on RolloutWorker_w0 [2024-07-05 10:24:04,378][11302] Heartbeat connected on RolloutWorker_w1 [2024-07-05 10:24:04,381][11302] Heartbeat connected on RolloutWorker_w2 [2024-07-05 10:24:04,384][11302] Heartbeat connected on RolloutWorker_w3 [2024-07-05 10:24:04,387][11302] Heartbeat connected on RolloutWorker_w4 [2024-07-05 10:24:04,390][11302] Heartbeat connected on RolloutWorker_w5 [2024-07-05 10:24:04,393][11302] Heartbeat connected on RolloutWorker_w6 [2024-07-05 10:24:04,399][11302] Heartbeat connected on RolloutWorker_w8 [2024-07-05 10:24:04,402][11302] Heartbeat connected on RolloutWorker_w7 [2024-07-05 10:24:04,447][11302] Heartbeat connected on RolloutWorker_w10 [2024-07-05 10:24:04,448][11302] Heartbeat connected on RolloutWorker_w9 [2024-07-05 10:24:04,449][11302] Heartbeat connected on RolloutWorker_w11 [2024-07-05 10:24:04,452][11302] Heartbeat connected on RolloutWorker_w12 [2024-07-05 10:24:04,455][11302] Heartbeat connected on RolloutWorker_w13 [2024-07-05 10:24:04,458][11302] Heartbeat connected on RolloutWorker_w14 [2024-07-05 10:24:04,460][11302] Heartbeat connected on RolloutWorker_w15 [2024-07-05 10:24:08,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 404.2. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:24:08,789][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:24:13,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 303.2. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:24:13,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:24:18,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 242.6. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:24:18,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:24:23,786][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 202.1. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:24:23,787][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:24:28,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 173.3. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:24:28,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:24:33,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 151.6. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:24:33,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:24:38,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 134.8. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:24:38,789][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:24:43,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:24:43,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:24:48,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:24:48,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:24:53,786][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:24:53,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:24:58,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:24:58,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:25:03,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:25:03,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:25:08,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:25:08,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:25:13,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:25:13,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:25:18,786][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:25:18,787][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:25:23,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:25:23,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:25:28,786][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:25:28,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:25:33,786][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:25:33,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:25:38,786][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:25:38,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:25:43,786][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:25:43,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:25:48,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:25:48,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:25:53,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:25:53,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:25:58,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:25:58,789][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:26:03,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:26:03,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:26:08,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:26:08,789][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:26:13,787][11302] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 6064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:26:13,788][11302] Avg episode reward: [(0, '2.206')] [2024-07-05 10:26:16,861][11302] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 11302], exiting... [2024-07-05 10:26:16,863][11846] Stopping Batcher_0... [2024-07-05 10:26:16,864][11846] Loop batcher_evt_loop terminating... [2024-07-05 10:26:16,863][11302] Runner profile tree view: main_loop: 152.4026 [2024-07-05 10:26:16,872][11302] Collected {0: 5005312}, FPS: 0.0 [2024-07-05 10:26:16,898][11876] Stopping RolloutWorker_w10... [2024-07-05 10:26:16,899][11872] Stopping RolloutWorker_w5... [2024-07-05 10:26:16,899][11876] Loop rollout_proc10_evt_loop terminating... [2024-07-05 10:26:16,899][11872] Loop rollout_proc5_evt_loop terminating... [2024-07-05 10:26:16,899][11895] Stopping RolloutWorker_w13... [2024-07-05 10:26:16,900][11895] Loop rollout_proc13_evt_loop terminating... [2024-07-05 10:26:16,900][11870] Stopping RolloutWorker_w3... [2024-07-05 10:26:16,901][11877] Stopping RolloutWorker_w9... [2024-07-05 10:26:16,901][11870] Loop rollout_proc3_evt_loop terminating... [2024-07-05 10:26:16,902][11877] Loop rollout_proc9_evt_loop terminating... [2024-07-05 10:26:16,902][11868] Stopping RolloutWorker_w1... [2024-07-05 10:26:16,902][11871] Stopping RolloutWorker_w4... [2024-07-05 10:26:16,903][11871] Loop rollout_proc4_evt_loop terminating... [2024-07-05 10:26:16,903][11868] Loop rollout_proc1_evt_loop terminating... [2024-07-05 10:26:16,903][11874] Stopping RolloutWorker_w7... [2024-07-05 10:26:16,904][11874] Loop rollout_proc7_evt_loop terminating... [2024-07-05 10:26:16,904][11894] Stopping RolloutWorker_w12... [2024-07-05 10:26:16,905][11875] Stopping RolloutWorker_w8... [2024-07-05 10:26:16,905][11894] Loop rollout_proc12_evt_loop terminating... [2024-07-05 10:26:16,905][11875] Loop rollout_proc8_evt_loop terminating... [2024-07-05 10:26:16,906][11896] Stopping RolloutWorker_w14... [2024-07-05 10:26:16,907][11896] Loop rollout_proc14_evt_loop terminating... [2024-07-05 10:26:16,915][11869] Stopping RolloutWorker_w2... [2024-07-05 10:26:16,916][11869] Loop rollout_proc2_evt_loop terminating... [2024-07-05 10:26:16,917][11867] Stopping RolloutWorker_w0... [2024-07-05 10:26:16,918][11867] Loop rollout_proc0_evt_loop terminating... [2024-07-05 10:26:16,918][11873] Stopping RolloutWorker_w6... [2024-07-05 10:26:16,919][11873] Loop rollout_proc6_evt_loop terminating... [2024-07-05 10:26:16,921][11897] Stopping RolloutWorker_w15... [2024-07-05 10:26:16,921][11897] Loop rollout_proc15_evt_loop terminating... [2024-07-05 10:26:16,983][11878] Stopping RolloutWorker_w11... [2024-07-05 10:26:16,997][11878] Loop rollout_proc11_evt_loop terminating... [2024-07-05 10:26:17,001][11866] Weights refcount: 2 0 [2024-07-05 10:26:17,007][11866] Stopping InferenceWorker_p0-w0... [2024-07-05 10:26:17,009][11866] Loop inference_proc0-0_evt_loop terminating... [2024-07-05 10:29:01,075][17621] Saving configuration to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/config.json... [2024-07-05 10:29:01,077][17621] Rollout worker 0 uses device cpu [2024-07-05 10:29:01,078][17621] Rollout worker 1 uses device cpu [2024-07-05 10:29:01,079][17621] Rollout worker 2 uses device cpu [2024-07-05 10:29:01,080][17621] Rollout worker 3 uses device cpu [2024-07-05 10:29:01,080][17621] Rollout worker 4 uses device cpu [2024-07-05 10:29:01,081][17621] Rollout worker 5 uses device cpu [2024-07-05 10:29:01,081][17621] Rollout worker 6 uses device cpu [2024-07-05 10:29:01,082][17621] Rollout worker 7 uses device cpu [2024-07-05 10:29:01,127][17621] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:29:01,128][17621] InferenceWorker_p0-w0: min num requests: 2 [2024-07-05 10:29:01,155][17621] Starting all processes... [2024-07-05 10:29:01,155][17621] Starting process learner_proc0 [2024-07-05 10:29:01,854][17621] Starting all processes... [2024-07-05 10:29:01,860][17621] Starting process inference_proc0-0 [2024-07-05 10:29:01,861][17621] Starting process rollout_proc0 [2024-07-05 10:29:01,861][17621] Starting process rollout_proc1 [2024-07-05 10:29:01,861][17621] Starting process rollout_proc2 [2024-07-05 10:29:01,862][17621] Starting process rollout_proc3 [2024-07-05 10:29:01,862][17621] Starting process rollout_proc4 [2024-07-05 10:29:01,862][17621] Starting process rollout_proc5 [2024-07-05 10:29:01,863][17621] Starting process rollout_proc6 [2024-07-05 10:29:01,864][17621] Starting process rollout_proc7 [2024-07-05 10:29:04,561][17915] Worker 3 uses CPU cores [6, 7] [2024-07-05 10:29:04,576][17913] Worker 1 uses CPU cores [2, 3] [2024-07-05 10:29:04,576][17914] Worker 2 uses CPU cores [4, 5] [2024-07-05 10:29:04,729][17898] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:29:04,729][17898] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-07-05 10:29:04,777][17898] Num visible devices: 1 [2024-07-05 10:29:04,793][17919] Worker 7 uses CPU cores [14, 15] [2024-07-05 10:29:04,801][17898] Setting fixed seed 200 [2024-07-05 10:29:04,812][17898] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:29:04,812][17898] Initializing actor-critic model on device cuda:0 [2024-07-05 10:29:04,813][17898] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 10:29:04,814][17898] RunningMeanStd input shape: (1,) [2024-07-05 10:29:04,823][17898] Num input channels: 3 [2024-07-05 10:29:04,854][17898] Convolutional layer output size: 4608 [2024-07-05 10:29:04,867][17898] Policy head output size: 512 [2024-07-05 10:29:04,931][17917] Worker 5 uses CPU cores [10, 11] [2024-07-05 10:29:05,001][17898] Created Actor Critic model with architecture: [2024-07-05 10:29:05,002][17898] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ResnetEncoder( (conv_head): Sequential( (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (2): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (3): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (5): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (6): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (7): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (8): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (9): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (10): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (11): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (12): ELU(alpha=1.0) ) (mlp_layers): Sequential( (0): Linear(in_features=4608, out_features=512, bias=True) (1): ELU(alpha=1.0) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-07-05 10:29:05,020][17912] Worker 0 uses CPU cores [0, 1] [2024-07-05 10:29:05,142][17911] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:29:05,142][17911] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-07-05 10:29:05,156][17898] Using optimizer [2024-07-05 10:29:05,182][17916] Worker 4 uses CPU cores [8, 9] [2024-07-05 10:29:05,187][17911] Num visible devices: 1 [2024-07-05 10:29:05,256][17918] Worker 6 uses CPU cores [12, 13] [2024-07-05 10:29:05,675][17898] Loading state from checkpoint /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-07-05 10:29:05,746][17898] Loading model from checkpoint [2024-07-05 10:29:05,748][17898] Loaded experiment state at self.train_step=1222, self.env_steps=5005312 [2024-07-05 10:29:05,748][17898] Initialized policy 0 weights for model version 1222 [2024-07-05 10:29:05,749][17898] LearnerWorker_p0 finished initialization! [2024-07-05 10:29:05,749][17898] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:29:05,817][17911] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 10:29:05,819][17911] RunningMeanStd input shape: (1,) [2024-07-05 10:29:05,826][17911] Num input channels: 3 [2024-07-05 10:29:05,836][17911] Convolutional layer output size: 4608 [2024-07-05 10:29:05,847][17911] Policy head output size: 512 [2024-07-05 10:29:05,973][17621] Inference worker 0-0 is ready! [2024-07-05 10:29:05,974][17621] All inference workers are ready! Signal rollout workers to start! [2024-07-05 10:29:06,014][17918] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:29:06,014][17917] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:29:06,014][17915] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:29:06,014][17912] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:29:06,015][17913] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:29:06,016][17914] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:29:06,016][17919] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:29:06,017][17916] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:29:06,595][17919] Decorrelating experience for 0 frames... [2024-07-05 10:29:06,595][17914] Decorrelating experience for 0 frames... [2024-07-05 10:29:06,595][17917] Decorrelating experience for 0 frames... [2024-07-05 10:29:06,595][17915] Decorrelating experience for 0 frames... [2024-07-05 10:29:06,595][17918] Decorrelating experience for 0 frames... [2024-07-05 10:29:06,595][17913] Decorrelating experience for 0 frames... [2024-07-05 10:29:06,754][17621] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 5005312. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:29:06,785][17919] Decorrelating experience for 32 frames... [2024-07-05 10:29:06,786][17917] Decorrelating experience for 32 frames... [2024-07-05 10:29:06,786][17915] Decorrelating experience for 32 frames... [2024-07-05 10:29:06,786][17918] Decorrelating experience for 32 frames... [2024-07-05 10:29:06,787][17914] Decorrelating experience for 32 frames... [2024-07-05 10:29:06,796][17916] Decorrelating experience for 0 frames... [2024-07-05 10:29:06,797][17912] Decorrelating experience for 0 frames... [2024-07-05 10:29:06,976][17912] Decorrelating experience for 32 frames... [2024-07-05 10:29:06,976][17916] Decorrelating experience for 32 frames... [2024-07-05 10:29:06,976][17913] Decorrelating experience for 32 frames... [2024-07-05 10:29:07,011][17919] Decorrelating experience for 64 frames... [2024-07-05 10:29:07,015][17917] Decorrelating experience for 64 frames... [2024-07-05 10:29:07,020][17918] Decorrelating experience for 64 frames... [2024-07-05 10:29:07,163][17914] Decorrelating experience for 64 frames... [2024-07-05 10:29:07,197][17913] Decorrelating experience for 64 frames... [2024-07-05 10:29:07,197][17912] Decorrelating experience for 64 frames... [2024-07-05 10:29:07,199][17916] Decorrelating experience for 64 frames... [2024-07-05 10:29:07,211][17917] Decorrelating experience for 96 frames... [2024-07-05 10:29:07,216][17915] Decorrelating experience for 64 frames... [2024-07-05 10:29:07,385][17913] Decorrelating experience for 96 frames... [2024-07-05 10:29:07,385][17916] Decorrelating experience for 96 frames... [2024-07-05 10:29:07,404][17914] Decorrelating experience for 96 frames... [2024-07-05 10:29:07,406][17915] Decorrelating experience for 96 frames... [2024-07-05 10:29:07,466][17919] Decorrelating experience for 96 frames... [2024-07-05 10:29:07,583][17912] Decorrelating experience for 96 frames... [2024-07-05 10:29:07,655][17918] Decorrelating experience for 96 frames... [2024-07-05 10:29:08,255][17898] Signal inference workers to stop experience collection... [2024-07-05 10:29:08,262][17911] InferenceWorker_p0-w0: stopping experience collection [2024-07-05 10:29:11,754][17621] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 432.0. Samples: 2160. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:29:11,756][17621] Avg episode reward: [(0, '1.390')] [2024-07-05 10:29:11,821][17898] Signal inference workers to resume experience collection... [2024-07-05 10:29:11,822][17898] Stopping Batcher_0... [2024-07-05 10:29:11,822][17898] Loop batcher_evt_loop terminating... [2024-07-05 10:29:11,826][17621] Component Batcher_0 stopped! [2024-07-05 10:29:11,834][17913] Stopping RolloutWorker_w1... [2024-07-05 10:29:11,834][17917] Stopping RolloutWorker_w5... [2024-07-05 10:29:11,834][17913] Loop rollout_proc1_evt_loop terminating... [2024-07-05 10:29:11,834][17919] Stopping RolloutWorker_w7... [2024-07-05 10:29:11,834][17915] Stopping RolloutWorker_w3... [2024-07-05 10:29:11,834][17918] Stopping RolloutWorker_w6... [2024-07-05 10:29:11,834][17914] Stopping RolloutWorker_w2... [2024-07-05 10:29:11,834][17917] Loop rollout_proc5_evt_loop terminating... [2024-07-05 10:29:11,834][17919] Loop rollout_proc7_evt_loop terminating... [2024-07-05 10:29:11,834][17914] Loop rollout_proc2_evt_loop terminating... [2024-07-05 10:29:11,834][17918] Loop rollout_proc6_evt_loop terminating... [2024-07-05 10:29:11,834][17915] Loop rollout_proc3_evt_loop terminating... [2024-07-05 10:29:11,835][17916] Stopping RolloutWorker_w4... [2024-07-05 10:29:11,834][17621] Component RolloutWorker_w1 stopped! [2024-07-05 10:29:11,835][17916] Loop rollout_proc4_evt_loop terminating... [2024-07-05 10:29:11,836][17912] Stopping RolloutWorker_w0... [2024-07-05 10:29:11,836][17912] Loop rollout_proc0_evt_loop terminating... [2024-07-05 10:29:11,836][17621] Component RolloutWorker_w5 stopped! [2024-07-05 10:29:11,836][17621] Component RolloutWorker_w7 stopped! [2024-07-05 10:29:11,838][17621] Component RolloutWorker_w3 stopped! [2024-07-05 10:29:11,839][17621] Component RolloutWorker_w6 stopped! [2024-07-05 10:29:11,840][17621] Component RolloutWorker_w2 stopped! [2024-07-05 10:29:11,840][17621] Component RolloutWorker_w4 stopped! [2024-07-05 10:29:11,841][17621] Component RolloutWorker_w0 stopped! [2024-07-05 10:29:11,849][17911] Weights refcount: 2 0 [2024-07-05 10:29:11,851][17911] Stopping InferenceWorker_p0-w0... [2024-07-05 10:29:11,851][17911] Loop inference_proc0-0_evt_loop terminating... [2024-07-05 10:29:11,851][17621] Component InferenceWorker_p0-w0 stopped! [2024-07-05 10:29:12,570][17898] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001224_5013504.pth... [2024-07-05 10:29:12,667][17898] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001117_4575232.pth [2024-07-05 10:29:12,668][17898] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001224_5013504.pth... [2024-07-05 10:29:12,765][17898] Stopping LearnerWorker_p0... [2024-07-05 10:29:12,765][17898] Loop learner_proc0_evt_loop terminating... [2024-07-05 10:29:12,765][17621] Component LearnerWorker_p0 stopped! [2024-07-05 10:29:12,767][17621] Waiting for process learner_proc0 to stop... [2024-07-05 10:29:13,428][17621] Waiting for process inference_proc0-0 to join... [2024-07-05 10:29:13,429][17621] Waiting for process rollout_proc0 to join... [2024-07-05 10:29:13,430][17621] Waiting for process rollout_proc1 to join... [2024-07-05 10:29:13,431][17621] Waiting for process rollout_proc2 to join... [2024-07-05 10:29:13,431][17621] Waiting for process rollout_proc3 to join... [2024-07-05 10:29:13,432][17621] Waiting for process rollout_proc4 to join... [2024-07-05 10:29:13,432][17621] Waiting for process rollout_proc5 to join... [2024-07-05 10:29:13,433][17621] Waiting for process rollout_proc6 to join... [2024-07-05 10:29:13,433][17621] Waiting for process rollout_proc7 to join... [2024-07-05 10:29:13,434][17621] Batcher 0 profile tree view: batching: 0.0149, releasing_batches: 0.0005 [2024-07-05 10:29:13,434][17621] InferenceWorker_p0-w0 profile tree view: update_model: 0.0039 wait_policy: 0.0000 wait_policy_total: 1.2594 one_step: 0.0027 handle_policy_step: 0.9892 deserialize: 0.0202, stack: 0.0021, obs_to_device_normalize: 0.1560, forward: 0.7558, send_messages: 0.0164 prepare_outputs: 0.0250 to_cpu: 0.0102 [2024-07-05 10:29:13,434][17621] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 0.8941 train: 3.6211 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0002, kl_divergence: 0.0071, after_optimizer: 0.0451 calculate_losses: 1.1230 losses_init: 0.0000, forward_head: 0.7392, bptt_initial: 0.2961, tail: 0.0354, advantages_returns: 0.0008, losses: 0.0406 bptt: 0.0106 bptt_forward_core: 0.0105 update: 2.4448 clip: 0.0529 [2024-07-05 10:29:13,435][17621] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0004, enqueue_policy_requests: 0.0150, env_step: 0.1536, overhead: 0.0144, complete_rollouts: 0.0003 save_policy_outputs: 0.0149 split_output_tensors: 0.0069 [2024-07-05 10:29:13,435][17621] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0004, enqueue_policy_requests: 0.0194, env_step: 0.1854, overhead: 0.0207, complete_rollouts: 0.0004 save_policy_outputs: 0.0159 split_output_tensors: 0.0077 [2024-07-05 10:29:13,436][17621] Loop Runner_EvtLoop terminating... [2024-07-05 10:29:13,436][17621] Runner profile tree view: main_loop: 12.2816 [2024-07-05 10:29:13,437][17621] Collected {0: 5013504}, FPS: 667.0 [2024-07-05 10:30:24,967][17621] Environment doom_basic already registered, overwriting... [2024-07-05 10:30:24,969][17621] Environment doom_two_colors_easy already registered, overwriting... [2024-07-05 10:30:24,970][17621] Environment doom_two_colors_hard already registered, overwriting... [2024-07-05 10:30:24,970][17621] Environment doom_dm already registered, overwriting... [2024-07-05 10:30:24,970][17621] Environment doom_dwango5 already registered, overwriting... [2024-07-05 10:30:24,971][17621] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-07-05 10:30:24,971][17621] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-07-05 10:30:24,972][17621] Environment doom_my_way_home already registered, overwriting... [2024-07-05 10:30:24,972][17621] Environment doom_deadly_corridor already registered, overwriting... [2024-07-05 10:30:24,973][17621] Environment doom_defend_the_center already registered, overwriting... [2024-07-05 10:30:24,973][17621] Environment doom_defend_the_line already registered, overwriting... [2024-07-05 10:30:24,974][17621] Environment doom_health_gathering already registered, overwriting... [2024-07-05 10:30:24,974][17621] Environment doom_health_gathering_supreme already registered, overwriting... [2024-07-05 10:30:24,974][17621] Environment doom_battle already registered, overwriting... [2024-07-05 10:30:24,975][17621] Environment doom_battle2 already registered, overwriting... [2024-07-05 10:30:24,975][17621] Environment doom_duel_bots already registered, overwriting... [2024-07-05 10:30:24,975][17621] Environment doom_deathmatch_bots already registered, overwriting... [2024-07-05 10:30:24,975][17621] Environment doom_duel already registered, overwriting... [2024-07-05 10:30:24,976][17621] Environment doom_deathmatch_full already registered, overwriting... [2024-07-05 10:30:24,976][17621] Environment doom_benchmark already registered, overwriting... [2024-07-05 10:30:24,976][17621] register_encoder_factory: [2024-07-05 10:30:24,983][17621] Loading existing experiment configuration from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/config.json [2024-07-05 10:30:24,984][17621] Overriding arg 'train_for_env_steps' with value 10000000 passed from command line [2024-07-05 10:30:24,989][17621] Experiment dir /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet already exists! [2024-07-05 10:30:24,990][17621] Resuming existing experiment from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet... [2024-07-05 10:30:24,991][17621] Weights and Biases integration disabled [2024-07-05 10:30:24,993][17621] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-07-05 10:30:28,261][17621] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=conv_resnet train_dir=/home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir restart_behavior=resume device=gpu seed=200 num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=10000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=resnet_impala encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --experiment=conv_resnet --seed=200 --num_workers=8 --num_envs_per_worker=4 --batch_size=1024 --encoder_conv_architecture=resnet_impala --train_for_env_steps=5000000 cli_args={'env': 'doom_health_gathering_supreme', 'experiment': 'conv_resnet', 'seed': 200, 'num_workers': 8, 'num_envs_per_worker': 4, 'batch_size': 1024, 'train_for_env_steps': 5000000, 'encoder_conv_architecture': 'resnet_impala'} git_hash=unknown git_repo_name=not a git repository [2024-07-05 10:30:28,262][17621] Saving configuration to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/config.json... [2024-07-05 10:30:28,263][17621] Rollout worker 0 uses device cpu [2024-07-05 10:30:28,264][17621] Rollout worker 1 uses device cpu [2024-07-05 10:30:28,264][17621] Rollout worker 2 uses device cpu [2024-07-05 10:30:28,265][17621] Rollout worker 3 uses device cpu [2024-07-05 10:30:28,265][17621] Rollout worker 4 uses device cpu [2024-07-05 10:30:28,266][17621] Rollout worker 5 uses device cpu [2024-07-05 10:30:28,266][17621] Rollout worker 6 uses device cpu [2024-07-05 10:30:28,266][17621] Rollout worker 7 uses device cpu [2024-07-05 10:30:28,301][17621] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:30:28,302][17621] InferenceWorker_p0-w0: min num requests: 2 [2024-07-05 10:30:28,327][17621] Starting all processes... [2024-07-05 10:30:28,327][17621] Starting process learner_proc0 [2024-07-05 10:30:28,377][17621] Starting all processes... [2024-07-05 10:30:28,380][17621] Starting process inference_proc0-0 [2024-07-05 10:30:28,380][17621] Starting process rollout_proc0 [2024-07-05 10:30:28,381][17621] Starting process rollout_proc1 [2024-07-05 10:30:28,382][17621] Starting process rollout_proc2 [2024-07-05 10:30:28,382][17621] Starting process rollout_proc3 [2024-07-05 10:30:28,388][17621] Starting process rollout_proc4 [2024-07-05 10:30:28,388][17621] Starting process rollout_proc5 [2024-07-05 10:30:28,388][17621] Starting process rollout_proc6 [2024-07-05 10:30:28,390][17621] Starting process rollout_proc7 [2024-07-05 10:30:31,042][19516] Worker 2 uses CPU cores [4, 5] [2024-07-05 10:30:31,097][19499] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:30:31,097][19499] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-07-05 10:30:31,131][19518] Worker 5 uses CPU cores [10, 11] [2024-07-05 10:30:31,160][19517] Worker 4 uses CPU cores [8, 9] [2024-07-05 10:30:31,167][19513] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:30:31,168][19513] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-07-05 10:30:31,296][19520] Worker 7 uses CPU cores [14, 15] [2024-07-05 10:30:31,423][19514] Worker 1 uses CPU cores [2, 3] [2024-07-05 10:30:31,489][19512] Worker 0 uses CPU cores [0, 1] [2024-07-05 10:30:31,570][19519] Worker 6 uses CPU cores [12, 13] [2024-07-05 10:30:31,686][19515] Worker 3 uses CPU cores [6, 7] [2024-07-05 10:30:32,719][19513] Num visible devices: 1 [2024-07-05 10:30:32,719][19499] Num visible devices: 1 [2024-07-05 10:30:32,744][19499] Setting fixed seed 200 [2024-07-05 10:30:32,745][19499] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:30:32,745][19499] Initializing actor-critic model on device cuda:0 [2024-07-05 10:30:32,746][19499] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 10:30:32,746][19499] RunningMeanStd input shape: (1,) [2024-07-05 10:30:32,753][19499] Num input channels: 3 [2024-07-05 10:30:32,764][19499] Convolutional layer output size: 4608 [2024-07-05 10:30:32,775][19499] Policy head output size: 512 [2024-07-05 10:30:32,896][19499] Created Actor Critic model with architecture: [2024-07-05 10:30:32,897][19499] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ResnetEncoder( (conv_head): Sequential( (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (2): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (3): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (5): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (6): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (7): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (8): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (9): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (10): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (11): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (12): ELU(alpha=1.0) ) (mlp_layers): Sequential( (0): Linear(in_features=4608, out_features=512, bias=True) (1): ELU(alpha=1.0) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-07-05 10:30:32,991][19499] Using optimizer [2024-07-05 10:30:33,485][19499] Loading state from checkpoint /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001224_5013504.pth... [2024-07-05 10:30:33,518][19499] Loading model from checkpoint [2024-07-05 10:30:33,519][19499] Loaded experiment state at self.train_step=1224, self.env_steps=5013504 [2024-07-05 10:30:33,519][19499] Initialized policy 0 weights for model version 1224 [2024-07-05 10:30:33,521][19499] LearnerWorker_p0 finished initialization! [2024-07-05 10:30:33,521][19499] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:30:33,579][19513] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 10:30:33,580][19513] RunningMeanStd input shape: (1,) [2024-07-05 10:30:33,587][19513] Num input channels: 3 [2024-07-05 10:30:33,597][19513] Convolutional layer output size: 4608 [2024-07-05 10:30:33,608][19513] Policy head output size: 512 [2024-07-05 10:30:33,735][17621] Inference worker 0-0 is ready! [2024-07-05 10:30:33,736][17621] All inference workers are ready! Signal rollout workers to start! [2024-07-05 10:30:33,775][19518] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:30:33,776][19520] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:30:33,776][19516] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:30:33,777][19515] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:30:33,778][19514] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:30:33,778][19519] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:30:33,779][19512] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:30:33,779][19517] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:30:34,279][19512] Decorrelating experience for 0 frames... [2024-07-05 10:30:34,280][19515] Decorrelating experience for 0 frames... [2024-07-05 10:30:34,282][19519] Decorrelating experience for 0 frames... [2024-07-05 10:30:34,282][19520] Decorrelating experience for 0 frames... [2024-07-05 10:30:34,282][19518] Decorrelating experience for 0 frames... [2024-07-05 10:30:34,282][19516] Decorrelating experience for 0 frames... [2024-07-05 10:30:34,458][19520] Decorrelating experience for 32 frames... [2024-07-05 10:30:34,458][19518] Decorrelating experience for 32 frames... [2024-07-05 10:30:34,458][19512] Decorrelating experience for 32 frames... [2024-07-05 10:30:34,459][19516] Decorrelating experience for 32 frames... [2024-07-05 10:30:34,459][19519] Decorrelating experience for 32 frames... [2024-07-05 10:30:34,650][19514] Decorrelating experience for 0 frames... [2024-07-05 10:30:34,650][19517] Decorrelating experience for 0 frames... [2024-07-05 10:30:34,683][19512] Decorrelating experience for 64 frames... [2024-07-05 10:30:34,684][19519] Decorrelating experience for 64 frames... [2024-07-05 10:30:34,684][19516] Decorrelating experience for 64 frames... [2024-07-05 10:30:34,685][19520] Decorrelating experience for 64 frames... [2024-07-05 10:30:34,818][19514] Decorrelating experience for 32 frames... [2024-07-05 10:30:34,818][19517] Decorrelating experience for 32 frames... [2024-07-05 10:30:34,855][19515] Decorrelating experience for 32 frames... [2024-07-05 10:30:34,884][19520] Decorrelating experience for 96 frames... [2024-07-05 10:30:34,889][19518] Decorrelating experience for 64 frames... [2024-07-05 10:30:34,993][17621] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 5013504. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:30:35,035][19512] Decorrelating experience for 96 frames... [2024-07-05 10:30:35,040][19517] Decorrelating experience for 64 frames... [2024-07-05 10:30:35,040][19514] Decorrelating experience for 64 frames... [2024-07-05 10:30:35,077][19516] Decorrelating experience for 96 frames... [2024-07-05 10:30:35,079][19515] Decorrelating experience for 64 frames... [2024-07-05 10:30:35,083][19518] Decorrelating experience for 96 frames... [2024-07-05 10:30:35,229][19514] Decorrelating experience for 96 frames... [2024-07-05 10:30:35,267][19515] Decorrelating experience for 96 frames... [2024-07-05 10:30:35,275][19519] Decorrelating experience for 96 frames... [2024-07-05 10:30:35,296][19517] Decorrelating experience for 96 frames... [2024-07-05 10:30:35,962][19499] Signal inference workers to stop experience collection... [2024-07-05 10:30:35,968][19513] InferenceWorker_p0-w0: stopping experience collection [2024-07-05 10:30:39,343][19499] Signal inference workers to resume experience collection... [2024-07-05 10:30:39,343][19513] InferenceWorker_p0-w0: resuming experience collection [2024-07-05 10:30:39,993][17621] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 5017600. Throughput: 0: 594.4. Samples: 2972. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-07-05 10:30:39,994][17621] Avg episode reward: [(0, '2.544')] [2024-07-05 10:30:43,823][19513] Updated weights for policy 0, policy_version 1234 (0.0105) [2024-07-05 10:30:44,993][17621] Fps is (10 sec: 4915.2, 60 sec: 4915.2, 300 sec: 4915.2). Total num frames: 5062656. Throughput: 0: 864.6. Samples: 8646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:30:44,994][17621] Avg episode reward: [(0, '15.281')] [2024-07-05 10:30:48,295][17621] Heartbeat connected on Batcher_0 [2024-07-05 10:30:48,306][17621] Heartbeat connected on RolloutWorker_w0 [2024-07-05 10:30:48,309][17621] Heartbeat connected on RolloutWorker_w1 [2024-07-05 10:30:48,312][17621] Heartbeat connected on RolloutWorker_w2 [2024-07-05 10:30:48,313][17621] Heartbeat connected on InferenceWorker_p0-w0 [2024-07-05 10:30:48,315][17621] Heartbeat connected on RolloutWorker_w3 [2024-07-05 10:30:48,318][17621] Heartbeat connected on RolloutWorker_w4 [2024-07-05 10:30:48,321][17621] Heartbeat connected on RolloutWorker_w5 [2024-07-05 10:30:48,326][17621] Heartbeat connected on RolloutWorker_w6 [2024-07-05 10:30:48,327][17621] Heartbeat connected on RolloutWorker_w7 [2024-07-05 10:30:48,452][17621] Heartbeat connected on LearnerWorker_p0 [2024-07-05 10:30:48,454][19513] Updated weights for policy 0, policy_version 1244 (0.0016) [2024-07-05 10:30:49,993][17621] Fps is (10 sec: 9011.1, 60 sec: 6280.4, 300 sec: 6280.4). Total num frames: 5107712. Throughput: 0: 1459.6. Samples: 21894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:30:49,995][17621] Avg episode reward: [(0, '23.001')] [2024-07-05 10:30:53,121][19513] Updated weights for policy 0, policy_version 1254 (0.0016) [2024-07-05 10:30:54,993][17621] Fps is (10 sec: 9011.0, 60 sec: 6963.1, 300 sec: 6963.1). Total num frames: 5152768. Throughput: 0: 1756.3. Samples: 35126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 10:30:54,994][17621] Avg episode reward: [(0, '27.101')] [2024-07-05 10:30:57,801][19513] Updated weights for policy 0, policy_version 1264 (0.0017) [2024-07-05 10:30:59,993][17621] Fps is (10 sec: 8601.8, 60 sec: 7209.0, 300 sec: 7209.0). Total num frames: 5193728. Throughput: 0: 1662.2. Samples: 41556. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 10:30:59,994][17621] Avg episode reward: [(0, '30.378')] [2024-07-05 10:31:02,475][19513] Updated weights for policy 0, policy_version 1274 (0.0017) [2024-07-05 10:31:04,993][17621] Fps is (10 sec: 8601.8, 60 sec: 7509.4, 300 sec: 7509.4). Total num frames: 5238784. Throughput: 0: 1826.3. Samples: 54788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 10:31:04,994][17621] Avg episode reward: [(0, '29.593')] [2024-07-05 10:31:07,112][19513] Updated weights for policy 0, policy_version 1284 (0.0015) [2024-07-05 10:31:09,993][17621] Fps is (10 sec: 9011.2, 60 sec: 7723.9, 300 sec: 7723.9). Total num frames: 5283840. Throughput: 0: 1941.8. Samples: 67964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:31:09,994][17621] Avg episode reward: [(0, '29.305')] [2024-07-05 10:31:11,805][19513] Updated weights for policy 0, policy_version 1294 (0.0014) [2024-07-05 10:31:14,993][17621] Fps is (10 sec: 8601.5, 60 sec: 7782.4, 300 sec: 7782.4). Total num frames: 5324800. Throughput: 0: 1861.8. Samples: 74474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 10:31:14,994][17621] Avg episode reward: [(0, '29.975')] [2024-07-05 10:31:16,469][19513] Updated weights for policy 0, policy_version 1304 (0.0015) [2024-07-05 10:31:19,993][17621] Fps is (10 sec: 8601.6, 60 sec: 7918.9, 300 sec: 7918.9). Total num frames: 5369856. Throughput: 0: 1947.7. Samples: 87648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 10:31:19,994][17621] Avg episode reward: [(0, '28.994')] [2024-07-05 10:31:21,158][19513] Updated weights for policy 0, policy_version 1314 (0.0016) [2024-07-05 10:31:24,993][17621] Fps is (10 sec: 9011.3, 60 sec: 8028.2, 300 sec: 8028.2). Total num frames: 5414912. Throughput: 0: 2173.0. Samples: 100756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 10:31:24,994][17621] Avg episode reward: [(0, '31.462')] [2024-07-05 10:31:24,997][19499] Saving new best policy, reward=31.462! [2024-07-05 10:31:25,902][19513] Updated weights for policy 0, policy_version 1324 (0.0015) [2024-07-05 10:31:29,993][17621] Fps is (10 sec: 8601.5, 60 sec: 8043.0, 300 sec: 8043.0). Total num frames: 5455872. Throughput: 0: 2189.6. Samples: 107180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 10:31:29,994][17621] Avg episode reward: [(0, '31.718')] [2024-07-05 10:31:30,143][19499] Saving new best policy, reward=31.718! [2024-07-05 10:31:30,634][19513] Updated weights for policy 0, policy_version 1334 (0.0017) [2024-07-05 10:31:34,993][17621] Fps is (10 sec: 8601.5, 60 sec: 8123.7, 300 sec: 8123.7). Total num frames: 5500928. Throughput: 0: 2187.3. Samples: 120324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 10:31:34,994][17621] Avg episode reward: [(0, '32.623')] [2024-07-05 10:31:35,323][19499] Saving new best policy, reward=32.623! [2024-07-05 10:31:35,325][19513] Updated weights for policy 0, policy_version 1344 (0.0016) [2024-07-05 10:31:39,993][17621] Fps is (10 sec: 8601.7, 60 sec: 8738.2, 300 sec: 8129.0). Total num frames: 5541888. Throughput: 0: 2182.2. Samples: 133324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 10:31:39,994][17621] Avg episode reward: [(0, '30.621')] [2024-07-05 10:31:40,092][19513] Updated weights for policy 0, policy_version 1354 (0.0016) [2024-07-05 10:31:44,774][19513] Updated weights for policy 0, policy_version 1364 (0.0015) [2024-07-05 10:31:44,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 8192.0). Total num frames: 5586944. Throughput: 0: 2182.2. Samples: 139754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:31:44,994][17621] Avg episode reward: [(0, '28.763')] [2024-07-05 10:31:49,537][19513] Updated weights for policy 0, policy_version 1374 (0.0015) [2024-07-05 10:31:49,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8192.0). Total num frames: 5627904. Throughput: 0: 2180.1. Samples: 152894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 10:31:49,994][17621] Avg episode reward: [(0, '29.604')] [2024-07-05 10:31:54,525][19513] Updated weights for policy 0, policy_version 1384 (0.0019) [2024-07-05 10:31:54,993][17621] Fps is (10 sec: 8191.9, 60 sec: 8601.6, 300 sec: 8192.0). Total num frames: 5668864. Throughput: 0: 2161.5. Samples: 165230. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:31:54,994][17621] Avg episode reward: [(0, '30.372')] [2024-07-05 10:31:59,509][19513] Updated weights for policy 0, policy_version 1394 (0.0016) [2024-07-05 10:31:59,993][17621] Fps is (10 sec: 8192.0, 60 sec: 8601.6, 300 sec: 8192.0). Total num frames: 5709824. Throughput: 0: 2152.9. Samples: 171352. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:31:59,994][17621] Avg episode reward: [(0, '30.498')] [2024-07-05 10:32:04,383][19513] Updated weights for policy 0, policy_version 1404 (0.0014) [2024-07-05 10:32:04,993][17621] Fps is (10 sec: 8601.7, 60 sec: 8601.6, 300 sec: 8237.5). Total num frames: 5754880. Throughput: 0: 2138.5. Samples: 183882. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:32:04,994][17621] Avg episode reward: [(0, '29.607')] [2024-07-05 10:32:09,137][19513] Updated weights for policy 0, policy_version 1414 (0.0014) [2024-07-05 10:32:09,993][17621] Fps is (10 sec: 8601.5, 60 sec: 8533.3, 300 sec: 8235.1). Total num frames: 5795840. Throughput: 0: 2136.6. Samples: 196904. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:32:09,995][17621] Avg episode reward: [(0, '30.198')] [2024-07-05 10:32:13,865][19513] Updated weights for policy 0, policy_version 1424 (0.0013) [2024-07-05 10:32:14,993][17621] Fps is (10 sec: 8601.5, 60 sec: 8601.6, 300 sec: 8273.9). Total num frames: 5840896. Throughput: 0: 2135.2. Samples: 203266. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:32:14,995][17621] Avg episode reward: [(0, '31.204')] [2024-07-05 10:32:18,562][19513] Updated weights for policy 0, policy_version 1434 (0.0013) [2024-07-05 10:32:19,993][17621] Fps is (10 sec: 9011.4, 60 sec: 8601.6, 300 sec: 8309.0). Total num frames: 5885952. Throughput: 0: 2136.9. Samples: 216482. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:32:19,994][17621] Avg episode reward: [(0, '30.875')] [2024-07-05 10:32:23,238][19513] Updated weights for policy 0, policy_version 1444 (0.0013) [2024-07-05 10:32:24,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8533.3, 300 sec: 8303.7). Total num frames: 5926912. Throughput: 0: 2141.5. Samples: 229692. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:32:24,994][17621] Avg episode reward: [(0, '28.836')] [2024-07-05 10:32:25,079][19499] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001448_5931008.pth... [2024-07-05 10:32:25,174][19499] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001222_5005312.pth [2024-07-05 10:32:27,897][19513] Updated weights for policy 0, policy_version 1454 (0.0014) [2024-07-05 10:32:29,994][17621] Fps is (10 sec: 8601.0, 60 sec: 8601.5, 300 sec: 8334.4). Total num frames: 5971968. Throughput: 0: 2141.4. Samples: 236120. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:32:29,995][17621] Avg episode reward: [(0, '29.113')] [2024-07-05 10:32:32,596][19513] Updated weights for policy 0, policy_version 1464 (0.0013) [2024-07-05 10:32:34,993][17621] Fps is (10 sec: 9011.4, 60 sec: 8601.6, 300 sec: 8362.7). Total num frames: 6017024. Throughput: 0: 2140.8. Samples: 249228. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:32:34,994][17621] Avg episode reward: [(0, '30.351')] [2024-07-05 10:32:37,298][19513] Updated weights for policy 0, policy_version 1474 (0.0013) [2024-07-05 10:32:39,993][17621] Fps is (10 sec: 8602.1, 60 sec: 8601.6, 300 sec: 8355.8). Total num frames: 6057984. Throughput: 0: 2157.4. Samples: 262312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:32:39,994][17621] Avg episode reward: [(0, '29.981')] [2024-07-05 10:32:42,067][19513] Updated weights for policy 0, policy_version 1484 (0.0014) [2024-07-05 10:32:44,999][17621] Fps is (10 sec: 8596.4, 60 sec: 8600.7, 300 sec: 8380.7). Total num frames: 6103040. Throughput: 0: 2162.9. Samples: 268694. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:32:45,012][17621] Avg episode reward: [(0, '28.907')] [2024-07-05 10:32:46,764][19513] Updated weights for policy 0, policy_version 1494 (0.0014) [2024-07-05 10:32:49,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8374.0). Total num frames: 6144000. Throughput: 0: 2176.8. Samples: 281838. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:32:49,994][17621] Avg episode reward: [(0, '26.770')] [2024-07-05 10:32:51,448][19513] Updated weights for policy 0, policy_version 1504 (0.0014) [2024-07-05 10:32:54,993][17621] Fps is (10 sec: 8606.9, 60 sec: 8669.9, 300 sec: 8396.8). Total num frames: 6189056. Throughput: 0: 2172.1. Samples: 294650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 10:32:54,994][17621] Avg episode reward: [(0, '28.208')] [2024-07-05 10:32:56,204][19513] Updated weights for policy 0, policy_version 1514 (0.0014) [2024-07-05 10:32:59,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8389.7). Total num frames: 6230016. Throughput: 0: 2178.7. Samples: 301306. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:32:59,994][17621] Avg episode reward: [(0, '30.529')] [2024-07-05 10:33:00,985][19513] Updated weights for policy 0, policy_version 1524 (0.0013) [2024-07-05 10:33:04,993][17621] Fps is (10 sec: 8601.4, 60 sec: 8669.9, 300 sec: 8410.4). Total num frames: 6275072. Throughput: 0: 2166.9. Samples: 313994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:33:04,994][17621] Avg episode reward: [(0, '32.731')] [2024-07-05 10:33:05,271][19499] Saving new best policy, reward=32.731! [2024-07-05 10:33:05,767][19513] Updated weights for policy 0, policy_version 1534 (0.0014) [2024-07-05 10:33:09,993][17621] Fps is (10 sec: 9011.3, 60 sec: 8738.2, 300 sec: 8429.8). Total num frames: 6320128. Throughput: 0: 2164.4. Samples: 327090. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:33:09,994][17621] Avg episode reward: [(0, '32.906')] [2024-07-05 10:33:09,995][19499] Saving new best policy, reward=32.906! [2024-07-05 10:33:10,511][19513] Updated weights for policy 0, policy_version 1544 (0.0014) [2024-07-05 10:33:14,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8422.4). Total num frames: 6361088. Throughput: 0: 2163.0. Samples: 333454. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:33:14,995][17621] Avg episode reward: [(0, '30.491')] [2024-07-05 10:33:15,228][19513] Updated weights for policy 0, policy_version 1554 (0.0013) [2024-07-05 10:33:19,993][17621] Fps is (10 sec: 8191.9, 60 sec: 8601.6, 300 sec: 8415.4). Total num frames: 6402048. Throughput: 0: 2161.9. Samples: 346514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:33:19,994][17621] Avg episode reward: [(0, '28.422')] [2024-07-05 10:33:19,995][19513] Updated weights for policy 0, policy_version 1564 (0.0014) [2024-07-05 10:33:24,760][19513] Updated weights for policy 0, policy_version 1574 (0.0014) [2024-07-05 10:33:24,993][17621] Fps is (10 sec: 8601.5, 60 sec: 8669.9, 300 sec: 8432.9). Total num frames: 6447104. Throughput: 0: 2153.1. Samples: 359202. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:33:24,994][17621] Avg episode reward: [(0, '29.413')] [2024-07-05 10:33:29,580][19513] Updated weights for policy 0, policy_version 1584 (0.0014) [2024-07-05 10:33:29,993][17621] Fps is (10 sec: 8601.7, 60 sec: 8601.7, 300 sec: 8426.1). Total num frames: 6488064. Throughput: 0: 2158.1. Samples: 365796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:33:29,994][17621] Avg episode reward: [(0, '32.376')] [2024-07-05 10:33:34,410][19513] Updated weights for policy 0, policy_version 1594 (0.0014) [2024-07-05 10:33:34,995][17621] Fps is (10 sec: 8600.3, 60 sec: 8601.4, 300 sec: 8442.2). Total num frames: 6533120. Throughput: 0: 2145.7. Samples: 378398. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:33:35,008][17621] Avg episode reward: [(0, '31.715')] [2024-07-05 10:33:39,199][19513] Updated weights for policy 0, policy_version 1604 (0.0014) [2024-07-05 10:33:39,993][17621] Fps is (10 sec: 8601.5, 60 sec: 8601.6, 300 sec: 8435.5). Total num frames: 6574080. Throughput: 0: 2146.4. Samples: 391236. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:33:39,994][17621] Avg episode reward: [(0, '31.934')] [2024-07-05 10:33:43,972][19513] Updated weights for policy 0, policy_version 1614 (0.0014) [2024-07-05 10:33:44,993][17621] Fps is (10 sec: 8603.0, 60 sec: 8602.5, 300 sec: 8450.7). Total num frames: 6619136. Throughput: 0: 2141.7. Samples: 397684. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:33:44,994][17621] Avg episode reward: [(0, '29.500')] [2024-07-05 10:33:48,764][19513] Updated weights for policy 0, policy_version 1624 (0.0014) [2024-07-05 10:33:49,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8444.1). Total num frames: 6660096. Throughput: 0: 2142.5. Samples: 410406. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:33:49,994][17621] Avg episode reward: [(0, '31.157')] [2024-07-05 10:33:53,594][19513] Updated weights for policy 0, policy_version 1634 (0.0014) [2024-07-05 10:33:54,993][17621] Fps is (10 sec: 8192.0, 60 sec: 8533.3, 300 sec: 8437.8). Total num frames: 6701056. Throughput: 0: 2138.0. Samples: 423302. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:33:54,994][17621] Avg episode reward: [(0, '30.499')] [2024-07-05 10:33:58,371][19513] Updated weights for policy 0, policy_version 1644 (0.0014) [2024-07-05 10:33:59,993][17621] Fps is (10 sec: 8601.7, 60 sec: 8601.6, 300 sec: 8451.7). Total num frames: 6746112. Throughput: 0: 2137.7. Samples: 429652. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:33:59,995][17621] Avg episode reward: [(0, '29.661')] [2024-07-05 10:34:03,156][19513] Updated weights for policy 0, policy_version 1654 (0.0014) [2024-07-05 10:34:04,993][17621] Fps is (10 sec: 8601.5, 60 sec: 8533.3, 300 sec: 8445.6). Total num frames: 6787072. Throughput: 0: 2136.5. Samples: 442656. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:34:04,994][17621] Avg episode reward: [(0, '27.723')] [2024-07-05 10:34:07,927][19513] Updated weights for policy 0, policy_version 1664 (0.0014) [2024-07-05 10:34:09,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8533.3, 300 sec: 8458.7). Total num frames: 6832128. Throughput: 0: 2134.1. Samples: 455238. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:34:09,994][17621] Avg episode reward: [(0, '26.168')] [2024-07-05 10:34:12,829][19513] Updated weights for policy 0, policy_version 1674 (0.0016) [2024-07-05 10:34:14,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8533.3, 300 sec: 8452.7). Total num frames: 6873088. Throughput: 0: 2127.0. Samples: 461512. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:34:14,994][17621] Avg episode reward: [(0, '25.991')] [2024-07-05 10:34:17,664][19513] Updated weights for policy 0, policy_version 1684 (0.0015) [2024-07-05 10:34:19,993][17621] Fps is (10 sec: 8192.0, 60 sec: 8533.3, 300 sec: 8446.9). Total num frames: 6914048. Throughput: 0: 2133.5. Samples: 474404. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:34:19,994][17621] Avg episode reward: [(0, '26.754')] [2024-07-05 10:34:22,498][19513] Updated weights for policy 0, policy_version 1694 (0.0014) [2024-07-05 10:34:24,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8533.3, 300 sec: 8459.1). Total num frames: 6959104. Throughput: 0: 2128.0. Samples: 486998. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:34:24,994][17621] Avg episode reward: [(0, '26.880')] [2024-07-05 10:34:24,998][19499] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001699_6959104.pth... [2024-07-05 10:34:25,106][19499] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001224_5013504.pth [2024-07-05 10:34:27,279][19513] Updated weights for policy 0, policy_version 1704 (0.0013) [2024-07-05 10:34:29,993][17621] Fps is (10 sec: 8601.5, 60 sec: 8533.3, 300 sec: 8453.4). Total num frames: 7000064. Throughput: 0: 2127.3. Samples: 493412. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-07-05 10:34:29,995][17621] Avg episode reward: [(0, '27.843')] [2024-07-05 10:34:32,068][19513] Updated weights for policy 0, policy_version 1714 (0.0014) [2024-07-05 10:34:34,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8533.6, 300 sec: 8465.1). Total num frames: 7045120. Throughput: 0: 2130.5. Samples: 506280. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:34:34,994][17621] Avg episode reward: [(0, '29.866')] [2024-07-05 10:34:36,911][19513] Updated weights for policy 0, policy_version 1724 (0.0015) [2024-07-05 10:34:39,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8533.3, 300 sec: 8459.5). Total num frames: 7086080. Throughput: 0: 2125.0. Samples: 518928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:34:39,995][17621] Avg episode reward: [(0, '30.013')] [2024-07-05 10:34:41,714][19513] Updated weights for policy 0, policy_version 1734 (0.0015) [2024-07-05 10:34:44,993][17621] Fps is (10 sec: 8191.9, 60 sec: 8465.1, 300 sec: 8454.1). Total num frames: 7127040. Throughput: 0: 2128.1. Samples: 525418. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:34:44,994][17621] Avg episode reward: [(0, '29.615')] [2024-07-05 10:34:46,493][19513] Updated weights for policy 0, policy_version 1744 (0.0014) [2024-07-05 10:34:49,993][17621] Fps is (10 sec: 8601.5, 60 sec: 8533.3, 300 sec: 8465.1). Total num frames: 7172096. Throughput: 0: 2124.4. Samples: 538256. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:34:49,995][17621] Avg episode reward: [(0, '28.615')] [2024-07-05 10:34:51,284][19513] Updated weights for policy 0, policy_version 1754 (0.0014) [2024-07-05 10:34:54,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8533.3, 300 sec: 8459.8). Total num frames: 7213056. Throughput: 0: 2135.1. Samples: 551318. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 10:34:54,994][17621] Avg episode reward: [(0, '26.984')] [2024-07-05 10:34:55,984][19513] Updated weights for policy 0, policy_version 1764 (0.0013) [2024-07-05 10:34:59,993][17621] Fps is (10 sec: 8601.8, 60 sec: 8533.3, 300 sec: 8470.2). Total num frames: 7258112. Throughput: 0: 2138.5. Samples: 557744. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 10:34:59,994][17621] Avg episode reward: [(0, '27.164')] [2024-07-05 10:35:00,687][19513] Updated weights for policy 0, policy_version 1774 (0.0013) [2024-07-05 10:35:04,993][17621] Fps is (10 sec: 9011.3, 60 sec: 8601.6, 300 sec: 8480.2). Total num frames: 7303168. Throughput: 0: 2143.7. Samples: 570872. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 10:35:04,994][17621] Avg episode reward: [(0, '29.269')] [2024-07-05 10:35:05,397][19513] Updated weights for policy 0, policy_version 1784 (0.0013) [2024-07-05 10:35:09,993][17621] Fps is (10 sec: 8601.5, 60 sec: 8533.3, 300 sec: 8475.0). Total num frames: 7344128. Throughput: 0: 2157.2. Samples: 584070. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 10:35:09,994][17621] Avg episode reward: [(0, '28.341')] [2024-07-05 10:35:10,078][19513] Updated weights for policy 0, policy_version 1794 (0.0013) [2024-07-05 10:35:14,767][19513] Updated weights for policy 0, policy_version 1804 (0.0013) [2024-07-05 10:35:14,993][17621] Fps is (10 sec: 8601.7, 60 sec: 8601.6, 300 sec: 8484.6). Total num frames: 7389184. Throughput: 0: 2157.9. Samples: 590518. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 10:35:14,994][17621] Avg episode reward: [(0, '27.120')] [2024-07-05 10:35:19,464][19513] Updated weights for policy 0, policy_version 1814 (0.0013) [2024-07-05 10:35:19,993][17621] Fps is (10 sec: 9011.3, 60 sec: 8669.9, 300 sec: 8493.8). Total num frames: 7434240. Throughput: 0: 2162.9. Samples: 603612. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 10:35:19,994][17621] Avg episode reward: [(0, '28.006')] [2024-07-05 10:35:24,156][19513] Updated weights for policy 0, policy_version 1824 (0.0013) [2024-07-05 10:35:24,993][17621] Fps is (10 sec: 8601.4, 60 sec: 8601.6, 300 sec: 8488.6). Total num frames: 7475200. Throughput: 0: 2174.5. Samples: 616782. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 10:35:24,995][17621] Avg episode reward: [(0, '29.148')] [2024-07-05 10:35:28,834][19513] Updated weights for policy 0, policy_version 1834 (0.0013) [2024-07-05 10:35:29,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8497.5). Total num frames: 7520256. Throughput: 0: 2174.8. Samples: 623286. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:35:29,994][17621] Avg episode reward: [(0, '28.570')] [2024-07-05 10:35:33,524][19513] Updated weights for policy 0, policy_version 1844 (0.0013) [2024-07-05 10:35:34,993][17621] Fps is (10 sec: 9011.3, 60 sec: 8669.9, 300 sec: 8636.3). Total num frames: 7565312. Throughput: 0: 2180.0. Samples: 636354. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:35:34,994][17621] Avg episode reward: [(0, '29.473')] [2024-07-05 10:35:38,228][19513] Updated weights for policy 0, policy_version 1854 (0.0014) [2024-07-05 10:35:39,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8622.4). Total num frames: 7606272. Throughput: 0: 2182.6. Samples: 649534. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:35:39,995][17621] Avg episode reward: [(0, '30.169')] [2024-07-05 10:35:42,886][19513] Updated weights for policy 0, policy_version 1864 (0.0013) [2024-07-05 10:35:44,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 8622.4). Total num frames: 7651328. Throughput: 0: 2183.5. Samples: 656000. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 10:35:44,994][17621] Avg episode reward: [(0, '29.035')] [2024-07-05 10:35:47,559][19513] Updated weights for policy 0, policy_version 1874 (0.0013) [2024-07-05 10:35:50,001][17621] Fps is (10 sec: 9005.6, 60 sec: 8737.3, 300 sec: 8622.3). Total num frames: 7696384. Throughput: 0: 2184.6. Samples: 669194. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:35:50,004][17621] Avg episode reward: [(0, '29.115')] [2024-07-05 10:35:52,205][19513] Updated weights for policy 0, policy_version 1884 (0.0013) [2024-07-05 10:35:54,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 8622.4). Total num frames: 7737344. Throughput: 0: 2186.0. Samples: 682438. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:35:54,994][17621] Avg episode reward: [(0, '32.234')] [2024-07-05 10:35:56,883][19513] Updated weights for policy 0, policy_version 1894 (0.0013) [2024-07-05 10:35:59,993][17621] Fps is (10 sec: 8606.8, 60 sec: 8738.1, 300 sec: 8622.4). Total num frames: 7782400. Throughput: 0: 2186.0. Samples: 688890. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:35:59,995][17621] Avg episode reward: [(0, '33.914')] [2024-07-05 10:36:00,163][19499] Saving new best policy, reward=33.914! [2024-07-05 10:36:01,551][19513] Updated weights for policy 0, policy_version 1904 (0.0013) [2024-07-05 10:36:04,993][17621] Fps is (10 sec: 9011.3, 60 sec: 8738.1, 300 sec: 8622.4). Total num frames: 7827456. Throughput: 0: 2189.2. Samples: 702128. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:36:04,994][17621] Avg episode reward: [(0, '34.746')] [2024-07-05 10:36:05,246][19499] Saving new best policy, reward=34.746! [2024-07-05 10:36:06,180][19513] Updated weights for policy 0, policy_version 1914 (0.0013) [2024-07-05 10:36:09,993][17621] Fps is (10 sec: 9011.3, 60 sec: 8806.4, 300 sec: 8636.3). Total num frames: 7872512. Throughput: 0: 2187.1. Samples: 715202. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:36:09,994][17621] Avg episode reward: [(0, '30.502')] [2024-07-05 10:36:10,963][19513] Updated weights for policy 0, policy_version 1924 (0.0014) [2024-07-05 10:36:14,993][17621] Fps is (10 sec: 8601.5, 60 sec: 8738.1, 300 sec: 8622.4). Total num frames: 7913472. Throughput: 0: 2184.5. Samples: 721588. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 10:36:14,994][17621] Avg episode reward: [(0, '28.684')] [2024-07-05 10:36:15,732][19513] Updated weights for policy 0, policy_version 1934 (0.0014) [2024-07-05 10:36:19,993][17621] Fps is (10 sec: 8192.1, 60 sec: 8669.9, 300 sec: 8608.5). Total num frames: 7954432. Throughput: 0: 2183.2. Samples: 734598. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:36:19,994][17621] Avg episode reward: [(0, '30.659')] [2024-07-05 10:36:20,477][19513] Updated weights for policy 0, policy_version 1944 (0.0013) [2024-07-05 10:36:24,993][17621] Fps is (10 sec: 8601.7, 60 sec: 8738.2, 300 sec: 8622.4). Total num frames: 7999488. Throughput: 0: 2176.8. Samples: 747488. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:36:24,994][17621] Avg episode reward: [(0, '31.186')] [2024-07-05 10:36:25,181][19499] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001954_8003584.pth... [2024-07-05 10:36:25,182][19513] Updated weights for policy 0, policy_version 1954 (0.0013) [2024-07-05 10:36:25,284][19499] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001448_5931008.pth [2024-07-05 10:36:29,957][19513] Updated weights for policy 0, policy_version 1964 (0.0014) [2024-07-05 10:36:29,993][17621] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8622.4). Total num frames: 8044544. Throughput: 0: 2178.8. Samples: 754044. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:36:29,994][17621] Avg episode reward: [(0, '32.391')] [2024-07-05 10:36:34,661][19513] Updated weights for policy 0, policy_version 1974 (0.0014) [2024-07-05 10:36:34,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8622.4). Total num frames: 8085504. Throughput: 0: 2172.8. Samples: 766958. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:36:34,994][17621] Avg episode reward: [(0, '31.105')] [2024-07-05 10:36:39,345][19513] Updated weights for policy 0, policy_version 1984 (0.0014) [2024-07-05 10:36:39,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 8622.4). Total num frames: 8130560. Throughput: 0: 2167.5. Samples: 779974. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:36:39,994][17621] Avg episode reward: [(0, '30.532')] [2024-07-05 10:36:44,137][19513] Updated weights for policy 0, policy_version 1994 (0.0014) [2024-07-05 10:36:44,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8622.4). Total num frames: 8171520. Throughput: 0: 2165.6. Samples: 786344. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:36:44,994][17621] Avg episode reward: [(0, '29.444')] [2024-07-05 10:36:48,971][19513] Updated weights for policy 0, policy_version 2004 (0.0014) [2024-07-05 10:36:49,993][17621] Fps is (10 sec: 8601.5, 60 sec: 8670.7, 300 sec: 8636.3). Total num frames: 8216576. Throughput: 0: 2158.1. Samples: 799242. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:36:49,994][17621] Avg episode reward: [(0, '30.250')] [2024-07-05 10:36:53,647][19513] Updated weights for policy 0, policy_version 2014 (0.0013) [2024-07-05 10:36:54,993][17621] Fps is (10 sec: 8601.5, 60 sec: 8669.9, 300 sec: 8636.3). Total num frames: 8257536. Throughput: 0: 2158.6. Samples: 812340. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:36:54,994][17621] Avg episode reward: [(0, '31.023')] [2024-07-05 10:36:58,327][19513] Updated weights for policy 0, policy_version 2024 (0.0013) [2024-07-05 10:36:59,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8636.3). Total num frames: 8302592. Throughput: 0: 2160.8. Samples: 818822. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:36:59,994][17621] Avg episode reward: [(0, '31.928')] [2024-07-05 10:37:02,975][19513] Updated weights for policy 0, policy_version 2034 (0.0013) [2024-07-05 10:37:04,993][17621] Fps is (10 sec: 9011.2, 60 sec: 8669.9, 300 sec: 8650.2). Total num frames: 8347648. Throughput: 0: 2165.2. Samples: 832034. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:37:04,994][17621] Avg episode reward: [(0, '32.171')] [2024-07-05 10:37:07,648][19513] Updated weights for policy 0, policy_version 2044 (0.0013) [2024-07-05 10:37:09,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8636.3). Total num frames: 8388608. Throughput: 0: 2171.8. Samples: 845220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:37:09,994][17621] Avg episode reward: [(0, '31.608')] [2024-07-05 10:37:12,349][19513] Updated weights for policy 0, policy_version 2054 (0.0013) [2024-07-05 10:37:14,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8636.3). Total num frames: 8433664. Throughput: 0: 2169.5. Samples: 851672. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:37:14,994][17621] Avg episode reward: [(0, '31.721')] [2024-07-05 10:37:17,024][19513] Updated weights for policy 0, policy_version 2064 (0.0013) [2024-07-05 10:37:19,993][17621] Fps is (10 sec: 9011.3, 60 sec: 8738.1, 300 sec: 8650.2). Total num frames: 8478720. Throughput: 0: 2175.7. Samples: 864866. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:37:19,994][17621] Avg episode reward: [(0, '30.887')] [2024-07-05 10:37:21,672][19513] Updated weights for policy 0, policy_version 2074 (0.0013) [2024-07-05 10:37:24,993][17621] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8650.2). Total num frames: 8523776. Throughput: 0: 2179.2. Samples: 878040. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:37:24,994][17621] Avg episode reward: [(0, '29.373')] [2024-07-05 10:37:26,364][19513] Updated weights for policy 0, policy_version 2084 (0.0013) [2024-07-05 10:37:29,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8636.3). Total num frames: 8564736. Throughput: 0: 2180.5. Samples: 884468. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:37:29,994][17621] Avg episode reward: [(0, '28.091')] [2024-07-05 10:37:31,062][19513] Updated weights for policy 0, policy_version 2094 (0.0013) [2024-07-05 10:37:34,993][17621] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 8650.2). Total num frames: 8609792. Throughput: 0: 2188.0. Samples: 897700. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 10:37:34,994][17621] Avg episode reward: [(0, '28.210')] [2024-07-05 10:37:35,402][19513] Updated weights for policy 0, policy_version 2104 (0.0013) [2024-07-05 10:37:38,916][19513] Updated weights for policy 0, policy_version 2114 (0.0011) [2024-07-05 10:37:39,993][17621] Fps is (10 sec: 10649.6, 60 sec: 9011.2, 300 sec: 8705.9). Total num frames: 8671232. Throughput: 0: 2273.9. Samples: 914664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:37:39,994][17621] Avg episode reward: [(0, '29.955')] [2024-07-05 10:37:42,437][19513] Updated weights for policy 0, policy_version 2124 (0.0012) [2024-07-05 10:37:44,993][17621] Fps is (10 sec: 11878.4, 60 sec: 9284.3, 300 sec: 8761.3). Total num frames: 8728576. Throughput: 0: 2318.6. Samples: 923160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:37:44,994][17621] Avg episode reward: [(0, '32.427')] [2024-07-05 10:37:45,930][19513] Updated weights for policy 0, policy_version 2134 (0.0011) [2024-07-05 10:37:49,470][19513] Updated weights for policy 0, policy_version 2144 (0.0012) [2024-07-05 10:37:49,993][17621] Fps is (10 sec: 11468.8, 60 sec: 9489.1, 300 sec: 8802.9). Total num frames: 8785920. Throughput: 0: 2412.4. Samples: 940590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:37:49,994][17621] Avg episode reward: [(0, '31.619')] [2024-07-05 10:37:52,980][19513] Updated weights for policy 0, policy_version 2154 (0.0012) [2024-07-05 10:37:54,993][17621] Fps is (10 sec: 11468.8, 60 sec: 9762.2, 300 sec: 8858.5). Total num frames: 8843264. Throughput: 0: 2506.8. Samples: 958026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:37:54,994][17621] Avg episode reward: [(0, '31.156')] [2024-07-05 10:37:56,509][19513] Updated weights for policy 0, policy_version 2164 (0.0012) [2024-07-05 10:37:59,993][17621] Fps is (10 sec: 11468.8, 60 sec: 9967.0, 300 sec: 8900.1). Total num frames: 8900608. Throughput: 0: 2562.2. Samples: 966972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 10:37:59,994][17621] Avg episode reward: [(0, '30.361')] [2024-07-05 10:38:00,055][19513] Updated weights for policy 0, policy_version 2174 (0.0012) [2024-07-05 10:38:03,623][19513] Updated weights for policy 0, policy_version 2184 (0.0012) [2024-07-05 10:38:04,993][17621] Fps is (10 sec: 11468.6, 60 sec: 10171.7, 300 sec: 8941.8). Total num frames: 8957952. Throughput: 0: 2652.3. Samples: 984220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:38:04,994][17621] Avg episode reward: [(0, '30.532')] [2024-07-05 10:38:07,205][19513] Updated weights for policy 0, policy_version 2194 (0.0012) [2024-07-05 10:38:09,993][17621] Fps is (10 sec: 11468.7, 60 sec: 10444.8, 300 sec: 8997.3). Total num frames: 9015296. Throughput: 0: 2741.5. Samples: 1001408. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:38:09,994][17621] Avg episode reward: [(0, '32.016')] [2024-07-05 10:38:10,731][19513] Updated weights for policy 0, policy_version 2204 (0.0012) [2024-07-05 10:38:14,241][19513] Updated weights for policy 0, policy_version 2214 (0.0011) [2024-07-05 10:38:14,993][17621] Fps is (10 sec: 11878.6, 60 sec: 10717.9, 300 sec: 9066.7). Total num frames: 9076736. Throughput: 0: 2792.2. Samples: 1010116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:38:14,994][17621] Avg episode reward: [(0, '32.414')] [2024-07-05 10:38:17,747][19513] Updated weights for policy 0, policy_version 2224 (0.0012) [2024-07-05 10:38:19,993][17621] Fps is (10 sec: 11878.3, 60 sec: 10922.6, 300 sec: 9108.4). Total num frames: 9134080. Throughput: 0: 2886.1. Samples: 1027576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:38:19,995][17621] Avg episode reward: [(0, '32.555')] [2024-07-05 10:38:21,268][19513] Updated weights for policy 0, policy_version 2234 (0.0012) [2024-07-05 10:38:24,784][19513] Updated weights for policy 0, policy_version 2244 (0.0012) [2024-07-05 10:38:24,993][17621] Fps is (10 sec: 11468.9, 60 sec: 11127.5, 300 sec: 9163.9). Total num frames: 9191424. Throughput: 0: 2897.4. Samples: 1045048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:38:24,994][17621] Avg episode reward: [(0, '31.650')] [2024-07-05 10:38:25,138][19499] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000002245_9195520.pth... [2024-07-05 10:38:25,213][19499] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001699_6959104.pth [2024-07-05 10:38:28,339][19513] Updated weights for policy 0, policy_version 2254 (0.0012) [2024-07-05 10:38:29,993][17621] Fps is (10 sec: 11468.8, 60 sec: 11400.5, 300 sec: 9205.6). Total num frames: 9248768. Throughput: 0: 2905.4. Samples: 1053904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:38:29,995][17621] Avg episode reward: [(0, '30.924')] [2024-07-05 10:38:31,876][19513] Updated weights for policy 0, policy_version 2264 (0.0012) [2024-07-05 10:38:34,993][17621] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 9261.1). Total num frames: 9306112. Throughput: 0: 2897.9. Samples: 1070994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:38:34,994][17621] Avg episode reward: [(0, '29.626')] [2024-07-05 10:38:35,538][19513] Updated weights for policy 0, policy_version 2274 (0.0012) [2024-07-05 10:38:39,212][19513] Updated weights for policy 0, policy_version 2284 (0.0013) [2024-07-05 10:38:39,993][17621] Fps is (10 sec: 11468.9, 60 sec: 11537.1, 300 sec: 9302.8). Total num frames: 9363456. Throughput: 0: 2882.4. Samples: 1087732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:38:39,994][17621] Avg episode reward: [(0, '30.748')] [2024-07-05 10:38:42,884][19513] Updated weights for policy 0, policy_version 2294 (0.0013) [2024-07-05 10:38:44,993][17621] Fps is (10 sec: 11059.2, 60 sec: 11468.8, 300 sec: 9344.4). Total num frames: 9416704. Throughput: 0: 2868.5. Samples: 1096054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:38:44,994][17621] Avg episode reward: [(0, '30.291')] [2024-07-05 10:38:46,573][19513] Updated weights for policy 0, policy_version 2304 (0.0012) [2024-07-05 10:38:49,993][17621] Fps is (10 sec: 11059.3, 60 sec: 11468.8, 300 sec: 9400.0). Total num frames: 9474048. Throughput: 0: 2853.2. Samples: 1112612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:38:49,994][17621] Avg episode reward: [(0, '34.336')] [2024-07-05 10:38:50,237][19513] Updated weights for policy 0, policy_version 2314 (0.0012) [2024-07-05 10:38:53,885][19513] Updated weights for policy 0, policy_version 2324 (0.0013) [2024-07-05 10:38:54,993][17621] Fps is (10 sec: 11468.8, 60 sec: 11468.8, 300 sec: 9441.6). Total num frames: 9531392. Throughput: 0: 2850.1. Samples: 1129664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:38:54,994][17621] Avg episode reward: [(0, '31.355')] [2024-07-05 10:38:57,566][19513] Updated weights for policy 0, policy_version 2334 (0.0012) [2024-07-05 10:38:59,993][17621] Fps is (10 sec: 11059.1, 60 sec: 11400.5, 300 sec: 9483.3). Total num frames: 9584640. Throughput: 0: 2840.4. Samples: 1137934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:38:59,994][17621] Avg episode reward: [(0, '33.801')] [2024-07-05 10:39:01,230][19513] Updated weights for policy 0, policy_version 2344 (0.0013) [2024-07-05 10:39:04,901][19513] Updated weights for policy 0, policy_version 2354 (0.0012) [2024-07-05 10:39:04,993][17621] Fps is (10 sec: 11059.1, 60 sec: 11400.5, 300 sec: 9524.9). Total num frames: 9641984. Throughput: 0: 2821.2. Samples: 1154532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:39:04,994][17621] Avg episode reward: [(0, '31.694')] [2024-07-05 10:39:08,588][19513] Updated weights for policy 0, policy_version 2364 (0.0013) [2024-07-05 10:39:09,993][17621] Fps is (10 sec: 11059.3, 60 sec: 11332.3, 300 sec: 9566.6). Total num frames: 9695232. Throughput: 0: 2805.2. Samples: 1171280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:39:09,994][17621] Avg episode reward: [(0, '34.175')] [2024-07-05 10:39:12,262][19513] Updated weights for policy 0, policy_version 2374 (0.0012) [2024-07-05 10:39:14,993][17621] Fps is (10 sec: 11059.3, 60 sec: 11264.0, 300 sec: 9622.1). Total num frames: 9752576. Throughput: 0: 2797.8. Samples: 1179804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:39:14,994][17621] Avg episode reward: [(0, '34.321')] [2024-07-05 10:39:15,931][19513] Updated weights for policy 0, policy_version 2384 (0.0012) [2024-07-05 10:39:19,581][19513] Updated weights for policy 0, policy_version 2394 (0.0013) [2024-07-05 10:39:19,993][17621] Fps is (10 sec: 11468.9, 60 sec: 11264.0, 300 sec: 9663.8). Total num frames: 9809920. Throughput: 0: 2787.5. Samples: 1196430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:39:19,994][17621] Avg episode reward: [(0, '31.875')] [2024-07-05 10:39:23,240][19513] Updated weights for policy 0, policy_version 2404 (0.0013) [2024-07-05 10:39:24,993][17621] Fps is (10 sec: 11059.2, 60 sec: 11195.7, 300 sec: 9705.4). Total num frames: 9863168. Throughput: 0: 2787.5. Samples: 1213168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:39:24,994][17621] Avg episode reward: [(0, '31.454')] [2024-07-05 10:39:26,939][19513] Updated weights for policy 0, policy_version 2414 (0.0012) [2024-07-05 10:39:29,993][17621] Fps is (10 sec: 11059.0, 60 sec: 11195.7, 300 sec: 9747.1). Total num frames: 9920512. Throughput: 0: 2790.3. Samples: 1221618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:39:29,994][17621] Avg episode reward: [(0, '32.980')] [2024-07-05 10:39:30,609][19513] Updated weights for policy 0, policy_version 2424 (0.0012) [2024-07-05 10:39:34,284][19513] Updated weights for policy 0, policy_version 2434 (0.0012) [2024-07-05 10:39:34,993][17621] Fps is (10 sec: 11059.3, 60 sec: 11127.5, 300 sec: 9788.8). Total num frames: 9973760. Throughput: 0: 2794.3. Samples: 1238354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:39:34,994][17621] Avg episode reward: [(0, '32.667')] [2024-07-05 10:39:37,586][19499] Stopping Batcher_0... [2024-07-05 10:39:37,586][19499] Loop batcher_evt_loop terminating... [2024-07-05 10:39:37,586][17621] Component Batcher_0 stopped! [2024-07-05 10:39:37,587][19499] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-07-05 10:39:37,595][19518] Stopping RolloutWorker_w5... [2024-07-05 10:39:37,595][19520] Stopping RolloutWorker_w7... [2024-07-05 10:39:37,596][19516] Stopping RolloutWorker_w2... [2024-07-05 10:39:37,596][19514] Stopping RolloutWorker_w1... [2024-07-05 10:39:37,596][19517] Stopping RolloutWorker_w4... [2024-07-05 10:39:37,596][19518] Loop rollout_proc5_evt_loop terminating... [2024-07-05 10:39:37,596][19520] Loop rollout_proc7_evt_loop terminating... [2024-07-05 10:39:37,596][19514] Loop rollout_proc1_evt_loop terminating... [2024-07-05 10:39:37,596][19516] Loop rollout_proc2_evt_loop terminating... [2024-07-05 10:39:37,596][19517] Loop rollout_proc4_evt_loop terminating... [2024-07-05 10:39:37,596][19515] Stopping RolloutWorker_w3... [2024-07-05 10:39:37,596][19512] Stopping RolloutWorker_w0... [2024-07-05 10:39:37,596][19519] Stopping RolloutWorker_w6... [2024-07-05 10:39:37,596][17621] Component RolloutWorker_w5 stopped! [2024-07-05 10:39:37,597][19512] Loop rollout_proc0_evt_loop terminating... [2024-07-05 10:39:37,597][19515] Loop rollout_proc3_evt_loop terminating... [2024-07-05 10:39:37,597][19519] Loop rollout_proc6_evt_loop terminating... [2024-07-05 10:39:37,597][17621] Component RolloutWorker_w7 stopped! [2024-07-05 10:39:37,598][17621] Component RolloutWorker_w2 stopped! [2024-07-05 10:39:37,599][17621] Component RolloutWorker_w1 stopped! [2024-07-05 10:39:37,600][17621] Component RolloutWorker_w4 stopped! [2024-07-05 10:39:37,601][17621] Component RolloutWorker_w3 stopped! [2024-07-05 10:39:37,605][17621] Component RolloutWorker_w0 stopped! [2024-07-05 10:39:37,606][17621] Component RolloutWorker_w6 stopped! [2024-07-05 10:39:37,619][19513] Weights refcount: 2 0 [2024-07-05 10:39:37,622][19513] Stopping InferenceWorker_p0-w0... [2024-07-05 10:39:37,622][19513] Loop inference_proc0-0_evt_loop terminating... [2024-07-05 10:39:37,622][17621] Component InferenceWorker_p0-w0 stopped! [2024-07-05 10:39:37,689][19499] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000001954_8003584.pth [2024-07-05 10:39:37,703][19499] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-07-05 10:39:37,832][19499] Stopping LearnerWorker_p0... [2024-07-05 10:39:37,832][19499] Loop learner_proc0_evt_loop terminating... [2024-07-05 10:39:37,832][17621] Component LearnerWorker_p0 stopped! [2024-07-05 10:39:37,833][17621] Waiting for process learner_proc0 to stop... [2024-07-05 10:39:38,874][17621] Waiting for process inference_proc0-0 to join... [2024-07-05 10:39:38,875][17621] Waiting for process rollout_proc0 to join... [2024-07-05 10:39:38,876][17621] Waiting for process rollout_proc1 to join... [2024-07-05 10:39:38,877][17621] Waiting for process rollout_proc2 to join... [2024-07-05 10:39:38,877][17621] Waiting for process rollout_proc3 to join... [2024-07-05 10:39:38,878][17621] Waiting for process rollout_proc4 to join... [2024-07-05 10:39:38,878][17621] Waiting for process rollout_proc5 to join... [2024-07-05 10:39:38,879][17621] Waiting for process rollout_proc6 to join... [2024-07-05 10:39:38,879][17621] Waiting for process rollout_proc7 to join... [2024-07-05 10:39:38,880][17621] Batcher 0 profile tree view: batching: 7.8226, releasing_batches: 0.0301 [2024-07-05 10:39:38,880][17621] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 3.3309 update_model: 3.8678 weight_update: 0.0012 one_step: 0.0029 handle_policy_step: 523.9220 deserialize: 7.3432, stack: 1.1157, obs_to_device_normalize: 87.8864, forward: 310.3393, send_messages: 10.3367 prepare_outputs: 98.4612 to_cpu: 88.9070 [2024-07-05 10:39:38,881][17621] Learner 0 profile tree view: misc: 0.0053, prepare_batch: 18.7216 train: 415.7077 epoch_init: 0.0536, minibatch_init: 0.0575, losses_postprocess: 0.4229, kl_divergence: 0.2211, after_optimizer: 1.9888 calculate_losses: 139.0886 losses_init: 0.0246, forward_head: 12.1004, bptt_initial: 123.7953, tail: 0.5311, advantages_returns: 0.1292, losses: 1.3283 bptt: 0.8529 bptt_forward_core: 0.8145 update: 272.4793 clip: 1.0972 [2024-07-05 10:39:38,881][17621] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.1033, enqueue_policy_requests: 7.1716, env_step: 106.9268, overhead: 9.2646, complete_rollouts: 0.1950 save_policy_outputs: 9.6318 split_output_tensors: 4.4998 [2024-07-05 10:39:38,881][17621] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.1014, enqueue_policy_requests: 7.3546, env_step: 106.6730, overhead: 9.3844, complete_rollouts: 0.1968 save_policy_outputs: 9.7707 split_output_tensors: 4.5739 [2024-07-05 10:39:38,882][17621] Loop Runner_EvtLoop terminating... [2024-07-05 10:39:38,882][17621] Runner profile tree view: main_loop: 550.5557 [2024-07-05 10:39:38,883][17621] Collected {0: 10006528}, FPS: 9069.1 [2024-07-05 10:41:06,918][17621] Loading existing experiment configuration from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/config.json [2024-07-05 10:41:06,919][17621] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-05 10:41:06,920][17621] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-05 10:41:06,920][17621] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-05 10:41:06,921][17621] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-05 10:41:06,921][17621] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-05 10:41:06,921][17621] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-07-05 10:41:06,921][17621] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-05 10:41:06,922][17621] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-07-05 10:41:06,922][17621] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-07-05 10:41:06,922][17621] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-05 10:41:06,923][17621] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-05 10:41:06,923][17621] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-05 10:41:06,923][17621] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-05 10:41:06,924][17621] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-05 10:41:06,941][17621] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:41:06,942][17621] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 10:41:06,943][17621] RunningMeanStd input shape: (1,) [2024-07-05 10:41:06,951][17621] Num input channels: 3 [2024-07-05 10:41:06,960][17621] Convolutional layer output size: 4608 [2024-07-05 10:41:06,973][17621] Policy head output size: 512 [2024-07-05 10:41:08,681][17621] Loading state from checkpoint /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-07-05 10:41:09,496][17621] Num frames 100... [2024-07-05 10:41:09,572][17621] Num frames 200... [2024-07-05 10:41:09,655][17621] Num frames 300... [2024-07-05 10:41:09,736][17621] Num frames 400... [2024-07-05 10:41:09,814][17621] Num frames 500... [2024-07-05 10:41:09,896][17621] Num frames 600... [2024-07-05 10:41:10,005][17621] Avg episode rewards: #0: 14.720, true rewards: #0: 6.720 [2024-07-05 10:41:10,006][17621] Avg episode reward: 14.720, avg true_objective: 6.720 [2024-07-05 10:41:10,034][17621] Num frames 700... [2024-07-05 10:41:10,108][17621] Num frames 800... [2024-07-05 10:41:10,184][17621] Num frames 900... [2024-07-05 10:41:10,267][17621] Num frames 1000... [2024-07-05 10:41:10,342][17621] Num frames 1100... [2024-07-05 10:41:10,415][17621] Num frames 1200... [2024-07-05 10:41:10,491][17621] Num frames 1300... [2024-07-05 10:41:10,565][17621] Num frames 1400... [2024-07-05 10:41:10,637][17621] Num frames 1500... [2024-07-05 10:41:10,712][17621] Num frames 1600... [2024-07-05 10:41:10,787][17621] Num frames 1700... [2024-07-05 10:41:10,869][17621] Num frames 1800... [2024-07-05 10:41:10,940][17621] Avg episode rewards: #0: 21.120, true rewards: #0: 9.120 [2024-07-05 10:41:10,941][17621] Avg episode reward: 21.120, avg true_objective: 9.120 [2024-07-05 10:41:11,003][17621] Num frames 1900... [2024-07-05 10:41:11,074][17621] Num frames 2000... [2024-07-05 10:41:11,147][17621] Num frames 2100... [2024-07-05 10:41:11,220][17621] Num frames 2200... [2024-07-05 10:41:11,297][17621] Num frames 2300... [2024-07-05 10:41:11,372][17621] Num frames 2400... [2024-07-05 10:41:11,445][17621] Num frames 2500... [2024-07-05 10:41:11,522][17621] Num frames 2600... [2024-07-05 10:41:11,601][17621] Num frames 2700... [2024-07-05 10:41:11,675][17621] Num frames 2800... [2024-07-05 10:41:11,753][17621] Num frames 2900... [2024-07-05 10:41:11,826][17621] Num frames 3000... [2024-07-05 10:41:11,908][17621] Num frames 3100... [2024-07-05 10:41:11,983][17621] Num frames 3200... [2024-07-05 10:41:12,058][17621] Num frames 3300... [2024-07-05 10:41:12,137][17621] Num frames 3400... [2024-07-05 10:41:12,215][17621] Num frames 3500... [2024-07-05 10:41:12,291][17621] Num frames 3600... [2024-07-05 10:41:12,371][17621] Num frames 3700... [2024-07-05 10:41:12,446][17621] Num frames 3800... [2024-07-05 10:41:12,523][17621] Num frames 3900... [2024-07-05 10:41:12,595][17621] Avg episode rewards: #0: 32.746, true rewards: #0: 13.080 [2024-07-05 10:41:12,596][17621] Avg episode reward: 32.746, avg true_objective: 13.080 [2024-07-05 10:41:12,658][17621] Num frames 4000... [2024-07-05 10:41:12,737][17621] Num frames 4100... [2024-07-05 10:41:12,814][17621] Num frames 4200... [2024-07-05 10:41:12,904][17621] Num frames 4300... [2024-07-05 10:41:12,985][17621] Num frames 4400... [2024-07-05 10:41:13,070][17621] Num frames 4500... [2024-07-05 10:41:13,149][17621] Avg episode rewards: #0: 27.580, true rewards: #0: 11.330 [2024-07-05 10:41:13,150][17621] Avg episode reward: 27.580, avg true_objective: 11.330 [2024-07-05 10:41:13,205][17621] Num frames 4600... [2024-07-05 10:41:13,283][17621] Num frames 4700... [2024-07-05 10:41:13,361][17621] Num frames 4800... [2024-07-05 10:41:13,437][17621] Num frames 4900... [2024-07-05 10:41:13,511][17621] Num frames 5000... [2024-07-05 10:41:13,587][17621] Num frames 5100... [2024-07-05 10:41:13,659][17621] Num frames 5200... [2024-07-05 10:41:13,730][17621] Num frames 5300... [2024-07-05 10:41:13,804][17621] Num frames 5400... [2024-07-05 10:41:13,879][17621] Avg episode rewards: #0: 27.456, true rewards: #0: 10.856 [2024-07-05 10:41:13,880][17621] Avg episode reward: 27.456, avg true_objective: 10.856 [2024-07-05 10:41:13,939][17621] Num frames 5500... [2024-07-05 10:41:14,010][17621] Num frames 5600... [2024-07-05 10:41:14,083][17621] Num frames 5700... [2024-07-05 10:41:14,156][17621] Num frames 5800... [2024-07-05 10:41:14,232][17621] Num frames 5900... [2024-07-05 10:41:14,307][17621] Num frames 6000... [2024-07-05 10:41:14,383][17621] Num frames 6100... [2024-07-05 10:41:14,459][17621] Num frames 6200... [2024-07-05 10:41:14,534][17621] Num frames 6300... [2024-07-05 10:41:14,607][17621] Num frames 6400... [2024-07-05 10:41:14,685][17621] Num frames 6500... [2024-07-05 10:41:14,780][17621] Num frames 6600... [2024-07-05 10:41:14,861][17621] Num frames 6700... [2024-07-05 10:41:14,938][17621] Num frames 6800... [2024-07-05 10:41:15,018][17621] Num frames 6900... [2024-07-05 10:41:15,098][17621] Num frames 7000... [2024-07-05 10:41:15,177][17621] Num frames 7100... [2024-07-05 10:41:15,252][17621] Num frames 7200... [2024-07-05 10:41:15,334][17621] Num frames 7300... [2024-07-05 10:41:15,414][17621] Num frames 7400... [2024-07-05 10:41:15,494][17621] Num frames 7500... [2024-07-05 10:41:15,571][17621] Avg episode rewards: #0: 32.213, true rewards: #0: 12.547 [2024-07-05 10:41:15,572][17621] Avg episode reward: 32.213, avg true_objective: 12.547 [2024-07-05 10:41:15,631][17621] Num frames 7600... [2024-07-05 10:41:15,708][17621] Num frames 7700... [2024-07-05 10:41:15,787][17621] Num frames 7800... [2024-07-05 10:41:15,864][17621] Num frames 7900... [2024-07-05 10:41:15,945][17621] Num frames 8000... [2024-07-05 10:41:16,030][17621] Num frames 8100... [2024-07-05 10:41:16,108][17621] Num frames 8200... [2024-07-05 10:41:16,187][17621] Num frames 8300... [2024-07-05 10:41:16,271][17621] Num frames 8400... [2024-07-05 10:41:16,362][17621] Num frames 8500... [2024-07-05 10:41:16,443][17621] Num frames 8600... [2024-07-05 10:41:16,519][17621] Num frames 8700... [2024-07-05 10:41:16,598][17621] Num frames 8800... [2024-07-05 10:41:16,672][17621] Num frames 8900... [2024-07-05 10:41:16,749][17621] Num frames 9000... [2024-07-05 10:41:16,827][17621] Num frames 9100... [2024-07-05 10:41:16,905][17621] Num frames 9200... [2024-07-05 10:41:16,998][17621] Avg episode rewards: #0: 33.213, true rewards: #0: 13.213 [2024-07-05 10:41:16,999][17621] Avg episode reward: 33.213, avg true_objective: 13.213 [2024-07-05 10:41:17,047][17621] Num frames 9300... [2024-07-05 10:41:17,126][17621] Num frames 9400... [2024-07-05 10:41:17,207][17621] Num frames 9500... [2024-07-05 10:41:17,288][17621] Num frames 9600... [2024-07-05 10:41:17,368][17621] Num frames 9700... [2024-07-05 10:41:17,446][17621] Num frames 9800... [2024-07-05 10:41:17,524][17621] Num frames 9900... [2024-07-05 10:41:17,607][17621] Num frames 10000... [2024-07-05 10:41:17,688][17621] Num frames 10100... [2024-07-05 10:41:17,770][17621] Num frames 10200... [2024-07-05 10:41:17,854][17621] Num frames 10300... [2024-07-05 10:41:17,934][17621] Num frames 10400... [2024-07-05 10:41:18,017][17621] Avg episode rewards: #0: 32.162, true rewards: #0: 13.038 [2024-07-05 10:41:18,018][17621] Avg episode reward: 32.162, avg true_objective: 13.038 [2024-07-05 10:41:18,081][17621] Num frames 10500... [2024-07-05 10:41:18,165][17621] Num frames 10600... [2024-07-05 10:41:18,244][17621] Num frames 10700... [2024-07-05 10:41:18,326][17621] Num frames 10800... [2024-07-05 10:41:18,411][17621] Num frames 10900... [2024-07-05 10:41:18,496][17621] Num frames 11000... [2024-07-05 10:41:18,583][17621] Num frames 11100... [2024-07-05 10:41:18,662][17621] Num frames 11200... [2024-07-05 10:41:18,745][17621] Num frames 11300... [2024-07-05 10:41:18,827][17621] Num frames 11400... [2024-07-05 10:41:18,900][17621] Avg episode rewards: #0: 30.802, true rewards: #0: 12.691 [2024-07-05 10:41:18,902][17621] Avg episode reward: 30.802, avg true_objective: 12.691 [2024-07-05 10:41:18,970][17621] Num frames 11500... [2024-07-05 10:41:19,051][17621] Num frames 11600... [2024-07-05 10:41:19,131][17621] Num frames 11700... [2024-07-05 10:41:19,213][17621] Num frames 11800... [2024-07-05 10:41:19,296][17621] Num frames 11900... [2024-07-05 10:41:19,380][17621] Num frames 12000... [2024-07-05 10:41:19,463][17621] Num frames 12100... [2024-07-05 10:41:19,546][17621] Num frames 12200... [2024-07-05 10:41:19,627][17621] Num frames 12300... [2024-07-05 10:41:19,709][17621] Num frames 12400... [2024-07-05 10:41:19,795][17621] Num frames 12500... [2024-07-05 10:41:19,872][17621] Num frames 12600... [2024-07-05 10:41:19,962][17621] Num frames 12700... [2024-07-05 10:41:20,041][17621] Num frames 12800... [2024-07-05 10:41:20,124][17621] Num frames 12900... [2024-07-05 10:41:20,204][17621] Num frames 13000... [2024-07-05 10:41:20,284][17621] Num frames 13100... [2024-07-05 10:41:20,365][17621] Num frames 13200... [2024-07-05 10:41:20,443][17621] Num frames 13300... [2024-07-05 10:41:20,527][17621] Num frames 13400... [2024-07-05 10:41:20,613][17621] Num frames 13500... [2024-07-05 10:41:20,691][17621] Avg episode rewards: #0: 32.822, true rewards: #0: 13.522 [2024-07-05 10:41:20,692][17621] Avg episode reward: 32.822, avg true_objective: 13.522 [2024-07-05 10:41:35,312][17621] Replay video saved to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/replay.mp4! [2024-07-05 10:44:13,506][17621] Environment doom_basic already registered, overwriting... [2024-07-05 10:44:13,508][17621] Environment doom_two_colors_easy already registered, overwriting... [2024-07-05 10:44:13,509][17621] Environment doom_two_colors_hard already registered, overwriting... [2024-07-05 10:44:13,509][17621] Environment doom_dm already registered, overwriting... [2024-07-05 10:44:13,509][17621] Environment doom_dwango5 already registered, overwriting... [2024-07-05 10:44:13,509][17621] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-07-05 10:44:13,510][17621] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-07-05 10:44:13,510][17621] Environment doom_my_way_home already registered, overwriting... [2024-07-05 10:44:13,510][17621] Environment doom_deadly_corridor already registered, overwriting... [2024-07-05 10:44:13,510][17621] Environment doom_defend_the_center already registered, overwriting... [2024-07-05 10:44:13,511][17621] Environment doom_defend_the_line already registered, overwriting... [2024-07-05 10:44:13,511][17621] Environment doom_health_gathering already registered, overwriting... [2024-07-05 10:44:13,511][17621] Environment doom_health_gathering_supreme already registered, overwriting... [2024-07-05 10:44:13,512][17621] Environment doom_battle already registered, overwriting... [2024-07-05 10:44:13,512][17621] Environment doom_battle2 already registered, overwriting... [2024-07-05 10:44:13,512][17621] Environment doom_duel_bots already registered, overwriting... [2024-07-05 10:44:13,512][17621] Environment doom_deathmatch_bots already registered, overwriting... [2024-07-05 10:44:13,513][17621] Environment doom_duel already registered, overwriting... [2024-07-05 10:44:13,513][17621] Environment doom_deathmatch_full already registered, overwriting... [2024-07-05 10:44:13,513][17621] Environment doom_benchmark already registered, overwriting... [2024-07-05 10:44:13,513][17621] register_encoder_factory: [2024-07-05 10:44:13,519][17621] Loading existing experiment configuration from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/config.json [2024-07-05 10:44:13,519][17621] Overriding arg 'train_for_env_steps' with value 20000000 passed from command line [2024-07-05 10:44:13,523][17621] Experiment dir /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet already exists! [2024-07-05 10:44:13,524][17621] Resuming existing experiment from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet... [2024-07-05 10:44:13,524][17621] Weights and Biases integration disabled [2024-07-05 10:44:13,525][17621] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-07-05 10:44:15,828][17621] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=conv_resnet train_dir=/home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir restart_behavior=resume device=gpu seed=200 num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=20000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=resnet_impala encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --experiment=conv_resnet --seed=200 --num_workers=8 --num_envs_per_worker=4 --batch_size=1024 --encoder_conv_architecture=resnet_impala --train_for_env_steps=5000000 cli_args={'env': 'doom_health_gathering_supreme', 'experiment': 'conv_resnet', 'seed': 200, 'num_workers': 8, 'num_envs_per_worker': 4, 'batch_size': 1024, 'train_for_env_steps': 5000000, 'encoder_conv_architecture': 'resnet_impala'} git_hash=unknown git_repo_name=not a git repository [2024-07-05 10:44:15,829][17621] Saving configuration to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/config.json... [2024-07-05 10:44:15,830][17621] Rollout worker 0 uses device cpu [2024-07-05 10:44:15,830][17621] Rollout worker 1 uses device cpu [2024-07-05 10:44:15,831][17621] Rollout worker 2 uses device cpu [2024-07-05 10:44:15,831][17621] Rollout worker 3 uses device cpu [2024-07-05 10:44:15,832][17621] Rollout worker 4 uses device cpu [2024-07-05 10:44:15,832][17621] Rollout worker 5 uses device cpu [2024-07-05 10:44:15,832][17621] Rollout worker 6 uses device cpu [2024-07-05 10:44:15,833][17621] Rollout worker 7 uses device cpu [2024-07-05 10:44:15,907][17621] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:44:15,907][17621] InferenceWorker_p0-w0: min num requests: 2 [2024-07-05 10:44:15,933][17621] Starting all processes... [2024-07-05 10:44:15,934][17621] Starting process learner_proc0 [2024-07-05 10:44:15,983][17621] Starting all processes... [2024-07-05 10:44:15,986][17621] Starting process inference_proc0-0 [2024-07-05 10:44:15,986][17621] Starting process rollout_proc0 [2024-07-05 10:44:15,987][17621] Starting process rollout_proc1 [2024-07-05 10:44:15,987][17621] Starting process rollout_proc2 [2024-07-05 10:44:15,987][17621] Starting process rollout_proc3 [2024-07-05 10:44:15,987][17621] Starting process rollout_proc4 [2024-07-05 10:44:15,987][17621] Starting process rollout_proc5 [2024-07-05 10:44:15,987][17621] Starting process rollout_proc6 [2024-07-05 10:44:15,987][17621] Starting process rollout_proc7 [2024-07-05 10:44:18,628][22241] Worker 2 uses CPU cores [4, 5] [2024-07-05 10:44:18,664][22245] Worker 6 uses CPU cores [12, 13] [2024-07-05 10:44:18,783][22239] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:44:18,783][22239] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-07-05 10:44:18,784][22240] Worker 1 uses CPU cores [2, 3] [2024-07-05 10:44:18,793][22238] Worker 0 uses CPU cores [0, 1] [2024-07-05 10:44:18,801][22244] Worker 5 uses CPU cores [10, 11] [2024-07-05 10:44:18,829][22239] Num visible devices: 1 [2024-07-05 10:44:18,949][22225] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:44:18,949][22225] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-07-05 10:44:18,992][22225] Num visible devices: 1 [2024-07-05 10:44:19,009][22242] Worker 4 uses CPU cores [8, 9] [2024-07-05 10:44:19,020][22225] Setting fixed seed 200 [2024-07-05 10:44:19,021][22225] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:44:19,021][22225] Initializing actor-critic model on device cuda:0 [2024-07-05 10:44:19,022][22225] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 10:44:19,023][22225] RunningMeanStd input shape: (1,) [2024-07-05 10:44:19,032][22225] Num input channels: 3 [2024-07-05 10:44:19,046][22225] Convolutional layer output size: 4608 [2024-07-05 10:44:19,052][22246] Worker 7 uses CPU cores [14, 15] [2024-07-05 10:44:19,057][22225] Policy head output size: 512 [2024-07-05 10:44:19,144][22243] Worker 3 uses CPU cores [6, 7] [2024-07-05 10:44:19,159][22225] Created Actor Critic model with architecture: [2024-07-05 10:44:19,159][22225] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ResnetEncoder( (conv_head): Sequential( (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (2): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (3): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (5): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (6): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (7): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (8): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (9): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (10): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (11): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (12): ELU(alpha=1.0) ) (mlp_layers): Sequential( (0): Linear(in_features=4608, out_features=512, bias=True) (1): ELU(alpha=1.0) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-07-05 10:44:19,253][22225] Using optimizer [2024-07-05 10:44:19,778][22225] Loading state from checkpoint /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-07-05 10:44:19,812][22225] Loading model from checkpoint [2024-07-05 10:44:19,814][22225] Loaded experiment state at self.train_step=2443, self.env_steps=10006528 [2024-07-05 10:44:19,814][22225] Initialized policy 0 weights for model version 2443 [2024-07-05 10:44:19,815][22225] LearnerWorker_p0 finished initialization! [2024-07-05 10:44:19,815][22225] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 10:44:19,874][22239] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 10:44:19,877][22239] RunningMeanStd input shape: (1,) [2024-07-05 10:44:19,885][22239] Num input channels: 3 [2024-07-05 10:44:19,896][22239] Convolutional layer output size: 4608 [2024-07-05 10:44:19,907][22239] Policy head output size: 512 [2024-07-05 10:44:20,034][17621] Inference worker 0-0 is ready! [2024-07-05 10:44:20,035][17621] All inference workers are ready! Signal rollout workers to start! [2024-07-05 10:44:20,063][22245] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:44:20,063][22238] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:44:20,063][22246] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:44:20,063][22243] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:44:20,063][22241] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:44:20,063][22242] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:44:20,063][22240] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:44:20,063][22244] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 10:44:20,544][22238] Decorrelating experience for 0 frames... [2024-07-05 10:44:20,545][22241] Decorrelating experience for 0 frames... [2024-07-05 10:44:20,545][22243] Decorrelating experience for 0 frames... [2024-07-05 10:44:20,546][22245] Decorrelating experience for 0 frames... [2024-07-05 10:44:20,547][22246] Decorrelating experience for 0 frames... [2024-07-05 10:44:20,547][22242] Decorrelating experience for 0 frames... [2024-07-05 10:44:20,548][22240] Decorrelating experience for 0 frames... [2024-07-05 10:44:20,696][22238] Decorrelating experience for 32 frames... [2024-07-05 10:44:20,697][22242] Decorrelating experience for 32 frames... [2024-07-05 10:44:20,697][22245] Decorrelating experience for 32 frames... [2024-07-05 10:44:20,697][22246] Decorrelating experience for 32 frames... [2024-07-05 10:44:20,761][22243] Decorrelating experience for 32 frames... [2024-07-05 10:44:20,862][22241] Decorrelating experience for 32 frames... [2024-07-05 10:44:20,864][22240] Decorrelating experience for 32 frames... [2024-07-05 10:44:20,901][22238] Decorrelating experience for 64 frames... [2024-07-05 10:44:20,903][22242] Decorrelating experience for 64 frames... [2024-07-05 10:44:20,914][22245] Decorrelating experience for 64 frames... [2024-07-05 10:44:20,964][22243] Decorrelating experience for 64 frames... [2024-07-05 10:44:21,060][22244] Decorrelating experience for 0 frames... [2024-07-05 10:44:21,060][22246] Decorrelating experience for 64 frames... [2024-07-05 10:44:21,068][22241] Decorrelating experience for 64 frames... [2024-07-05 10:44:21,074][22240] Decorrelating experience for 64 frames... [2024-07-05 10:44:21,084][22238] Decorrelating experience for 96 frames... [2024-07-05 10:44:21,106][22245] Decorrelating experience for 96 frames... [2024-07-05 10:44:21,149][22243] Decorrelating experience for 96 frames... [2024-07-05 10:44:21,238][22242] Decorrelating experience for 96 frames... [2024-07-05 10:44:21,245][22241] Decorrelating experience for 96 frames... [2024-07-05 10:44:21,274][22244] Decorrelating experience for 32 frames... [2024-07-05 10:44:21,346][22246] Decorrelating experience for 96 frames... [2024-07-05 10:44:21,370][22240] Decorrelating experience for 96 frames... [2024-07-05 10:44:21,495][22244] Decorrelating experience for 64 frames... [2024-07-05 10:44:21,667][22244] Decorrelating experience for 96 frames... [2024-07-05 10:44:22,094][22225] Signal inference workers to stop experience collection... [2024-07-05 10:44:22,100][22239] InferenceWorker_p0-w0: stopping experience collection [2024-07-05 10:44:23,526][17621] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 10006528. Throughput: 0: nan. Samples: 2270. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 10:44:23,526][17621] Avg episode reward: [(0, '2.078')] [2024-07-05 10:44:24,772][22225] Signal inference workers to resume experience collection... [2024-07-05 10:44:24,772][22239] InferenceWorker_p0-w0: resuming experience collection [2024-07-05 10:44:28,203][22239] Updated weights for policy 0, policy_version 2453 (0.0099) [2024-07-05 10:44:28,525][17621] Fps is (10 sec: 8192.1, 60 sec: 8192.1, 300 sec: 8192.1). Total num frames: 10047488. Throughput: 0: 568.4. Samples: 5112. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-07-05 10:44:28,526][17621] Avg episode reward: [(0, '12.420')] [2024-07-05 10:44:31,807][22239] Updated weights for policy 0, policy_version 2463 (0.0012) [2024-07-05 10:44:33,525][17621] Fps is (10 sec: 9830.5, 60 sec: 9830.5, 300 sec: 9830.5). Total num frames: 10104832. Throughput: 0: 1968.8. Samples: 21958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:44:33,526][17621] Avg episode reward: [(0, '25.039')] [2024-07-05 10:44:35,424][22239] Updated weights for policy 0, policy_version 2473 (0.0012) [2024-07-05 10:44:35,898][17621] Heartbeat connected on Batcher_0 [2024-07-05 10:44:35,911][17621] Heartbeat connected on RolloutWorker_w0 [2024-07-05 10:44:35,915][17621] Heartbeat connected on RolloutWorker_w1 [2024-07-05 10:44:35,917][17621] Heartbeat connected on RolloutWorker_w2 [2024-07-05 10:44:35,920][17621] Heartbeat connected on RolloutWorker_w3 [2024-07-05 10:44:35,923][17621] Heartbeat connected on RolloutWorker_w4 [2024-07-05 10:44:35,924][17621] Heartbeat connected on InferenceWorker_p0-w0 [2024-07-05 10:44:35,927][17621] Heartbeat connected on RolloutWorker_w5 [2024-07-05 10:44:35,930][17621] Heartbeat connected on RolloutWorker_w6 [2024-07-05 10:44:35,933][17621] Heartbeat connected on RolloutWorker_w7 [2024-07-05 10:44:36,142][17621] Heartbeat connected on LearnerWorker_p0 [2024-07-05 10:44:38,526][17621] Fps is (10 sec: 11468.8, 60 sec: 10376.5, 300 sec: 10376.5). Total num frames: 10162176. Throughput: 0: 2447.2. Samples: 38978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:44:38,526][17621] Avg episode reward: [(0, '31.312')] [2024-07-05 10:44:39,050][22239] Updated weights for policy 0, policy_version 2483 (0.0012) [2024-07-05 10:44:42,661][22239] Updated weights for policy 0, policy_version 2493 (0.0012) [2024-07-05 10:44:43,526][17621] Fps is (10 sec: 11468.7, 60 sec: 10649.6, 300 sec: 10649.6). Total num frames: 10219520. Throughput: 0: 2258.3. Samples: 47436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 10:44:43,526][17621] Avg episode reward: [(0, '33.632')] [2024-07-05 10:44:46,335][22239] Updated weights for policy 0, policy_version 2503 (0.0012) [2024-07-05 10:44:48,525][17621] Fps is (10 sec: 11468.8, 60 sec: 10813.5, 300 sec: 10813.5). Total num frames: 10276864. Throughput: 0: 2474.2. Samples: 64124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 10:44:48,527][17621] Avg episode reward: [(0, '33.295')] [2024-07-05 10:44:49,952][22239] Updated weights for policy 0, policy_version 2513 (0.0012) [2024-07-05 10:44:53,526][17621] Fps is (10 sec: 11059.2, 60 sec: 10786.1, 300 sec: 10786.1). Total num frames: 10330112. Throughput: 0: 2636.4. Samples: 81362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 10:44:53,526][17621] Avg episode reward: [(0, '34.199')] [2024-07-05 10:44:53,556][22239] Updated weights for policy 0, policy_version 2523 (0.0012) [2024-07-05 10:44:57,157][22239] Updated weights for policy 0, policy_version 2533 (0.0012) [2024-07-05 10:44:58,525][17621] Fps is (10 sec: 11059.2, 60 sec: 10883.7, 300 sec: 10883.7). Total num frames: 10387456. Throughput: 0: 2500.9. Samples: 89802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 10:44:58,526][17621] Avg episode reward: [(0, '32.141')] [2024-07-05 10:45:00,759][22239] Updated weights for policy 0, policy_version 2543 (0.0012) [2024-07-05 10:45:03,525][17621] Fps is (10 sec: 11468.9, 60 sec: 10956.8, 300 sec: 10956.8). Total num frames: 10444800. Throughput: 0: 2619.9. Samples: 107066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:45:03,526][17621] Avg episode reward: [(0, '32.091')] [2024-07-05 10:45:04,293][22239] Updated weights for policy 0, policy_version 2553 (0.0012) [2024-07-05 10:45:07,830][22239] Updated weights for policy 0, policy_version 2563 (0.0012) [2024-07-05 10:45:08,525][17621] Fps is (10 sec: 11468.8, 60 sec: 11013.7, 300 sec: 11013.7). Total num frames: 10502144. Throughput: 0: 2715.5. Samples: 124468. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:45:08,526][17621] Avg episode reward: [(0, '29.199')] [2024-07-05 10:45:11,363][22239] Updated weights for policy 0, policy_version 2573 (0.0012) [2024-07-05 10:45:13,526][17621] Fps is (10 sec: 11878.2, 60 sec: 11141.1, 300 sec: 11141.1). Total num frames: 10563584. Throughput: 0: 2844.1. Samples: 133098. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:45:13,527][17621] Avg episode reward: [(0, '32.140')] [2024-07-05 10:45:14,919][22239] Updated weights for policy 0, policy_version 2583 (0.0012) [2024-07-05 10:45:18,482][22239] Updated weights for policy 0, policy_version 2593 (0.0012) [2024-07-05 10:45:18,526][17621] Fps is (10 sec: 11878.3, 60 sec: 11170.9, 300 sec: 11170.9). Total num frames: 10620928. Throughput: 0: 2851.8. Samples: 150288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:45:18,526][17621] Avg episode reward: [(0, '31.332')] [2024-07-05 10:45:22,019][22239] Updated weights for policy 0, policy_version 2603 (0.0012) [2024-07-05 10:45:23,526][17621] Fps is (10 sec: 11468.9, 60 sec: 11195.7, 300 sec: 11195.7). Total num frames: 10678272. Throughput: 0: 2859.4. Samples: 167652. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:45:23,527][17621] Avg episode reward: [(0, '31.411')] [2024-07-05 10:45:25,579][22239] Updated weights for policy 0, policy_version 2613 (0.0012) [2024-07-05 10:45:28,525][17621] Fps is (10 sec: 11468.9, 60 sec: 11468.8, 300 sec: 11216.8). Total num frames: 10735616. Throughput: 0: 2868.9. Samples: 176534. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:45:28,526][17621] Avg episode reward: [(0, '30.890')] [2024-07-05 10:45:29,150][22239] Updated weights for policy 0, policy_version 2623 (0.0011) [2024-07-05 10:45:32,727][22239] Updated weights for policy 0, policy_version 2633 (0.0012) [2024-07-05 10:45:33,525][17621] Fps is (10 sec: 11468.9, 60 sec: 11468.8, 300 sec: 11234.8). Total num frames: 10792960. Throughput: 0: 2875.8. Samples: 193534. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:45:33,527][17621] Avg episode reward: [(0, '28.854')] [2024-07-05 10:45:36,294][22239] Updated weights for policy 0, policy_version 2643 (0.0012) [2024-07-05 10:45:38,525][17621] Fps is (10 sec: 11468.8, 60 sec: 11468.8, 300 sec: 11250.4). Total num frames: 10850304. Throughput: 0: 2874.8. Samples: 210730. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:45:38,526][17621] Avg episode reward: [(0, '29.227')] [2024-07-05 10:45:39,854][22239] Updated weights for policy 0, policy_version 2653 (0.0012) [2024-07-05 10:45:43,402][22239] Updated weights for policy 0, policy_version 2663 (0.0012) [2024-07-05 10:45:43,526][17621] Fps is (10 sec: 11468.7, 60 sec: 11468.8, 300 sec: 11264.0). Total num frames: 10907648. Throughput: 0: 2884.1. Samples: 219588. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 10:45:43,526][17621] Avg episode reward: [(0, '31.927')] [2024-07-05 10:45:46,959][22239] Updated weights for policy 0, policy_version 2673 (0.0012) [2024-07-05 10:45:48,525][17621] Fps is (10 sec: 11468.8, 60 sec: 11468.8, 300 sec: 11276.1). Total num frames: 10964992. Throughput: 0: 2885.6. Samples: 236918. Policy #0 lag: (min: 0.0, avg: 0.9, max: 1.0) [2024-07-05 10:45:48,527][17621] Avg episode reward: [(0, '33.178')] [2024-07-05 10:45:50,506][22239] Updated weights for policy 0, policy_version 2683 (0.0012) [2024-07-05 10:45:53,525][17621] Fps is (10 sec: 11468.9, 60 sec: 11537.1, 300 sec: 11286.8). Total num frames: 11022336. Throughput: 0: 2878.9. Samples: 254020. Policy #0 lag: (min: 0.0, avg: 0.9, max: 1.0) [2024-07-05 10:45:53,526][17621] Avg episode reward: [(0, '34.124')] [2024-07-05 10:45:54,072][22239] Updated weights for policy 0, policy_version 2693 (0.0012) [2024-07-05 10:45:57,623][22239] Updated weights for policy 0, policy_version 2703 (0.0012) [2024-07-05 10:45:58,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 11296.3). Total num frames: 11079680. Throughput: 0: 2881.7. Samples: 262772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 1.0) [2024-07-05 10:45:58,526][17621] Avg episode reward: [(0, '32.672')] [2024-07-05 10:46:01,157][22239] Updated weights for policy 0, policy_version 2713 (0.0012) [2024-07-05 10:46:03,526][17621] Fps is (10 sec: 11468.6, 60 sec: 11537.0, 300 sec: 11305.0). Total num frames: 11137024. Throughput: 0: 2886.4. Samples: 280176. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 10:46:03,527][17621] Avg episode reward: [(0, '32.289')] [2024-07-05 10:46:04,715][22239] Updated weights for policy 0, policy_version 2723 (0.0012) [2024-07-05 10:46:08,248][22239] Updated weights for policy 0, policy_version 2733 (0.0012) [2024-07-05 10:46:08,525][17621] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 11312.8). Total num frames: 11194368. Throughput: 0: 2886.8. Samples: 297556. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 10:46:08,527][17621] Avg episode reward: [(0, '30.228')] [2024-07-05 10:46:11,775][22239] Updated weights for policy 0, policy_version 2743 (0.0012) [2024-07-05 10:46:13,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11468.8, 300 sec: 11319.8). Total num frames: 11251712. Throughput: 0: 2877.8. Samples: 306034. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 10:46:13,527][17621] Avg episode reward: [(0, '30.845')] [2024-07-05 10:46:13,530][22225] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000002748_11255808.pth... [2024-07-05 10:46:13,604][22225] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000002245_9195520.pth [2024-07-05 10:46:15,263][22239] Updated weights for policy 0, policy_version 2753 (0.0012) [2024-07-05 10:46:18,525][17621] Fps is (10 sec: 11878.4, 60 sec: 11537.1, 300 sec: 11362.0). Total num frames: 11313152. Throughput: 0: 2895.8. Samples: 323844. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 10:46:18,526][17621] Avg episode reward: [(0, '30.248')] [2024-07-05 10:46:18,717][22239] Updated weights for policy 0, policy_version 2763 (0.0011) [2024-07-05 10:46:22,151][22239] Updated weights for policy 0, policy_version 2773 (0.0011) [2024-07-05 10:46:23,525][17621] Fps is (10 sec: 12288.2, 60 sec: 11605.4, 300 sec: 11400.5). Total num frames: 11374592. Throughput: 0: 2912.4. Samples: 341786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 10:46:23,526][17621] Avg episode reward: [(0, '30.197')] [2024-07-05 10:46:25,592][22239] Updated weights for policy 0, policy_version 2783 (0.0011) [2024-07-05 10:46:28,526][17621] Fps is (10 sec: 11878.3, 60 sec: 11605.3, 300 sec: 11403.3). Total num frames: 11431936. Throughput: 0: 2915.2. Samples: 350770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 10:46:28,527][17621] Avg episode reward: [(0, '31.984')] [2024-07-05 10:46:29,112][22239] Updated weights for policy 0, policy_version 2793 (0.0011) [2024-07-05 10:46:32,617][22239] Updated weights for policy 0, policy_version 2803 (0.0011) [2024-07-05 10:46:33,526][17621] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11405.8). Total num frames: 11489280. Throughput: 0: 2917.5. Samples: 368204. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 10:46:33,527][17621] Avg episode reward: [(0, '33.702')] [2024-07-05 10:46:36,088][22239] Updated weights for policy 0, policy_version 2813 (0.0011) [2024-07-05 10:46:38,525][17621] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11408.1). Total num frames: 11546624. Throughput: 0: 2928.5. Samples: 385804. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 10:46:38,526][17621] Avg episode reward: [(0, '33.315')] [2024-07-05 10:46:39,595][22239] Updated weights for policy 0, policy_version 2823 (0.0012) [2024-07-05 10:46:43,057][22239] Updated weights for policy 0, policy_version 2833 (0.0011) [2024-07-05 10:46:43,525][17621] Fps is (10 sec: 11878.6, 60 sec: 11673.6, 300 sec: 11439.6). Total num frames: 11608064. Throughput: 0: 2931.8. Samples: 394702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 10:46:43,526][17621] Avg episode reward: [(0, '33.610')] [2024-07-05 10:46:46,574][22239] Updated weights for policy 0, policy_version 2843 (0.0012) [2024-07-05 10:46:48,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11440.6). Total num frames: 11665408. Throughput: 0: 2936.0. Samples: 412296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 10:46:48,526][17621] Avg episode reward: [(0, '34.826')] [2024-07-05 10:46:48,648][22225] Saving new best policy, reward=34.826! [2024-07-05 10:46:50,058][22239] Updated weights for policy 0, policy_version 2853 (0.0012) [2024-07-05 10:46:53,526][17621] Fps is (10 sec: 11468.6, 60 sec: 11673.6, 300 sec: 11441.5). Total num frames: 11722752. Throughput: 0: 2939.5. Samples: 429832. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 10:46:53,526][17621] Avg episode reward: [(0, '35.397')] [2024-07-05 10:46:53,532][22225] Saving new best policy, reward=35.397! [2024-07-05 10:46:53,534][22239] Updated weights for policy 0, policy_version 2863 (0.0011) [2024-07-05 10:46:57,050][22239] Updated weights for policy 0, policy_version 2873 (0.0014) [2024-07-05 10:46:58,525][17621] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11468.8). Total num frames: 11784192. Throughput: 0: 2943.9. Samples: 438510. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 10:46:58,526][17621] Avg episode reward: [(0, '34.461')] [2024-07-05 10:47:00,528][22239] Updated weights for policy 0, policy_version 2883 (0.0011) [2024-07-05 10:47:03,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11468.8). Total num frames: 11841536. Throughput: 0: 2944.3. Samples: 456338. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 10:47:03,526][17621] Avg episode reward: [(0, '31.910')] [2024-07-05 10:47:03,982][22239] Updated weights for policy 0, policy_version 2893 (0.0011) [2024-07-05 10:47:07,426][22239] Updated weights for policy 0, policy_version 2903 (0.0011) [2024-07-05 10:47:08,526][17621] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11493.6). Total num frames: 11902976. Throughput: 0: 2938.0. Samples: 473996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 10:47:08,526][17621] Avg episode reward: [(0, '31.829')] [2024-07-05 10:47:10,881][22239] Updated weights for policy 0, policy_version 2913 (0.0011) [2024-07-05 10:47:13,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11810.2, 300 sec: 11492.9). Total num frames: 11960320. Throughput: 0: 2938.6. Samples: 483008. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 10:47:13,526][17621] Avg episode reward: [(0, '33.901')] [2024-07-05 10:47:14,321][22239] Updated weights for policy 0, policy_version 2923 (0.0011) [2024-07-05 10:47:17,780][22239] Updated weights for policy 0, policy_version 2933 (0.0011) [2024-07-05 10:47:18,525][17621] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11515.6). Total num frames: 12021760. Throughput: 0: 2943.9. Samples: 500678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 10:47:18,526][17621] Avg episode reward: [(0, '35.163')] [2024-07-05 10:47:21,228][22239] Updated weights for policy 0, policy_version 2943 (0.0011) [2024-07-05 10:47:23,526][17621] Fps is (10 sec: 11878.2, 60 sec: 11741.8, 300 sec: 11514.3). Total num frames: 12079104. Throughput: 0: 2955.3. Samples: 518794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 10:47:23,527][17621] Avg episode reward: [(0, '33.913')] [2024-07-05 10:47:24,675][22239] Updated weights for policy 0, policy_version 2953 (0.0011) [2024-07-05 10:47:28,120][22239] Updated weights for policy 0, policy_version 2963 (0.0011) [2024-07-05 10:47:28,525][17621] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11535.2). Total num frames: 12140544. Throughput: 0: 2949.0. Samples: 527406. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:47:28,526][17621] Avg episode reward: [(0, '35.517')] [2024-07-05 10:47:28,527][22225] Saving new best policy, reward=35.517! [2024-07-05 10:47:31,580][22239] Updated weights for policy 0, policy_version 2973 (0.0011) [2024-07-05 10:47:33,526][17621] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11533.5). Total num frames: 12197888. Throughput: 0: 2959.5. Samples: 545472. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:47:33,527][17621] Avg episode reward: [(0, '34.441')] [2024-07-05 10:47:35,028][22239] Updated weights for policy 0, policy_version 2983 (0.0011) [2024-07-05 10:47:38,520][22239] Updated weights for policy 0, policy_version 2993 (0.0012) [2024-07-05 10:47:38,525][17621] Fps is (10 sec: 11878.4, 60 sec: 11878.4, 300 sec: 11552.8). Total num frames: 12259328. Throughput: 0: 2960.7. Samples: 563062. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:47:38,526][17621] Avg episode reward: [(0, '34.588')] [2024-07-05 10:47:41,981][22239] Updated weights for policy 0, policy_version 3003 (0.0011) [2024-07-05 10:47:43,526][17621] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11550.7). Total num frames: 12316672. Throughput: 0: 2968.5. Samples: 572094. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:47:43,526][17621] Avg episode reward: [(0, '32.322')] [2024-07-05 10:47:45,456][22239] Updated weights for policy 0, policy_version 3013 (0.0011) [2024-07-05 10:47:48,525][17621] Fps is (10 sec: 11468.8, 60 sec: 11810.1, 300 sec: 11548.7). Total num frames: 12374016. Throughput: 0: 2960.8. Samples: 589574. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:47:48,526][17621] Avg episode reward: [(0, '31.797')] [2024-07-05 10:47:48,965][22239] Updated weights for policy 0, policy_version 3023 (0.0011) [2024-07-05 10:47:52,411][22239] Updated weights for policy 0, policy_version 3033 (0.0011) [2024-07-05 10:47:53,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11878.4, 300 sec: 11566.3). Total num frames: 12435456. Throughput: 0: 2960.5. Samples: 607218. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:47:53,526][17621] Avg episode reward: [(0, '31.331')] [2024-07-05 10:47:55,869][22239] Updated weights for policy 0, policy_version 3043 (0.0011) [2024-07-05 10:47:58,526][17621] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11564.1). Total num frames: 12492800. Throughput: 0: 2960.5. Samples: 616230. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:47:58,527][17621] Avg episode reward: [(0, '33.645')] [2024-07-05 10:47:59,338][22239] Updated weights for policy 0, policy_version 3053 (0.0011) [2024-07-05 10:48:02,819][22239] Updated weights for policy 0, policy_version 3063 (0.0011) [2024-07-05 10:48:03,525][17621] Fps is (10 sec: 11468.8, 60 sec: 11810.1, 300 sec: 11561.9). Total num frames: 12550144. Throughput: 0: 2958.0. Samples: 633790. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:48:03,526][17621] Avg episode reward: [(0, '34.520')] [2024-07-05 10:48:06,339][22239] Updated weights for policy 0, policy_version 3073 (0.0011) [2024-07-05 10:48:08,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11578.0). Total num frames: 12611584. Throughput: 0: 2944.2. Samples: 651284. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:48:08,527][17621] Avg episode reward: [(0, '34.222')] [2024-07-05 10:48:09,813][22239] Updated weights for policy 0, policy_version 3083 (0.0012) [2024-07-05 10:48:13,289][22239] Updated weights for policy 0, policy_version 3093 (0.0011) [2024-07-05 10:48:13,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11575.7). Total num frames: 12668928. Throughput: 0: 2952.7. Samples: 660278. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:48:13,526][17621] Avg episode reward: [(0, '34.243')] [2024-07-05 10:48:13,633][22225] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000003094_12673024.pth... [2024-07-05 10:48:13,702][22225] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000002443_10006528.pth [2024-07-05 10:48:16,759][22239] Updated weights for policy 0, policy_version 3103 (0.0011) [2024-07-05 10:48:18,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11590.8). Total num frames: 12730368. Throughput: 0: 2942.1. Samples: 677864. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:48:18,526][17621] Avg episode reward: [(0, '34.706')] [2024-07-05 10:48:20,245][22239] Updated weights for policy 0, policy_version 3113 (0.0011) [2024-07-05 10:48:23,526][17621] Fps is (10 sec: 11878.2, 60 sec: 11810.1, 300 sec: 11588.3). Total num frames: 12787712. Throughput: 0: 2941.9. Samples: 695450. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:48:23,527][17621] Avg episode reward: [(0, '34.809')] [2024-07-05 10:48:23,730][22239] Updated weights for policy 0, policy_version 3123 (0.0011) [2024-07-05 10:48:27,240][22239] Updated weights for policy 0, policy_version 3133 (0.0011) [2024-07-05 10:48:28,526][17621] Fps is (10 sec: 11468.5, 60 sec: 11741.8, 300 sec: 11585.8). Total num frames: 12845056. Throughput: 0: 2939.5. Samples: 704370. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:48:28,526][17621] Avg episode reward: [(0, '36.109')] [2024-07-05 10:48:28,635][22225] Saving new best policy, reward=36.109! [2024-07-05 10:48:30,712][22239] Updated weights for policy 0, policy_version 3143 (0.0011) [2024-07-05 10:48:33,525][17621] Fps is (10 sec: 11878.6, 60 sec: 11810.1, 300 sec: 11599.9). Total num frames: 12906496. Throughput: 0: 2942.0. Samples: 721964. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:48:33,526][17621] Avg episode reward: [(0, '36.198')] [2024-07-05 10:48:33,529][22225] Saving new best policy, reward=36.198! [2024-07-05 10:48:34,211][22239] Updated weights for policy 0, policy_version 3153 (0.0012) [2024-07-05 10:48:37,676][22239] Updated weights for policy 0, policy_version 3163 (0.0011) [2024-07-05 10:48:38,526][17621] Fps is (10 sec: 11878.5, 60 sec: 11741.8, 300 sec: 11597.3). Total num frames: 12963840. Throughput: 0: 2941.0. Samples: 739562. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:48:38,527][17621] Avg episode reward: [(0, '32.251')] [2024-07-05 10:48:41,148][22239] Updated weights for policy 0, policy_version 3173 (0.0012) [2024-07-05 10:48:43,526][17621] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11594.8). Total num frames: 13021184. Throughput: 0: 2938.5. Samples: 748462. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:48:43,526][17621] Avg episode reward: [(0, '30.368')] [2024-07-05 10:48:44,637][22239] Updated weights for policy 0, policy_version 3183 (0.0011) [2024-07-05 10:48:48,135][22239] Updated weights for policy 0, policy_version 3193 (0.0011) [2024-07-05 10:48:48,525][17621] Fps is (10 sec: 11878.6, 60 sec: 11810.1, 300 sec: 11607.9). Total num frames: 13082624. Throughput: 0: 2937.7. Samples: 765986. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:48:48,526][17621] Avg episode reward: [(0, '30.422')] [2024-07-05 10:48:51,636][22239] Updated weights for policy 0, policy_version 3203 (0.0012) [2024-07-05 10:48:53,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11605.3). Total num frames: 13139968. Throughput: 0: 2936.4. Samples: 783424. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:48:53,526][17621] Avg episode reward: [(0, '32.225')] [2024-07-05 10:48:55,158][22239] Updated weights for policy 0, policy_version 3213 (0.0012) [2024-07-05 10:48:58,526][17621] Fps is (10 sec: 11468.2, 60 sec: 11741.8, 300 sec: 11602.8). Total num frames: 13197312. Throughput: 0: 2933.7. Samples: 792298. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:48:58,527][17621] Avg episode reward: [(0, '33.683')] [2024-07-05 10:48:58,761][22239] Updated weights for policy 0, policy_version 3223 (0.0012) [2024-07-05 10:49:02,299][22239] Updated weights for policy 0, policy_version 3233 (0.0012) [2024-07-05 10:49:03,525][17621] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11600.5). Total num frames: 13254656. Throughput: 0: 2928.4. Samples: 809640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:49:03,526][17621] Avg episode reward: [(0, '33.838')] [2024-07-05 10:49:05,826][22239] Updated weights for policy 0, policy_version 3243 (0.0012) [2024-07-05 10:49:08,526][17621] Fps is (10 sec: 11469.3, 60 sec: 11673.6, 300 sec: 11598.1). Total num frames: 13312000. Throughput: 0: 2923.7. Samples: 827016. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:49:08,526][17621] Avg episode reward: [(0, '30.303')] [2024-07-05 10:49:09,350][22239] Updated weights for policy 0, policy_version 3253 (0.0012) [2024-07-05 10:49:12,864][22239] Updated weights for policy 0, policy_version 3263 (0.0012) [2024-07-05 10:49:13,526][17621] Fps is (10 sec: 11468.6, 60 sec: 11673.6, 300 sec: 11595.9). Total num frames: 13369344. Throughput: 0: 2915.1. Samples: 835548. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:49:13,527][17621] Avg episode reward: [(0, '29.833')] [2024-07-05 10:49:16,387][22239] Updated weights for policy 0, policy_version 3273 (0.0012) [2024-07-05 10:49:18,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11607.7). Total num frames: 13430784. Throughput: 0: 2911.5. Samples: 852980. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:49:18,526][17621] Avg episode reward: [(0, '30.754')] [2024-07-05 10:49:19,925][22239] Updated weights for policy 0, policy_version 3283 (0.0012) [2024-07-05 10:49:23,449][22239] Updated weights for policy 0, policy_version 3293 (0.0011) [2024-07-05 10:49:23,525][17621] Fps is (10 sec: 11878.6, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 13488128. Throughput: 0: 2907.3. Samples: 870390. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:49:23,526][17621] Avg episode reward: [(0, '33.408')] [2024-07-05 10:49:26,970][22239] Updated weights for policy 0, policy_version 3303 (0.0012) [2024-07-05 10:49:28,525][17621] Fps is (10 sec: 11468.8, 60 sec: 11673.7, 300 sec: 11663.2). Total num frames: 13545472. Throughput: 0: 2908.0. Samples: 879320. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:49:28,526][17621] Avg episode reward: [(0, '35.185')] [2024-07-05 10:49:30,495][22239] Updated weights for policy 0, policy_version 3313 (0.0011) [2024-07-05 10:49:33,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11663.2). Total num frames: 13602816. Throughput: 0: 2906.5. Samples: 896780. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:49:33,526][17621] Avg episode reward: [(0, '34.412')] [2024-07-05 10:49:34,017][22239] Updated weights for policy 0, policy_version 3323 (0.0012) [2024-07-05 10:49:37,535][22239] Updated weights for policy 0, policy_version 3333 (0.0012) [2024-07-05 10:49:38,525][17621] Fps is (10 sec: 11468.8, 60 sec: 11605.4, 300 sec: 11663.2). Total num frames: 13660160. Throughput: 0: 2906.7. Samples: 914224. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:49:38,526][17621] Avg episode reward: [(0, '33.102')] [2024-07-05 10:49:41,053][22239] Updated weights for policy 0, policy_version 3343 (0.0012) [2024-07-05 10:49:43,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 13721600. Throughput: 0: 2898.7. Samples: 922738. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:49:43,527][17621] Avg episode reward: [(0, '34.745')] [2024-07-05 10:49:44,569][22239] Updated weights for policy 0, policy_version 3353 (0.0011) [2024-07-05 10:49:48,101][22239] Updated weights for policy 0, policy_version 3363 (0.0012) [2024-07-05 10:49:48,525][17621] Fps is (10 sec: 11878.4, 60 sec: 11605.3, 300 sec: 11691.0). Total num frames: 13778944. Throughput: 0: 2900.5. Samples: 940162. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:49:48,526][17621] Avg episode reward: [(0, '32.323')] [2024-07-05 10:49:51,625][22239] Updated weights for policy 0, policy_version 3373 (0.0011) [2024-07-05 10:49:53,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11691.0). Total num frames: 13836288. Throughput: 0: 2902.3. Samples: 957618. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:49:53,526][17621] Avg episode reward: [(0, '31.738')] [2024-07-05 10:49:55,150][22239] Updated weights for policy 0, policy_version 3383 (0.0012) [2024-07-05 10:49:58,526][17621] Fps is (10 sec: 11468.6, 60 sec: 11605.4, 300 sec: 11691.0). Total num frames: 13893632. Throughput: 0: 2910.7. Samples: 966530. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:49:58,526][17621] Avg episode reward: [(0, '31.940')] [2024-07-05 10:49:58,627][22239] Updated weights for policy 0, policy_version 3393 (0.0011) [2024-07-05 10:50:02,120][22239] Updated weights for policy 0, policy_version 3403 (0.0012) [2024-07-05 10:50:03,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 13955072. Throughput: 0: 2913.4. Samples: 984082. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:50:03,526][17621] Avg episode reward: [(0, '35.552')] [2024-07-05 10:50:05,607][22239] Updated weights for policy 0, policy_version 3413 (0.0012) [2024-07-05 10:50:08,525][17621] Fps is (10 sec: 11878.6, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 14012416. Throughput: 0: 2913.1. Samples: 1001478. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:50:08,526][17621] Avg episode reward: [(0, '32.818')] [2024-07-05 10:50:09,164][22239] Updated weights for policy 0, policy_version 3423 (0.0011) [2024-07-05 10:50:12,690][22239] Updated weights for policy 0, policy_version 3433 (0.0012) [2024-07-05 10:50:13,526][17621] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 14069760. Throughput: 0: 2912.1. Samples: 1010366. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:50:13,526][17621] Avg episode reward: [(0, '28.854')] [2024-07-05 10:50:13,743][22225] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000003436_14073856.pth... [2024-07-05 10:50:13,817][22225] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000002748_11255808.pth [2024-07-05 10:50:16,231][22239] Updated weights for policy 0, policy_version 3443 (0.0011) [2024-07-05 10:50:18,526][17621] Fps is (10 sec: 11468.4, 60 sec: 11605.3, 300 sec: 11690.9). Total num frames: 14127104. Throughput: 0: 2911.5. Samples: 1027800. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:50:18,527][17621] Avg episode reward: [(0, '30.917')] [2024-07-05 10:50:19,742][22239] Updated weights for policy 0, policy_version 3453 (0.0012) [2024-07-05 10:50:23,271][22239] Updated weights for policy 0, policy_version 3463 (0.0012) [2024-07-05 10:50:23,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11691.0). Total num frames: 14184448. Throughput: 0: 2910.7. Samples: 1045204. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:50:23,526][17621] Avg episode reward: [(0, '33.489')] [2024-07-05 10:50:26,749][22239] Updated weights for policy 0, policy_version 3473 (0.0011) [2024-07-05 10:50:28,525][17621] Fps is (10 sec: 11878.7, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 14245888. Throughput: 0: 2911.8. Samples: 1053770. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:50:28,526][17621] Avg episode reward: [(0, '32.658')] [2024-07-05 10:50:30,220][22239] Updated weights for policy 0, policy_version 3483 (0.0011) [2024-07-05 10:50:33,526][17621] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 14303232. Throughput: 0: 2924.7. Samples: 1071776. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:50:33,527][17621] Avg episode reward: [(0, '30.914')] [2024-07-05 10:50:33,695][22239] Updated weights for policy 0, policy_version 3493 (0.0013) [2024-07-05 10:50:37,153][22239] Updated weights for policy 0, policy_version 3503 (0.0011) [2024-07-05 10:50:38,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 14360576. Throughput: 0: 2927.5. Samples: 1089356. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:50:38,527][17621] Avg episode reward: [(0, '31.179')] [2024-07-05 10:50:40,653][22239] Updated weights for policy 0, policy_version 3513 (0.0012) [2024-07-05 10:50:43,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 14422016. Throughput: 0: 2924.1. Samples: 1098116. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:50:43,526][17621] Avg episode reward: [(0, '34.625')] [2024-07-05 10:50:44,112][22239] Updated weights for policy 0, policy_version 3523 (0.0011) [2024-07-05 10:50:47,617][22239] Updated weights for policy 0, policy_version 3533 (0.0011) [2024-07-05 10:50:48,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 14479360. Throughput: 0: 2930.0. Samples: 1115932. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:50:48,526][17621] Avg episode reward: [(0, '34.909')] [2024-07-05 10:50:51,075][22239] Updated weights for policy 0, policy_version 3543 (0.0011) [2024-07-05 10:50:53,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 14540800. Throughput: 0: 2935.2. Samples: 1133560. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:50:53,526][17621] Avg episode reward: [(0, '36.872')] [2024-07-05 10:50:53,529][22225] Saving new best policy, reward=36.872! [2024-07-05 10:50:54,550][22239] Updated weights for policy 0, policy_version 3553 (0.0011) [2024-07-05 10:50:58,016][22239] Updated weights for policy 0, policy_version 3563 (0.0011) [2024-07-05 10:50:58,526][17621] Fps is (10 sec: 11878.2, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 14598144. Throughput: 0: 2936.3. Samples: 1142498. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:50:58,527][17621] Avg episode reward: [(0, '38.247')] [2024-07-05 10:50:58,709][22225] Saving new best policy, reward=38.247! [2024-07-05 10:51:01,499][22239] Updated weights for policy 0, policy_version 3573 (0.0011) [2024-07-05 10:51:03,526][17621] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11732.6). Total num frames: 14655488. Throughput: 0: 2939.6. Samples: 1160082. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:51:03,526][17621] Avg episode reward: [(0, '34.732')] [2024-07-05 10:51:04,965][22239] Updated weights for policy 0, policy_version 3583 (0.0011) [2024-07-05 10:51:08,423][22239] Updated weights for policy 0, policy_version 3593 (0.0011) [2024-07-05 10:51:08,526][17621] Fps is (10 sec: 11878.5, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 14716928. Throughput: 0: 2943.4. Samples: 1177656. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:51:08,526][17621] Avg episode reward: [(0, '31.727')] [2024-07-05 10:51:11,899][22239] Updated weights for policy 0, policy_version 3603 (0.0011) [2024-07-05 10:51:13,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 14774272. Throughput: 0: 2953.7. Samples: 1186688. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:51:13,526][17621] Avg episode reward: [(0, '32.747')] [2024-07-05 10:51:15,361][22239] Updated weights for policy 0, policy_version 3613 (0.0011) [2024-07-05 10:51:18,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11810.2, 300 sec: 11732.6). Total num frames: 14835712. Throughput: 0: 2944.2. Samples: 1204264. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:51:18,526][17621] Avg episode reward: [(0, '34.567')] [2024-07-05 10:51:18,846][22239] Updated weights for policy 0, policy_version 3623 (0.0012) [2024-07-05 10:51:22,309][22239] Updated weights for policy 0, policy_version 3633 (0.0011) [2024-07-05 10:51:23,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11732.6). Total num frames: 14893056. Throughput: 0: 2948.2. Samples: 1222024. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:51:23,526][17621] Avg episode reward: [(0, '33.371')] [2024-07-05 10:51:25,794][22239] Updated weights for policy 0, policy_version 3643 (0.0011) [2024-07-05 10:51:28,526][17621] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 14950400. Throughput: 0: 2950.0. Samples: 1230864. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:51:28,526][17621] Avg episode reward: [(0, '33.892')] [2024-07-05 10:51:29,266][22239] Updated weights for policy 0, policy_version 3653 (0.0011) [2024-07-05 10:51:32,736][22239] Updated weights for policy 0, policy_version 3663 (0.0011) [2024-07-05 10:51:33,526][17621] Fps is (10 sec: 11878.2, 60 sec: 11810.1, 300 sec: 11746.5). Total num frames: 15011840. Throughput: 0: 2945.0. Samples: 1248456. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:51:33,527][17621] Avg episode reward: [(0, '34.873')] [2024-07-05 10:51:36,199][22239] Updated weights for policy 0, policy_version 3673 (0.0011) [2024-07-05 10:51:38,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11732.6). Total num frames: 15069184. Throughput: 0: 2953.4. Samples: 1266464. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:51:38,527][17621] Avg episode reward: [(0, '34.261')] [2024-07-05 10:51:39,671][22239] Updated weights for policy 0, policy_version 3683 (0.0011) [2024-07-05 10:51:43,151][22239] Updated weights for policy 0, policy_version 3693 (0.0011) [2024-07-05 10:51:43,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11746.5). Total num frames: 15130624. Throughput: 0: 2945.2. Samples: 1275030. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:51:43,526][17621] Avg episode reward: [(0, '35.402')] [2024-07-05 10:51:46,632][22239] Updated weights for policy 0, policy_version 3703 (0.0012) [2024-07-05 10:51:48,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11746.5). Total num frames: 15187968. Throughput: 0: 2951.3. Samples: 1292892. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:51:48,526][17621] Avg episode reward: [(0, '36.698')] [2024-07-05 10:51:50,093][22239] Updated weights for policy 0, policy_version 3713 (0.0011) [2024-07-05 10:51:53,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 15245312. Throughput: 0: 2953.5. Samples: 1310562. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:51:53,526][17621] Avg episode reward: [(0, '37.104')] [2024-07-05 10:51:53,584][22239] Updated weights for policy 0, policy_version 3723 (0.0011) [2024-07-05 10:51:57,049][22239] Updated weights for policy 0, policy_version 3733 (0.0012) [2024-07-05 10:51:58,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11810.2, 300 sec: 11746.5). Total num frames: 15306752. Throughput: 0: 2945.8. Samples: 1319250. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:51:58,526][17621] Avg episode reward: [(0, '36.921')] [2024-07-05 10:52:00,538][22239] Updated weights for policy 0, policy_version 3743 (0.0011) [2024-07-05 10:52:03,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11732.6). Total num frames: 15364096. Throughput: 0: 2952.2. Samples: 1337112. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:52:03,526][17621] Avg episode reward: [(0, '36.353')] [2024-07-05 10:52:04,015][22239] Updated weights for policy 0, policy_version 3753 (0.0011) [2024-07-05 10:52:07,470][22239] Updated weights for policy 0, policy_version 3763 (0.0011) [2024-07-05 10:52:08,525][17621] Fps is (10 sec: 11878.4, 60 sec: 11810.2, 300 sec: 11746.5). Total num frames: 15425536. Throughput: 0: 2949.0. Samples: 1354728. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:52:08,526][17621] Avg episode reward: [(0, '35.090')] [2024-07-05 10:52:10,948][22239] Updated weights for policy 0, policy_version 3773 (0.0011) [2024-07-05 10:52:13,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11732.6). Total num frames: 15482880. Throughput: 0: 2951.9. Samples: 1363698. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:52:13,527][17621] Avg episode reward: [(0, '34.143')] [2024-07-05 10:52:13,723][22225] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000003781_15486976.pth... [2024-07-05 10:52:13,794][22225] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000003094_12673024.pth [2024-07-05 10:52:14,428][22239] Updated weights for policy 0, policy_version 3783 (0.0012) [2024-07-05 10:52:17,913][22239] Updated weights for policy 0, policy_version 3793 (0.0011) [2024-07-05 10:52:18,525][17621] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 15540224. Throughput: 0: 2950.1. Samples: 1381210. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:52:18,526][17621] Avg episode reward: [(0, '33.997')] [2024-07-05 10:52:21,382][22239] Updated weights for policy 0, policy_version 3803 (0.0011) [2024-07-05 10:52:23,526][17621] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11732.6). Total num frames: 15601664. Throughput: 0: 2940.3. Samples: 1398778. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:52:23,526][17621] Avg episode reward: [(0, '35.518')] [2024-07-05 10:52:24,855][22239] Updated weights for policy 0, policy_version 3813 (0.0011) [2024-07-05 10:52:28,329][22239] Updated weights for policy 0, policy_version 3823 (0.0012) [2024-07-05 10:52:28,526][17621] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11732.6). Total num frames: 15659008. Throughput: 0: 2949.6. Samples: 1407764. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:52:28,526][17621] Avg episode reward: [(0, '34.590')] [2024-07-05 10:52:31,815][22239] Updated weights for policy 0, policy_version 3833 (0.0012) [2024-07-05 10:52:33,526][17621] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 15716352. Throughput: 0: 2942.8. Samples: 1425318. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:52:33,527][17621] Avg episode reward: [(0, '34.683')] [2024-07-05 10:52:35,306][22239] Updated weights for policy 0, policy_version 3843 (0.0012) [2024-07-05 10:52:38,526][17621] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11732.6). Total num frames: 15777792. Throughput: 0: 2939.1. Samples: 1442824. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:52:38,527][17621] Avg episode reward: [(0, '34.328')] [2024-07-05 10:52:38,783][22239] Updated weights for policy 0, policy_version 3853 (0.0011) [2024-07-05 10:52:42,285][22239] Updated weights for policy 0, policy_version 3863 (0.0011) [2024-07-05 10:52:43,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 15835136. Throughput: 0: 2946.0. Samples: 1451818. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:52:43,526][17621] Avg episode reward: [(0, '33.907')] [2024-07-05 10:52:45,791][22239] Updated weights for policy 0, policy_version 3873 (0.0012) [2024-07-05 10:52:48,526][17621] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 15892480. Throughput: 0: 2936.6. Samples: 1469258. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:52:48,526][17621] Avg episode reward: [(0, '35.607')] [2024-07-05 10:52:49,315][22239] Updated weights for policy 0, policy_version 3883 (0.0012) [2024-07-05 10:52:52,795][22239] Updated weights for policy 0, policy_version 3893 (0.0012) [2024-07-05 10:52:53,526][17621] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11732.6). Total num frames: 15953920. Throughput: 0: 2934.7. Samples: 1486792. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:52:53,526][17621] Avg episode reward: [(0, '36.249')] [2024-07-05 10:52:56,291][22239] Updated weights for policy 0, policy_version 3903 (0.0011) [2024-07-05 10:52:58,526][17621] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11732.6). Total num frames: 16011264. Throughput: 0: 2934.6. Samples: 1495754. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:52:58,526][17621] Avg episode reward: [(0, '34.536')] [2024-07-05 10:52:59,773][22239] Updated weights for policy 0, policy_version 3913 (0.0011) [2024-07-05 10:53:03,274][22239] Updated weights for policy 0, policy_version 3923 (0.0011) [2024-07-05 10:53:03,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 16068608. Throughput: 0: 2934.7. Samples: 1513270. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:53:03,526][17621] Avg episode reward: [(0, '33.284')] [2024-07-05 10:53:06,750][22239] Updated weights for policy 0, policy_version 3933 (0.0011) [2024-07-05 10:53:08,525][17621] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 16130048. Throughput: 0: 2934.4. Samples: 1530826. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:53:08,526][17621] Avg episode reward: [(0, '34.184')] [2024-07-05 10:53:10,256][22239] Updated weights for policy 0, policy_version 3943 (0.0011) [2024-07-05 10:53:13,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 16187392. Throughput: 0: 2929.6. Samples: 1539598. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:53:13,527][17621] Avg episode reward: [(0, '33.827')] [2024-07-05 10:53:13,806][22239] Updated weights for policy 0, policy_version 3953 (0.0012) [2024-07-05 10:53:17,350][22239] Updated weights for policy 0, policy_version 3963 (0.0012) [2024-07-05 10:53:18,525][17621] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 16244736. Throughput: 0: 2923.2. Samples: 1556862. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:53:18,526][17621] Avg episode reward: [(0, '36.455')] [2024-07-05 10:53:20,882][22239] Updated weights for policy 0, policy_version 3973 (0.0012) [2024-07-05 10:53:23,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 16302080. Throughput: 0: 2921.3. Samples: 1574280. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:53:23,527][17621] Avg episode reward: [(0, '34.882')] [2024-07-05 10:53:24,409][22239] Updated weights for policy 0, policy_version 3983 (0.0012) [2024-07-05 10:53:27,948][22239] Updated weights for policy 0, policy_version 3993 (0.0012) [2024-07-05 10:53:28,526][17621] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 16359424. Throughput: 0: 2915.7. Samples: 1583024. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:53:28,527][17621] Avg episode reward: [(0, '34.714')] [2024-07-05 10:53:31,470][22239] Updated weights for policy 0, policy_version 4003 (0.0012) [2024-07-05 10:53:33,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 16416768. Throughput: 0: 2914.6. Samples: 1600416. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:53:33,526][17621] Avg episode reward: [(0, '35.665')] [2024-07-05 10:53:35,004][22239] Updated weights for policy 0, policy_version 4013 (0.0012) [2024-07-05 10:53:38,526][17621] Fps is (10 sec: 11468.6, 60 sec: 11605.3, 300 sec: 11704.8). Total num frames: 16474112. Throughput: 0: 2911.9. Samples: 1617828. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:53:38,527][17621] Avg episode reward: [(0, '35.951')] [2024-07-05 10:53:38,549][22239] Updated weights for policy 0, policy_version 4023 (0.0012) [2024-07-05 10:53:42,095][22239] Updated weights for policy 0, policy_version 4033 (0.0012) [2024-07-05 10:53:43,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 16535552. Throughput: 0: 2901.5. Samples: 1626320. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:53:43,526][17621] Avg episode reward: [(0, '36.758')] [2024-07-05 10:53:45,589][22239] Updated weights for policy 0, policy_version 4043 (0.0013) [2024-07-05 10:53:48,526][17621] Fps is (10 sec: 11878.6, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 16592896. Throughput: 0: 2905.2. Samples: 1644004. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:53:48,526][17621] Avg episode reward: [(0, '36.450')] [2024-07-05 10:53:49,071][22239] Updated weights for policy 0, policy_version 4053 (0.0012) [2024-07-05 10:53:52,553][22239] Updated weights for policy 0, policy_version 4063 (0.0013) [2024-07-05 10:53:53,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11704.9). Total num frames: 16650240. Throughput: 0: 2908.0. Samples: 1661684. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:53:53,526][17621] Avg episode reward: [(0, '35.086')] [2024-07-05 10:53:56,049][22239] Updated weights for policy 0, policy_version 4073 (0.0011) [2024-07-05 10:53:58,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 16711680. Throughput: 0: 2903.5. Samples: 1670256. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:53:58,526][17621] Avg episode reward: [(0, '37.287')] [2024-07-05 10:53:59,540][22239] Updated weights for policy 0, policy_version 4083 (0.0012) [2024-07-05 10:54:03,009][22239] Updated weights for policy 0, policy_version 4093 (0.0011) [2024-07-05 10:54:03,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 16769024. Throughput: 0: 2918.9. Samples: 1688212. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:54:03,526][17621] Avg episode reward: [(0, '38.159')] [2024-07-05 10:54:06,493][22239] Updated weights for policy 0, policy_version 4103 (0.0012) [2024-07-05 10:54:08,526][17621] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11718.7). Total num frames: 16826368. Throughput: 0: 2922.6. Samples: 1705796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:54:08,526][17621] Avg episode reward: [(0, '36.902')] [2024-07-05 10:54:09,976][22239] Updated weights for policy 0, policy_version 4113 (0.0012) [2024-07-05 10:54:13,481][22239] Updated weights for policy 0, policy_version 4123 (0.0012) [2024-07-05 10:54:13,525][17621] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 16887808. Throughput: 0: 2918.3. Samples: 1714348. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:54:13,526][17621] Avg episode reward: [(0, '37.767')] [2024-07-05 10:54:13,529][22225] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000004123_16887808.pth... [2024-07-05 10:54:13,602][22225] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000003436_14073856.pth [2024-07-05 10:54:16,980][22239] Updated weights for policy 0, policy_version 4133 (0.0011) [2024-07-05 10:54:18,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 16945152. Throughput: 0: 2926.2. Samples: 1732094. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:54:18,527][17621] Avg episode reward: [(0, '35.990')] [2024-07-05 10:54:20,484][22239] Updated weights for policy 0, policy_version 4143 (0.0011) [2024-07-05 10:54:23,526][17621] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 17002496. Throughput: 0: 2931.6. Samples: 1749748. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:54:23,526][17621] Avg episode reward: [(0, '36.294')] [2024-07-05 10:54:23,999][22239] Updated weights for policy 0, policy_version 4153 (0.0012) [2024-07-05 10:54:27,482][22239] Updated weights for policy 0, policy_version 4163 (0.0011) [2024-07-05 10:54:28,525][17621] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 17059840. Throughput: 0: 2932.8. Samples: 1758294. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:54:28,526][17621] Avg episode reward: [(0, '37.143')] [2024-07-05 10:54:30,968][22239] Updated weights for policy 0, policy_version 4173 (0.0011) [2024-07-05 10:54:33,526][17621] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11732.6). Total num frames: 17121280. Throughput: 0: 2930.3. Samples: 1775870. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:54:33,526][17621] Avg episode reward: [(0, '37.790')] [2024-07-05 10:54:34,458][22239] Updated weights for policy 0, policy_version 4183 (0.0011) [2024-07-05 10:54:37,964][22239] Updated weights for policy 0, policy_version 4193 (0.0011) [2024-07-05 10:54:38,526][17621] Fps is (10 sec: 11878.2, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 17178624. Throughput: 0: 2933.4. Samples: 1793686. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:54:38,527][17621] Avg episode reward: [(0, '33.890')] [2024-07-05 10:54:41,437][22239] Updated weights for policy 0, policy_version 4203 (0.0011) [2024-07-05 10:54:43,525][17621] Fps is (10 sec: 11469.0, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 17235968. Throughput: 0: 2934.9. Samples: 1802326. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:54:43,527][17621] Avg episode reward: [(0, '33.734')] [2024-07-05 10:54:44,911][22239] Updated weights for policy 0, policy_version 4213 (0.0011) [2024-07-05 10:54:48,411][22239] Updated weights for policy 0, policy_version 4223 (0.0011) [2024-07-05 10:54:48,525][17621] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 17297408. Throughput: 0: 2928.4. Samples: 1819992. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:54:48,526][17621] Avg episode reward: [(0, '33.976')] [2024-07-05 10:54:51,907][22239] Updated weights for policy 0, policy_version 4233 (0.0011) [2024-07-05 10:54:53,526][17621] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 17354752. Throughput: 0: 2931.9. Samples: 1837732. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:54:53,526][17621] Avg episode reward: [(0, '35.507')] [2024-07-05 10:54:55,407][22239] Updated weights for policy 0, policy_version 4243 (0.0011) [2024-07-05 10:54:58,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 17412096. Throughput: 0: 2932.5. Samples: 1846310. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:54:58,526][17621] Avg episode reward: [(0, '35.977')] [2024-07-05 10:54:58,890][22239] Updated weights for policy 0, policy_version 4253 (0.0011) [2024-07-05 10:55:02,374][22239] Updated weights for policy 0, policy_version 4263 (0.0011) [2024-07-05 10:55:03,526][17621] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11732.6). Total num frames: 17473536. Throughput: 0: 2929.0. Samples: 1863900. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:55:03,526][17621] Avg episode reward: [(0, '35.078')] [2024-07-05 10:55:05,917][22239] Updated weights for policy 0, policy_version 4273 (0.0011) [2024-07-05 10:55:08,526][17621] Fps is (10 sec: 11878.2, 60 sec: 11741.8, 300 sec: 11732.6). Total num frames: 17530880. Throughput: 0: 2924.8. Samples: 1881364. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:55:08,526][17621] Avg episode reward: [(0, '35.458')] [2024-07-05 10:55:09,433][22239] Updated weights for policy 0, policy_version 4283 (0.0012) [2024-07-05 10:55:12,960][22239] Updated weights for policy 0, policy_version 4293 (0.0012) [2024-07-05 10:55:13,526][17621] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11732.6). Total num frames: 17588224. Throughput: 0: 2932.3. Samples: 1890246. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:55:13,526][17621] Avg episode reward: [(0, '35.649')] [2024-07-05 10:55:16,469][22239] Updated weights for policy 0, policy_version 4303 (0.0012) [2024-07-05 10:55:18,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11732.6). Total num frames: 17645568. Throughput: 0: 2929.5. Samples: 1907696. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:55:18,527][17621] Avg episode reward: [(0, '36.062')] [2024-07-05 10:55:19,959][22239] Updated weights for policy 0, policy_version 4313 (0.0012) [2024-07-05 10:55:23,447][22239] Updated weights for policy 0, policy_version 4323 (0.0011) [2024-07-05 10:55:23,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 17707008. Throughput: 0: 2923.5. Samples: 1925242. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:55:23,527][17621] Avg episode reward: [(0, '36.211')] [2024-07-05 10:55:26,942][22239] Updated weights for policy 0, policy_version 4333 (0.0011) [2024-07-05 10:55:28,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11741.8, 300 sec: 11732.6). Total num frames: 17764352. Throughput: 0: 2929.9. Samples: 1934172. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:55:28,527][17621] Avg episode reward: [(0, '38.604')] [2024-07-05 10:55:28,670][22225] Saving new best policy, reward=38.604! [2024-07-05 10:55:30,432][22239] Updated weights for policy 0, policy_version 4343 (0.0011) [2024-07-05 10:55:33,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11732.6). Total num frames: 17821696. Throughput: 0: 2926.0. Samples: 1951662. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:55:33,526][17621] Avg episode reward: [(0, '34.447')] [2024-07-05 10:55:33,945][22239] Updated weights for policy 0, policy_version 4353 (0.0012) [2024-07-05 10:55:37,418][22239] Updated weights for policy 0, policy_version 4363 (0.0011) [2024-07-05 10:55:38,525][17621] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 17883136. Throughput: 0: 2921.6. Samples: 1969204. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:55:38,526][17621] Avg episode reward: [(0, '34.099')] [2024-07-05 10:55:40,874][22239] Updated weights for policy 0, policy_version 4373 (0.0011) [2024-07-05 10:55:43,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 17940480. Throughput: 0: 2930.8. Samples: 1978194. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:55:43,526][17621] Avg episode reward: [(0, '35.279')] [2024-07-05 10:55:44,351][22239] Updated weights for policy 0, policy_version 4383 (0.0012) [2024-07-05 10:55:47,825][22239] Updated weights for policy 0, policy_version 4393 (0.0011) [2024-07-05 10:55:48,525][17621] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 18001920. Throughput: 0: 2929.8. Samples: 1995740. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:55:48,526][17621] Avg episode reward: [(0, '38.352')] [2024-07-05 10:55:51,306][22239] Updated weights for policy 0, policy_version 4403 (0.0011) [2024-07-05 10:55:53,526][17621] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 18059264. Throughput: 0: 2931.5. Samples: 2013280. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:55:53,526][17621] Avg episode reward: [(0, '35.935')] [2024-07-05 10:55:54,793][22239] Updated weights for policy 0, policy_version 4413 (0.0011) [2024-07-05 10:55:58,394][22239] Updated weights for policy 0, policy_version 4423 (0.0012) [2024-07-05 10:55:58,526][17621] Fps is (10 sec: 11468.3, 60 sec: 11741.8, 300 sec: 11732.6). Total num frames: 18116608. Throughput: 0: 2932.4. Samples: 2022204. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:55:58,527][17621] Avg episode reward: [(0, '35.283')] [2024-07-05 10:56:01,944][22239] Updated weights for policy 0, policy_version 4433 (0.0012) [2024-07-05 10:56:03,526][17621] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 18173952. Throughput: 0: 2928.9. Samples: 2039498. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:56:03,527][17621] Avg episode reward: [(0, '34.056')] [2024-07-05 10:56:05,469][22239] Updated weights for policy 0, policy_version 4443 (0.0011) [2024-07-05 10:56:08,526][17621] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 18231296. Throughput: 0: 2926.5. Samples: 2056936. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:56:08,527][17621] Avg episode reward: [(0, '33.183')] [2024-07-05 10:56:08,978][22239] Updated weights for policy 0, policy_version 4453 (0.0012) [2024-07-05 10:56:12,499][22239] Updated weights for policy 0, policy_version 4463 (0.0012) [2024-07-05 10:56:13,526][17621] Fps is (10 sec: 11468.6, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 18288640. Throughput: 0: 2917.5. Samples: 2065460. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:56:13,527][17621] Avg episode reward: [(0, '37.524')] [2024-07-05 10:56:13,558][22225] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000004466_18292736.pth... [2024-07-05 10:56:13,632][22225] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000003781_15486976.pth [2024-07-05 10:56:16,033][22239] Updated weights for policy 0, policy_version 4473 (0.0012) [2024-07-05 10:56:18,526][17621] Fps is (10 sec: 11469.0, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 18345984. Throughput: 0: 2915.6. Samples: 2082864. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:56:18,526][17621] Avg episode reward: [(0, '36.274')] [2024-07-05 10:56:19,586][22239] Updated weights for policy 0, policy_version 4483 (0.0012) [2024-07-05 10:56:23,108][22239] Updated weights for policy 0, policy_version 4493 (0.0012) [2024-07-05 10:56:23,526][17621] Fps is (10 sec: 11878.7, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 18407424. Throughput: 0: 2911.5. Samples: 2100222. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:56:23,526][17621] Avg episode reward: [(0, '38.506')] [2024-07-05 10:56:26,660][22239] Updated weights for policy 0, policy_version 4503 (0.0012) [2024-07-05 10:56:28,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 18464768. Throughput: 0: 2909.4. Samples: 2109118. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:56:28,526][17621] Avg episode reward: [(0, '35.978')] [2024-07-05 10:56:30,181][22239] Updated weights for policy 0, policy_version 4513 (0.0012) [2024-07-05 10:56:33,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 18522112. Throughput: 0: 2902.6. Samples: 2126358. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:56:33,527][17621] Avg episode reward: [(0, '38.099')] [2024-07-05 10:56:33,742][22239] Updated weights for policy 0, policy_version 4523 (0.0012) [2024-07-05 10:56:37,248][22239] Updated weights for policy 0, policy_version 4533 (0.0011) [2024-07-05 10:56:38,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11691.0). Total num frames: 18579456. Throughput: 0: 2905.1. Samples: 2144010. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:56:38,526][17621] Avg episode reward: [(0, '38.060')] [2024-07-05 10:56:40,732][22239] Updated weights for policy 0, policy_version 4543 (0.0011) [2024-07-05 10:56:43,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 18640896. Throughput: 0: 2896.7. Samples: 2152554. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:56:43,526][17621] Avg episode reward: [(0, '40.711')] [2024-07-05 10:56:43,528][22225] Saving new best policy, reward=40.711! [2024-07-05 10:56:44,222][22239] Updated weights for policy 0, policy_version 4553 (0.0011) [2024-07-05 10:56:47,723][22239] Updated weights for policy 0, policy_version 4563 (0.0011) [2024-07-05 10:56:48,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11605.3, 300 sec: 11704.8). Total num frames: 18698240. Throughput: 0: 2903.7. Samples: 2170164. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:56:48,526][17621] Avg episode reward: [(0, '39.459')] [2024-07-05 10:56:51,262][22239] Updated weights for policy 0, policy_version 4573 (0.0014) [2024-07-05 10:56:53,526][17621] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11691.0). Total num frames: 18755584. Throughput: 0: 2904.4. Samples: 2187632. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:56:53,526][17621] Avg episode reward: [(0, '36.632')] [2024-07-05 10:56:54,787][22239] Updated weights for policy 0, policy_version 4583 (0.0012) [2024-07-05 10:56:58,336][22239] Updated weights for policy 0, policy_version 4593 (0.0012) [2024-07-05 10:56:58,525][17621] Fps is (10 sec: 11468.8, 60 sec: 11605.4, 300 sec: 11691.0). Total num frames: 18812928. Throughput: 0: 2909.1. Samples: 2196370. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:56:58,526][17621] Avg episode reward: [(0, '34.809')] [2024-07-05 10:57:01,826][22239] Updated weights for policy 0, policy_version 4603 (0.0011) [2024-07-05 10:57:03,525][17621] Fps is (10 sec: 11468.9, 60 sec: 11605.4, 300 sec: 11677.1). Total num frames: 18870272. Throughput: 0: 2910.5. Samples: 2213834. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:57:03,526][17621] Avg episode reward: [(0, '35.190')] [2024-07-05 10:57:05,335][22239] Updated weights for policy 0, policy_version 4613 (0.0011) [2024-07-05 10:57:08,526][17621] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 18931712. Throughput: 0: 2912.8. Samples: 2231298. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:57:08,527][17621] Avg episode reward: [(0, '34.187')] [2024-07-05 10:57:08,834][22239] Updated weights for policy 0, policy_version 4623 (0.0012) [2024-07-05 10:57:12,351][22239] Updated weights for policy 0, policy_version 4633 (0.0012) [2024-07-05 10:57:13,526][17621] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 18989056. Throughput: 0: 2914.2. Samples: 2240258. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 10:57:13,527][17621] Avg episode reward: [(0, '33.250')] [2024-07-05 10:57:15,853][22239] Updated weights for policy 0, policy_version 4643 (0.0011) [2024-07-05 10:57:18,526][17621] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 19046400. Throughput: 0: 2919.0. Samples: 2257712. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:57:18,527][17621] Avg episode reward: [(0, '33.154')] [2024-07-05 10:57:19,380][22239] Updated weights for policy 0, policy_version 4653 (0.0011) [2024-07-05 10:57:22,876][22239] Updated weights for policy 0, policy_version 4663 (0.0012) [2024-07-05 10:57:23,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11677.1). Total num frames: 19103744. Throughput: 0: 2914.7. Samples: 2275172. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:57:23,527][17621] Avg episode reward: [(0, '34.708')] [2024-07-05 10:57:26,344][22239] Updated weights for policy 0, policy_version 4673 (0.0011) [2024-07-05 10:57:28,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 19165184. Throughput: 0: 2921.2. Samples: 2284006. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:57:28,526][17621] Avg episode reward: [(0, '34.441')] [2024-07-05 10:57:29,847][22239] Updated weights for policy 0, policy_version 4683 (0.0011) [2024-07-05 10:57:33,344][22239] Updated weights for policy 0, policy_version 4693 (0.0012) [2024-07-05 10:57:33,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 19222528. Throughput: 0: 2923.3. Samples: 2301712. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:57:33,526][17621] Avg episode reward: [(0, '33.052')] [2024-07-05 10:57:36,843][22239] Updated weights for policy 0, policy_version 4703 (0.0011) [2024-07-05 10:57:38,526][17621] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 19279872. Throughput: 0: 2923.1. Samples: 2319172. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:57:38,527][17621] Avg episode reward: [(0, '34.933')] [2024-07-05 10:57:40,358][22239] Updated weights for policy 0, policy_version 4713 (0.0014) [2024-07-05 10:57:43,526][17621] Fps is (10 sec: 11468.6, 60 sec: 11605.3, 300 sec: 11677.1). Total num frames: 19337216. Throughput: 0: 2918.5. Samples: 2327704. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:57:43,527][17621] Avg episode reward: [(0, '36.222')] [2024-07-05 10:57:43,885][22239] Updated weights for policy 0, policy_version 4723 (0.0011) [2024-07-05 10:57:47,473][22239] Updated weights for policy 0, policy_version 4733 (0.0012) [2024-07-05 10:57:48,526][17621] Fps is (10 sec: 11468.5, 60 sec: 11605.3, 300 sec: 11663.2). Total num frames: 19394560. Throughput: 0: 2917.1. Samples: 2345104. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:57:48,527][17621] Avg episode reward: [(0, '36.659')] [2024-07-05 10:57:51,031][22239] Updated weights for policy 0, policy_version 4743 (0.0012) [2024-07-05 10:57:53,526][17621] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 19456000. Throughput: 0: 2914.0. Samples: 2362426. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:57:53,526][17621] Avg episode reward: [(0, '36.453')] [2024-07-05 10:57:54,561][22239] Updated weights for policy 0, policy_version 4753 (0.0012) [2024-07-05 10:57:58,121][22239] Updated weights for policy 0, policy_version 4763 (0.0011) [2024-07-05 10:57:58,525][17621] Fps is (10 sec: 11878.8, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 19513344. Throughput: 0: 2906.1. Samples: 2371034. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:57:58,526][17621] Avg episode reward: [(0, '34.101')] [2024-07-05 10:58:01,686][22239] Updated weights for policy 0, policy_version 4773 (0.0012) [2024-07-05 10:58:03,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 19570688. Throughput: 0: 2899.8. Samples: 2388202. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:58:03,527][17621] Avg episode reward: [(0, '34.025')] [2024-07-05 10:58:05,205][22239] Updated weights for policy 0, policy_version 4783 (0.0011) [2024-07-05 10:58:08,525][17621] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11663.2). Total num frames: 19628032. Throughput: 0: 2906.2. Samples: 2405950. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:58:08,527][17621] Avg episode reward: [(0, '36.401')] [2024-07-05 10:58:08,677][22239] Updated weights for policy 0, policy_version 4793 (0.0011) [2024-07-05 10:58:12,162][22239] Updated weights for policy 0, policy_version 4803 (0.0011) [2024-07-05 10:58:13,526][17621] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11663.2). Total num frames: 19685376. Throughput: 0: 2903.3. Samples: 2414656. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:58:13,526][17621] Avg episode reward: [(0, '37.105')] [2024-07-05 10:58:13,551][22225] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000004807_19689472.pth... [2024-07-05 10:58:13,627][22225] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000004123_16887808.pth [2024-07-05 10:58:15,714][22239] Updated weights for policy 0, policy_version 4813 (0.0012) [2024-07-05 10:58:18,526][17621] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11663.2). Total num frames: 19742720. Throughput: 0: 2895.2. Samples: 2431996. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:58:18,526][17621] Avg episode reward: [(0, '34.284')] [2024-07-05 10:58:19,293][22239] Updated weights for policy 0, policy_version 4823 (0.0012) [2024-07-05 10:58:22,803][22239] Updated weights for policy 0, policy_version 4833 (0.0011) [2024-07-05 10:58:23,525][17621] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 19804160. Throughput: 0: 2894.9. Samples: 2449442. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:58:23,526][17621] Avg episode reward: [(0, '34.538')] [2024-07-05 10:58:26,325][22239] Updated weights for policy 0, policy_version 4843 (0.0012) [2024-07-05 10:58:28,525][17621] Fps is (10 sec: 11878.6, 60 sec: 11605.3, 300 sec: 11677.1). Total num frames: 19861504. Throughput: 0: 2901.2. Samples: 2458256. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:58:28,526][17621] Avg episode reward: [(0, '36.055')] [2024-07-05 10:58:29,961][22239] Updated weights for policy 0, policy_version 4853 (0.0012) [2024-07-05 10:58:33,525][17621] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11677.1). Total num frames: 19918848. Throughput: 0: 2890.9. Samples: 2475192. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:58:33,526][17621] Avg episode reward: [(0, '37.918')] [2024-07-05 10:58:33,527][22239] Updated weights for policy 0, policy_version 4863 (0.0012) [2024-07-05 10:58:37,120][22239] Updated weights for policy 0, policy_version 4873 (0.0012) [2024-07-05 10:58:38,526][17621] Fps is (10 sec: 11059.1, 60 sec: 11537.1, 300 sec: 11649.3). Total num frames: 19972096. Throughput: 0: 2889.0. Samples: 2492432. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 10:58:38,526][17621] Avg episode reward: [(0, '37.645')] [2024-07-05 10:58:40,633][22239] Updated weights for policy 0, policy_version 4883 (0.0011) [2024-07-05 10:58:41,001][22225] Stopping Batcher_0... [2024-07-05 10:58:41,002][22225] Loop batcher_evt_loop terminating... [2024-07-05 10:58:41,001][17621] Component Batcher_0 stopped! [2024-07-05 10:58:41,002][22225] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000004884_20004864.pth... [2024-07-05 10:58:41,011][22241] Stopping RolloutWorker_w2... [2024-07-05 10:58:41,011][22242] Stopping RolloutWorker_w4... [2024-07-05 10:58:41,011][22238] Stopping RolloutWorker_w0... [2024-07-05 10:58:41,011][22240] Stopping RolloutWorker_w1... [2024-07-05 10:58:41,011][22241] Loop rollout_proc2_evt_loop terminating... [2024-07-05 10:58:41,011][22242] Loop rollout_proc4_evt_loop terminating... [2024-07-05 10:58:41,011][22238] Loop rollout_proc0_evt_loop terminating... [2024-07-05 10:58:41,011][22244] Stopping RolloutWorker_w5... [2024-07-05 10:58:41,011][22240] Loop rollout_proc1_evt_loop terminating... [2024-07-05 10:58:41,011][22243] Stopping RolloutWorker_w3... [2024-07-05 10:58:41,011][22244] Loop rollout_proc5_evt_loop terminating... [2024-07-05 10:58:41,011][22243] Loop rollout_proc3_evt_loop terminating... [2024-07-05 10:58:41,011][22245] Stopping RolloutWorker_w6... [2024-07-05 10:58:41,011][17621] Component RolloutWorker_w2 stopped! [2024-07-05 10:58:41,012][22245] Loop rollout_proc6_evt_loop terminating... [2024-07-05 10:58:41,012][22246] Stopping RolloutWorker_w7... [2024-07-05 10:58:41,012][22246] Loop rollout_proc7_evt_loop terminating... [2024-07-05 10:58:41,012][17621] Component RolloutWorker_w4 stopped! [2024-07-05 10:58:41,012][17621] Component RolloutWorker_w0 stopped! [2024-07-05 10:58:41,013][17621] Component RolloutWorker_w1 stopped! [2024-07-05 10:58:41,014][17621] Component RolloutWorker_w5 stopped! [2024-07-05 10:58:41,014][17621] Component RolloutWorker_w3 stopped! [2024-07-05 10:58:41,015][17621] Component RolloutWorker_w6 stopped! [2024-07-05 10:58:41,016][17621] Component RolloutWorker_w7 stopped! [2024-07-05 10:58:41,036][22239] Weights refcount: 2 0 [2024-07-05 10:58:41,037][22239] Stopping InferenceWorker_p0-w0... [2024-07-05 10:58:41,037][22239] Loop inference_proc0-0_evt_loop terminating... [2024-07-05 10:58:41,037][17621] Component InferenceWorker_p0-w0 stopped! [2024-07-05 10:58:41,096][22225] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000004466_18292736.pth [2024-07-05 10:58:41,108][22225] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000004884_20004864.pth... [2024-07-05 10:58:41,223][22225] Stopping LearnerWorker_p0... [2024-07-05 10:58:41,224][22225] Loop learner_proc0_evt_loop terminating... [2024-07-05 10:58:41,224][17621] Component LearnerWorker_p0 stopped! [2024-07-05 10:58:41,226][17621] Waiting for process learner_proc0 to stop... [2024-07-05 10:58:42,188][17621] Waiting for process inference_proc0-0 to join... [2024-07-05 10:58:42,189][17621] Waiting for process rollout_proc0 to join... [2024-07-05 10:58:42,189][17621] Waiting for process rollout_proc1 to join... [2024-07-05 10:58:42,189][17621] Waiting for process rollout_proc2 to join... [2024-07-05 10:58:42,189][17621] Waiting for process rollout_proc3 to join... [2024-07-05 10:58:42,190][17621] Waiting for process rollout_proc4 to join... [2024-07-05 10:58:42,190][17621] Waiting for process rollout_proc5 to join... [2024-07-05 10:58:42,190][17621] Waiting for process rollout_proc6 to join... [2024-07-05 10:58:42,191][17621] Waiting for process rollout_proc7 to join... [2024-07-05 10:58:42,191][17621] Batcher 0 profile tree view: batching: 11.5262, releasing_batches: 0.0404 [2024-07-05 10:58:42,192][17621] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 4.7420 update_model: 6.4541 weight_update: 0.0011 one_step: 0.0028 handle_policy_step: 831.2903 deserialize: 14.0334, stack: 1.9051, obs_to_device_normalize: 140.6688, forward: 486.9789, send_messages: 16.8591 prepare_outputs: 158.2148 to_cpu: 143.4491 [2024-07-05 10:58:42,192][17621] Learner 0 profile tree view: misc: 0.0081, prepare_batch: 24.7747 train: 621.7223 epoch_init: 0.0069, minibatch_init: 0.0093, losses_postprocess: 0.7488, kl_divergence: 0.5096, after_optimizer: 2.1024 calculate_losses: 204.2291 losses_init: 0.0040, forward_head: 3.0489, bptt_initial: 196.2611, tail: 0.9628, advantages_returns: 0.2643, losses: 1.9828 bptt: 1.4519 bptt_forward_core: 1.3872 update: 413.5385 clip: 1.8356 [2024-07-05 10:58:42,192][17621] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.1888, enqueue_policy_requests: 10.8761, env_step: 156.2023, overhead: 13.0633, complete_rollouts: 0.3381 save_policy_outputs: 13.9536 split_output_tensors: 6.6527 [2024-07-05 10:58:42,192][17621] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2006, enqueue_policy_requests: 11.7703, env_step: 169.0861, overhead: 14.4208, complete_rollouts: 0.3581 save_policy_outputs: 13.8412 split_output_tensors: 6.5571 [2024-07-05 10:58:42,193][17621] Loop Runner_EvtLoop terminating... [2024-07-05 10:58:42,193][17621] Runner profile tree view: main_loop: 866.2605 [2024-07-05 10:58:42,194][17621] Collected {0: 20004864}, FPS: 11542.0 [2024-07-05 11:00:05,182][17621] Loading existing experiment configuration from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/config.json [2024-07-05 11:00:05,183][17621] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-05 11:00:05,184][17621] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-05 11:00:05,184][17621] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-05 11:00:05,184][17621] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-05 11:00:05,185][17621] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-05 11:00:05,185][17621] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-07-05 11:00:05,185][17621] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-05 11:00:05,186][17621] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-07-05 11:00:05,186][17621] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-07-05 11:00:05,186][17621] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-05 11:00:05,187][17621] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-05 11:00:05,187][17621] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-05 11:00:05,187][17621] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-05 11:00:05,188][17621] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-05 11:00:05,201][17621] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 11:00:05,202][17621] RunningMeanStd input shape: (1,) [2024-07-05 11:00:05,208][17621] Num input channels: 3 [2024-07-05 11:00:05,213][17621] Convolutional layer output size: 4608 [2024-07-05 11:00:05,222][17621] Policy head output size: 512 [2024-07-05 11:00:05,277][17621] Loading state from checkpoint /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000004884_20004864.pth... [2024-07-05 11:00:05,866][17621] Num frames 100... [2024-07-05 11:00:05,937][17621] Num frames 200... [2024-07-05 11:00:06,008][17621] Num frames 300... [2024-07-05 11:00:06,078][17621] Num frames 400... [2024-07-05 11:00:06,150][17621] Num frames 500... [2024-07-05 11:00:06,223][17621] Num frames 600... [2024-07-05 11:00:06,295][17621] Num frames 700... [2024-07-05 11:00:06,368][17621] Num frames 800... [2024-07-05 11:00:06,442][17621] Num frames 900... [2024-07-05 11:00:06,509][17621] Avg episode rewards: #0: 20.170, true rewards: #0: 9.170 [2024-07-05 11:00:06,510][17621] Avg episode reward: 20.170, avg true_objective: 9.170 [2024-07-05 11:00:06,571][17621] Num frames 1000... [2024-07-05 11:00:06,647][17621] Num frames 1100... [2024-07-05 11:00:06,717][17621] Num frames 1200... [2024-07-05 11:00:06,790][17621] Num frames 1300... [2024-07-05 11:00:06,861][17621] Num frames 1400... [2024-07-05 11:00:06,932][17621] Num frames 1500... [2024-07-05 11:00:07,028][17621] Avg episode rewards: #0: 15.785, true rewards: #0: 7.785 [2024-07-05 11:00:07,029][17621] Avg episode reward: 15.785, avg true_objective: 7.785 [2024-07-05 11:00:07,066][17621] Num frames 1600... [2024-07-05 11:00:07,137][17621] Num frames 1700... [2024-07-05 11:00:07,209][17621] Num frames 1800... [2024-07-05 11:00:07,281][17621] Num frames 1900... [2024-07-05 11:00:07,355][17621] Num frames 2000... [2024-07-05 11:00:07,427][17621] Num frames 2100... [2024-07-05 11:00:07,499][17621] Num frames 2200... [2024-07-05 11:00:07,599][17621] Avg episode rewards: #0: 15.870, true rewards: #0: 7.537 [2024-07-05 11:00:07,601][17621] Avg episode reward: 15.870, avg true_objective: 7.537 [2024-07-05 11:00:07,637][17621] Num frames 2300... [2024-07-05 11:00:07,707][17621] Num frames 2400... [2024-07-05 11:00:07,780][17621] Num frames 2500... [2024-07-05 11:00:07,853][17621] Num frames 2600... [2024-07-05 11:00:07,926][17621] Num frames 2700... [2024-07-05 11:00:07,997][17621] Num frames 2800... [2024-07-05 11:00:08,070][17621] Num frames 2900... [2024-07-05 11:00:08,143][17621] Num frames 3000... [2024-07-05 11:00:08,218][17621] Num frames 3100... [2024-07-05 11:00:08,290][17621] Num frames 3200... [2024-07-05 11:00:08,363][17621] Num frames 3300... [2024-07-05 11:00:08,473][17621] Avg episode rewards: #0: 17.448, true rewards: #0: 8.447 [2024-07-05 11:00:08,475][17621] Avg episode reward: 17.448, avg true_objective: 8.447 [2024-07-05 11:00:08,498][17621] Num frames 3400... [2024-07-05 11:00:08,570][17621] Num frames 3500... [2024-07-05 11:00:08,641][17621] Num frames 3600... [2024-07-05 11:00:08,712][17621] Num frames 3700... [2024-07-05 11:00:08,812][17621] Avg episode rewards: #0: 14.726, true rewards: #0: 7.526 [2024-07-05 11:00:08,813][17621] Avg episode reward: 14.726, avg true_objective: 7.526 [2024-07-05 11:00:08,846][17621] Num frames 3800... [2024-07-05 11:00:08,923][17621] Num frames 3900... [2024-07-05 11:00:08,997][17621] Num frames 4000... [2024-07-05 11:00:09,075][17621] Num frames 4100... [2024-07-05 11:00:09,150][17621] Num frames 4200... [2024-07-05 11:00:09,226][17621] Num frames 4300... [2024-07-05 11:00:09,313][17621] Num frames 4400... [2024-07-05 11:00:09,385][17621] Num frames 4500... [2024-07-05 11:00:09,457][17621] Num frames 4600... [2024-07-05 11:00:09,532][17621] Num frames 4700... [2024-07-05 11:00:09,608][17621] Num frames 4800... [2024-07-05 11:00:09,683][17621] Num frames 4900... [2024-07-05 11:00:09,756][17621] Num frames 5000... [2024-07-05 11:00:09,832][17621] Num frames 5100... [2024-07-05 11:00:09,904][17621] Num frames 5200... [2024-07-05 11:00:09,981][17621] Num frames 5300... [2024-07-05 11:00:10,053][17621] Num frames 5400... [2024-07-05 11:00:10,131][17621] Num frames 5500... [2024-07-05 11:00:10,206][17621] Num frames 5600... [2024-07-05 11:00:10,290][17621] Num frames 5700... [2024-07-05 11:00:10,367][17621] Num frames 5800... [2024-07-05 11:00:10,468][17621] Avg episode rewards: #0: 21.105, true rewards: #0: 9.772 [2024-07-05 11:00:10,469][17621] Avg episode reward: 21.105, avg true_objective: 9.772 [2024-07-05 11:00:10,504][17621] Num frames 5900... [2024-07-05 11:00:10,581][17621] Num frames 6000... [2024-07-05 11:00:10,656][17621] Num frames 6100... [2024-07-05 11:00:10,731][17621] Num frames 6200... [2024-07-05 11:00:10,804][17621] Num frames 6300... [2024-07-05 11:00:10,876][17621] Num frames 6400... [2024-07-05 11:00:10,951][17621] Num frames 6500... [2024-07-05 11:00:11,022][17621] Num frames 6600... [2024-07-05 11:00:11,094][17621] Num frames 6700... [2024-07-05 11:00:11,168][17621] Num frames 6800... [2024-07-05 11:00:11,240][17621] Num frames 6900... [2024-07-05 11:00:11,313][17621] Num frames 7000... [2024-07-05 11:00:11,389][17621] Num frames 7100... [2024-07-05 11:00:11,462][17621] Num frames 7200... [2024-07-05 11:00:11,537][17621] Num frames 7300... [2024-07-05 11:00:11,614][17621] Num frames 7400... [2024-07-05 11:00:11,688][17621] Num frames 7500... [2024-07-05 11:00:11,761][17621] Num frames 7600... [2024-07-05 11:00:11,836][17621] Num frames 7700... [2024-07-05 11:00:11,911][17621] Num frames 7800... [2024-07-05 11:00:11,984][17621] Num frames 7900... [2024-07-05 11:00:12,084][17621] Avg episode rewards: #0: 25.518, true rewards: #0: 11.376 [2024-07-05 11:00:12,086][17621] Avg episode reward: 25.518, avg true_objective: 11.376 [2024-07-05 11:00:12,120][17621] Num frames 8000... [2024-07-05 11:00:12,190][17621] Num frames 8100... [2024-07-05 11:00:12,264][17621] Num frames 8200... [2024-07-05 11:00:12,345][17621] Num frames 8300... [2024-07-05 11:00:12,415][17621] Num frames 8400... [2024-07-05 11:00:12,492][17621] Num frames 8500... [2024-07-05 11:00:12,569][17621] Num frames 8600... [2024-07-05 11:00:12,642][17621] Num frames 8700... [2024-07-05 11:00:12,714][17621] Num frames 8800... [2024-07-05 11:00:12,787][17621] Num frames 8900... [2024-07-05 11:00:12,861][17621] Num frames 9000... [2024-07-05 11:00:12,935][17621] Num frames 9100... [2024-07-05 11:00:13,023][17621] Avg episode rewards: #0: 25.434, true rewards: #0: 11.434 [2024-07-05 11:00:13,025][17621] Avg episode reward: 25.434, avg true_objective: 11.434 [2024-07-05 11:00:13,065][17621] Num frames 9200... [2024-07-05 11:00:13,137][17621] Num frames 9300... [2024-07-05 11:00:13,211][17621] Num frames 9400... [2024-07-05 11:00:13,281][17621] Num frames 9500... [2024-07-05 11:00:13,354][17621] Num frames 9600... [2024-07-05 11:00:13,424][17621] Num frames 9700... [2024-07-05 11:00:13,496][17621] Num frames 9800... [2024-07-05 11:00:13,611][17621] Avg episode rewards: #0: 24.314, true rewards: #0: 10.981 [2024-07-05 11:00:13,612][17621] Avg episode reward: 24.314, avg true_objective: 10.981 [2024-07-05 11:00:13,628][17621] Num frames 9900... [2024-07-05 11:00:13,703][17621] Num frames 10000... [2024-07-05 11:00:13,775][17621] Num frames 10100... [2024-07-05 11:00:13,846][17621] Num frames 10200... [2024-07-05 11:00:13,919][17621] Num frames 10300... [2024-07-05 11:00:13,992][17621] Num frames 10400... [2024-07-05 11:00:14,065][17621] Num frames 10500... [2024-07-05 11:00:14,137][17621] Num frames 10600... [2024-07-05 11:00:14,211][17621] Num frames 10700... [2024-07-05 11:00:14,284][17621] Num frames 10800... [2024-07-05 11:00:14,360][17621] Num frames 10900... [2024-07-05 11:00:14,434][17621] Num frames 11000... [2024-07-05 11:00:14,507][17621] Num frames 11100... [2024-07-05 11:00:14,580][17621] Num frames 11200... [2024-07-05 11:00:14,652][17621] Num frames 11300... [2024-07-05 11:00:14,729][17621] Num frames 11400... [2024-07-05 11:00:14,800][17621] Num frames 11500... [2024-07-05 11:00:14,875][17621] Num frames 11600... [2024-07-05 11:00:14,949][17621] Num frames 11700... [2024-07-05 11:00:15,024][17621] Num frames 11800... [2024-07-05 11:00:15,103][17621] Num frames 11900... [2024-07-05 11:00:15,218][17621] Avg episode rewards: #0: 27.083, true rewards: #0: 11.983 [2024-07-05 11:00:15,219][17621] Avg episode reward: 27.083, avg true_objective: 11.983 [2024-07-05 11:00:27,747][17621] Replay video saved to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/replay.mp4! [2024-07-05 15:58:39,084][04005] Saving configuration to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/config.json... [2024-07-05 15:58:39,086][04005] Rollout worker 0 uses device cpu [2024-07-05 15:58:39,086][04005] Rollout worker 1 uses device cpu [2024-07-05 15:58:39,086][04005] Rollout worker 2 uses device cpu [2024-07-05 15:58:39,087][04005] Rollout worker 3 uses device cpu [2024-07-05 15:58:39,087][04005] Rollout worker 4 uses device cpu [2024-07-05 15:58:39,087][04005] Rollout worker 5 uses device cpu [2024-07-05 15:58:39,088][04005] Rollout worker 6 uses device cpu [2024-07-05 15:58:39,088][04005] Rollout worker 7 uses device cpu [2024-07-05 15:58:39,130][04005] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 15:58:39,131][04005] InferenceWorker_p0-w0: min num requests: 2 [2024-07-05 15:58:39,157][04005] Starting all processes... [2024-07-05 15:58:39,157][04005] Starting process learner_proc0 [2024-07-05 15:58:39,884][04005] Starting all processes... [2024-07-05 15:58:39,889][04005] Starting process inference_proc0-0 [2024-07-05 15:58:39,889][04005] Starting process rollout_proc0 [2024-07-05 15:58:39,890][04005] Starting process rollout_proc1 [2024-07-05 15:58:39,890][04005] Starting process rollout_proc2 [2024-07-05 15:58:39,890][04005] Starting process rollout_proc3 [2024-07-05 15:58:39,891][04005] Starting process rollout_proc4 [2024-07-05 15:58:39,891][04005] Starting process rollout_proc5 [2024-07-05 15:58:39,892][04005] Starting process rollout_proc6 [2024-07-05 15:58:39,894][04005] Starting process rollout_proc7 [2024-07-05 15:58:42,480][04599] Worker 3 uses CPU cores [6, 7] [2024-07-05 15:58:42,497][04596] Worker 2 uses CPU cores [4, 5] [2024-07-05 15:58:42,498][04581] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 15:58:42,498][04581] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-07-05 15:58:42,548][04601] Worker 7 uses CPU cores [14, 15] [2024-07-05 15:58:42,553][04581] Num visible devices: 1 [2024-07-05 15:58:42,567][04595] Worker 0 uses CPU cores [0, 1] [2024-07-05 15:58:42,585][04581] Setting fixed seed 200 [2024-07-05 15:58:42,588][04581] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 15:58:42,588][04581] Initializing actor-critic model on device cuda:0 [2024-07-05 15:58:42,588][04581] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 15:58:42,589][04581] RunningMeanStd input shape: (1,) [2024-07-05 15:58:42,598][04581] Num input channels: 3 [2024-07-05 15:58:42,599][04602] Worker 6 uses CPU cores [12, 13] [2024-07-05 15:58:42,630][04581] Convolutional layer output size: 4608 [2024-07-05 15:58:42,644][04581] Policy head output size: 512 [2024-07-05 15:58:42,718][04598] Worker 4 uses CPU cores [8, 9] [2024-07-05 15:58:42,762][04581] Created Actor Critic model with architecture: [2024-07-05 15:58:42,762][04581] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ResnetEncoder( (conv_head): Sequential( (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (2): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (3): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (5): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (6): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (7): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (8): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (9): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (10): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (11): ResBlock( (res_block_core): Sequential( (0): ELU(alpha=1.0) (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ELU(alpha=1.0) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (12): ELU(alpha=1.0) ) (mlp_layers): Sequential( (0): Linear(in_features=4608, out_features=512, bias=True) (1): ELU(alpha=1.0) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-07-05 15:58:42,791][04600] Worker 5 uses CPU cores [10, 11] [2024-07-05 15:58:42,808][04597] Worker 1 uses CPU cores [2, 3] [2024-07-05 15:58:42,849][04594] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 15:58:42,849][04594] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-07-05 15:58:42,876][04581] Using optimizer [2024-07-05 15:58:42,891][04594] Num visible devices: 1 [2024-07-05 15:58:43,368][04581] Loading state from checkpoint /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000004884_20004864.pth... [2024-07-05 15:58:43,439][04581] Loading model from checkpoint [2024-07-05 15:58:43,440][04581] Loaded experiment state at self.train_step=4884, self.env_steps=20004864 [2024-07-05 15:58:43,441][04581] Initialized policy 0 weights for model version 4884 [2024-07-05 15:58:43,442][04581] LearnerWorker_p0 finished initialization! [2024-07-05 15:58:43,442][04581] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-05 15:58:43,501][04594] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 15:58:43,503][04594] RunningMeanStd input shape: (1,) [2024-07-05 15:58:43,510][04594] Num input channels: 3 [2024-07-05 15:58:43,520][04594] Convolutional layer output size: 4608 [2024-07-05 15:58:43,531][04594] Policy head output size: 512 [2024-07-05 15:58:43,652][04005] Inference worker 0-0 is ready! [2024-07-05 15:58:43,653][04005] All inference workers are ready! Signal rollout workers to start! [2024-07-05 15:58:43,680][04602] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 15:58:43,680][04598] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 15:58:43,681][04600] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 15:58:43,681][04601] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 15:58:43,681][04599] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 15:58:43,681][04595] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 15:58:43,682][04597] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 15:58:43,682][04596] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 15:58:43,941][04005] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 20004864. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-05 15:58:44,254][04602] Decorrelating experience for 0 frames... [2024-07-05 15:58:44,254][04601] Decorrelating experience for 0 frames... [2024-07-05 15:58:44,254][04597] Decorrelating experience for 0 frames... [2024-07-05 15:58:44,254][04600] Decorrelating experience for 0 frames... [2024-07-05 15:58:44,254][04598] Decorrelating experience for 0 frames... [2024-07-05 15:58:44,254][04595] Decorrelating experience for 0 frames... [2024-07-05 15:58:44,402][04601] Decorrelating experience for 32 frames... [2024-07-05 15:58:44,402][04595] Decorrelating experience for 32 frames... [2024-07-05 15:58:44,402][04597] Decorrelating experience for 32 frames... [2024-07-05 15:58:44,405][04602] Decorrelating experience for 32 frames... [2024-07-05 15:58:44,414][04596] Decorrelating experience for 0 frames... [2024-07-05 15:58:44,414][04599] Decorrelating experience for 0 frames... [2024-07-05 15:58:44,562][04600] Decorrelating experience for 32 frames... [2024-07-05 15:58:44,562][04598] Decorrelating experience for 32 frames... [2024-07-05 15:58:44,597][04597] Decorrelating experience for 64 frames... [2024-07-05 15:58:44,597][04595] Decorrelating experience for 64 frames... [2024-07-05 15:58:44,597][04601] Decorrelating experience for 64 frames... [2024-07-05 15:58:44,719][04596] Decorrelating experience for 32 frames... [2024-07-05 15:58:44,752][04600] Decorrelating experience for 64 frames... [2024-07-05 15:58:44,760][04595] Decorrelating experience for 96 frames... [2024-07-05 15:58:44,760][04597] Decorrelating experience for 96 frames... [2024-07-05 15:58:44,901][04602] Decorrelating experience for 64 frames... [2024-07-05 15:58:44,901][04598] Decorrelating experience for 64 frames... [2024-07-05 15:58:44,906][04596] Decorrelating experience for 64 frames... [2024-07-05 15:58:44,922][04600] Decorrelating experience for 96 frames... [2024-07-05 15:58:45,066][04602] Decorrelating experience for 96 frames... [2024-07-05 15:58:45,067][04598] Decorrelating experience for 96 frames... [2024-07-05 15:58:45,067][04601] Decorrelating experience for 96 frames... [2024-07-05 15:58:45,073][04596] Decorrelating experience for 96 frames... [2024-07-05 15:58:45,228][04599] Decorrelating experience for 32 frames... [2024-07-05 15:58:45,421][04599] Decorrelating experience for 64 frames... [2024-07-05 15:58:45,596][04599] Decorrelating experience for 96 frames... [2024-07-05 15:58:46,023][04581] Signal inference workers to stop experience collection... [2024-07-05 15:58:46,029][04594] InferenceWorker_p0-w0: stopping experience collection [2024-07-05 15:58:48,865][04581] Signal inference workers to resume experience collection... [2024-07-05 15:58:48,866][04594] InferenceWorker_p0-w0: resuming experience collection [2024-07-05 15:58:48,942][04005] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 20008960. Throughput: 0: 580.4. Samples: 2902. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-07-05 15:58:48,943][04005] Avg episode reward: [(0, '2.016')] [2024-07-05 15:58:52,050][04594] Updated weights for policy 0, policy_version 4894 (0.0102) [2024-07-05 15:58:53,941][04005] Fps is (10 sec: 6144.0, 60 sec: 6144.0, 300 sec: 6144.0). Total num frames: 20066304. Throughput: 0: 1339.0. Samples: 13390. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-07-05 15:58:53,942][04005] Avg episode reward: [(0, '28.170')] [2024-07-05 15:58:55,529][04594] Updated weights for policy 0, policy_version 4904 (0.0012) [2024-07-05 15:58:58,941][04005] Fps is (10 sec: 11468.9, 60 sec: 7918.9, 300 sec: 7918.9). Total num frames: 20123648. Throughput: 0: 2075.2. Samples: 31128. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 15:58:58,942][04005] Avg episode reward: [(0, '31.070')] [2024-07-05 15:58:58,981][04594] Updated weights for policy 0, policy_version 4914 (0.0012) [2024-07-05 15:58:59,123][04005] Heartbeat connected on Batcher_0 [2024-07-05 15:58:59,134][04005] Heartbeat connected on RolloutWorker_w0 [2024-07-05 15:58:59,139][04005] Heartbeat connected on RolloutWorker_w1 [2024-07-05 15:58:59,140][04005] Heartbeat connected on RolloutWorker_w2 [2024-07-05 15:58:59,144][04005] Heartbeat connected on RolloutWorker_w3 [2024-07-05 15:58:59,147][04005] Heartbeat connected on RolloutWorker_w4 [2024-07-05 15:58:59,147][04005] Heartbeat connected on InferenceWorker_p0-w0 [2024-07-05 15:58:59,150][04005] Heartbeat connected on RolloutWorker_w5 [2024-07-05 15:58:59,153][04005] Heartbeat connected on RolloutWorker_w6 [2024-07-05 15:58:59,155][04005] Heartbeat connected on RolloutWorker_w7 [2024-07-05 15:58:59,332][04005] Heartbeat connected on LearnerWorker_p0 [2024-07-05 15:59:02,435][04594] Updated weights for policy 0, policy_version 4924 (0.0012) [2024-07-05 15:59:03,941][04005] Fps is (10 sec: 11878.5, 60 sec: 9011.2, 300 sec: 9011.2). Total num frames: 20185088. Throughput: 0: 2000.1. Samples: 40002. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 15:59:03,942][04005] Avg episode reward: [(0, '36.562')] [2024-07-05 15:59:05,898][04594] Updated weights for policy 0, policy_version 4934 (0.0012) [2024-07-05 15:59:08,942][04005] Fps is (10 sec: 11878.3, 60 sec: 9502.7, 300 sec: 9502.7). Total num frames: 20242432. Throughput: 0: 2303.9. Samples: 57598. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 15:59:08,943][04005] Avg episode reward: [(0, '36.161')] [2024-07-05 15:59:09,433][04594] Updated weights for policy 0, policy_version 4944 (0.0012) [2024-07-05 15:59:12,948][04594] Updated weights for policy 0, policy_version 4954 (0.0012) [2024-07-05 15:59:13,941][04005] Fps is (10 sec: 11468.7, 60 sec: 9830.4, 300 sec: 9830.4). Total num frames: 20299776. Throughput: 0: 2500.6. Samples: 75018. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 15:59:13,942][04005] Avg episode reward: [(0, '36.971')] [2024-07-05 15:59:16,445][04594] Updated weights for policy 0, policy_version 4964 (0.0012) [2024-07-05 15:59:18,941][04005] Fps is (10 sec: 11878.6, 60 sec: 10181.5, 300 sec: 10181.5). Total num frames: 20361216. Throughput: 0: 2399.7. Samples: 83990. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 15:59:18,942][04005] Avg episode reward: [(0, '39.046')] [2024-07-05 15:59:19,904][04594] Updated weights for policy 0, policy_version 4974 (0.0012) [2024-07-05 15:59:23,368][04594] Updated weights for policy 0, policy_version 4984 (0.0011) [2024-07-05 15:59:23,942][04005] Fps is (10 sec: 11878.3, 60 sec: 10342.4, 300 sec: 10342.4). Total num frames: 20418560. Throughput: 0: 2539.6. Samples: 101584. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 15:59:23,943][04005] Avg episode reward: [(0, '39.963')] [2024-07-05 15:59:26,839][04594] Updated weights for policy 0, policy_version 4994 (0.0012) [2024-07-05 15:59:28,941][04005] Fps is (10 sec: 11878.4, 60 sec: 10558.6, 300 sec: 10558.6). Total num frames: 20480000. Throughput: 0: 2657.6. Samples: 119590. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-07-05 15:59:28,942][04005] Avg episode reward: [(0, '40.570')] [2024-07-05 15:59:30,322][04594] Updated weights for policy 0, policy_version 5004 (0.0011) [2024-07-05 15:59:33,790][04594] Updated weights for policy 0, policy_version 5014 (0.0012) [2024-07-05 15:59:33,942][04005] Fps is (10 sec: 11878.4, 60 sec: 10649.6, 300 sec: 10649.6). Total num frames: 20537344. Throughput: 0: 2783.6. Samples: 128164. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 15:59:33,943][04005] Avg episode reward: [(0, '38.056')] [2024-07-05 15:59:37,260][04594] Updated weights for policy 0, policy_version 5024 (0.0011) [2024-07-05 15:59:38,941][04005] Fps is (10 sec: 11468.8, 60 sec: 10724.1, 300 sec: 10724.1). Total num frames: 20594688. Throughput: 0: 2946.8. Samples: 145998. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-07-05 15:59:38,942][04005] Avg episode reward: [(0, '37.824')] [2024-07-05 15:59:40,744][04594] Updated weights for policy 0, policy_version 5034 (0.0011) [2024-07-05 15:59:43,942][04005] Fps is (10 sec: 11878.4, 60 sec: 10854.4, 300 sec: 10854.4). Total num frames: 20656128. Throughput: 0: 2946.5. Samples: 163720. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-07-05 15:59:43,942][04005] Avg episode reward: [(0, '36.803')] [2024-07-05 15:59:44,218][04594] Updated weights for policy 0, policy_version 5044 (0.0012) [2024-07-05 15:59:47,729][04594] Updated weights for policy 0, policy_version 5054 (0.0011) [2024-07-05 15:59:48,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 10901.6). Total num frames: 20713472. Throughput: 0: 2938.9. Samples: 172254. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 15:59:48,942][04005] Avg episode reward: [(0, '38.379')] [2024-07-05 15:59:51,202][04594] Updated weights for policy 0, policy_version 5064 (0.0011) [2024-07-05 15:59:53,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.8, 300 sec: 10942.2). Total num frames: 20770816. Throughput: 0: 2945.4. Samples: 190142. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-07-05 15:59:53,943][04005] Avg episode reward: [(0, '36.251')] [2024-07-05 15:59:54,677][04594] Updated weights for policy 0, policy_version 5074 (0.0012) [2024-07-05 15:59:58,159][04594] Updated weights for policy 0, policy_version 5084 (0.0011) [2024-07-05 15:59:58,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11031.9). Total num frames: 20832256. Throughput: 0: 2951.2. Samples: 207820. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-07-05 15:59:58,942][04005] Avg episode reward: [(0, '34.416')] [2024-07-05 16:00:01,637][04594] Updated weights for policy 0, policy_version 5094 (0.0012) [2024-07-05 16:00:03,942][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.8, 300 sec: 11059.2). Total num frames: 20889600. Throughput: 0: 2943.8. Samples: 216462. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-07-05 16:00:03,943][04005] Avg episode reward: [(0, '35.229')] [2024-07-05 16:00:05,119][04594] Updated weights for policy 0, policy_version 5104 (0.0012) [2024-07-05 16:00:08,628][04594] Updated weights for policy 0, policy_version 5114 (0.0012) [2024-07-05 16:00:08,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11083.3). Total num frames: 20946944. Throughput: 0: 2948.4. Samples: 234262. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-07-05 16:00:08,943][04005] Avg episode reward: [(0, '37.528')] [2024-07-05 16:00:12,118][04594] Updated weights for policy 0, policy_version 5124 (0.0011) [2024-07-05 16:00:13,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11150.2). Total num frames: 21008384. Throughput: 0: 2938.4. Samples: 251820. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-07-05 16:00:13,943][04005] Avg episode reward: [(0, '38.785')] [2024-07-05 16:00:15,607][04594] Updated weights for policy 0, policy_version 5134 (0.0012) [2024-07-05 16:00:18,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11167.0). Total num frames: 21065728. Throughput: 0: 2938.2. Samples: 260384. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 16:00:18,942][04005] Avg episode reward: [(0, '36.914')] [2024-07-05 16:00:19,099][04594] Updated weights for policy 0, policy_version 5144 (0.0012) [2024-07-05 16:00:22,588][04594] Updated weights for policy 0, policy_version 5154 (0.0011) [2024-07-05 16:00:23,941][04005] Fps is (10 sec: 11469.0, 60 sec: 11741.9, 300 sec: 11182.1). Total num frames: 21123072. Throughput: 0: 2937.6. Samples: 278188. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 16:00:23,942][04005] Avg episode reward: [(0, '35.316')] [2024-07-05 16:00:26,076][04594] Updated weights for policy 0, policy_version 5164 (0.0012) [2024-07-05 16:00:28,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11234.8). Total num frames: 21184512. Throughput: 0: 2934.7. Samples: 295780. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 16:00:28,942][04005] Avg episode reward: [(0, '33.375')] [2024-07-05 16:00:29,573][04594] Updated weights for policy 0, policy_version 5174 (0.0011) [2024-07-05 16:00:33,064][04594] Updated weights for policy 0, policy_version 5184 (0.0011) [2024-07-05 16:00:33,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11245.4). Total num frames: 21241856. Throughput: 0: 2934.5. Samples: 304306. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 16:00:33,942][04005] Avg episode reward: [(0, '36.148')] [2024-07-05 16:00:34,116][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000005187_21245952.pth... [2024-07-05 16:00:34,191][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000004807_19689472.pth [2024-07-05 16:00:36,597][04594] Updated weights for policy 0, policy_version 5194 (0.0011) [2024-07-05 16:00:38,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11255.1). Total num frames: 21299200. Throughput: 0: 2925.5. Samples: 321790. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-07-05 16:00:38,942][04005] Avg episode reward: [(0, '35.679')] [2024-07-05 16:00:40,107][04594] Updated weights for policy 0, policy_version 5204 (0.0012) [2024-07-05 16:00:43,598][04594] Updated weights for policy 0, policy_version 5214 (0.0011) [2024-07-05 16:00:43,942][04005] Fps is (10 sec: 11468.5, 60 sec: 11673.6, 300 sec: 11264.0). Total num frames: 21356544. Throughput: 0: 2923.9. Samples: 339398. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-07-05 16:00:43,943][04005] Avg episode reward: [(0, '38.941')] [2024-07-05 16:00:47,097][04594] Updated weights for policy 0, policy_version 5224 (0.0012) [2024-07-05 16:00:48,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11305.0). Total num frames: 21417984. Throughput: 0: 2926.1. Samples: 348138. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:00:48,942][04005] Avg episode reward: [(0, '38.210')] [2024-07-05 16:00:50,591][04594] Updated weights for policy 0, policy_version 5234 (0.0012) [2024-07-05 16:00:53,942][04005] Fps is (10 sec: 11878.7, 60 sec: 11741.9, 300 sec: 11311.3). Total num frames: 21475328. Throughput: 0: 2918.9. Samples: 365614. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:00:53,942][04005] Avg episode reward: [(0, '37.468')] [2024-07-05 16:00:54,085][04594] Updated weights for policy 0, policy_version 5244 (0.0012) [2024-07-05 16:00:57,592][04594] Updated weights for policy 0, policy_version 5254 (0.0011) [2024-07-05 16:00:58,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11317.1). Total num frames: 21532672. Throughput: 0: 2920.1. Samples: 383222. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:00:58,942][04005] Avg episode reward: [(0, '37.018')] [2024-07-05 16:01:01,076][04594] Updated weights for policy 0, policy_version 5264 (0.0011) [2024-07-05 16:01:03,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11351.8). Total num frames: 21594112. Throughput: 0: 2927.9. Samples: 392138. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:01:03,942][04005] Avg episode reward: [(0, '39.217')] [2024-07-05 16:01:04,574][04594] Updated weights for policy 0, policy_version 5274 (0.0011) [2024-07-05 16:01:08,064][04594] Updated weights for policy 0, policy_version 5284 (0.0012) [2024-07-05 16:01:08,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11355.8). Total num frames: 21651456. Throughput: 0: 2920.6. Samples: 409616. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:01:08,943][04005] Avg episode reward: [(0, '38.099')] [2024-07-05 16:01:11,564][04594] Updated weights for policy 0, policy_version 5294 (0.0012) [2024-07-05 16:01:13,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11359.6). Total num frames: 21708800. Throughput: 0: 2919.3. Samples: 427148. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:01:13,943][04005] Avg episode reward: [(0, '38.902')] [2024-07-05 16:01:15,051][04594] Updated weights for policy 0, policy_version 5304 (0.0011) [2024-07-05 16:01:18,536][04594] Updated weights for policy 0, policy_version 5314 (0.0011) [2024-07-05 16:01:18,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11389.5). Total num frames: 21770240. Throughput: 0: 2929.9. Samples: 436150. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:01:18,942][04005] Avg episode reward: [(0, '37.198')] [2024-07-05 16:01:22,040][04594] Updated weights for policy 0, policy_version 5324 (0.0012) [2024-07-05 16:01:23,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11392.0). Total num frames: 21827584. Throughput: 0: 2929.9. Samples: 453638. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:01:23,943][04005] Avg episode reward: [(0, '35.189')] [2024-07-05 16:01:25,536][04594] Updated weights for policy 0, policy_version 5334 (0.0012) [2024-07-05 16:01:28,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11394.3). Total num frames: 21884928. Throughput: 0: 2927.5. Samples: 471134. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:01:28,942][04005] Avg episode reward: [(0, '36.917')] [2024-07-05 16:01:29,033][04594] Updated weights for policy 0, policy_version 5344 (0.0012) [2024-07-05 16:01:32,538][04594] Updated weights for policy 0, policy_version 5354 (0.0011) [2024-07-05 16:01:33,941][04005] Fps is (10 sec: 11878.7, 60 sec: 11741.9, 300 sec: 11420.6). Total num frames: 21946368. Throughput: 0: 2932.4. Samples: 480096. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:01:33,942][04005] Avg episode reward: [(0, '36.823')] [2024-07-05 16:01:36,057][04594] Updated weights for policy 0, policy_version 5364 (0.0011) [2024-07-05 16:01:38,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11422.0). Total num frames: 22003712. Throughput: 0: 2931.7. Samples: 497540. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:01:38,942][04005] Avg episode reward: [(0, '35.043')] [2024-07-05 16:01:39,567][04594] Updated weights for policy 0, policy_version 5374 (0.0012) [2024-07-05 16:01:43,080][04594] Updated weights for policy 0, policy_version 5384 (0.0012) [2024-07-05 16:01:43,942][04005] Fps is (10 sec: 11468.5, 60 sec: 11741.9, 300 sec: 11423.3). Total num frames: 22061056. Throughput: 0: 2928.9. Samples: 515022. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:01:43,943][04005] Avg episode reward: [(0, '36.434')] [2024-07-05 16:01:46,605][04594] Updated weights for policy 0, policy_version 5394 (0.0012) [2024-07-05 16:01:48,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11424.5). Total num frames: 22118400. Throughput: 0: 2923.9. Samples: 523712. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:01:48,942][04005] Avg episode reward: [(0, '34.980')] [2024-07-05 16:01:50,164][04594] Updated weights for policy 0, policy_version 5404 (0.0012) [2024-07-05 16:01:53,709][04594] Updated weights for policy 0, policy_version 5414 (0.0011) [2024-07-05 16:01:53,941][04005] Fps is (10 sec: 11469.0, 60 sec: 11673.6, 300 sec: 11425.7). Total num frames: 22175744. Throughput: 0: 2915.2. Samples: 540800. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:01:53,942][04005] Avg episode reward: [(0, '35.104')] [2024-07-05 16:01:57,212][04594] Updated weights for policy 0, policy_version 5424 (0.0012) [2024-07-05 16:01:58,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11673.6, 300 sec: 11426.8). Total num frames: 22233088. Throughput: 0: 2918.5. Samples: 558480. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:01:58,943][04005] Avg episode reward: [(0, '34.921')] [2024-07-05 16:02:00,711][04594] Updated weights for policy 0, policy_version 5434 (0.0012) [2024-07-05 16:02:03,942][04005] Fps is (10 sec: 11878.0, 60 sec: 11673.5, 300 sec: 11448.3). Total num frames: 22294528. Throughput: 0: 2913.1. Samples: 567240. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:02:03,943][04005] Avg episode reward: [(0, '37.775')] [2024-07-05 16:02:04,208][04594] Updated weights for policy 0, policy_version 5444 (0.0011) [2024-07-05 16:02:07,729][04594] Updated weights for policy 0, policy_version 5454 (0.0012) [2024-07-05 16:02:08,942][04005] Fps is (10 sec: 11878.6, 60 sec: 11673.6, 300 sec: 11448.8). Total num frames: 22351872. Throughput: 0: 2912.4. Samples: 584698. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:02:08,943][04005] Avg episode reward: [(0, '38.988')] [2024-07-05 16:02:11,239][04594] Updated weights for policy 0, policy_version 5464 (0.0012) [2024-07-05 16:02:13,941][04005] Fps is (10 sec: 11469.1, 60 sec: 11673.6, 300 sec: 11449.3). Total num frames: 22409216. Throughput: 0: 2911.2. Samples: 602140. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:02:13,942][04005] Avg episode reward: [(0, '37.844')] [2024-07-05 16:02:14,746][04594] Updated weights for policy 0, policy_version 5474 (0.0012) [2024-07-05 16:02:18,250][04594] Updated weights for policy 0, policy_version 5484 (0.0012) [2024-07-05 16:02:18,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11449.8). Total num frames: 22466560. Throughput: 0: 2910.7. Samples: 611080. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:02:18,942][04005] Avg episode reward: [(0, '37.513')] [2024-07-05 16:02:21,757][04594] Updated weights for policy 0, policy_version 5494 (0.0011) [2024-07-05 16:02:23,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11468.8). Total num frames: 22528000. Throughput: 0: 2912.4. Samples: 628596. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:02:23,942][04005] Avg episode reward: [(0, '37.673')] [2024-07-05 16:02:25,261][04594] Updated weights for policy 0, policy_version 5504 (0.0012) [2024-07-05 16:02:28,747][04594] Updated weights for policy 0, policy_version 5514 (0.0012) [2024-07-05 16:02:28,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11468.8). Total num frames: 22585344. Throughput: 0: 2912.9. Samples: 646104. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:02:28,942][04005] Avg episode reward: [(0, '38.139')] [2024-07-05 16:02:32,243][04594] Updated weights for policy 0, policy_version 5524 (0.0011) [2024-07-05 16:02:33,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11468.8). Total num frames: 22642688. Throughput: 0: 2918.9. Samples: 655064. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:02:33,942][04005] Avg episode reward: [(0, '36.187')] [2024-07-05 16:02:33,988][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000005529_22646784.pth... [2024-07-05 16:02:34,060][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000004884_20004864.pth [2024-07-05 16:02:35,758][04594] Updated weights for policy 0, policy_version 5534 (0.0012) [2024-07-05 16:02:38,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11486.2). Total num frames: 22704128. Throughput: 0: 2928.1. Samples: 672564. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:02:38,942][04005] Avg episode reward: [(0, '35.902')] [2024-07-05 16:02:39,272][04594] Updated weights for policy 0, policy_version 5544 (0.0011) [2024-07-05 16:02:42,781][04594] Updated weights for policy 0, policy_version 5554 (0.0012) [2024-07-05 16:02:43,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11485.9). Total num frames: 22761472. Throughput: 0: 2922.9. Samples: 690012. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:02:43,942][04005] Avg episode reward: [(0, '36.412')] [2024-07-05 16:02:46,287][04594] Updated weights for policy 0, policy_version 5564 (0.0012) [2024-07-05 16:02:48,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11485.5). Total num frames: 22818816. Throughput: 0: 2919.7. Samples: 698624. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:02:48,942][04005] Avg episode reward: [(0, '39.078')] [2024-07-05 16:02:49,783][04594] Updated weights for policy 0, policy_version 5574 (0.0011) [2024-07-05 16:02:53,276][04594] Updated weights for policy 0, policy_version 5584 (0.0011) [2024-07-05 16:02:53,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11485.2). Total num frames: 22876160. Throughput: 0: 2927.6. Samples: 716440. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:02:53,942][04005] Avg episode reward: [(0, '39.994')] [2024-07-05 16:02:56,778][04594] Updated weights for policy 0, policy_version 5594 (0.0011) [2024-07-05 16:02:58,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11500.9). Total num frames: 22937600. Throughput: 0: 2928.7. Samples: 733932. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:02:58,942][04005] Avg episode reward: [(0, '39.747')] [2024-07-05 16:03:00,285][04594] Updated weights for policy 0, policy_version 5604 (0.0012) [2024-07-05 16:03:03,779][04594] Updated weights for policy 0, policy_version 5614 (0.0012) [2024-07-05 16:03:03,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11500.3). Total num frames: 22994944. Throughput: 0: 2919.5. Samples: 742456. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:03:03,942][04005] Avg episode reward: [(0, '40.073')] [2024-07-05 16:03:07,291][04594] Updated weights for policy 0, policy_version 5624 (0.0012) [2024-07-05 16:03:08,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11499.7). Total num frames: 23052288. Throughput: 0: 2920.2. Samples: 760004. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:03:08,943][04005] Avg episode reward: [(0, '39.329')] [2024-07-05 16:03:10,802][04594] Updated weights for policy 0, policy_version 5634 (0.0011) [2024-07-05 16:03:13,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11499.1). Total num frames: 23109632. Throughput: 0: 2922.3. Samples: 777608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:03:13,943][04005] Avg episode reward: [(0, '38.912')] [2024-07-05 16:03:14,303][04594] Updated weights for policy 0, policy_version 5644 (0.0011) [2024-07-05 16:03:17,809][04594] Updated weights for policy 0, policy_version 5654 (0.0012) [2024-07-05 16:03:18,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11513.5). Total num frames: 23171072. Throughput: 0: 2916.2. Samples: 786294. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:03:18,942][04005] Avg episode reward: [(0, '38.858')] [2024-07-05 16:03:21,322][04594] Updated weights for policy 0, policy_version 5664 (0.0012) [2024-07-05 16:03:23,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11512.7). Total num frames: 23228416. Throughput: 0: 2915.8. Samples: 803776. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:03:23,942][04005] Avg episode reward: [(0, '39.360')] [2024-07-05 16:03:24,812][04594] Updated weights for policy 0, policy_version 5674 (0.0012) [2024-07-05 16:03:28,309][04594] Updated weights for policy 0, policy_version 5684 (0.0012) [2024-07-05 16:03:28,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11511.9). Total num frames: 23285760. Throughput: 0: 2916.7. Samples: 821262. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:03:28,942][04005] Avg episode reward: [(0, '39.606')] [2024-07-05 16:03:31,805][04594] Updated weights for policy 0, policy_version 5694 (0.0012) [2024-07-05 16:03:33,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11525.3). Total num frames: 23347200. Throughput: 0: 2924.8. Samples: 830242. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:03:33,942][04005] Avg episode reward: [(0, '37.605')] [2024-07-05 16:03:35,326][04594] Updated weights for policy 0, policy_version 5704 (0.0012) [2024-07-05 16:03:38,833][04594] Updated weights for policy 0, policy_version 5714 (0.0011) [2024-07-05 16:03:38,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11524.3). Total num frames: 23404544. Throughput: 0: 2916.4. Samples: 847676. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:03:38,942][04005] Avg episode reward: [(0, '37.575')] [2024-07-05 16:03:42,337][04594] Updated weights for policy 0, policy_version 5724 (0.0012) [2024-07-05 16:03:43,942][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 23461888. Throughput: 0: 2916.6. Samples: 865180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:03:43,942][04005] Avg episode reward: [(0, '38.607')] [2024-07-05 16:03:45,834][04594] Updated weights for policy 0, policy_version 5734 (0.0011) [2024-07-05 16:03:48,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 23519232. Throughput: 0: 2926.0. Samples: 874128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:03:48,943][04005] Avg episode reward: [(0, '37.394')] [2024-07-05 16:03:49,344][04594] Updated weights for policy 0, policy_version 5744 (0.0011) [2024-07-05 16:03:52,832][04594] Updated weights for policy 0, policy_version 5754 (0.0012) [2024-07-05 16:03:53,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 23580672. Throughput: 0: 2925.6. Samples: 891656. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:03:53,943][04005] Avg episode reward: [(0, '34.938')] [2024-07-05 16:03:56,344][04594] Updated weights for policy 0, policy_version 5764 (0.0011) [2024-07-05 16:03:58,941][04005] Fps is (10 sec: 11878.7, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 23638016. Throughput: 0: 2923.2. Samples: 909152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:03:58,942][04005] Avg episode reward: [(0, '33.847')] [2024-07-05 16:03:59,837][04594] Updated weights for policy 0, policy_version 5774 (0.0011) [2024-07-05 16:04:03,327][04594] Updated weights for policy 0, policy_version 5784 (0.0012) [2024-07-05 16:04:03,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 23695360. Throughput: 0: 2927.8. Samples: 918044. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:04:03,943][04005] Avg episode reward: [(0, '36.956')] [2024-07-05 16:04:06,826][04594] Updated weights for policy 0, policy_version 5794 (0.0012) [2024-07-05 16:04:08,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 23756800. Throughput: 0: 2930.4. Samples: 935646. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:04:08,942][04005] Avg episode reward: [(0, '37.428')] [2024-07-05 16:04:10,345][04594] Updated weights for policy 0, policy_version 5804 (0.0012) [2024-07-05 16:04:13,850][04594] Updated weights for policy 0, policy_version 5814 (0.0012) [2024-07-05 16:04:13,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 23814144. Throughput: 0: 2928.9. Samples: 953064. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:04:13,942][04005] Avg episode reward: [(0, '38.566')] [2024-07-05 16:04:17,364][04594] Updated weights for policy 0, policy_version 5824 (0.0012) [2024-07-05 16:04:18,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 23871488. Throughput: 0: 2918.1. Samples: 961558. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:04:18,942][04005] Avg episode reward: [(0, '39.139')] [2024-07-05 16:04:20,869][04594] Updated weights for policy 0, policy_version 5834 (0.0012) [2024-07-05 16:04:23,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 23928832. Throughput: 0: 2923.1. Samples: 979216. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:04:23,942][04005] Avg episode reward: [(0, '39.102')] [2024-07-05 16:04:24,378][04594] Updated weights for policy 0, policy_version 5844 (0.0012) [2024-07-05 16:04:27,876][04594] Updated weights for policy 0, policy_version 5854 (0.0012) [2024-07-05 16:04:28,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 23990272. Throughput: 0: 2927.2. Samples: 996904. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:04:28,942][04005] Avg episode reward: [(0, '35.748')] [2024-07-05 16:04:31,391][04594] Updated weights for policy 0, policy_version 5864 (0.0012) [2024-07-05 16:04:33,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 24047616. Throughput: 0: 2918.4. Samples: 1005456. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:04:33,942][04005] Avg episode reward: [(0, '36.600')] [2024-07-05 16:04:34,180][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000005872_24051712.pth... [2024-07-05 16:04:34,252][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000005187_21245952.pth [2024-07-05 16:04:34,893][04594] Updated weights for policy 0, policy_version 5874 (0.0011) [2024-07-05 16:04:38,387][04594] Updated weights for policy 0, policy_version 5884 (0.0012) [2024-07-05 16:04:38,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 24104960. Throughput: 0: 2917.9. Samples: 1022960. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:04:38,943][04005] Avg episode reward: [(0, '35.120')] [2024-07-05 16:04:41,908][04594] Updated weights for policy 0, policy_version 5894 (0.0012) [2024-07-05 16:04:43,941][04005] Fps is (10 sec: 11469.0, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 24162304. Throughput: 0: 2917.0. Samples: 1040416. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:04:43,942][04005] Avg episode reward: [(0, '37.577')] [2024-07-05 16:04:45,409][04594] Updated weights for policy 0, policy_version 5904 (0.0012) [2024-07-05 16:04:48,928][04594] Updated weights for policy 0, policy_version 5914 (0.0012) [2024-07-05 16:04:48,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11704.9). Total num frames: 24223744. Throughput: 0: 2918.6. Samples: 1049380. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:04:48,942][04005] Avg episode reward: [(0, '39.249')] [2024-07-05 16:04:52,435][04594] Updated weights for policy 0, policy_version 5924 (0.0012) [2024-07-05 16:04:53,942][04005] Fps is (10 sec: 11878.2, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 24281088. Throughput: 0: 2916.0. Samples: 1066866. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:04:53,942][04005] Avg episode reward: [(0, '41.215')] [2024-07-05 16:04:54,182][04581] Saving new best policy, reward=41.215! [2024-07-05 16:04:55,948][04594] Updated weights for policy 0, policy_version 5934 (0.0012) [2024-07-05 16:04:58,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 24338432. Throughput: 0: 2917.2. Samples: 1084338. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:04:58,942][04005] Avg episode reward: [(0, '38.113')] [2024-07-05 16:04:59,449][04594] Updated weights for policy 0, policy_version 5944 (0.0012) [2024-07-05 16:05:02,963][04594] Updated weights for policy 0, policy_version 5954 (0.0012) [2024-07-05 16:05:03,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 24395776. Throughput: 0: 2926.9. Samples: 1093270. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:05:03,942][04005] Avg episode reward: [(0, '36.393')] [2024-07-05 16:05:06,475][04594] Updated weights for policy 0, policy_version 5964 (0.0012) [2024-07-05 16:05:08,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 24457216. Throughput: 0: 2923.0. Samples: 1110750. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:05:08,942][04005] Avg episode reward: [(0, '34.817')] [2024-07-05 16:05:10,003][04594] Updated weights for policy 0, policy_version 5974 (0.0011) [2024-07-05 16:05:13,497][04594] Updated weights for policy 0, policy_version 5984 (0.0011) [2024-07-05 16:05:13,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 24514560. Throughput: 0: 2916.9. Samples: 1128166. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:05:13,942][04005] Avg episode reward: [(0, '35.833')] [2024-07-05 16:05:17,002][04594] Updated weights for policy 0, policy_version 5994 (0.0011) [2024-07-05 16:05:18,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 24571904. Throughput: 0: 2916.9. Samples: 1136718. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:05:18,942][04005] Avg episode reward: [(0, '36.809')] [2024-07-05 16:05:20,516][04594] Updated weights for policy 0, policy_version 6004 (0.0012) [2024-07-05 16:05:23,942][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 24629248. Throughput: 0: 2920.1. Samples: 1154366. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:05:23,942][04005] Avg episode reward: [(0, '35.293')] [2024-07-05 16:05:24,013][04594] Updated weights for policy 0, policy_version 6014 (0.0012) [2024-07-05 16:05:27,509][04594] Updated weights for policy 0, policy_version 6024 (0.0012) [2024-07-05 16:05:28,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 24690688. Throughput: 0: 2926.6. Samples: 1172112. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:05:28,942][04005] Avg episode reward: [(0, '38.152')] [2024-07-05 16:05:31,034][04594] Updated weights for policy 0, policy_version 6034 (0.0011) [2024-07-05 16:05:33,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 24748032. Throughput: 0: 2916.7. Samples: 1180632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:05:33,942][04005] Avg episode reward: [(0, '38.182')] [2024-07-05 16:05:34,542][04594] Updated weights for policy 0, policy_version 6044 (0.0011) [2024-07-05 16:05:38,052][04594] Updated weights for policy 0, policy_version 6054 (0.0012) [2024-07-05 16:05:38,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 24805376. Throughput: 0: 2915.3. Samples: 1198056. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:05:38,942][04005] Avg episode reward: [(0, '38.365')] [2024-07-05 16:05:41,560][04594] Updated weights for policy 0, policy_version 6064 (0.0012) [2024-07-05 16:05:43,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 24862720. Throughput: 0: 2914.9. Samples: 1215510. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:05:43,942][04005] Avg episode reward: [(0, '37.063')] [2024-07-05 16:05:45,077][04594] Updated weights for policy 0, policy_version 6074 (0.0013) [2024-07-05 16:05:48,577][04594] Updated weights for policy 0, policy_version 6084 (0.0011) [2024-07-05 16:05:48,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 24924160. Throughput: 0: 2915.2. Samples: 1224456. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:05:48,942][04005] Avg episode reward: [(0, '38.329')] [2024-07-05 16:05:52,084][04594] Updated weights for policy 0, policy_version 6094 (0.0012) [2024-07-05 16:05:53,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 24981504. Throughput: 0: 2915.1. Samples: 1241930. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:05:53,942][04005] Avg episode reward: [(0, '37.814')] [2024-07-05 16:05:55,596][04594] Updated weights for policy 0, policy_version 6104 (0.0012) [2024-07-05 16:05:58,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 25038848. Throughput: 0: 2916.0. Samples: 1259386. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:05:58,943][04005] Avg episode reward: [(0, '37.550')] [2024-07-05 16:05:59,115][04594] Updated weights for policy 0, policy_version 6114 (0.0012) [2024-07-05 16:06:02,643][04594] Updated weights for policy 0, policy_version 6124 (0.0011) [2024-07-05 16:06:03,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 25096192. Throughput: 0: 2920.9. Samples: 1268160. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:06:03,942][04005] Avg episode reward: [(0, '38.212')] [2024-07-05 16:06:06,139][04594] Updated weights for policy 0, policy_version 6134 (0.0012) [2024-07-05 16:06:08,941][04005] Fps is (10 sec: 11469.0, 60 sec: 11605.3, 300 sec: 11677.1). Total num frames: 25153536. Throughput: 0: 2918.8. Samples: 1285710. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:06:08,942][04005] Avg episode reward: [(0, '39.495')] [2024-07-05 16:06:09,678][04594] Updated weights for policy 0, policy_version 6144 (0.0012) [2024-07-05 16:06:13,188][04594] Updated weights for policy 0, policy_version 6154 (0.0012) [2024-07-05 16:06:13,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 25214976. Throughput: 0: 2912.0. Samples: 1303150. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:06:13,942][04005] Avg episode reward: [(0, '41.898')] [2024-07-05 16:06:13,945][04581] Saving new best policy, reward=41.898! [2024-07-05 16:06:16,703][04594] Updated weights for policy 0, policy_version 6164 (0.0012) [2024-07-05 16:06:18,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 25272320. Throughput: 0: 2912.1. Samples: 1311678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:06:18,942][04005] Avg episode reward: [(0, '41.385')] [2024-07-05 16:06:20,203][04594] Updated weights for policy 0, policy_version 6174 (0.0012) [2024-07-05 16:06:23,713][04594] Updated weights for policy 0, policy_version 6184 (0.0012) [2024-07-05 16:06:23,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 25329664. Throughput: 0: 2912.0. Samples: 1329098. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:06:23,942][04005] Avg episode reward: [(0, '38.714')] [2024-07-05 16:06:27,216][04594] Updated weights for policy 0, policy_version 6194 (0.0012) [2024-07-05 16:06:28,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11663.2). Total num frames: 25387008. Throughput: 0: 2916.3. Samples: 1346742. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:06:28,942][04005] Avg episode reward: [(0, '35.164')] [2024-07-05 16:06:30,733][04594] Updated weights for policy 0, policy_version 6204 (0.0012) [2024-07-05 16:06:33,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 25448448. Throughput: 0: 2912.3. Samples: 1355508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:06:33,943][04005] Avg episode reward: [(0, '33.322')] [2024-07-05 16:06:33,946][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000006213_25448448.pth... [2024-07-05 16:06:34,018][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000005529_22646784.pth [2024-07-05 16:06:34,289][04594] Updated weights for policy 0, policy_version 6214 (0.0012) [2024-07-05 16:06:37,758][04594] Updated weights for policy 0, policy_version 6224 (0.0013) [2024-07-05 16:06:38,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 25505792. Throughput: 0: 2911.7. Samples: 1372958. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:06:38,942][04005] Avg episode reward: [(0, '34.852')] [2024-07-05 16:06:41,276][04594] Updated weights for policy 0, policy_version 6234 (0.0012) [2024-07-05 16:06:43,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 25563136. Throughput: 0: 2911.9. Samples: 1390422. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:06:43,943][04005] Avg episode reward: [(0, '38.761')] [2024-07-05 16:06:44,784][04594] Updated weights for policy 0, policy_version 6244 (0.0012) [2024-07-05 16:06:48,296][04594] Updated weights for policy 0, policy_version 6254 (0.0011) [2024-07-05 16:06:48,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11677.1). Total num frames: 25620480. Throughput: 0: 2915.3. Samples: 1399350. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:06:48,942][04005] Avg episode reward: [(0, '39.320')] [2024-07-05 16:06:51,802][04594] Updated weights for policy 0, policy_version 6264 (0.0011) [2024-07-05 16:06:53,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 25681920. Throughput: 0: 2913.0. Samples: 1416796. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:06:53,942][04005] Avg episode reward: [(0, '39.177')] [2024-07-05 16:06:55,313][04594] Updated weights for policy 0, policy_version 6274 (0.0012) [2024-07-05 16:06:58,832][04594] Updated weights for policy 0, policy_version 6284 (0.0012) [2024-07-05 16:06:58,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 25739264. Throughput: 0: 2914.3. Samples: 1434294. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:06:58,943][04005] Avg episode reward: [(0, '36.483')] [2024-07-05 16:07:02,347][04594] Updated weights for policy 0, policy_version 6294 (0.0012) [2024-07-05 16:07:03,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 25796608. Throughput: 0: 2914.6. Samples: 1442834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:07:03,942][04005] Avg episode reward: [(0, '38.030')] [2024-07-05 16:07:05,882][04594] Updated weights for policy 0, policy_version 6304 (0.0012) [2024-07-05 16:07:08,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 25853952. Throughput: 0: 2917.4. Samples: 1460380. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:07:08,942][04005] Avg episode reward: [(0, '39.230')] [2024-07-05 16:07:09,391][04594] Updated weights for policy 0, policy_version 6314 (0.0012) [2024-07-05 16:07:12,901][04594] Updated weights for policy 0, policy_version 6324 (0.0012) [2024-07-05 16:07:13,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11677.1). Total num frames: 25911296. Throughput: 0: 2917.5. Samples: 1478030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:07:13,942][04005] Avg episode reward: [(0, '39.288')] [2024-07-05 16:07:16,409][04594] Updated weights for policy 0, policy_version 6334 (0.0012) [2024-07-05 16:07:18,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 25972736. Throughput: 0: 2915.9. Samples: 1486722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:07:18,942][04005] Avg episode reward: [(0, '39.710')] [2024-07-05 16:07:19,916][04594] Updated weights for policy 0, policy_version 6344 (0.0012) [2024-07-05 16:07:23,425][04594] Updated weights for policy 0, policy_version 6354 (0.0012) [2024-07-05 16:07:23,942][04005] Fps is (10 sec: 11878.0, 60 sec: 11673.5, 300 sec: 11677.1). Total num frames: 26030080. Throughput: 0: 2916.6. Samples: 1504206. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:07:23,943][04005] Avg episode reward: [(0, '36.320')] [2024-07-05 16:07:26,936][04594] Updated weights for policy 0, policy_version 6364 (0.0013) [2024-07-05 16:07:28,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 26087424. Throughput: 0: 2915.7. Samples: 1521630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:07:28,942][04005] Avg episode reward: [(0, '37.512')] [2024-07-05 16:07:30,452][04594] Updated weights for policy 0, policy_version 6374 (0.0012) [2024-07-05 16:07:33,941][04005] Fps is (10 sec: 11469.2, 60 sec: 11605.3, 300 sec: 11663.2). Total num frames: 26144768. Throughput: 0: 2916.8. Samples: 1530608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:07:33,942][04005] Avg episode reward: [(0, '38.490')] [2024-07-05 16:07:33,951][04594] Updated weights for policy 0, policy_version 6384 (0.0012) [2024-07-05 16:07:37,461][04594] Updated weights for policy 0, policy_version 6394 (0.0012) [2024-07-05 16:07:38,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 26206208. Throughput: 0: 2917.3. Samples: 1548076. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:07:38,942][04005] Avg episode reward: [(0, '39.602')] [2024-07-05 16:07:40,980][04594] Updated weights for policy 0, policy_version 6404 (0.0012) [2024-07-05 16:07:43,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 26263552. Throughput: 0: 2917.0. Samples: 1565558. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:07:43,943][04005] Avg episode reward: [(0, '39.232')] [2024-07-05 16:07:44,494][04594] Updated weights for policy 0, policy_version 6414 (0.0011) [2024-07-05 16:07:48,011][04594] Updated weights for policy 0, policy_version 6424 (0.0012) [2024-07-05 16:07:48,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 26320896. Throughput: 0: 2921.0. Samples: 1574278. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:07:48,943][04005] Avg episode reward: [(0, '39.272')] [2024-07-05 16:07:51,515][04594] Updated weights for policy 0, policy_version 6434 (0.0012) [2024-07-05 16:07:53,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11663.2). Total num frames: 26378240. Throughput: 0: 2924.0. Samples: 1591960. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:07:53,943][04005] Avg episode reward: [(0, '39.161')] [2024-07-05 16:07:55,035][04594] Updated weights for policy 0, policy_version 6444 (0.0012) [2024-07-05 16:07:58,533][04594] Updated weights for policy 0, policy_version 6454 (0.0011) [2024-07-05 16:07:58,941][04005] Fps is (10 sec: 11878.7, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 26439680. Throughput: 0: 2919.0. Samples: 1609384. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:07:58,942][04005] Avg episode reward: [(0, '36.501')] [2024-07-05 16:08:02,051][04594] Updated weights for policy 0, policy_version 6464 (0.0012) [2024-07-05 16:08:03,942][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 26497024. Throughput: 0: 2914.8. Samples: 1617888. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:08:03,943][04005] Avg episode reward: [(0, '35.316')] [2024-07-05 16:08:05,573][04594] Updated weights for policy 0, policy_version 6474 (0.0012) [2024-07-05 16:08:08,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 26554368. Throughput: 0: 2913.8. Samples: 1635326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:08:08,942][04005] Avg episode reward: [(0, '37.040')] [2024-07-05 16:08:09,080][04594] Updated weights for policy 0, policy_version 6484 (0.0011) [2024-07-05 16:08:12,611][04594] Updated weights for policy 0, policy_version 6494 (0.0012) [2024-07-05 16:08:13,942][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 26611712. Throughput: 0: 2913.6. Samples: 1652742. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:08:13,943][04005] Avg episode reward: [(0, '38.075')] [2024-07-05 16:08:16,124][04594] Updated weights for policy 0, policy_version 6504 (0.0012) [2024-07-05 16:08:18,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 26673152. Throughput: 0: 2912.9. Samples: 1661690. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:08:18,942][04005] Avg episode reward: [(0, '40.264')] [2024-07-05 16:08:19,639][04594] Updated weights for policy 0, policy_version 6514 (0.0012) [2024-07-05 16:08:23,146][04594] Updated weights for policy 0, policy_version 6524 (0.0012) [2024-07-05 16:08:23,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11673.7, 300 sec: 11677.1). Total num frames: 26730496. Throughput: 0: 2912.7. Samples: 1679148. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:08:23,942][04005] Avg episode reward: [(0, '40.484')] [2024-07-05 16:08:26,662][04594] Updated weights for policy 0, policy_version 6534 (0.0012) [2024-07-05 16:08:28,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 26787840. Throughput: 0: 2912.4. Samples: 1696614. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:08:28,943][04005] Avg episode reward: [(0, '40.487')] [2024-07-05 16:08:30,153][04594] Updated weights for policy 0, policy_version 6544 (0.0012) [2024-07-05 16:08:33,677][04594] Updated weights for policy 0, policy_version 6554 (0.0012) [2024-07-05 16:08:33,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 26845184. Throughput: 0: 2917.8. Samples: 1705578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:08:33,942][04005] Avg episode reward: [(0, '37.458')] [2024-07-05 16:08:34,025][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000006555_26849280.pth... [2024-07-05 16:08:34,097][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000005872_24051712.pth [2024-07-05 16:08:37,193][04594] Updated weights for policy 0, policy_version 6564 (0.0011) [2024-07-05 16:08:38,942][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11663.2). Total num frames: 26902528. Throughput: 0: 2912.0. Samples: 1723002. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:08:38,943][04005] Avg episode reward: [(0, '35.862')] [2024-07-05 16:08:40,722][04594] Updated weights for policy 0, policy_version 6574 (0.0012) [2024-07-05 16:08:43,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 26963968. Throughput: 0: 2911.1. Samples: 1740386. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:08:43,942][04005] Avg episode reward: [(0, '35.705')] [2024-07-05 16:08:44,234][04594] Updated weights for policy 0, policy_version 6584 (0.0012) [2024-07-05 16:08:47,754][04594] Updated weights for policy 0, policy_version 6594 (0.0011) [2024-07-05 16:08:48,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 27021312. Throughput: 0: 2911.9. Samples: 1748924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:08:48,942][04005] Avg episode reward: [(0, '36.941')] [2024-07-05 16:08:51,254][04594] Updated weights for policy 0, policy_version 6604 (0.0011) [2024-07-05 16:08:53,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 27078656. Throughput: 0: 2912.6. Samples: 1766394. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:08:53,942][04005] Avg episode reward: [(0, '37.346')] [2024-07-05 16:08:54,764][04594] Updated weights for policy 0, policy_version 6614 (0.0011) [2024-07-05 16:08:58,269][04594] Updated weights for policy 0, policy_version 6624 (0.0012) [2024-07-05 16:08:58,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11663.2). Total num frames: 27136000. Throughput: 0: 2918.8. Samples: 1784086. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:08:58,943][04005] Avg episode reward: [(0, '37.157')] [2024-07-05 16:09:01,773][04594] Updated weights for policy 0, policy_version 6634 (0.0012) [2024-07-05 16:09:03,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 27197440. Throughput: 0: 2914.2. Samples: 1792830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:09:03,942][04005] Avg episode reward: [(0, '38.656')] [2024-07-05 16:09:05,279][04594] Updated weights for policy 0, policy_version 6644 (0.0011) [2024-07-05 16:09:08,790][04594] Updated weights for policy 0, policy_version 6654 (0.0012) [2024-07-05 16:09:08,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 27254784. Throughput: 0: 2914.6. Samples: 1810304. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:09:08,943][04005] Avg episode reward: [(0, '37.093')] [2024-07-05 16:09:12,311][04594] Updated weights for policy 0, policy_version 6664 (0.0012) [2024-07-05 16:09:13,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 27312128. Throughput: 0: 2914.0. Samples: 1827746. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:09:13,943][04005] Avg episode reward: [(0, '37.523')] [2024-07-05 16:09:15,821][04594] Updated weights for policy 0, policy_version 6674 (0.0012) [2024-07-05 16:09:18,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11663.2). Total num frames: 27369472. Throughput: 0: 2913.8. Samples: 1836700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:09:18,942][04005] Avg episode reward: [(0, '38.064')] [2024-07-05 16:09:19,335][04594] Updated weights for policy 0, policy_version 6684 (0.0012) [2024-07-05 16:09:22,839][04594] Updated weights for policy 0, policy_version 6694 (0.0012) [2024-07-05 16:09:23,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 27430912. Throughput: 0: 2914.3. Samples: 1854146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:09:23,942][04005] Avg episode reward: [(0, '40.388')] [2024-07-05 16:09:26,360][04594] Updated weights for policy 0, policy_version 6704 (0.0012) [2024-07-05 16:09:28,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 27488256. Throughput: 0: 2917.4. Samples: 1871670. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:09:28,942][04005] Avg episode reward: [(0, '38.826')] [2024-07-05 16:09:29,876][04594] Updated weights for policy 0, policy_version 6714 (0.0012) [2024-07-05 16:09:33,380][04594] Updated weights for policy 0, policy_version 6724 (0.0012) [2024-07-05 16:09:33,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 27545600. Throughput: 0: 2919.4. Samples: 1880296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:09:33,942][04005] Avg episode reward: [(0, '36.840')] [2024-07-05 16:09:36,894][04594] Updated weights for policy 0, policy_version 6734 (0.0012) [2024-07-05 16:09:38,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 27602944. Throughput: 0: 2921.8. Samples: 1897876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:09:38,942][04005] Avg episode reward: [(0, '37.263')] [2024-07-05 16:09:40,420][04594] Updated weights for policy 0, policy_version 6744 (0.0011) [2024-07-05 16:09:43,933][04594] Updated weights for policy 0, policy_version 6754 (0.0012) [2024-07-05 16:09:43,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 27664384. Throughput: 0: 2918.9. Samples: 1915438. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:09:43,942][04005] Avg episode reward: [(0, '36.834')] [2024-07-05 16:09:47,452][04594] Updated weights for policy 0, policy_version 6764 (0.0012) [2024-07-05 16:09:48,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 27721728. Throughput: 0: 2914.8. Samples: 1923998. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:09:48,942][04005] Avg episode reward: [(0, '39.941')] [2024-07-05 16:09:50,970][04594] Updated weights for policy 0, policy_version 6774 (0.0012) [2024-07-05 16:09:53,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 27779072. Throughput: 0: 2914.5. Samples: 1941456. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:09:53,943][04005] Avg episode reward: [(0, '40.073')] [2024-07-05 16:09:54,476][04594] Updated weights for policy 0, policy_version 6784 (0.0012) [2024-07-05 16:09:57,980][04594] Updated weights for policy 0, policy_version 6794 (0.0011) [2024-07-05 16:09:58,942][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 27836416. Throughput: 0: 2915.2. Samples: 1958930. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:09:58,943][04005] Avg episode reward: [(0, '41.417')] [2024-07-05 16:10:01,495][04594] Updated weights for policy 0, policy_version 6804 (0.0012) [2024-07-05 16:10:03,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 27893760. Throughput: 0: 2915.1. Samples: 1967880. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:10:03,943][04005] Avg episode reward: [(0, '40.898')] [2024-07-05 16:10:05,009][04594] Updated weights for policy 0, policy_version 6814 (0.0012) [2024-07-05 16:10:08,516][04594] Updated weights for policy 0, policy_version 6824 (0.0012) [2024-07-05 16:10:08,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 27955200. Throughput: 0: 2915.1. Samples: 1985324. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:10:08,942][04005] Avg episode reward: [(0, '43.238')] [2024-07-05 16:10:08,943][04581] Saving new best policy, reward=43.238! [2024-07-05 16:10:12,050][04594] Updated weights for policy 0, policy_version 6834 (0.0012) [2024-07-05 16:10:13,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 28012544. Throughput: 0: 2912.7. Samples: 2002740. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:10:13,942][04005] Avg episode reward: [(0, '42.991')] [2024-07-05 16:10:15,554][04594] Updated weights for policy 0, policy_version 6844 (0.0013) [2024-07-05 16:10:18,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 28069888. Throughput: 0: 2915.1. Samples: 2011474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:10:18,942][04005] Avg episode reward: [(0, '42.267')] [2024-07-05 16:10:19,068][04594] Updated weights for policy 0, policy_version 6854 (0.0012) [2024-07-05 16:10:22,592][04594] Updated weights for policy 0, policy_version 6864 (0.0012) [2024-07-05 16:10:23,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 28127232. Throughput: 0: 2913.5. Samples: 2028982. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:10:23,942][04005] Avg episode reward: [(0, '40.480')] [2024-07-05 16:10:26,109][04594] Updated weights for policy 0, policy_version 6874 (0.0012) [2024-07-05 16:10:28,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 28188672. Throughput: 0: 2912.0. Samples: 2046480. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:10:28,942][04005] Avg episode reward: [(0, '41.765')] [2024-07-05 16:10:29,633][04594] Updated weights for policy 0, policy_version 6884 (0.0012) [2024-07-05 16:10:33,156][04594] Updated weights for policy 0, policy_version 6894 (0.0012) [2024-07-05 16:10:33,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 28246016. Throughput: 0: 2912.6. Samples: 2055064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:10:33,942][04005] Avg episode reward: [(0, '41.493')] [2024-07-05 16:10:34,204][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000006897_28250112.pth... [2024-07-05 16:10:34,276][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000006213_25448448.pth [2024-07-05 16:10:36,675][04594] Updated weights for policy 0, policy_version 6904 (0.0011) [2024-07-05 16:10:38,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 28303360. Throughput: 0: 2911.2. Samples: 2072462. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:10:38,942][04005] Avg episode reward: [(0, '42.011')] [2024-07-05 16:10:40,215][04594] Updated weights for policy 0, policy_version 6914 (0.0012) [2024-07-05 16:10:43,722][04594] Updated weights for policy 0, policy_version 6924 (0.0013) [2024-07-05 16:10:43,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 28360704. Throughput: 0: 2910.8. Samples: 2089916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:10:43,942][04005] Avg episode reward: [(0, '43.406')] [2024-07-05 16:10:44,068][04581] Saving new best policy, reward=43.406! [2024-07-05 16:10:47,252][04594] Updated weights for policy 0, policy_version 6934 (0.0012) [2024-07-05 16:10:48,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.4, 300 sec: 11649.3). Total num frames: 28418048. Throughput: 0: 2909.6. Samples: 2098810. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:10:48,942][04005] Avg episode reward: [(0, '44.616')] [2024-07-05 16:10:48,999][04581] Saving new best policy, reward=44.616! [2024-07-05 16:10:50,765][04594] Updated weights for policy 0, policy_version 6944 (0.0011) [2024-07-05 16:10:53,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 28479488. Throughput: 0: 2909.4. Samples: 2116248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:10:53,942][04005] Avg episode reward: [(0, '43.147')] [2024-07-05 16:10:54,287][04594] Updated weights for policy 0, policy_version 6954 (0.0012) [2024-07-05 16:10:57,804][04594] Updated weights for policy 0, policy_version 6964 (0.0012) [2024-07-05 16:10:58,942][04005] Fps is (10 sec: 11878.1, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 28536832. Throughput: 0: 2909.0. Samples: 2133646. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:10:58,942][04005] Avg episode reward: [(0, '42.230')] [2024-07-05 16:11:01,311][04594] Updated weights for policy 0, policy_version 6974 (0.0011) [2024-07-05 16:11:03,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 28594176. Throughput: 0: 2904.9. Samples: 2142194. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:11:03,943][04005] Avg episode reward: [(0, '40.920')] [2024-07-05 16:11:04,825][04594] Updated weights for policy 0, policy_version 6984 (0.0011) [2024-07-05 16:11:08,328][04594] Updated weights for policy 0, policy_version 6994 (0.0012) [2024-07-05 16:11:08,941][04005] Fps is (10 sec: 11469.0, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 28651520. Throughput: 0: 2906.9. Samples: 2159792. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:11:08,942][04005] Avg episode reward: [(0, '42.598')] [2024-07-05 16:11:11,851][04594] Updated weights for policy 0, policy_version 7004 (0.0012) [2024-07-05 16:11:13,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 28708864. Throughput: 0: 2907.9. Samples: 2177336. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:11:13,943][04005] Avg episode reward: [(0, '42.848')] [2024-07-05 16:11:15,359][04594] Updated weights for policy 0, policy_version 7014 (0.0012) [2024-07-05 16:11:18,864][04594] Updated weights for policy 0, policy_version 7024 (0.0012) [2024-07-05 16:11:18,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 28770304. Throughput: 0: 2910.0. Samples: 2186014. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:11:18,942][04005] Avg episode reward: [(0, '42.863')] [2024-07-05 16:11:22,392][04594] Updated weights for policy 0, policy_version 7034 (0.0012) [2024-07-05 16:11:23,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 28827648. Throughput: 0: 2910.9. Samples: 2203454. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:11:23,942][04005] Avg episode reward: [(0, '40.716')] [2024-07-05 16:11:25,908][04594] Updated weights for policy 0, policy_version 7044 (0.0011) [2024-07-05 16:11:28,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 28884992. Throughput: 0: 2911.4. Samples: 2220930. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:11:28,942][04005] Avg episode reward: [(0, '41.833')] [2024-07-05 16:11:29,411][04594] Updated weights for policy 0, policy_version 7054 (0.0012) [2024-07-05 16:11:32,916][04594] Updated weights for policy 0, policy_version 7064 (0.0012) [2024-07-05 16:11:33,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 28942336. Throughput: 0: 2912.5. Samples: 2229872. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:11:33,942][04005] Avg episode reward: [(0, '41.820')] [2024-07-05 16:11:36,435][04594] Updated weights for policy 0, policy_version 7074 (0.0012) [2024-07-05 16:11:38,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 29003776. Throughput: 0: 2912.7. Samples: 2247320. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:11:38,942][04005] Avg episode reward: [(0, '42.209')] [2024-07-05 16:11:39,960][04594] Updated weights for policy 0, policy_version 7084 (0.0012) [2024-07-05 16:11:43,469][04594] Updated weights for policy 0, policy_version 7094 (0.0012) [2024-07-05 16:11:43,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 29061120. Throughput: 0: 2913.0. Samples: 2264730. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:11:43,942][04005] Avg episode reward: [(0, '43.482')] [2024-07-05 16:11:46,979][04594] Updated weights for policy 0, policy_version 7104 (0.0012) [2024-07-05 16:11:48,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 29118464. Throughput: 0: 2915.5. Samples: 2273392. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:11:48,943][04005] Avg episode reward: [(0, '42.050')] [2024-07-05 16:11:50,483][04594] Updated weights for policy 0, policy_version 7114 (0.0011) [2024-07-05 16:11:53,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 29175808. Throughput: 0: 2917.4. Samples: 2291074. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:11:53,942][04005] Avg episode reward: [(0, '40.702')] [2024-07-05 16:11:53,991][04594] Updated weights for policy 0, policy_version 7124 (0.0012) [2024-07-05 16:11:57,495][04594] Updated weights for policy 0, policy_version 7134 (0.0011) [2024-07-05 16:11:58,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 29237248. Throughput: 0: 2917.8. Samples: 2308636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:11:58,942][04005] Avg episode reward: [(0, '40.698')] [2024-07-05 16:12:01,014][04594] Updated weights for policy 0, policy_version 7144 (0.0012) [2024-07-05 16:12:03,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 29294592. Throughput: 0: 2914.3. Samples: 2317158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:12:03,942][04005] Avg episode reward: [(0, '41.819')] [2024-07-05 16:12:04,529][04594] Updated weights for policy 0, policy_version 7154 (0.0012) [2024-07-05 16:12:08,022][04594] Updated weights for policy 0, policy_version 7164 (0.0012) [2024-07-05 16:12:08,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 29351936. Throughput: 0: 2915.1. Samples: 2334636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:12:08,943][04005] Avg episode reward: [(0, '42.453')] [2024-07-05 16:12:11,548][04594] Updated weights for policy 0, policy_version 7174 (0.0012) [2024-07-05 16:12:13,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 29409280. Throughput: 0: 2914.4. Samples: 2352076. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:12:13,942][04005] Avg episode reward: [(0, '43.004')] [2024-07-05 16:12:15,059][04594] Updated weights for policy 0, policy_version 7184 (0.0012) [2024-07-05 16:12:18,560][04594] Updated weights for policy 0, policy_version 7194 (0.0012) [2024-07-05 16:12:18,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 29470720. Throughput: 0: 2914.5. Samples: 2361026. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:12:18,942][04005] Avg episode reward: [(0, '43.932')] [2024-07-05 16:12:22,083][04594] Updated weights for policy 0, policy_version 7204 (0.0012) [2024-07-05 16:12:23,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 29528064. Throughput: 0: 2914.7. Samples: 2378480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:12:23,943][04005] Avg episode reward: [(0, '43.726')] [2024-07-05 16:12:25,595][04594] Updated weights for policy 0, policy_version 7214 (0.0012) [2024-07-05 16:12:28,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 29585408. Throughput: 0: 2914.5. Samples: 2395884. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:12:28,942][04005] Avg episode reward: [(0, '43.135')] [2024-07-05 16:12:29,110][04594] Updated weights for policy 0, policy_version 7224 (0.0011) [2024-07-05 16:12:32,616][04594] Updated weights for policy 0, policy_version 7234 (0.0012) [2024-07-05 16:12:33,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 29642752. Throughput: 0: 2921.3. Samples: 2404850. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:12:33,943][04005] Avg episode reward: [(0, '42.912')] [2024-07-05 16:12:34,023][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000007238_29646848.pth... [2024-07-05 16:12:34,095][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000006555_26849280.pth [2024-07-05 16:12:36,146][04594] Updated weights for policy 0, policy_version 7244 (0.0012) [2024-07-05 16:12:38,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 29700096. Throughput: 0: 2915.4. Samples: 2422266. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:12:38,943][04005] Avg episode reward: [(0, '43.262')] [2024-07-05 16:12:39,674][04594] Updated weights for policy 0, policy_version 7254 (0.0012) [2024-07-05 16:12:43,179][04594] Updated weights for policy 0, policy_version 7264 (0.0011) [2024-07-05 16:12:43,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 29761536. Throughput: 0: 2912.5. Samples: 2439698. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:12:43,942][04005] Avg episode reward: [(0, '42.142')] [2024-07-05 16:12:46,701][04594] Updated weights for policy 0, policy_version 7274 (0.0012) [2024-07-05 16:12:48,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 29818880. Throughput: 0: 2911.9. Samples: 2448194. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:12:48,942][04005] Avg episode reward: [(0, '41.035')] [2024-07-05 16:12:50,196][04594] Updated weights for policy 0, policy_version 7284 (0.0012) [2024-07-05 16:12:53,715][04594] Updated weights for policy 0, policy_version 7294 (0.0012) [2024-07-05 16:12:53,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 29876224. Throughput: 0: 2911.7. Samples: 2465664. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:12:53,943][04005] Avg episode reward: [(0, '42.210')] [2024-07-05 16:12:57,227][04594] Updated weights for policy 0, policy_version 7304 (0.0012) [2024-07-05 16:12:58,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 29933568. Throughput: 0: 2914.5. Samples: 2483228. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:12:58,942][04005] Avg episode reward: [(0, '42.615')] [2024-07-05 16:13:00,741][04594] Updated weights for policy 0, policy_version 7314 (0.0011) [2024-07-05 16:13:03,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 29995008. Throughput: 0: 2912.2. Samples: 2492076. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:13:03,942][04005] Avg episode reward: [(0, '42.768')] [2024-07-05 16:13:04,254][04594] Updated weights for policy 0, policy_version 7324 (0.0012) [2024-07-05 16:13:07,762][04594] Updated weights for policy 0, policy_version 7334 (0.0011) [2024-07-05 16:13:08,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 30052352. Throughput: 0: 2912.2. Samples: 2509528. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:13:08,942][04005] Avg episode reward: [(0, '44.627')] [2024-07-05 16:13:09,159][04581] Saving new best policy, reward=44.627! [2024-07-05 16:13:11,295][04594] Updated weights for policy 0, policy_version 7344 (0.0012) [2024-07-05 16:13:13,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 30109696. Throughput: 0: 2912.7. Samples: 2526958. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:13:13,943][04005] Avg episode reward: [(0, '47.066')] [2024-07-05 16:13:14,103][04581] Saving new best policy, reward=47.066! [2024-07-05 16:13:14,818][04594] Updated weights for policy 0, policy_version 7354 (0.0011) [2024-07-05 16:13:18,322][04594] Updated weights for policy 0, policy_version 7364 (0.0012) [2024-07-05 16:13:18,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 30167040. Throughput: 0: 2911.9. Samples: 2535886. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:13:18,942][04005] Avg episode reward: [(0, '47.367')] [2024-07-05 16:13:19,029][04581] Saving new best policy, reward=47.367! [2024-07-05 16:13:21,840][04594] Updated weights for policy 0, policy_version 7374 (0.0011) [2024-07-05 16:13:23,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 30224384. Throughput: 0: 2912.9. Samples: 2553348. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:13:23,942][04005] Avg episode reward: [(0, '45.956')] [2024-07-05 16:13:25,359][04594] Updated weights for policy 0, policy_version 7384 (0.0012) [2024-07-05 16:13:28,866][04594] Updated weights for policy 0, policy_version 7394 (0.0012) [2024-07-05 16:13:28,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 30285824. Throughput: 0: 2912.8. Samples: 2570776. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:13:28,943][04005] Avg episode reward: [(0, '43.665')] [2024-07-05 16:13:32,385][04594] Updated weights for policy 0, policy_version 7404 (0.0012) [2024-07-05 16:13:33,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 30343168. Throughput: 0: 2913.6. Samples: 2579304. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:13:33,943][04005] Avg episode reward: [(0, '45.070')] [2024-07-05 16:13:35,890][04594] Updated weights for policy 0, policy_version 7414 (0.0011) [2024-07-05 16:13:38,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 30400512. Throughput: 0: 2915.1. Samples: 2596842. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:13:38,942][04005] Avg episode reward: [(0, '45.274')] [2024-07-05 16:13:39,458][04594] Updated weights for policy 0, policy_version 7424 (0.0012) [2024-07-05 16:13:42,919][04594] Updated weights for policy 0, policy_version 7434 (0.0011) [2024-07-05 16:13:43,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 30457856. Throughput: 0: 2915.1. Samples: 2614410. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:13:43,942][04005] Avg episode reward: [(0, '45.026')] [2024-07-05 16:13:46,440][04594] Updated weights for policy 0, policy_version 7444 (0.0012) [2024-07-05 16:13:48,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 30519296. Throughput: 0: 2914.0. Samples: 2623204. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:13:48,942][04005] Avg episode reward: [(0, '43.992')] [2024-07-05 16:13:49,951][04594] Updated weights for policy 0, policy_version 7454 (0.0012) [2024-07-05 16:13:53,488][04594] Updated weights for policy 0, policy_version 7464 (0.0012) [2024-07-05 16:13:53,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 30576640. Throughput: 0: 2913.1. Samples: 2640618. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:13:53,942][04005] Avg episode reward: [(0, '45.407')] [2024-07-05 16:13:56,999][04594] Updated weights for policy 0, policy_version 7474 (0.0012) [2024-07-05 16:13:58,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 30633984. Throughput: 0: 2913.3. Samples: 2658058. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:13:58,942][04005] Avg episode reward: [(0, '46.773')] [2024-07-05 16:14:00,516][04594] Updated weights for policy 0, policy_version 7484 (0.0012) [2024-07-05 16:14:03,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 30691328. Throughput: 0: 2913.4. Samples: 2666990. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:14:03,942][04005] Avg episode reward: [(0, '46.169')] [2024-07-05 16:14:04,022][04594] Updated weights for policy 0, policy_version 7494 (0.0011) [2024-07-05 16:14:07,543][04594] Updated weights for policy 0, policy_version 7504 (0.0012) [2024-07-05 16:14:08,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 30748672. Throughput: 0: 2912.9. Samples: 2684430. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:14:08,943][04005] Avg episode reward: [(0, '46.517')] [2024-07-05 16:14:11,084][04594] Updated weights for policy 0, policy_version 7514 (0.0012) [2024-07-05 16:14:13,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 30810112. Throughput: 0: 2913.0. Samples: 2701862. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:14:13,942][04005] Avg episode reward: [(0, '44.600')] [2024-07-05 16:14:14,591][04594] Updated weights for policy 0, policy_version 7524 (0.0012) [2024-07-05 16:14:18,116][04594] Updated weights for policy 0, policy_version 7534 (0.0011) [2024-07-05 16:14:18,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 30867456. Throughput: 0: 2912.9. Samples: 2710386. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:14:18,942][04005] Avg episode reward: [(0, '44.582')] [2024-07-05 16:14:21,628][04594] Updated weights for policy 0, policy_version 7544 (0.0012) [2024-07-05 16:14:23,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 30924800. Throughput: 0: 2910.5. Samples: 2727814. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:14:23,942][04005] Avg episode reward: [(0, '44.007')] [2024-07-05 16:14:25,141][04594] Updated weights for policy 0, policy_version 7554 (0.0011) [2024-07-05 16:14:28,651][04594] Updated weights for policy 0, policy_version 7564 (0.0012) [2024-07-05 16:14:28,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 30982144. Throughput: 0: 2908.7. Samples: 2745302. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:14:28,942][04005] Avg episode reward: [(0, '45.314')] [2024-07-05 16:14:32,160][04594] Updated weights for policy 0, policy_version 7574 (0.0012) [2024-07-05 16:14:33,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 31043584. Throughput: 0: 2912.2. Samples: 2754254. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:14:33,943][04005] Avg episode reward: [(0, '43.264')] [2024-07-05 16:14:33,946][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000007579_31043584.pth... [2024-07-05 16:14:34,019][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000006897_28250112.pth [2024-07-05 16:14:35,690][04594] Updated weights for policy 0, policy_version 7584 (0.0011) [2024-07-05 16:14:38,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 31100928. Throughput: 0: 2913.1. Samples: 2771708. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:14:38,942][04005] Avg episode reward: [(0, '43.272')] [2024-07-05 16:14:39,203][04594] Updated weights for policy 0, policy_version 7594 (0.0012) [2024-07-05 16:14:42,729][04594] Updated weights for policy 0, policy_version 7604 (0.0012) [2024-07-05 16:14:43,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 31158272. Throughput: 0: 2913.2. Samples: 2789152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:14:43,942][04005] Avg episode reward: [(0, '41.753')] [2024-07-05 16:14:46,242][04594] Updated weights for policy 0, policy_version 7614 (0.0012) [2024-07-05 16:14:48,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 31215616. Throughput: 0: 2910.0. Samples: 2797942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:14:48,943][04005] Avg episode reward: [(0, '43.471')] [2024-07-05 16:14:49,773][04594] Updated weights for policy 0, policy_version 7624 (0.0012) [2024-07-05 16:14:53,282][04594] Updated weights for policy 0, policy_version 7634 (0.0011) [2024-07-05 16:14:53,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 31272960. Throughput: 0: 2912.0. Samples: 2815472. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:14:53,943][04005] Avg episode reward: [(0, '43.800')] [2024-07-05 16:14:56,802][04594] Updated weights for policy 0, policy_version 7644 (0.0012) [2024-07-05 16:14:58,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 31334400. Throughput: 0: 2913.6. Samples: 2832974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:14:58,942][04005] Avg episode reward: [(0, '45.194')] [2024-07-05 16:15:00,331][04594] Updated weights for policy 0, policy_version 7654 (0.0012) [2024-07-05 16:15:03,835][04594] Updated weights for policy 0, policy_version 7664 (0.0012) [2024-07-05 16:15:03,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 31391744. Throughput: 0: 2912.8. Samples: 2841464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:15:03,943][04005] Avg episode reward: [(0, '44.621')] [2024-07-05 16:15:07,355][04594] Updated weights for policy 0, policy_version 7674 (0.0012) [2024-07-05 16:15:08,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 31449088. Throughput: 0: 2912.8. Samples: 2858890. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:15:08,942][04005] Avg episode reward: [(0, '45.975')] [2024-07-05 16:15:10,885][04594] Updated weights for policy 0, policy_version 7684 (0.0011) [2024-07-05 16:15:13,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 31506432. Throughput: 0: 2912.5. Samples: 2876364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:15:13,942][04005] Avg episode reward: [(0, '46.466')] [2024-07-05 16:15:14,394][04594] Updated weights for policy 0, policy_version 7694 (0.0012) [2024-07-05 16:15:17,903][04594] Updated weights for policy 0, policy_version 7704 (0.0011) [2024-07-05 16:15:18,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 31563776. Throughput: 0: 2911.9. Samples: 2885288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:15:18,942][04005] Avg episode reward: [(0, '48.412')] [2024-07-05 16:15:18,954][04581] Saving new best policy, reward=48.412! [2024-07-05 16:15:21,417][04594] Updated weights for policy 0, policy_version 7714 (0.0012) [2024-07-05 16:15:23,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 31625216. Throughput: 0: 2912.5. Samples: 2902772. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:15:23,942][04005] Avg episode reward: [(0, '46.420')] [2024-07-05 16:15:24,920][04594] Updated weights for policy 0, policy_version 7724 (0.0011) [2024-07-05 16:15:28,437][04594] Updated weights for policy 0, policy_version 7734 (0.0011) [2024-07-05 16:15:28,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 31682560. Throughput: 0: 2912.5. Samples: 2920216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:15:28,942][04005] Avg episode reward: [(0, '44.816')] [2024-07-05 16:15:31,941][04594] Updated weights for policy 0, policy_version 7744 (0.0011) [2024-07-05 16:15:33,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 31739904. Throughput: 0: 2914.2. Samples: 2929082. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:15:33,942][04005] Avg episode reward: [(0, '44.984')] [2024-07-05 16:15:35,456][04594] Updated weights for policy 0, policy_version 7754 (0.0011) [2024-07-05 16:15:38,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11649.3). Total num frames: 31797248. Throughput: 0: 2914.4. Samples: 2946620. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:15:38,942][04005] Avg episode reward: [(0, '45.549')] [2024-07-05 16:15:38,964][04594] Updated weights for policy 0, policy_version 7764 (0.0012) [2024-07-05 16:15:42,491][04594] Updated weights for policy 0, policy_version 7774 (0.0012) [2024-07-05 16:15:43,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 31858688. Throughput: 0: 2913.2. Samples: 2964068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:15:43,942][04005] Avg episode reward: [(0, '47.283')] [2024-07-05 16:15:45,996][04594] Updated weights for policy 0, policy_version 7784 (0.0011) [2024-07-05 16:15:48,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 31916032. Throughput: 0: 2914.6. Samples: 2972622. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:15:48,942][04005] Avg episode reward: [(0, '47.540')] [2024-07-05 16:15:49,466][04594] Updated weights for policy 0, policy_version 7794 (0.0011) [2024-07-05 16:15:52,944][04594] Updated weights for policy 0, policy_version 7804 (0.0011) [2024-07-05 16:15:53,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11649.3). Total num frames: 31973376. Throughput: 0: 2925.1. Samples: 2990518. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:15:53,942][04005] Avg episode reward: [(0, '45.508')] [2024-07-05 16:15:56,402][04594] Updated weights for policy 0, policy_version 7814 (0.0011) [2024-07-05 16:15:58,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 32034816. Throughput: 0: 2929.6. Samples: 3008194. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:15:58,942][04005] Avg episode reward: [(0, '46.395')] [2024-07-05 16:15:59,878][04594] Updated weights for policy 0, policy_version 7824 (0.0011) [2024-07-05 16:16:03,349][04594] Updated weights for policy 0, policy_version 7834 (0.0012) [2024-07-05 16:16:03,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11663.2). Total num frames: 32092160. Throughput: 0: 2927.2. Samples: 3017012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:16:03,942][04005] Avg episode reward: [(0, '45.910')] [2024-07-05 16:16:06,821][04594] Updated weights for policy 0, policy_version 7844 (0.0012) [2024-07-05 16:16:08,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11677.1). Total num frames: 32153600. Throughput: 0: 2933.6. Samples: 3034786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:16:08,943][04005] Avg episode reward: [(0, '47.997')] [2024-07-05 16:16:10,299][04594] Updated weights for policy 0, policy_version 7854 (0.0011) [2024-07-05 16:16:13,779][04594] Updated weights for policy 0, policy_version 7864 (0.0011) [2024-07-05 16:16:13,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11663.2). Total num frames: 32210944. Throughput: 0: 2936.6. Samples: 3052362. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:16:13,943][04005] Avg episode reward: [(0, '47.741')] [2024-07-05 16:16:17,264][04594] Updated weights for policy 0, policy_version 7874 (0.0011) [2024-07-05 16:16:18,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11663.2). Total num frames: 32268288. Throughput: 0: 2937.9. Samples: 3061288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:16:18,942][04005] Avg episode reward: [(0, '49.097')] [2024-07-05 16:16:18,994][04581] Saving new best policy, reward=49.097! [2024-07-05 16:16:20,741][04594] Updated weights for policy 0, policy_version 7884 (0.0011) [2024-07-05 16:16:23,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11677.1). Total num frames: 32329728. Throughput: 0: 2937.8. Samples: 3078822. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:16:23,942][04005] Avg episode reward: [(0, '49.886')] [2024-07-05 16:16:23,945][04581] Saving new best policy, reward=49.886! [2024-07-05 16:16:24,293][04594] Updated weights for policy 0, policy_version 7894 (0.0011) [2024-07-05 16:16:27,714][04594] Updated weights for policy 0, policy_version 7904 (0.0011) [2024-07-05 16:16:28,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11677.1). Total num frames: 32387072. Throughput: 0: 2940.7. Samples: 3096398. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:16:28,943][04005] Avg episode reward: [(0, '48.980')] [2024-07-05 16:16:31,184][04594] Updated weights for policy 0, policy_version 7914 (0.0011) [2024-07-05 16:16:33,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11741.8, 300 sec: 11663.2). Total num frames: 32444416. Throughput: 0: 2950.0. Samples: 3105372. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:16:33,943][04005] Avg episode reward: [(0, '48.051')] [2024-07-05 16:16:33,969][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000007922_32448512.pth... [2024-07-05 16:16:34,040][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000007238_29646848.pth [2024-07-05 16:16:34,670][04594] Updated weights for policy 0, policy_version 7924 (0.0011) [2024-07-05 16:16:38,144][04594] Updated weights for policy 0, policy_version 7934 (0.0011) [2024-07-05 16:16:38,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11677.1). Total num frames: 32505856. Throughput: 0: 2943.1. Samples: 3122960. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:16:38,943][04005] Avg episode reward: [(0, '47.458')] [2024-07-05 16:16:41,628][04594] Updated weights for policy 0, policy_version 7944 (0.0011) [2024-07-05 16:16:43,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11677.1). Total num frames: 32563200. Throughput: 0: 2939.6. Samples: 3140474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:16:43,942][04005] Avg episode reward: [(0, '46.945')] [2024-07-05 16:16:45,105][04594] Updated weights for policy 0, policy_version 7954 (0.0011) [2024-07-05 16:16:48,588][04594] Updated weights for policy 0, policy_version 7964 (0.0013) [2024-07-05 16:16:48,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11691.0). Total num frames: 32624640. Throughput: 0: 2942.9. Samples: 3149442. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:16:48,942][04005] Avg episode reward: [(0, '46.584')] [2024-07-05 16:16:52,062][04594] Updated weights for policy 0, policy_version 7974 (0.0011) [2024-07-05 16:16:53,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11677.1). Total num frames: 32681984. Throughput: 0: 2938.8. Samples: 3167030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:16:53,942][04005] Avg episode reward: [(0, '47.806')] [2024-07-05 16:16:55,543][04594] Updated weights for policy 0, policy_version 7984 (0.0011) [2024-07-05 16:16:58,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11677.1). Total num frames: 32739328. Throughput: 0: 2938.3. Samples: 3184586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:16:58,942][04005] Avg episode reward: [(0, '47.462')] [2024-07-05 16:16:59,016][04594] Updated weights for policy 0, policy_version 7994 (0.0011) [2024-07-05 16:17:02,487][04594] Updated weights for policy 0, policy_version 8004 (0.0011) [2024-07-05 16:17:03,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11691.0). Total num frames: 32800768. Throughput: 0: 2940.5. Samples: 3193612. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:17:03,942][04005] Avg episode reward: [(0, '47.697')] [2024-07-05 16:17:05,959][04594] Updated weights for policy 0, policy_version 8014 (0.0011) [2024-07-05 16:17:08,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 32858112. Throughput: 0: 2941.2. Samples: 3211174. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:17:08,942][04005] Avg episode reward: [(0, '45.504')] [2024-07-05 16:17:09,462][04594] Updated weights for policy 0, policy_version 8024 (0.0011) [2024-07-05 16:17:12,929][04594] Updated weights for policy 0, policy_version 8034 (0.0011) [2024-07-05 16:17:13,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11677.1). Total num frames: 32915456. Throughput: 0: 2943.7. Samples: 3228864. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:17:13,942][04005] Avg episode reward: [(0, '46.172')] [2024-07-05 16:17:16,407][04594] Updated weights for policy 0, policy_version 8044 (0.0011) [2024-07-05 16:17:18,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11691.0). Total num frames: 32976896. Throughput: 0: 2940.4. Samples: 3237688. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:17:18,942][04005] Avg episode reward: [(0, '46.346')] [2024-07-05 16:17:19,900][04594] Updated weights for policy 0, policy_version 8054 (0.0011) [2024-07-05 16:17:23,366][04594] Updated weights for policy 0, policy_version 8064 (0.0011) [2024-07-05 16:17:23,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11691.0). Total num frames: 33034240. Throughput: 0: 2938.7. Samples: 3255200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:17:23,943][04005] Avg episode reward: [(0, '46.003')] [2024-07-05 16:17:26,845][04594] Updated weights for policy 0, policy_version 8074 (0.0011) [2024-07-05 16:17:28,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11704.8). Total num frames: 33095680. Throughput: 0: 2948.3. Samples: 3273148. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:17:28,942][04005] Avg episode reward: [(0, '45.687')] [2024-07-05 16:17:30,320][04594] Updated weights for policy 0, policy_version 8084 (0.0011) [2024-07-05 16:17:33,793][04594] Updated weights for policy 0, policy_version 8094 (0.0011) [2024-07-05 16:17:33,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11704.8). Total num frames: 33153024. Throughput: 0: 2940.6. Samples: 3281768. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:17:33,942][04005] Avg episode reward: [(0, '44.812')] [2024-07-05 16:17:37,276][04594] Updated weights for policy 0, policy_version 8104 (0.0011) [2024-07-05 16:17:38,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 33210368. Throughput: 0: 2943.9. Samples: 3299504. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:17:38,942][04005] Avg episode reward: [(0, '44.012')] [2024-07-05 16:17:40,757][04594] Updated weights for policy 0, policy_version 8114 (0.0011) [2024-07-05 16:17:43,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11810.1, 300 sec: 11704.8). Total num frames: 33271808. Throughput: 0: 2949.3. Samples: 3317306. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:17:43,942][04005] Avg episode reward: [(0, '43.446')] [2024-07-05 16:17:44,250][04594] Updated weights for policy 0, policy_version 8124 (0.0011) [2024-07-05 16:17:47,729][04594] Updated weights for policy 0, policy_version 8134 (0.0011) [2024-07-05 16:17:48,942][04005] Fps is (10 sec: 11878.2, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 33329152. Throughput: 0: 2939.0. Samples: 3325866. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:17:48,943][04005] Avg episode reward: [(0, '45.237')] [2024-07-05 16:17:51,206][04594] Updated weights for policy 0, policy_version 8144 (0.0011) [2024-07-05 16:17:53,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 33386496. Throughput: 0: 2945.4. Samples: 3343718. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:17:53,942][04005] Avg episode reward: [(0, '48.305')] [2024-07-05 16:17:54,682][04594] Updated weights for policy 0, policy_version 8154 (0.0012) [2024-07-05 16:17:58,156][04594] Updated weights for policy 0, policy_version 8164 (0.0012) [2024-07-05 16:17:58,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11704.8). Total num frames: 33447936. Throughput: 0: 2945.1. Samples: 3361396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:17:58,943][04005] Avg episode reward: [(0, '49.196')] [2024-07-05 16:18:01,624][04594] Updated weights for policy 0, policy_version 8174 (0.0011) [2024-07-05 16:18:03,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11704.8). Total num frames: 33505280. Throughput: 0: 2942.8. Samples: 3370114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:18:03,944][04005] Avg episode reward: [(0, '46.616')] [2024-07-05 16:18:05,101][04594] Updated weights for policy 0, policy_version 8184 (0.0012) [2024-07-05 16:18:08,589][04594] Updated weights for policy 0, policy_version 8194 (0.0011) [2024-07-05 16:18:08,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11718.7). Total num frames: 33566720. Throughput: 0: 2949.7. Samples: 3387938. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:18:08,942][04005] Avg episode reward: [(0, '45.445')] [2024-07-05 16:18:12,072][04594] Updated weights for policy 0, policy_version 8204 (0.0011) [2024-07-05 16:18:13,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11718.7). Total num frames: 33624064. Throughput: 0: 2940.7. Samples: 3405480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:18:13,943][04005] Avg episode reward: [(0, '45.422')] [2024-07-05 16:18:15,550][04594] Updated weights for policy 0, policy_version 8214 (0.0011) [2024-07-05 16:18:18,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 33681408. Throughput: 0: 2945.5. Samples: 3414316. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:18:18,942][04005] Avg episode reward: [(0, '45.350')] [2024-07-05 16:18:19,027][04594] Updated weights for policy 0, policy_version 8224 (0.0011) [2024-07-05 16:18:22,492][04594] Updated weights for policy 0, policy_version 8234 (0.0011) [2024-07-05 16:18:23,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.2, 300 sec: 11718.7). Total num frames: 33742848. Throughput: 0: 2945.1. Samples: 3432032. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:18:23,942][04005] Avg episode reward: [(0, '46.699')] [2024-07-05 16:18:25,959][04594] Updated weights for policy 0, policy_version 8244 (0.0011) [2024-07-05 16:18:28,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 33800192. Throughput: 0: 2940.4. Samples: 3449624. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:18:28,942][04005] Avg episode reward: [(0, '45.970')] [2024-07-05 16:18:29,437][04594] Updated weights for policy 0, policy_version 8254 (0.0011) [2024-07-05 16:18:32,915][04594] Updated weights for policy 0, policy_version 8264 (0.0011) [2024-07-05 16:18:33,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 33857536. Throughput: 0: 2949.9. Samples: 3458612. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:18:33,944][04005] Avg episode reward: [(0, '47.758')] [2024-07-05 16:18:33,956][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000008267_33861632.pth... [2024-07-05 16:18:34,028][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000007579_31043584.pth [2024-07-05 16:18:36,400][04594] Updated weights for policy 0, policy_version 8274 (0.0011) [2024-07-05 16:18:38,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11732.6). Total num frames: 33918976. Throughput: 0: 2943.1. Samples: 3476158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:18:38,942][04005] Avg episode reward: [(0, '46.848')] [2024-07-05 16:18:39,901][04594] Updated weights for policy 0, policy_version 8284 (0.0011) [2024-07-05 16:18:43,374][04594] Updated weights for policy 0, policy_version 8294 (0.0011) [2024-07-05 16:18:43,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 33976320. Throughput: 0: 2939.1. Samples: 3493656. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:18:43,942][04005] Avg episode reward: [(0, '47.524')] [2024-07-05 16:18:46,852][04594] Updated weights for policy 0, policy_version 8304 (0.0012) [2024-07-05 16:18:48,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 34033664. Throughput: 0: 2945.5. Samples: 3502662. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:18:48,942][04005] Avg episode reward: [(0, '46.822')] [2024-07-05 16:18:50,348][04594] Updated weights for policy 0, policy_version 8314 (0.0011) [2024-07-05 16:18:53,822][04594] Updated weights for policy 0, policy_version 8324 (0.0011) [2024-07-05 16:18:53,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11732.6). Total num frames: 34095104. Throughput: 0: 2939.4. Samples: 3520210. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:18:53,942][04005] Avg episode reward: [(0, '44.826')] [2024-07-05 16:18:57,296][04594] Updated weights for policy 0, policy_version 8334 (0.0011) [2024-07-05 16:18:58,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 34152448. Throughput: 0: 2939.1. Samples: 3537738. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:18:58,942][04005] Avg episode reward: [(0, '44.142')] [2024-07-05 16:19:00,773][04594] Updated weights for policy 0, policy_version 8344 (0.0011) [2024-07-05 16:19:03,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11746.5). Total num frames: 34213888. Throughput: 0: 2943.2. Samples: 3546760. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:19:03,942][04005] Avg episode reward: [(0, '44.466')] [2024-07-05 16:19:04,259][04594] Updated weights for policy 0, policy_version 8354 (0.0011) [2024-07-05 16:19:07,732][04594] Updated weights for policy 0, policy_version 8364 (0.0012) [2024-07-05 16:19:08,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 34271232. Throughput: 0: 2939.0. Samples: 3564286. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:19:08,942][04005] Avg episode reward: [(0, '47.074')] [2024-07-05 16:19:11,230][04594] Updated weights for policy 0, policy_version 8374 (0.0011) [2024-07-05 16:19:13,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 34328576. Throughput: 0: 2937.6. Samples: 3581818. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:19:13,942][04005] Avg episode reward: [(0, '46.700')] [2024-07-05 16:19:14,698][04594] Updated weights for policy 0, policy_version 8384 (0.0011) [2024-07-05 16:19:18,169][04594] Updated weights for policy 0, policy_version 8394 (0.0011) [2024-07-05 16:19:18,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11746.5). Total num frames: 34390016. Throughput: 0: 2937.6. Samples: 3590804. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:19:18,944][04005] Avg episode reward: [(0, '46.757')] [2024-07-05 16:19:21,644][04594] Updated weights for policy 0, policy_version 8404 (0.0012) [2024-07-05 16:19:23,942][04005] Fps is (10 sec: 11878.2, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 34447360. Throughput: 0: 2937.6. Samples: 3608350. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:19:23,942][04005] Avg episode reward: [(0, '46.262')] [2024-07-05 16:19:25,123][04594] Updated weights for policy 0, policy_version 8414 (0.0011) [2024-07-05 16:19:28,625][04594] Updated weights for policy 0, policy_version 8424 (0.0011) [2024-07-05 16:19:28,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 34504704. Throughput: 0: 2941.9. Samples: 3626042. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:19:28,942][04005] Avg episode reward: [(0, '46.832')] [2024-07-05 16:19:32,102][04594] Updated weights for policy 0, policy_version 8434 (0.0011) [2024-07-05 16:19:33,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11746.5). Total num frames: 34566144. Throughput: 0: 2938.7. Samples: 3634904. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:19:33,943][04005] Avg episode reward: [(0, '45.388')] [2024-07-05 16:19:35,603][04594] Updated weights for policy 0, policy_version 8444 (0.0011) [2024-07-05 16:19:38,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 34623488. Throughput: 0: 2938.4. Samples: 3652436. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:19:38,942][04005] Avg episode reward: [(0, '47.085')] [2024-07-05 16:19:39,072][04594] Updated weights for policy 0, policy_version 8454 (0.0011) [2024-07-05 16:19:42,578][04594] Updated weights for policy 0, policy_version 8464 (0.0012) [2024-07-05 16:19:43,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 34680832. Throughput: 0: 2941.9. Samples: 3670122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:19:43,942][04005] Avg episode reward: [(0, '47.124')] [2024-07-05 16:19:46,055][04594] Updated weights for policy 0, policy_version 8474 (0.0011) [2024-07-05 16:19:48,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 34742272. Throughput: 0: 2937.8. Samples: 3678962. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:19:48,943][04005] Avg episode reward: [(0, '49.370')] [2024-07-05 16:19:49,542][04594] Updated weights for policy 0, policy_version 8484 (0.0011) [2024-07-05 16:19:53,026][04594] Updated weights for policy 0, policy_version 8494 (0.0011) [2024-07-05 16:19:53,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 34799616. Throughput: 0: 2938.3. Samples: 3696508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:19:53,942][04005] Avg episode reward: [(0, '48.300')] [2024-07-05 16:19:56,517][04594] Updated weights for policy 0, policy_version 8504 (0.0012) [2024-07-05 16:19:58,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 34856960. Throughput: 0: 2944.4. Samples: 3714318. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:19:58,942][04005] Avg episode reward: [(0, '48.422')] [2024-07-05 16:20:00,007][04594] Updated weights for policy 0, policy_version 8514 (0.0011) [2024-07-05 16:20:03,479][04594] Updated weights for policy 0, policy_version 8524 (0.0011) [2024-07-05 16:20:03,942][04005] Fps is (10 sec: 11877.8, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 34918400. Throughput: 0: 2938.0. Samples: 3723016. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:20:03,943][04005] Avg episode reward: [(0, '46.459')] [2024-07-05 16:20:06,960][04594] Updated weights for policy 0, policy_version 8534 (0.0011) [2024-07-05 16:20:08,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 34975744. Throughput: 0: 2938.4. Samples: 3740576. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:20:08,942][04005] Avg episode reward: [(0, '46.055')] [2024-07-05 16:20:10,450][04594] Updated weights for policy 0, policy_version 8544 (0.0011) [2024-07-05 16:20:13,942][04005] Fps is (10 sec: 11469.4, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 35033088. Throughput: 0: 2940.3. Samples: 3758356. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:20:13,943][04005] Avg episode reward: [(0, '45.833')] [2024-07-05 16:20:13,947][04594] Updated weights for policy 0, policy_version 8554 (0.0012) [2024-07-05 16:20:17,426][04594] Updated weights for policy 0, policy_version 8564 (0.0011) [2024-07-05 16:20:18,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 35094528. Throughput: 0: 2937.0. Samples: 3767070. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:20:18,942][04005] Avg episode reward: [(0, '46.972')] [2024-07-05 16:20:20,893][04594] Updated weights for policy 0, policy_version 8574 (0.0011) [2024-07-05 16:20:23,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 35151872. Throughput: 0: 2940.5. Samples: 3784758. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:20:23,942][04005] Avg episode reward: [(0, '46.905')] [2024-07-05 16:20:24,365][04594] Updated weights for policy 0, policy_version 8584 (0.0011) [2024-07-05 16:20:27,859][04594] Updated weights for policy 0, policy_version 8594 (0.0011) [2024-07-05 16:20:28,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11774.3). Total num frames: 35213312. Throughput: 0: 2944.1. Samples: 3802606. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:20:28,942][04005] Avg episode reward: [(0, '47.395')] [2024-07-05 16:20:31,347][04594] Updated weights for policy 0, policy_version 8604 (0.0015) [2024-07-05 16:20:33,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11774.3). Total num frames: 35270656. Throughput: 0: 2937.6. Samples: 3811156. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:20:33,942][04005] Avg episode reward: [(0, '47.865')] [2024-07-05 16:20:34,131][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000008612_35274752.pth... [2024-07-05 16:20:34,203][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000007922_32448512.pth [2024-07-05 16:20:34,830][04594] Updated weights for policy 0, policy_version 8614 (0.0012) [2024-07-05 16:20:38,307][04594] Updated weights for policy 0, policy_version 8624 (0.0011) [2024-07-05 16:20:38,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 35328000. Throughput: 0: 2942.9. Samples: 3828940. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:20:38,943][04005] Avg episode reward: [(0, '47.140')] [2024-07-05 16:20:41,796][04594] Updated weights for policy 0, policy_version 8634 (0.0011) [2024-07-05 16:20:43,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11774.3). Total num frames: 35389440. Throughput: 0: 2940.8. Samples: 3846654. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:20:43,942][04005] Avg episode reward: [(0, '45.214')] [2024-07-05 16:20:45,297][04594] Updated weights for policy 0, policy_version 8644 (0.0012) [2024-07-05 16:20:48,787][04594] Updated weights for policy 0, policy_version 8654 (0.0011) [2024-07-05 16:20:48,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11774.3). Total num frames: 35446784. Throughput: 0: 2937.1. Samples: 3855184. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:20:48,942][04005] Avg episode reward: [(0, '45.763')] [2024-07-05 16:20:52,277][04594] Updated weights for policy 0, policy_version 8664 (0.0011) [2024-07-05 16:20:53,942][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 35504128. Throughput: 0: 2939.7. Samples: 3872862. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:20:53,942][04005] Avg episode reward: [(0, '47.209')] [2024-07-05 16:20:55,752][04594] Updated weights for policy 0, policy_version 8674 (0.0012) [2024-07-05 16:20:58,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11774.3). Total num frames: 35565568. Throughput: 0: 2940.1. Samples: 3890660. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:20:58,942][04005] Avg episode reward: [(0, '49.696')] [2024-07-05 16:20:59,247][04594] Updated weights for policy 0, policy_version 8684 (0.0011) [2024-07-05 16:21:02,733][04594] Updated weights for policy 0, policy_version 8694 (0.0011) [2024-07-05 16:21:03,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11742.0, 300 sec: 11760.4). Total num frames: 35622912. Throughput: 0: 2936.5. Samples: 3899212. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:21:03,942][04005] Avg episode reward: [(0, '49.397')] [2024-07-05 16:21:06,226][04594] Updated weights for policy 0, policy_version 8704 (0.0011) [2024-07-05 16:21:08,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 35680256. Throughput: 0: 2937.1. Samples: 3916926. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:21:08,943][04005] Avg episode reward: [(0, '49.549')] [2024-07-05 16:21:09,712][04594] Updated weights for policy 0, policy_version 8714 (0.0011) [2024-07-05 16:21:13,197][04594] Updated weights for policy 0, policy_version 8724 (0.0011) [2024-07-05 16:21:13,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11810.2, 300 sec: 11774.3). Total num frames: 35741696. Throughput: 0: 2934.8. Samples: 3934672. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:21:13,942][04005] Avg episode reward: [(0, '50.149')] [2024-07-05 16:21:13,945][04581] Saving new best policy, reward=50.149! [2024-07-05 16:21:16,687][04594] Updated weights for policy 0, policy_version 8734 (0.0011) [2024-07-05 16:21:18,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 35799040. Throughput: 0: 2934.0. Samples: 3943188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:21:18,942][04005] Avg episode reward: [(0, '50.233')] [2024-07-05 16:21:19,118][04581] Saving new best policy, reward=50.233! [2024-07-05 16:21:20,174][04594] Updated weights for policy 0, policy_version 8744 (0.0011) [2024-07-05 16:21:23,661][04594] Updated weights for policy 0, policy_version 8754 (0.0013) [2024-07-05 16:21:23,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 35856384. Throughput: 0: 2933.3. Samples: 3960938. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:21:23,942][04005] Avg episode reward: [(0, '51.516')] [2024-07-05 16:21:24,004][04581] Saving new best policy, reward=51.516! [2024-07-05 16:21:27,152][04594] Updated weights for policy 0, policy_version 8764 (0.0011) [2024-07-05 16:21:28,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11774.3). Total num frames: 35917824. Throughput: 0: 2934.4. Samples: 3978702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:21:28,942][04005] Avg episode reward: [(0, '50.440')] [2024-07-05 16:21:30,637][04594] Updated weights for policy 0, policy_version 8774 (0.0012) [2024-07-05 16:21:33,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 35975168. Throughput: 0: 2934.6. Samples: 3987242. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:21:33,942][04005] Avg episode reward: [(0, '52.027')] [2024-07-05 16:21:34,125][04581] Saving new best policy, reward=52.027! [2024-07-05 16:21:34,128][04594] Updated weights for policy 0, policy_version 8784 (0.0011) [2024-07-05 16:21:37,619][04594] Updated weights for policy 0, policy_version 8794 (0.0012) [2024-07-05 16:21:38,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 36032512. Throughput: 0: 2934.6. Samples: 4004920. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:21:38,942][04005] Avg episode reward: [(0, '49.444')] [2024-07-05 16:21:41,099][04594] Updated weights for policy 0, policy_version 8804 (0.0011) [2024-07-05 16:21:43,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 36093952. Throughput: 0: 2934.0. Samples: 4022690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:21:43,942][04005] Avg episode reward: [(0, '48.956')] [2024-07-05 16:21:44,585][04594] Updated weights for policy 0, policy_version 8814 (0.0011) [2024-07-05 16:21:48,060][04594] Updated weights for policy 0, policy_version 8824 (0.0011) [2024-07-05 16:21:48,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 36151296. Throughput: 0: 2934.7. Samples: 4031272. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:21:48,942][04005] Avg episode reward: [(0, '46.199')] [2024-07-05 16:21:51,558][04594] Updated weights for policy 0, policy_version 8834 (0.0011) [2024-07-05 16:21:53,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 36208640. Throughput: 0: 2936.7. Samples: 4049076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:21:53,942][04005] Avg episode reward: [(0, '47.576')] [2024-07-05 16:21:55,037][04594] Updated weights for policy 0, policy_version 8844 (0.0011) [2024-07-05 16:21:58,504][04594] Updated weights for policy 0, policy_version 8854 (0.0011) [2024-07-05 16:21:58,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 36270080. Throughput: 0: 2935.5. Samples: 4066770. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:21:58,942][04005] Avg episode reward: [(0, '47.814')] [2024-07-05 16:22:01,983][04594] Updated weights for policy 0, policy_version 8864 (0.0011) [2024-07-05 16:22:03,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 36327424. Throughput: 0: 2939.5. Samples: 4075468. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:22:03,943][04005] Avg episode reward: [(0, '48.085')] [2024-07-05 16:22:05,455][04594] Updated weights for policy 0, policy_version 8874 (0.0011) [2024-07-05 16:22:08,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 36384768. Throughput: 0: 2940.9. Samples: 4093280. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:22:08,942][04005] Avg episode reward: [(0, '48.632')] [2024-07-05 16:22:08,947][04594] Updated weights for policy 0, policy_version 8884 (0.0012) [2024-07-05 16:22:12,435][04594] Updated weights for policy 0, policy_version 8894 (0.0011) [2024-07-05 16:22:13,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 36446208. Throughput: 0: 2935.5. Samples: 4110798. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:22:13,942][04005] Avg episode reward: [(0, '47.280')] [2024-07-05 16:22:15,926][04594] Updated weights for policy 0, policy_version 8904 (0.0011) [2024-07-05 16:22:18,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 36503552. Throughput: 0: 2940.0. Samples: 4119542. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:22:18,943][04005] Avg episode reward: [(0, '48.050')] [2024-07-05 16:22:19,409][04594] Updated weights for policy 0, policy_version 8914 (0.0011) [2024-07-05 16:22:22,899][04594] Updated weights for policy 0, policy_version 8924 (0.0011) [2024-07-05 16:22:23,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 36564992. Throughput: 0: 2942.3. Samples: 4137322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:22:23,942][04005] Avg episode reward: [(0, '48.158')] [2024-07-05 16:22:26,379][04594] Updated weights for policy 0, policy_version 8934 (0.0011) [2024-07-05 16:22:28,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 36622336. Throughput: 0: 2937.5. Samples: 4154878. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:22:28,942][04005] Avg episode reward: [(0, '49.922')] [2024-07-05 16:22:29,856][04594] Updated weights for policy 0, policy_version 8944 (0.0011) [2024-07-05 16:22:33,340][04594] Updated weights for policy 0, policy_version 8954 (0.0011) [2024-07-05 16:22:33,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 36679680. Throughput: 0: 2942.6. Samples: 4163688. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:22:33,942][04005] Avg episode reward: [(0, '51.462')] [2024-07-05 16:22:34,037][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000008956_36683776.pth... [2024-07-05 16:22:34,107][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000008267_33861632.pth [2024-07-05 16:22:36,824][04594] Updated weights for policy 0, policy_version 8964 (0.0011) [2024-07-05 16:22:38,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 36741120. Throughput: 0: 2940.0. Samples: 4181376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:22:38,942][04005] Avg episode reward: [(0, '49.611')] [2024-07-05 16:22:40,326][04594] Updated weights for policy 0, policy_version 8974 (0.0012) [2024-07-05 16:22:43,810][04594] Updated weights for policy 0, policy_version 8984 (0.0011) [2024-07-05 16:22:43,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 36798464. Throughput: 0: 2936.8. Samples: 4198928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:22:43,942][04005] Avg episode reward: [(0, '48.276')] [2024-07-05 16:22:47,285][04594] Updated weights for policy 0, policy_version 8994 (0.0011) [2024-07-05 16:22:48,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 36855808. Throughput: 0: 2940.3. Samples: 4207780. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:22:48,943][04005] Avg episode reward: [(0, '47.943')] [2024-07-05 16:22:50,761][04594] Updated weights for policy 0, policy_version 9004 (0.0011) [2024-07-05 16:22:53,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 36917248. Throughput: 0: 2936.0. Samples: 4225402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:22:53,942][04005] Avg episode reward: [(0, '49.359')] [2024-07-05 16:22:54,264][04594] Updated weights for policy 0, policy_version 9014 (0.0011) [2024-07-05 16:22:57,734][04594] Updated weights for policy 0, policy_version 9024 (0.0011) [2024-07-05 16:22:58,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 36974592. Throughput: 0: 2937.2. Samples: 4242972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:22:58,942][04005] Avg episode reward: [(0, '50.127')] [2024-07-05 16:23:01,220][04594] Updated weights for policy 0, policy_version 9034 (0.0011) [2024-07-05 16:23:03,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 37031936. Throughput: 0: 2942.0. Samples: 4251930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:23:03,943][04005] Avg episode reward: [(0, '49.385')] [2024-07-05 16:23:04,696][04594] Updated weights for policy 0, policy_version 9044 (0.0011) [2024-07-05 16:23:08,174][04594] Updated weights for policy 0, policy_version 9054 (0.0011) [2024-07-05 16:23:08,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.2, 300 sec: 11760.4). Total num frames: 37093376. Throughput: 0: 2936.8. Samples: 4269476. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:23:08,942][04005] Avg episode reward: [(0, '48.462')] [2024-07-05 16:23:11,674][04594] Updated weights for policy 0, policy_version 9064 (0.0011) [2024-07-05 16:23:13,942][04005] Fps is (10 sec: 11878.2, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 37150720. Throughput: 0: 2935.9. Samples: 4286994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:23:13,942][04005] Avg episode reward: [(0, '48.973')] [2024-07-05 16:23:15,143][04594] Updated weights for policy 0, policy_version 9074 (0.0011) [2024-07-05 16:23:18,621][04594] Updated weights for policy 0, policy_version 9084 (0.0011) [2024-07-05 16:23:18,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 37208064. Throughput: 0: 2940.6. Samples: 4296014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:23:18,942][04005] Avg episode reward: [(0, '49.416')] [2024-07-05 16:23:22,098][04594] Updated weights for policy 0, policy_version 9094 (0.0011) [2024-07-05 16:23:23,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 37269504. Throughput: 0: 2937.1. Samples: 4313546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:23:23,943][04005] Avg episode reward: [(0, '52.101')] [2024-07-05 16:23:24,195][04581] Saving new best policy, reward=52.101! [2024-07-05 16:23:25,601][04594] Updated weights for policy 0, policy_version 9104 (0.0012) [2024-07-05 16:23:28,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 37326848. Throughput: 0: 2936.8. Samples: 4331084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:23:28,942][04005] Avg episode reward: [(0, '51.577')] [2024-07-05 16:23:29,080][04594] Updated weights for policy 0, policy_version 9114 (0.0013) [2024-07-05 16:23:32,563][04594] Updated weights for policy 0, policy_version 9124 (0.0011) [2024-07-05 16:23:33,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 37384192. Throughput: 0: 2939.8. Samples: 4340072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:23:33,942][04005] Avg episode reward: [(0, '49.821')] [2024-07-05 16:23:36,054][04594] Updated weights for policy 0, policy_version 9134 (0.0011) [2024-07-05 16:23:38,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 37445632. Throughput: 0: 2938.4. Samples: 4357630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:23:38,942][04005] Avg episode reward: [(0, '49.233')] [2024-07-05 16:23:39,537][04594] Updated weights for policy 0, policy_version 9144 (0.0011) [2024-07-05 16:23:43,012][04594] Updated weights for policy 0, policy_version 9154 (0.0011) [2024-07-05 16:23:43,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 37502976. Throughput: 0: 2937.9. Samples: 4375176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:23:43,942][04005] Avg episode reward: [(0, '49.435')] [2024-07-05 16:23:46,498][04594] Updated weights for policy 0, policy_version 9164 (0.0011) [2024-07-05 16:23:48,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.2, 300 sec: 11760.4). Total num frames: 37564416. Throughput: 0: 2938.1. Samples: 4384146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:23:48,942][04005] Avg episode reward: [(0, '50.496')] [2024-07-05 16:23:49,982][04594] Updated weights for policy 0, policy_version 9174 (0.0011) [2024-07-05 16:23:53,461][04594] Updated weights for policy 0, policy_version 9184 (0.0012) [2024-07-05 16:23:53,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 37621760. Throughput: 0: 2939.1. Samples: 4401736. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:23:53,944][04005] Avg episode reward: [(0, '50.089')] [2024-07-05 16:23:56,950][04594] Updated weights for policy 0, policy_version 9194 (0.0013) [2024-07-05 16:23:58,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 37679104. Throughput: 0: 2938.4. Samples: 4419224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:23:58,943][04005] Avg episode reward: [(0, '49.250')] [2024-07-05 16:24:00,429][04594] Updated weights for policy 0, policy_version 9204 (0.0011) [2024-07-05 16:24:03,923][04594] Updated weights for policy 0, policy_version 9214 (0.0011) [2024-07-05 16:24:03,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 37740544. Throughput: 0: 2938.3. Samples: 4428236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:24:03,942][04005] Avg episode reward: [(0, '48.943')] [2024-07-05 16:24:07,402][04594] Updated weights for policy 0, policy_version 9224 (0.0013) [2024-07-05 16:24:08,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 37797888. Throughput: 0: 2937.7. Samples: 4445744. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:24:08,942][04005] Avg episode reward: [(0, '49.042')] [2024-07-05 16:24:10,901][04594] Updated weights for policy 0, policy_version 9234 (0.0012) [2024-07-05 16:24:13,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 37855232. Throughput: 0: 2937.6. Samples: 4463278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:24:13,942][04005] Avg episode reward: [(0, '48.452')] [2024-07-05 16:24:14,390][04594] Updated weights for policy 0, policy_version 9244 (0.0014) [2024-07-05 16:24:17,883][04594] Updated weights for policy 0, policy_version 9254 (0.0011) [2024-07-05 16:24:18,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.2, 300 sec: 11760.4). Total num frames: 37916672. Throughput: 0: 2936.9. Samples: 4472232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:24:18,942][04005] Avg episode reward: [(0, '48.075')] [2024-07-05 16:24:21,364][04594] Updated weights for policy 0, policy_version 9264 (0.0012) [2024-07-05 16:24:23,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 37974016. Throughput: 0: 2936.8. Samples: 4489788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:24:23,942][04005] Avg episode reward: [(0, '49.301')] [2024-07-05 16:24:24,872][04594] Updated weights for policy 0, policy_version 9274 (0.0011) [2024-07-05 16:24:28,347][04594] Updated weights for policy 0, policy_version 9284 (0.0011) [2024-07-05 16:24:28,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 38031360. Throughput: 0: 2937.2. Samples: 4507348. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:24:28,942][04005] Avg episode reward: [(0, '49.878')] [2024-07-05 16:24:31,821][04594] Updated weights for policy 0, policy_version 9294 (0.0011) [2024-07-05 16:24:33,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 38092800. Throughput: 0: 2938.1. Samples: 4516360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:24:33,942][04005] Avg episode reward: [(0, '49.788')] [2024-07-05 16:24:33,945][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000009300_38092800.pth... [2024-07-05 16:24:34,018][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000008612_35274752.pth [2024-07-05 16:24:35,310][04594] Updated weights for policy 0, policy_version 9304 (0.0012) [2024-07-05 16:24:38,797][04594] Updated weights for policy 0, policy_version 9314 (0.0011) [2024-07-05 16:24:38,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 38150144. Throughput: 0: 2936.0. Samples: 4533858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:24:38,942][04005] Avg episode reward: [(0, '50.099')] [2024-07-05 16:24:42,292][04594] Updated weights for policy 0, policy_version 9324 (0.0011) [2024-07-05 16:24:43,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 38207488. Throughput: 0: 2937.1. Samples: 4551392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:24:43,942][04005] Avg episode reward: [(0, '51.074')] [2024-07-05 16:24:45,768][04594] Updated weights for policy 0, policy_version 9334 (0.0011) [2024-07-05 16:24:48,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 38268928. Throughput: 0: 2937.2. Samples: 4560408. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:24:48,942][04005] Avg episode reward: [(0, '51.541')] [2024-07-05 16:24:49,266][04594] Updated weights for policy 0, policy_version 9344 (0.0011) [2024-07-05 16:24:52,739][04594] Updated weights for policy 0, policy_version 9354 (0.0011) [2024-07-05 16:24:53,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 38326272. Throughput: 0: 2937.7. Samples: 4577940. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:24:53,942][04005] Avg episode reward: [(0, '51.766')] [2024-07-05 16:24:56,226][04594] Updated weights for policy 0, policy_version 9364 (0.0011) [2024-07-05 16:24:58,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 38383616. Throughput: 0: 2937.4. Samples: 4595460. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:24:58,942][04005] Avg episode reward: [(0, '49.415')] [2024-07-05 16:24:59,702][04594] Updated weights for policy 0, policy_version 9374 (0.0011) [2024-07-05 16:25:03,179][04594] Updated weights for policy 0, policy_version 9384 (0.0011) [2024-07-05 16:25:03,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 38445056. Throughput: 0: 2938.8. Samples: 4604476. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:25:03,942][04005] Avg episode reward: [(0, '49.255')] [2024-07-05 16:25:06,655][04594] Updated weights for policy 0, policy_version 9394 (0.0011) [2024-07-05 16:25:08,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 38502400. Throughput: 0: 2939.0. Samples: 4622042. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:25:08,943][04005] Avg episode reward: [(0, '49.161')] [2024-07-05 16:25:10,147][04594] Updated weights for policy 0, policy_version 9404 (0.0011) [2024-07-05 16:25:13,636][04594] Updated weights for policy 0, policy_version 9414 (0.0011) [2024-07-05 16:25:13,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 38559744. Throughput: 0: 2939.7. Samples: 4639636. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:25:13,942][04005] Avg episode reward: [(0, '50.114')] [2024-07-05 16:25:17,121][04594] Updated weights for policy 0, policy_version 9424 (0.0011) [2024-07-05 16:25:18,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 38621184. Throughput: 0: 2937.6. Samples: 4648554. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:25:18,942][04005] Avg episode reward: [(0, '50.532')] [2024-07-05 16:25:20,604][04594] Updated weights for policy 0, policy_version 9434 (0.0013) [2024-07-05 16:25:23,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 38678528. Throughput: 0: 2938.0. Samples: 4666066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:25:23,943][04005] Avg episode reward: [(0, '49.455')] [2024-07-05 16:25:24,083][04594] Updated weights for policy 0, policy_version 9444 (0.0011) [2024-07-05 16:25:27,572][04594] Updated weights for policy 0, policy_version 9454 (0.0011) [2024-07-05 16:25:28,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 38735872. Throughput: 0: 2942.4. Samples: 4683800. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:25:28,942][04005] Avg episode reward: [(0, '46.931')] [2024-07-05 16:25:31,053][04594] Updated weights for policy 0, policy_version 9464 (0.0011) [2024-07-05 16:25:33,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 38797312. Throughput: 0: 2937.2. Samples: 4692584. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:25:33,943][04005] Avg episode reward: [(0, '45.696')] [2024-07-05 16:25:34,531][04594] Updated weights for policy 0, policy_version 9474 (0.0011) [2024-07-05 16:25:38,004][04594] Updated weights for policy 0, policy_version 9484 (0.0011) [2024-07-05 16:25:38,942][04005] Fps is (10 sec: 11878.2, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 38854656. Throughput: 0: 2937.6. Samples: 4710130. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:25:38,942][04005] Avg episode reward: [(0, '44.838')] [2024-07-05 16:25:41,502][04594] Updated weights for policy 0, policy_version 9494 (0.0011) [2024-07-05 16:25:43,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.2, 300 sec: 11760.4). Total num frames: 38916096. Throughput: 0: 2945.7. Samples: 4728018. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:25:43,942][04005] Avg episode reward: [(0, '46.614')] [2024-07-05 16:25:44,982][04594] Updated weights for policy 0, policy_version 9504 (0.0013) [2024-07-05 16:25:48,454][04594] Updated weights for policy 0, policy_version 9514 (0.0011) [2024-07-05 16:25:48,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 38973440. Throughput: 0: 2938.4. Samples: 4736706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:25:48,943][04005] Avg episode reward: [(0, '48.443')] [2024-07-05 16:25:51,939][04594] Updated weights for policy 0, policy_version 9524 (0.0012) [2024-07-05 16:25:53,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 39030784. Throughput: 0: 2940.0. Samples: 4754342. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:25:53,942][04005] Avg episode reward: [(0, '48.767')] [2024-07-05 16:25:55,415][04594] Updated weights for policy 0, policy_version 9534 (0.0011) [2024-07-05 16:25:58,911][04594] Updated weights for policy 0, policy_version 9544 (0.0012) [2024-07-05 16:25:58,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 39092224. Throughput: 0: 2945.9. Samples: 4772202. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:25:58,942][04005] Avg episode reward: [(0, '50.872')] [2024-07-05 16:26:02,386][04594] Updated weights for policy 0, policy_version 9554 (0.0011) [2024-07-05 16:26:03,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 39149568. Throughput: 0: 2938.9. Samples: 4780806. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:26:03,943][04005] Avg episode reward: [(0, '50.645')] [2024-07-05 16:26:05,872][04594] Updated weights for policy 0, policy_version 9564 (0.0011) [2024-07-05 16:26:08,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 39206912. Throughput: 0: 2944.4. Samples: 4798562. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:26:08,942][04005] Avg episode reward: [(0, '51.197')] [2024-07-05 16:26:09,351][04594] Updated weights for policy 0, policy_version 9574 (0.0011) [2024-07-05 16:26:12,849][04594] Updated weights for policy 0, policy_version 9584 (0.0011) [2024-07-05 16:26:13,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 39268352. Throughput: 0: 2943.7. Samples: 4816268. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:26:13,942][04005] Avg episode reward: [(0, '50.613')] [2024-07-05 16:26:16,323][04594] Updated weights for policy 0, policy_version 9594 (0.0011) [2024-07-05 16:26:18,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 39325696. Throughput: 0: 2938.7. Samples: 4824826. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:26:18,942][04005] Avg episode reward: [(0, '48.845')] [2024-07-05 16:26:19,797][04594] Updated weights for policy 0, policy_version 9604 (0.0012) [2024-07-05 16:26:23,292][04594] Updated weights for policy 0, policy_version 9614 (0.0011) [2024-07-05 16:26:23,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 39383040. Throughput: 0: 2945.1. Samples: 4842658. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:26:23,943][04005] Avg episode reward: [(0, '45.157')] [2024-07-05 16:26:26,767][04594] Updated weights for policy 0, policy_version 9624 (0.0013) [2024-07-05 16:26:28,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 39444480. Throughput: 0: 2940.2. Samples: 4860326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:26:28,942][04005] Avg episode reward: [(0, '46.111')] [2024-07-05 16:26:30,252][04594] Updated weights for policy 0, policy_version 9634 (0.0011) [2024-07-05 16:26:33,730][04594] Updated weights for policy 0, policy_version 9644 (0.0011) [2024-07-05 16:26:33,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 39501824. Throughput: 0: 2939.4. Samples: 4868980. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:26:33,943][04005] Avg episode reward: [(0, '46.749')] [2024-07-05 16:26:34,075][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000009645_39505920.pth... [2024-07-05 16:26:34,147][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000008956_36683776.pth [2024-07-05 16:26:37,241][04594] Updated weights for policy 0, policy_version 9654 (0.0011) [2024-07-05 16:26:38,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 39559168. Throughput: 0: 2942.5. Samples: 4886754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:26:38,943][04005] Avg episode reward: [(0, '48.925')] [2024-07-05 16:26:40,721][04594] Updated weights for policy 0, policy_version 9664 (0.0013) [2024-07-05 16:26:43,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 39620608. Throughput: 0: 2936.5. Samples: 4904346. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:26:43,942][04005] Avg episode reward: [(0, '46.931')] [2024-07-05 16:26:44,205][04594] Updated weights for policy 0, policy_version 9674 (0.0011) [2024-07-05 16:26:47,686][04594] Updated weights for policy 0, policy_version 9684 (0.0011) [2024-07-05 16:26:48,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 39677952. Throughput: 0: 2936.5. Samples: 4912948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:26:48,943][04005] Avg episode reward: [(0, '46.808')] [2024-07-05 16:26:51,171][04594] Updated weights for policy 0, policy_version 9694 (0.0013) [2024-07-05 16:26:53,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 39735296. Throughput: 0: 2940.0. Samples: 4930862. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:26:53,942][04005] Avg episode reward: [(0, '49.176')] [2024-07-05 16:26:54,663][04594] Updated weights for policy 0, policy_version 9704 (0.0012) [2024-07-05 16:26:58,135][04594] Updated weights for policy 0, policy_version 9714 (0.0012) [2024-07-05 16:26:58,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 39796736. Throughput: 0: 2936.4. Samples: 4948406. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:26:58,942][04005] Avg episode reward: [(0, '50.578')] [2024-07-05 16:27:01,617][04594] Updated weights for policy 0, policy_version 9724 (0.0011) [2024-07-05 16:27:03,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 39854080. Throughput: 0: 2941.0. Samples: 4957170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:27:03,942][04005] Avg episode reward: [(0, '48.457')] [2024-07-05 16:27:05,098][04594] Updated weights for policy 0, policy_version 9734 (0.0011) [2024-07-05 16:27:08,577][04594] Updated weights for policy 0, policy_version 9744 (0.0011) [2024-07-05 16:27:08,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 39915520. Throughput: 0: 2939.3. Samples: 4974926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:27:08,942][04005] Avg episode reward: [(0, '49.329')] [2024-07-05 16:27:12,081][04594] Updated weights for policy 0, policy_version 9754 (0.0011) [2024-07-05 16:27:13,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 39972864. Throughput: 0: 2935.8. Samples: 4992436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:27:13,942][04005] Avg episode reward: [(0, '50.260')] [2024-07-05 16:27:15,569][04594] Updated weights for policy 0, policy_version 9764 (0.0011) [2024-07-05 16:27:18,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 40030208. Throughput: 0: 2939.1. Samples: 5001240. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:27:18,942][04005] Avg episode reward: [(0, '50.302')] [2024-07-05 16:27:19,046][04594] Updated weights for policy 0, policy_version 9774 (0.0013) [2024-07-05 16:27:22,522][04594] Updated weights for policy 0, policy_version 9784 (0.0013) [2024-07-05 16:27:23,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 40091648. Throughput: 0: 2938.8. Samples: 5018998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:27:23,942][04005] Avg episode reward: [(0, '48.885')] [2024-07-05 16:27:25,996][04594] Updated weights for policy 0, policy_version 9794 (0.0012) [2024-07-05 16:27:28,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 40148992. Throughput: 0: 2937.8. Samples: 5036548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:27:28,942][04005] Avg episode reward: [(0, '48.681')] [2024-07-05 16:27:29,477][04594] Updated weights for policy 0, policy_version 9804 (0.0011) [2024-07-05 16:27:32,961][04594] Updated weights for policy 0, policy_version 9814 (0.0011) [2024-07-05 16:27:33,942][04005] Fps is (10 sec: 11468.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 40206336. Throughput: 0: 2946.3. Samples: 5045534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:27:33,943][04005] Avg episode reward: [(0, '49.883')] [2024-07-05 16:27:36,433][04594] Updated weights for policy 0, policy_version 9824 (0.0013) [2024-07-05 16:27:38,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.2, 300 sec: 11760.4). Total num frames: 40267776. Throughput: 0: 2939.0. Samples: 5063118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:27:38,942][04005] Avg episode reward: [(0, '50.528')] [2024-07-05 16:27:39,948][04594] Updated weights for policy 0, policy_version 9834 (0.0012) [2024-07-05 16:27:43,427][04594] Updated weights for policy 0, policy_version 9844 (0.0011) [2024-07-05 16:27:43,942][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 40325120. Throughput: 0: 2938.1. Samples: 5080620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:27:43,942][04005] Avg episode reward: [(0, '50.736')] [2024-07-05 16:27:46,912][04594] Updated weights for policy 0, policy_version 9854 (0.0011) [2024-07-05 16:27:48,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 40382464. Throughput: 0: 2943.1. Samples: 5089610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:27:48,942][04005] Avg episode reward: [(0, '50.498')] [2024-07-05 16:27:50,393][04594] Updated weights for policy 0, policy_version 9864 (0.0011) [2024-07-05 16:27:53,861][04594] Updated weights for policy 0, policy_version 9874 (0.0011) [2024-07-05 16:27:53,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11810.2, 300 sec: 11760.4). Total num frames: 40443904. Throughput: 0: 2938.5. Samples: 5107158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:27:53,942][04005] Avg episode reward: [(0, '52.749')] [2024-07-05 16:27:53,945][04581] Saving new best policy, reward=52.749! [2024-07-05 16:27:57,364][04594] Updated weights for policy 0, policy_version 9884 (0.0011) [2024-07-05 16:27:58,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 40501248. Throughput: 0: 2939.4. Samples: 5124710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:27:58,943][04005] Avg episode reward: [(0, '53.010')] [2024-07-05 16:27:59,091][04581] Saving new best policy, reward=53.010! [2024-07-05 16:28:00,850][04594] Updated weights for policy 0, policy_version 9894 (0.0011) [2024-07-05 16:28:03,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 40558592. Throughput: 0: 2942.9. Samples: 5133668. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-07-05 16:28:03,942][04005] Avg episode reward: [(0, '51.865')] [2024-07-05 16:28:04,327][04594] Updated weights for policy 0, policy_version 9904 (0.0011) [2024-07-05 16:28:07,809][04594] Updated weights for policy 0, policy_version 9914 (0.0011) [2024-07-05 16:28:08,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 40620032. Throughput: 0: 2938.2. Samples: 5151218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:28:08,942][04005] Avg episode reward: [(0, '51.416')] [2024-07-05 16:28:11,309][04594] Updated weights for policy 0, policy_version 9924 (0.0011) [2024-07-05 16:28:13,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 40677376. Throughput: 0: 2938.0. Samples: 5168756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:28:13,943][04005] Avg episode reward: [(0, '50.699')] [2024-07-05 16:28:14,780][04594] Updated weights for policy 0, policy_version 9934 (0.0011) [2024-07-05 16:28:18,280][04594] Updated weights for policy 0, policy_version 9944 (0.0011) [2024-07-05 16:28:18,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 40734720. Throughput: 0: 2936.9. Samples: 5177694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:28:18,943][04005] Avg episode reward: [(0, '51.029')] [2024-07-05 16:28:21,759][04594] Updated weights for policy 0, policy_version 9954 (0.0011) [2024-07-05 16:28:23,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 40796160. Throughput: 0: 2936.7. Samples: 5195268. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:28:23,943][04005] Avg episode reward: [(0, '51.319')] [2024-07-05 16:28:25,243][04594] Updated weights for policy 0, policy_version 9964 (0.0012) [2024-07-05 16:28:28,716][04594] Updated weights for policy 0, policy_version 9974 (0.0012) [2024-07-05 16:28:28,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 40853504. Throughput: 0: 2937.5. Samples: 5212808. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:28:28,942][04005] Avg episode reward: [(0, '50.038')] [2024-07-05 16:28:32,195][04594] Updated weights for policy 0, policy_version 9984 (0.0012) [2024-07-05 16:28:33,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11810.2, 300 sec: 11760.4). Total num frames: 40914944. Throughput: 0: 2938.0. Samples: 5221818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:28:33,942][04005] Avg episode reward: [(0, '50.502')] [2024-07-05 16:28:33,945][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000009989_40914944.pth... [2024-07-05 16:28:34,019][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000009300_38092800.pth [2024-07-05 16:28:35,691][04594] Updated weights for policy 0, policy_version 9994 (0.0011) [2024-07-05 16:28:38,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 40972288. Throughput: 0: 2936.9. Samples: 5239318. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:28:38,942][04005] Avg episode reward: [(0, '50.696')] [2024-07-05 16:28:39,170][04594] Updated weights for policy 0, policy_version 10004 (0.0011) [2024-07-05 16:28:42,665][04594] Updated weights for policy 0, policy_version 10014 (0.0012) [2024-07-05 16:28:43,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 41029632. Throughput: 0: 2936.5. Samples: 5256852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:28:43,942][04005] Avg episode reward: [(0, '52.575')] [2024-07-05 16:28:46,140][04594] Updated weights for policy 0, policy_version 10024 (0.0011) [2024-07-05 16:28:48,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 41091072. Throughput: 0: 2937.2. Samples: 5265840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:28:48,942][04005] Avg episode reward: [(0, '52.598')] [2024-07-05 16:28:49,624][04594] Updated weights for policy 0, policy_version 10034 (0.0013) [2024-07-05 16:28:53,107][04594] Updated weights for policy 0, policy_version 10044 (0.0011) [2024-07-05 16:28:53,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 41148416. Throughput: 0: 2937.2. Samples: 5283390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:28:53,942][04005] Avg episode reward: [(0, '52.393')] [2024-07-05 16:28:56,595][04594] Updated weights for policy 0, policy_version 10054 (0.0012) [2024-07-05 16:28:58,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 41205760. Throughput: 0: 2936.9. Samples: 5300918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:28:58,942][04005] Avg episode reward: [(0, '51.448')] [2024-07-05 16:29:00,069][04594] Updated weights for policy 0, policy_version 10064 (0.0011) [2024-07-05 16:29:03,558][04594] Updated weights for policy 0, policy_version 10074 (0.0011) [2024-07-05 16:29:03,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 41267200. Throughput: 0: 2938.7. Samples: 5309934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:29:03,942][04005] Avg episode reward: [(0, '50.519')] [2024-07-05 16:29:07,035][04594] Updated weights for policy 0, policy_version 10084 (0.0011) [2024-07-05 16:29:08,942][04005] Fps is (10 sec: 11878.2, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 41324544. Throughput: 0: 2938.3. Samples: 5327494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:29:08,943][04005] Avg episode reward: [(0, '49.900')] [2024-07-05 16:29:10,521][04594] Updated weights for policy 0, policy_version 10094 (0.0012) [2024-07-05 16:29:13,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 41381888. Throughput: 0: 2936.9. Samples: 5344968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:29:13,943][04005] Avg episode reward: [(0, '50.178')] [2024-07-05 16:29:14,014][04594] Updated weights for policy 0, policy_version 10104 (0.0011) [2024-07-05 16:29:17,486][04594] Updated weights for policy 0, policy_version 10114 (0.0012) [2024-07-05 16:29:18,942][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 41443328. Throughput: 0: 2936.6. Samples: 5353966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:29:18,942][04005] Avg episode reward: [(0, '50.181')] [2024-07-05 16:29:20,961][04594] Updated weights for policy 0, policy_version 10124 (0.0013) [2024-07-05 16:29:23,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 41500672. Throughput: 0: 2938.3. Samples: 5371540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:29:23,943][04005] Avg episode reward: [(0, '49.918')] [2024-07-05 16:29:24,440][04594] Updated weights for policy 0, policy_version 10134 (0.0013) [2024-07-05 16:29:27,918][04594] Updated weights for policy 0, policy_version 10144 (0.0012) [2024-07-05 16:29:28,942][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 41558016. Throughput: 0: 2942.8. Samples: 5389278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:29:28,942][04005] Avg episode reward: [(0, '48.692')] [2024-07-05 16:29:31,436][04594] Updated weights for policy 0, policy_version 10154 (0.0011) [2024-07-05 16:29:33,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 41619456. Throughput: 0: 2937.5. Samples: 5398028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:29:33,942][04005] Avg episode reward: [(0, '49.097')] [2024-07-05 16:29:34,937][04594] Updated weights for policy 0, policy_version 10164 (0.0011) [2024-07-05 16:29:38,428][04594] Updated weights for policy 0, policy_version 10174 (0.0013) [2024-07-05 16:29:38,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 41676800. Throughput: 0: 2936.5. Samples: 5415534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:29:38,943][04005] Avg episode reward: [(0, '50.047')] [2024-07-05 16:29:41,926][04594] Updated weights for policy 0, policy_version 10184 (0.0012) [2024-07-05 16:29:43,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 41734144. Throughput: 0: 2936.3. Samples: 5433052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:29:43,942][04005] Avg episode reward: [(0, '48.867')] [2024-07-05 16:29:45,407][04594] Updated weights for policy 0, policy_version 10194 (0.0011) [2024-07-05 16:29:48,896][04594] Updated weights for policy 0, policy_version 10204 (0.0011) [2024-07-05 16:29:48,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 41795584. Throughput: 0: 2936.4. Samples: 5442072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:29:48,942][04005] Avg episode reward: [(0, '49.818')] [2024-07-05 16:29:52,382][04594] Updated weights for policy 0, policy_version 10214 (0.0011) [2024-07-05 16:29:53,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 41852928. Throughput: 0: 2934.5. Samples: 5459546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:29:53,942][04005] Avg episode reward: [(0, '49.389')] [2024-07-05 16:29:55,874][04594] Updated weights for policy 0, policy_version 10224 (0.0011) [2024-07-05 16:29:58,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 41910272. Throughput: 0: 2936.0. Samples: 5477086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:29:58,942][04005] Avg episode reward: [(0, '48.155')] [2024-07-05 16:29:59,352][04594] Updated weights for policy 0, policy_version 10234 (0.0012) [2024-07-05 16:30:02,821][04594] Updated weights for policy 0, policy_version 10244 (0.0011) [2024-07-05 16:30:03,942][04005] Fps is (10 sec: 11878.2, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 41971712. Throughput: 0: 2936.6. Samples: 5486112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:30:03,943][04005] Avg episode reward: [(0, '45.773')] [2024-07-05 16:30:06,298][04594] Updated weights for policy 0, policy_version 10254 (0.0013) [2024-07-05 16:30:08,942][04005] Fps is (10 sec: 11877.9, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 42029056. Throughput: 0: 2935.8. Samples: 5503650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:30:08,943][04005] Avg episode reward: [(0, '44.966')] [2024-07-05 16:30:09,787][04594] Updated weights for policy 0, policy_version 10264 (0.0011) [2024-07-05 16:30:13,276][04594] Updated weights for policy 0, policy_version 10274 (0.0011) [2024-07-05 16:30:13,941][04005] Fps is (10 sec: 11469.0, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 42086400. Throughput: 0: 2933.8. Samples: 5521300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:30:13,943][04005] Avg episode reward: [(0, '47.862')] [2024-07-05 16:30:16,754][04594] Updated weights for policy 0, policy_version 10284 (0.0011) [2024-07-05 16:30:18,942][04005] Fps is (10 sec: 11878.8, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 42147840. Throughput: 0: 2935.7. Samples: 5530134. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:30:18,942][04005] Avg episode reward: [(0, '49.156')] [2024-07-05 16:30:20,252][04594] Updated weights for policy 0, policy_version 10294 (0.0013) [2024-07-05 16:30:23,735][04594] Updated weights for policy 0, policy_version 10304 (0.0012) [2024-07-05 16:30:23,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 42205184. Throughput: 0: 2936.0. Samples: 5547656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:30:23,942][04005] Avg episode reward: [(0, '50.223')] [2024-07-05 16:30:27,230][04594] Updated weights for policy 0, policy_version 10314 (0.0011) [2024-07-05 16:30:28,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 42262528. Throughput: 0: 2940.8. Samples: 5565386. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:30:28,942][04005] Avg episode reward: [(0, '49.594')] [2024-07-05 16:30:30,710][04594] Updated weights for policy 0, policy_version 10324 (0.0011) [2024-07-05 16:30:33,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 42323968. Throughput: 0: 2935.7. Samples: 5574180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:30:33,942][04005] Avg episode reward: [(0, '48.195')] [2024-07-05 16:30:34,192][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000010334_42328064.pth... [2024-07-05 16:30:34,194][04594] Updated weights for policy 0, policy_version 10334 (0.0011) [2024-07-05 16:30:34,263][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000009645_39505920.pth [2024-07-05 16:30:37,686][04594] Updated weights for policy 0, policy_version 10344 (0.0012) [2024-07-05 16:30:38,942][04005] Fps is (10 sec: 11878.2, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 42381312. Throughput: 0: 2937.1. Samples: 5591716. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:30:38,943][04005] Avg episode reward: [(0, '46.474')] [2024-07-05 16:30:41,185][04594] Updated weights for policy 0, policy_version 10354 (0.0011) [2024-07-05 16:30:43,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 42438656. Throughput: 0: 2939.2. Samples: 5609350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:30:43,942][04005] Avg episode reward: [(0, '45.812')] [2024-07-05 16:30:44,669][04594] Updated weights for policy 0, policy_version 10364 (0.0011) [2024-07-05 16:30:48,158][04594] Updated weights for policy 0, policy_version 10374 (0.0011) [2024-07-05 16:30:48,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 42500096. Throughput: 0: 2935.1. Samples: 5618192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:30:48,942][04005] Avg episode reward: [(0, '46.339')] [2024-07-05 16:30:51,648][04594] Updated weights for policy 0, policy_version 10384 (0.0011) [2024-07-05 16:30:53,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 42557440. Throughput: 0: 2936.3. Samples: 5635780. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:30:53,942][04005] Avg episode reward: [(0, '46.719')] [2024-07-05 16:30:55,122][04594] Updated weights for policy 0, policy_version 10394 (0.0012) [2024-07-05 16:30:58,617][04594] Updated weights for policy 0, policy_version 10404 (0.0012) [2024-07-05 16:30:58,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 42614784. Throughput: 0: 2938.6. Samples: 5653536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:30:58,942][04005] Avg episode reward: [(0, '48.068')] [2024-07-05 16:31:02,100][04594] Updated weights for policy 0, policy_version 10414 (0.0011) [2024-07-05 16:31:03,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 42676224. Throughput: 0: 2937.2. Samples: 5662308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:31:03,942][04005] Avg episode reward: [(0, '48.511')] [2024-07-05 16:31:05,595][04594] Updated weights for policy 0, policy_version 10424 (0.0011) [2024-07-05 16:31:08,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11742.0, 300 sec: 11746.5). Total num frames: 42733568. Throughput: 0: 2936.7. Samples: 5679806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:31:08,942][04005] Avg episode reward: [(0, '50.677')] [2024-07-05 16:31:09,086][04594] Updated weights for policy 0, policy_version 10434 (0.0011) [2024-07-05 16:31:12,572][04594] Updated weights for policy 0, policy_version 10444 (0.0011) [2024-07-05 16:31:13,942][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 42790912. Throughput: 0: 2937.0. Samples: 5697552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:31:13,943][04005] Avg episode reward: [(0, '51.728')] [2024-07-05 16:31:16,063][04594] Updated weights for policy 0, policy_version 10454 (0.0012) [2024-07-05 16:31:18,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 42852352. Throughput: 0: 2936.8. Samples: 5706334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:31:18,943][04005] Avg episode reward: [(0, '50.396')] [2024-07-05 16:31:19,540][04594] Updated weights for policy 0, policy_version 10464 (0.0011) [2024-07-05 16:31:23,015][04594] Updated weights for policy 0, policy_version 10474 (0.0011) [2024-07-05 16:31:23,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 42909696. Throughput: 0: 2936.2. Samples: 5723844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:31:23,943][04005] Avg episode reward: [(0, '49.326')] [2024-07-05 16:31:26,499][04594] Updated weights for policy 0, policy_version 10484 (0.0013) [2024-07-05 16:31:28,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 42971136. Throughput: 0: 2943.6. Samples: 5741810. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:31:28,942][04005] Avg episode reward: [(0, '48.535')] [2024-07-05 16:31:29,977][04594] Updated weights for policy 0, policy_version 10494 (0.0011) [2024-07-05 16:31:33,448][04594] Updated weights for policy 0, policy_version 10504 (0.0011) [2024-07-05 16:31:33,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 43028480. Throughput: 0: 2937.9. Samples: 5750396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:31:33,942][04005] Avg episode reward: [(0, '48.514')] [2024-07-05 16:31:36,953][04594] Updated weights for policy 0, policy_version 10514 (0.0012) [2024-07-05 16:31:38,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 43085824. Throughput: 0: 2938.1. Samples: 5767994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:31:38,942][04005] Avg episode reward: [(0, '48.782')] [2024-07-05 16:31:40,447][04594] Updated weights for policy 0, policy_version 10524 (0.0011) [2024-07-05 16:31:43,942][04005] Fps is (10 sec: 11468.5, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 43143168. Throughput: 0: 2937.7. Samples: 5785734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:31:43,943][04005] Avg episode reward: [(0, '48.862')] [2024-07-05 16:31:43,945][04594] Updated weights for policy 0, policy_version 10534 (0.0012) [2024-07-05 16:31:47,433][04594] Updated weights for policy 0, policy_version 10544 (0.0011) [2024-07-05 16:31:48,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 43204608. Throughput: 0: 2935.7. Samples: 5794414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:31:48,942][04005] Avg episode reward: [(0, '47.496')] [2024-07-05 16:31:50,927][04594] Updated weights for policy 0, policy_version 10554 (0.0012) [2024-07-05 16:31:53,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 43261952. Throughput: 0: 2936.3. Samples: 5811940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:31:53,942][04005] Avg episode reward: [(0, '49.390')] [2024-07-05 16:31:54,392][04594] Updated weights for policy 0, policy_version 10564 (0.0011) [2024-07-05 16:31:57,892][04594] Updated weights for policy 0, policy_version 10574 (0.0011) [2024-07-05 16:31:58,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.2, 300 sec: 11760.4). Total num frames: 43323392. Throughput: 0: 2938.5. Samples: 5829784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:31:58,942][04005] Avg episode reward: [(0, '48.598')] [2024-07-05 16:32:01,374][04594] Updated weights for policy 0, policy_version 10584 (0.0011) [2024-07-05 16:32:03,942][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 43380736. Throughput: 0: 2936.1. Samples: 5838460. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:32:03,942][04005] Avg episode reward: [(0, '49.772')] [2024-07-05 16:32:04,859][04594] Updated weights for policy 0, policy_version 10594 (0.0012) [2024-07-05 16:32:08,343][04594] Updated weights for policy 0, policy_version 10604 (0.0011) [2024-07-05 16:32:08,942][04005] Fps is (10 sec: 11467.9, 60 sec: 11741.7, 300 sec: 11746.5). Total num frames: 43438080. Throughput: 0: 2937.3. Samples: 5856024. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:32:08,945][04005] Avg episode reward: [(0, '49.480')] [2024-07-05 16:32:11,827][04594] Updated weights for policy 0, policy_version 10614 (0.0013) [2024-07-05 16:32:13,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11810.2, 300 sec: 11760.4). Total num frames: 43499520. Throughput: 0: 2935.2. Samples: 5873894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:32:13,942][04005] Avg episode reward: [(0, '50.589')] [2024-07-05 16:32:15,318][04594] Updated weights for policy 0, policy_version 10624 (0.0015) [2024-07-05 16:32:18,792][04594] Updated weights for policy 0, policy_version 10634 (0.0011) [2024-07-05 16:32:18,941][04005] Fps is (10 sec: 11879.1, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 43556864. Throughput: 0: 2934.9. Samples: 5882466. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:32:18,942][04005] Avg episode reward: [(0, '49.487')] [2024-07-05 16:32:22,274][04594] Updated weights for policy 0, policy_version 10644 (0.0011) [2024-07-05 16:32:23,942][04005] Fps is (10 sec: 11468.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 43614208. Throughput: 0: 2937.5. Samples: 5900184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:32:23,942][04005] Avg episode reward: [(0, '50.762')] [2024-07-05 16:32:25,765][04594] Updated weights for policy 0, policy_version 10654 (0.0011) [2024-07-05 16:32:28,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 43675648. Throughput: 0: 2938.2. Samples: 5917954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:32:28,942][04005] Avg episode reward: [(0, '48.951')] [2024-07-05 16:32:29,259][04594] Updated weights for policy 0, policy_version 10664 (0.0011) [2024-07-05 16:32:32,753][04594] Updated weights for policy 0, policy_version 10674 (0.0011) [2024-07-05 16:32:33,942][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 43732992. Throughput: 0: 2935.0. Samples: 5926490. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:32:33,943][04005] Avg episode reward: [(0, '50.421')] [2024-07-05 16:32:34,147][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000010678_43737088.pth... [2024-07-05 16:32:34,219][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000009989_40914944.pth [2024-07-05 16:32:36,254][04594] Updated weights for policy 0, policy_version 10684 (0.0013) [2024-07-05 16:32:38,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 43790336. Throughput: 0: 2935.8. Samples: 5944050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:32:38,942][04005] Avg episode reward: [(0, '49.034')] [2024-07-05 16:32:39,751][04594] Updated weights for policy 0, policy_version 10694 (0.0012) [2024-07-05 16:32:43,253][04594] Updated weights for policy 0, policy_version 10704 (0.0011) [2024-07-05 16:32:43,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11810.2, 300 sec: 11760.4). Total num frames: 43851776. Throughput: 0: 2935.0. Samples: 5961858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:32:43,942][04005] Avg episode reward: [(0, '49.506')] [2024-07-05 16:32:46,729][04594] Updated weights for policy 0, policy_version 10714 (0.0011) [2024-07-05 16:32:48,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 43909120. Throughput: 0: 2934.2. Samples: 5970498. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:32:48,942][04005] Avg episode reward: [(0, '49.308')] [2024-07-05 16:32:50,224][04594] Updated weights for policy 0, policy_version 10724 (0.0011) [2024-07-05 16:32:53,695][04594] Updated weights for policy 0, policy_version 10734 (0.0011) [2024-07-05 16:32:53,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 43966464. Throughput: 0: 2933.6. Samples: 5988034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:32:53,942][04005] Avg episode reward: [(0, '49.224')] [2024-07-05 16:32:57,180][04594] Updated weights for policy 0, policy_version 10744 (0.0011) [2024-07-05 16:32:58,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 44027904. Throughput: 0: 2934.9. Samples: 6005966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:32:58,942][04005] Avg episode reward: [(0, '48.291')] [2024-07-05 16:33:00,677][04594] Updated weights for policy 0, policy_version 10754 (0.0013) [2024-07-05 16:33:03,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 44085248. Throughput: 0: 2934.6. Samples: 6014522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 16:33:03,943][04005] Avg episode reward: [(0, '47.299')] [2024-07-05 16:33:04,150][04594] Updated weights for policy 0, policy_version 10764 (0.0011) [2024-07-05 16:33:07,645][04594] Updated weights for policy 0, policy_version 10774 (0.0012) [2024-07-05 16:33:08,942][04005] Fps is (10 sec: 11468.8, 60 sec: 11742.0, 300 sec: 11746.5). Total num frames: 44142592. Throughput: 0: 2931.7. Samples: 6032110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 16:33:08,943][04005] Avg episode reward: [(0, '49.080')] [2024-07-05 16:33:11,130][04594] Updated weights for policy 0, policy_version 10784 (0.0011) [2024-07-05 16:33:13,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 44204032. Throughput: 0: 2934.8. Samples: 6050018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 16:33:13,942][04005] Avg episode reward: [(0, '50.970')] [2024-07-05 16:33:14,620][04594] Updated weights for policy 0, policy_version 10794 (0.0013) [2024-07-05 16:33:18,108][04594] Updated weights for policy 0, policy_version 10804 (0.0011) [2024-07-05 16:33:18,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 44261376. Throughput: 0: 2935.2. Samples: 6058574. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:33:18,942][04005] Avg episode reward: [(0, '52.729')] [2024-07-05 16:33:21,598][04594] Updated weights for policy 0, policy_version 10814 (0.0011) [2024-07-05 16:33:23,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 44318720. Throughput: 0: 2935.2. Samples: 6076136. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:33:23,943][04005] Avg episode reward: [(0, '50.394')] [2024-07-05 16:33:25,075][04594] Updated weights for policy 0, policy_version 10824 (0.0013) [2024-07-05 16:33:28,559][04594] Updated weights for policy 0, policy_version 10834 (0.0012) [2024-07-05 16:33:28,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 44380160. Throughput: 0: 2937.5. Samples: 6094046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 16:33:28,942][04005] Avg episode reward: [(0, '49.018')] [2024-07-05 16:33:32,047][04594] Updated weights for policy 0, policy_version 10844 (0.0011) [2024-07-05 16:33:33,941][04005] Fps is (10 sec: 11878.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 44437504. Throughput: 0: 2935.4. Samples: 6102592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 16:33:33,942][04005] Avg episode reward: [(0, '47.904')] [2024-07-05 16:33:35,538][04594] Updated weights for policy 0, policy_version 10854 (0.0011) [2024-07-05 16:33:38,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 44494848. Throughput: 0: 2937.5. Samples: 6120220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 16:33:38,942][04005] Avg episode reward: [(0, '48.791')] [2024-07-05 16:33:39,028][04594] Updated weights for policy 0, policy_version 10864 (0.0011) [2024-07-05 16:33:42,519][04594] Updated weights for policy 0, policy_version 10874 (0.0011) [2024-07-05 16:33:43,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 44556288. Throughput: 0: 2936.1. Samples: 6138090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 16:33:43,942][04005] Avg episode reward: [(0, '48.344')] [2024-07-05 16:33:46,007][04594] Updated weights for policy 0, policy_version 10884 (0.0014) [2024-07-05 16:33:48,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 44613632. Throughput: 0: 2935.7. Samples: 6146628. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 16:33:48,942][04005] Avg episode reward: [(0, '48.669')] [2024-07-05 16:33:49,493][04594] Updated weights for policy 0, policy_version 10894 (0.0011) [2024-07-05 16:33:52,986][04594] Updated weights for policy 0, policy_version 10904 (0.0012) [2024-07-05 16:33:53,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 44670976. Throughput: 0: 2937.6. Samples: 6164302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:33:53,942][04005] Avg episode reward: [(0, '48.497')] [2024-07-05 16:33:56,460][04594] Updated weights for policy 0, policy_version 10914 (0.0011) [2024-07-05 16:33:58,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 44732416. Throughput: 0: 2936.8. Samples: 6182176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:33:58,942][04005] Avg episode reward: [(0, '50.216')] [2024-07-05 16:33:59,949][04594] Updated weights for policy 0, policy_version 10924 (0.0011) [2024-07-05 16:34:03,422][04594] Updated weights for policy 0, policy_version 10934 (0.0011) [2024-07-05 16:34:03,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 44789760. Throughput: 0: 2936.9. Samples: 6190736. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:34:03,943][04005] Avg episode reward: [(0, '51.098')] [2024-07-05 16:34:06,902][04594] Updated weights for policy 0, policy_version 10944 (0.0013) [2024-07-05 16:34:08,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 44847104. Throughput: 0: 2942.8. Samples: 6208560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:34:08,943][04005] Avg episode reward: [(0, '51.417')] [2024-07-05 16:34:10,407][04594] Updated weights for policy 0, policy_version 10954 (0.0011) [2024-07-05 16:34:13,884][04594] Updated weights for policy 0, policy_version 10964 (0.0011) [2024-07-05 16:34:13,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 44908544. Throughput: 0: 2937.2. Samples: 6226218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:34:13,942][04005] Avg episode reward: [(0, '52.335')] [2024-07-05 16:34:17,376][04594] Updated weights for policy 0, policy_version 10974 (0.0011) [2024-07-05 16:34:18,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 44965888. Throughput: 0: 2936.8. Samples: 6234748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:34:18,942][04005] Avg episode reward: [(0, '50.808')] [2024-07-05 16:34:20,873][04594] Updated weights for policy 0, policy_version 10984 (0.0013) [2024-07-05 16:34:23,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 45023232. Throughput: 0: 2939.2. Samples: 6252486. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:34:23,943][04005] Avg episode reward: [(0, '50.446')] [2024-07-05 16:34:24,349][04594] Updated weights for policy 0, policy_version 10994 (0.0011) [2024-07-05 16:34:27,840][04594] Updated weights for policy 0, policy_version 11004 (0.0011) [2024-07-05 16:34:28,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 45084672. Throughput: 0: 2937.2. Samples: 6270262. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:34:28,942][04005] Avg episode reward: [(0, '49.619')] [2024-07-05 16:34:31,314][04594] Updated weights for policy 0, policy_version 11014 (0.0012) [2024-07-05 16:34:33,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 45142016. Throughput: 0: 2937.7. Samples: 6278824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:34:33,942][04005] Avg episode reward: [(0, '49.265')] [2024-07-05 16:34:34,095][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000011022_45146112.pth... [2024-07-05 16:34:34,167][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000010334_42328064.pth [2024-07-05 16:34:34,809][04594] Updated weights for policy 0, policy_version 11024 (0.0011) [2024-07-05 16:34:38,278][04594] Updated weights for policy 0, policy_version 11034 (0.0011) [2024-07-05 16:34:38,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 45199360. Throughput: 0: 2942.9. Samples: 6296734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:34:38,942][04005] Avg episode reward: [(0, '49.671')] [2024-07-05 16:34:41,763][04594] Updated weights for policy 0, policy_version 11044 (0.0011) [2024-07-05 16:34:43,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 45260800. Throughput: 0: 2936.7. Samples: 6314326. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:34:43,942][04005] Avg episode reward: [(0, '49.837')] [2024-07-05 16:34:45,267][04594] Updated weights for policy 0, policy_version 11054 (0.0012) [2024-07-05 16:34:48,738][04594] Updated weights for policy 0, policy_version 11064 (0.0011) [2024-07-05 16:34:48,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 45318144. Throughput: 0: 2936.8. Samples: 6322892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:34:48,942][04005] Avg episode reward: [(0, '49.859')] [2024-07-05 16:34:52,231][04594] Updated weights for policy 0, policy_version 11074 (0.0011) [2024-07-05 16:34:53,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 45375488. Throughput: 0: 2939.2. Samples: 6340824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:34:53,942][04005] Avg episode reward: [(0, '49.523')] [2024-07-05 16:34:55,720][04594] Updated weights for policy 0, policy_version 11084 (0.0011) [2024-07-05 16:34:58,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 45436928. Throughput: 0: 2937.4. Samples: 6358400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:34:58,942][04005] Avg episode reward: [(0, '49.690')] [2024-07-05 16:34:59,196][04594] Updated weights for policy 0, policy_version 11094 (0.0012) [2024-07-05 16:35:02,684][04594] Updated weights for policy 0, policy_version 11104 (0.0013) [2024-07-05 16:35:03,942][04005] Fps is (10 sec: 11878.2, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 45494272. Throughput: 0: 2940.7. Samples: 6367082. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:35:03,943][04005] Avg episode reward: [(0, '50.336')] [2024-07-05 16:35:06,168][04594] Updated weights for policy 0, policy_version 11114 (0.0011) [2024-07-05 16:35:08,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 45551616. Throughput: 0: 2943.6. Samples: 6384948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-05 16:35:08,942][04005] Avg episode reward: [(0, '51.464')] [2024-07-05 16:35:09,670][04594] Updated weights for policy 0, policy_version 11124 (0.0012) [2024-07-05 16:35:13,145][04594] Updated weights for policy 0, policy_version 11134 (0.0011) [2024-07-05 16:35:13,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 45613056. Throughput: 0: 2938.3. Samples: 6402488. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 16:35:13,942][04005] Avg episode reward: [(0, '49.237')] [2024-07-05 16:35:16,635][04594] Updated weights for policy 0, policy_version 11144 (0.0011) [2024-07-05 16:35:18,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 45670400. Throughput: 0: 2940.8. Samples: 6411160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 16:35:18,942][04005] Avg episode reward: [(0, '51.199')] [2024-07-05 16:35:20,106][04594] Updated weights for policy 0, policy_version 11154 (0.0011) [2024-07-05 16:35:23,599][04594] Updated weights for policy 0, policy_version 11164 (0.0013) [2024-07-05 16:35:23,942][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 45727744. Throughput: 0: 2939.8. Samples: 6429024. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 16:35:23,942][04005] Avg episode reward: [(0, '49.111')] [2024-07-05 16:35:27,078][04594] Updated weights for policy 0, policy_version 11174 (0.0012) [2024-07-05 16:35:28,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 45789184. Throughput: 0: 2937.9. Samples: 6446532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 16:35:28,942][04005] Avg episode reward: [(0, '48.303')] [2024-07-05 16:35:30,575][04594] Updated weights for policy 0, policy_version 11184 (0.0012) [2024-07-05 16:35:33,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 45846528. Throughput: 0: 2941.6. Samples: 6455262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:35:33,942][04005] Avg episode reward: [(0, '47.076')] [2024-07-05 16:35:34,062][04594] Updated weights for policy 0, policy_version 11194 (0.0011) [2024-07-05 16:35:37,543][04594] Updated weights for policy 0, policy_version 11204 (0.0011) [2024-07-05 16:35:38,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 45907968. Throughput: 0: 2938.8. Samples: 6473072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 16:35:38,943][04005] Avg episode reward: [(0, '48.867')] [2024-07-05 16:35:41,035][04594] Updated weights for policy 0, policy_version 11214 (0.0011) [2024-07-05 16:35:43,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 45965312. Throughput: 0: 2938.0. Samples: 6490608. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 16:35:43,942][04005] Avg episode reward: [(0, '51.048')] [2024-07-05 16:35:44,508][04594] Updated weights for policy 0, policy_version 11224 (0.0011) [2024-07-05 16:35:47,994][04594] Updated weights for policy 0, policy_version 11234 (0.0011) [2024-07-05 16:35:48,942][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 46022656. Throughput: 0: 2941.4. Samples: 6499446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 16:35:48,943][04005] Avg episode reward: [(0, '52.012')] [2024-07-05 16:35:51,476][04594] Updated weights for policy 0, policy_version 11244 (0.0011) [2024-07-05 16:35:53,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 46084096. Throughput: 0: 2937.7. Samples: 6517146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 16:35:53,942][04005] Avg episode reward: [(0, '53.305')] [2024-07-05 16:35:53,945][04581] Saving new best policy, reward=53.305! [2024-07-05 16:35:54,972][04594] Updated weights for policy 0, policy_version 11254 (0.0012) [2024-07-05 16:35:58,457][04594] Updated weights for policy 0, policy_version 11264 (0.0011) [2024-07-05 16:35:58,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 46141440. Throughput: 0: 2938.6. Samples: 6534724. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:35:58,942][04005] Avg episode reward: [(0, '52.576')] [2024-07-05 16:36:01,937][04594] Updated weights for policy 0, policy_version 11274 (0.0011) [2024-07-05 16:36:03,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 46198784. Throughput: 0: 2943.1. Samples: 6543600. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:36:03,943][04005] Avg episode reward: [(0, '50.455')] [2024-07-05 16:36:05,413][04594] Updated weights for policy 0, policy_version 11284 (0.0011) [2024-07-05 16:36:08,908][04594] Updated weights for policy 0, policy_version 11294 (0.0011) [2024-07-05 16:36:08,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 46260224. Throughput: 0: 2938.1. Samples: 6561236. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:36:08,942][04005] Avg episode reward: [(0, '49.859')] [2024-07-05 16:36:12,397][04594] Updated weights for policy 0, policy_version 11304 (0.0013) [2024-07-05 16:36:13,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 46317568. Throughput: 0: 2937.8. Samples: 6578734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:36:13,942][04005] Avg episode reward: [(0, '49.633')] [2024-07-05 16:36:15,881][04594] Updated weights for policy 0, policy_version 11314 (0.0012) [2024-07-05 16:36:18,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 46374912. Throughput: 0: 2942.2. Samples: 6587662. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:36:18,942][04005] Avg episode reward: [(0, '51.385')] [2024-07-05 16:36:19,366][04594] Updated weights for policy 0, policy_version 11324 (0.0011) [2024-07-05 16:36:22,852][04594] Updated weights for policy 0, policy_version 11334 (0.0011) [2024-07-05 16:36:23,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11746.5). Total num frames: 46436352. Throughput: 0: 2937.5. Samples: 6605258. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:36:23,942][04005] Avg episode reward: [(0, '50.288')] [2024-07-05 16:36:26,332][04594] Updated weights for policy 0, policy_version 11344 (0.0012) [2024-07-05 16:36:28,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 46493696. Throughput: 0: 2938.5. Samples: 6622842. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:36:28,942][04005] Avg episode reward: [(0, '48.083')] [2024-07-05 16:36:29,810][04594] Updated weights for policy 0, policy_version 11354 (0.0011) [2024-07-05 16:36:33,298][04594] Updated weights for policy 0, policy_version 11364 (0.0011) [2024-07-05 16:36:33,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 46551040. Throughput: 0: 2942.0. Samples: 6631834. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:36:33,942][04005] Avg episode reward: [(0, '48.888')] [2024-07-05 16:36:33,987][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000011366_46555136.pth... [2024-07-05 16:36:34,058][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000010678_43737088.pth [2024-07-05 16:36:36,785][04594] Updated weights for policy 0, policy_version 11374 (0.0011) [2024-07-05 16:36:38,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 46612480. Throughput: 0: 2938.7. Samples: 6649386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:36:38,942][04005] Avg episode reward: [(0, '49.482')] [2024-07-05 16:36:40,294][04594] Updated weights for policy 0, policy_version 11384 (0.0012) [2024-07-05 16:36:43,785][04594] Updated weights for policy 0, policy_version 11394 (0.0011) [2024-07-05 16:36:43,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 46669824. Throughput: 0: 2936.4. Samples: 6666864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:36:43,942][04005] Avg episode reward: [(0, '51.452')] [2024-07-05 16:36:47,279][04594] Updated weights for policy 0, policy_version 11404 (0.0012) [2024-07-05 16:36:48,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 46727168. Throughput: 0: 2937.2. Samples: 6675774. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:36:48,942][04005] Avg episode reward: [(0, '48.515')] [2024-07-05 16:36:50,750][04594] Updated weights for policy 0, policy_version 11414 (0.0011) [2024-07-05 16:36:53,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 46788608. Throughput: 0: 2936.2. Samples: 6693364. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:36:53,942][04005] Avg episode reward: [(0, '46.742')] [2024-07-05 16:36:54,248][04594] Updated weights for policy 0, policy_version 11424 (0.0011) [2024-07-05 16:36:57,722][04594] Updated weights for policy 0, policy_version 11434 (0.0012) [2024-07-05 16:36:58,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 46845952. Throughput: 0: 2938.0. Samples: 6710942. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:36:58,942][04005] Avg episode reward: [(0, '46.073')] [2024-07-05 16:37:01,216][04594] Updated weights for policy 0, policy_version 11444 (0.0014) [2024-07-05 16:37:03,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 46903296. Throughput: 0: 2938.3. Samples: 6719884. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:37:03,942][04005] Avg episode reward: [(0, '47.813')] [2024-07-05 16:37:04,705][04594] Updated weights for policy 0, policy_version 11454 (0.0011) [2024-07-05 16:37:08,183][04594] Updated weights for policy 0, policy_version 11464 (0.0011) [2024-07-05 16:37:08,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 46964736. Throughput: 0: 2937.0. Samples: 6737424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:37:08,942][04005] Avg episode reward: [(0, '49.559')] [2024-07-05 16:37:11,674][04594] Updated weights for policy 0, policy_version 11474 (0.0011) [2024-07-05 16:37:13,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 47022080. Throughput: 0: 2935.8. Samples: 6754954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:37:13,943][04005] Avg episode reward: [(0, '50.344')] [2024-07-05 16:37:15,167][04594] Updated weights for policy 0, policy_version 11484 (0.0011) [2024-07-05 16:37:18,638][04594] Updated weights for policy 0, policy_version 11494 (0.0011) [2024-07-05 16:37:18,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 47079424. Throughput: 0: 2936.0. Samples: 6763952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:37:18,942][04005] Avg episode reward: [(0, '50.588')] [2024-07-05 16:37:22,121][04594] Updated weights for policy 0, policy_version 11504 (0.0012) [2024-07-05 16:37:23,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 47140864. Throughput: 0: 2935.7. Samples: 6781490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:37:23,942][04005] Avg episode reward: [(0, '51.437')] [2024-07-05 16:37:25,612][04594] Updated weights for policy 0, policy_version 11514 (0.0011) [2024-07-05 16:37:28,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 47198208. Throughput: 0: 2937.4. Samples: 6799048. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:37:28,943][04005] Avg episode reward: [(0, '50.962')] [2024-07-05 16:37:29,094][04594] Updated weights for policy 0, policy_version 11524 (0.0011) [2024-07-05 16:37:32,583][04594] Updated weights for policy 0, policy_version 11534 (0.0011) [2024-07-05 16:37:33,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 47255552. Throughput: 0: 2938.1. Samples: 6807990. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:37:33,942][04005] Avg episode reward: [(0, '50.523')] [2024-07-05 16:37:36,074][04594] Updated weights for policy 0, policy_version 11544 (0.0011) [2024-07-05 16:37:38,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 47316992. Throughput: 0: 2937.2. Samples: 6825536. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:37:38,942][04005] Avg episode reward: [(0, '49.497')] [2024-07-05 16:37:39,571][04594] Updated weights for policy 0, policy_version 11554 (0.0011) [2024-07-05 16:37:43,054][04594] Updated weights for policy 0, policy_version 11564 (0.0011) [2024-07-05 16:37:43,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 47374336. Throughput: 0: 2936.3. Samples: 6843074. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:37:43,942][04005] Avg episode reward: [(0, '49.931')] [2024-07-05 16:37:46,546][04594] Updated weights for policy 0, policy_version 11574 (0.0012) [2024-07-05 16:37:48,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 47431680. Throughput: 0: 2937.6. Samples: 6852076. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:37:48,943][04005] Avg episode reward: [(0, '49.983')] [2024-07-05 16:37:50,029][04594] Updated weights for policy 0, policy_version 11584 (0.0011) [2024-07-05 16:37:53,505][04594] Updated weights for policy 0, policy_version 11594 (0.0011) [2024-07-05 16:37:53,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 47493120. Throughput: 0: 2938.0. Samples: 6869636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:37:53,942][04005] Avg episode reward: [(0, '49.103')] [2024-07-05 16:37:56,982][04594] Updated weights for policy 0, policy_version 11604 (0.0011) [2024-07-05 16:37:58,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 47550464. Throughput: 0: 2937.7. Samples: 6887150. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:37:58,942][04005] Avg episode reward: [(0, '49.347')] [2024-07-05 16:38:00,459][04594] Updated weights for policy 0, policy_version 11614 (0.0011) [2024-07-05 16:38:03,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 47607808. Throughput: 0: 2937.7. Samples: 6896146. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:38:03,942][04005] Avg episode reward: [(0, '48.356')] [2024-07-05 16:38:03,962][04594] Updated weights for policy 0, policy_version 11624 (0.0011) [2024-07-05 16:38:07,438][04594] Updated weights for policy 0, policy_version 11634 (0.0011) [2024-07-05 16:38:08,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 47669248. Throughput: 0: 2938.0. Samples: 6913702. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:38:08,942][04005] Avg episode reward: [(0, '48.807')] [2024-07-05 16:38:10,929][04594] Updated weights for policy 0, policy_version 11644 (0.0011) [2024-07-05 16:38:13,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 47726592. Throughput: 0: 2937.8. Samples: 6931250. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:38:13,942][04005] Avg episode reward: [(0, '48.808')] [2024-07-05 16:38:14,402][04594] Updated weights for policy 0, policy_version 11654 (0.0011) [2024-07-05 16:38:17,899][04594] Updated weights for policy 0, policy_version 11664 (0.0011) [2024-07-05 16:38:18,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.2, 300 sec: 11760.4). Total num frames: 47788032. Throughput: 0: 2938.0. Samples: 6940198. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:38:18,942][04005] Avg episode reward: [(0, '48.579')] [2024-07-05 16:38:21,385][04594] Updated weights for policy 0, policy_version 11674 (0.0011) [2024-07-05 16:38:23,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 47845376. Throughput: 0: 2938.3. Samples: 6957760. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:38:23,942][04005] Avg episode reward: [(0, '47.296')] [2024-07-05 16:38:24,852][04594] Updated weights for policy 0, policy_version 11684 (0.0011) [2024-07-05 16:38:28,330][04594] Updated weights for policy 0, policy_version 11694 (0.0012) [2024-07-05 16:38:28,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 47902720. Throughput: 0: 2937.8. Samples: 6975276. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:38:28,942][04005] Avg episode reward: [(0, '48.753')] [2024-07-05 16:38:31,819][04594] Updated weights for policy 0, policy_version 11704 (0.0011) [2024-07-05 16:38:33,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 47964160. Throughput: 0: 2938.1. Samples: 6984290. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:38:33,942][04005] Avg episode reward: [(0, '50.810')] [2024-07-05 16:38:33,945][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000011710_47964160.pth... [2024-07-05 16:38:34,027][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000011022_45146112.pth [2024-07-05 16:38:35,323][04594] Updated weights for policy 0, policy_version 11714 (0.0011) [2024-07-05 16:38:38,802][04594] Updated weights for policy 0, policy_version 11724 (0.0011) [2024-07-05 16:38:38,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 48021504. Throughput: 0: 2937.0. Samples: 7001800. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:38:38,942][04005] Avg episode reward: [(0, '50.553')] [2024-07-05 16:38:42,315][04594] Updated weights for policy 0, policy_version 11734 (0.0011) [2024-07-05 16:38:43,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 48078848. Throughput: 0: 2937.2. Samples: 7019326. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:38:43,944][04005] Avg episode reward: [(0, '51.312')] [2024-07-05 16:38:45,792][04594] Updated weights for policy 0, policy_version 11744 (0.0011) [2024-07-05 16:38:48,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.2, 300 sec: 11760.4). Total num frames: 48140288. Throughput: 0: 2936.6. Samples: 7028294. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:38:48,942][04005] Avg episode reward: [(0, '48.755')] [2024-07-05 16:38:49,281][04594] Updated weights for policy 0, policy_version 11754 (0.0014) [2024-07-05 16:38:52,747][04594] Updated weights for policy 0, policy_version 11764 (0.0012) [2024-07-05 16:38:53,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 48197632. Throughput: 0: 2936.8. Samples: 7045860. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:38:53,942][04005] Avg episode reward: [(0, '50.276')] [2024-07-05 16:38:56,236][04594] Updated weights for policy 0, policy_version 11774 (0.0011) [2024-07-05 16:38:58,941][04005] Fps is (10 sec: 11468.6, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 48254976. Throughput: 0: 2937.2. Samples: 7063424. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:38:58,942][04005] Avg episode reward: [(0, '49.892')] [2024-07-05 16:38:59,715][04594] Updated weights for policy 0, policy_version 11784 (0.0011) [2024-07-05 16:39:03,200][04594] Updated weights for policy 0, policy_version 11794 (0.0012) [2024-07-05 16:39:03,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 48316416. Throughput: 0: 2937.9. Samples: 7072404. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:39:03,942][04005] Avg episode reward: [(0, '52.650')] [2024-07-05 16:39:06,681][04594] Updated weights for policy 0, policy_version 11804 (0.0012) [2024-07-05 16:39:08,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 48373760. Throughput: 0: 2937.7. Samples: 7089958. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:39:08,942][04005] Avg episode reward: [(0, '52.714')] [2024-07-05 16:39:10,166][04594] Updated weights for policy 0, policy_version 11814 (0.0011) [2024-07-05 16:39:13,646][04594] Updated weights for policy 0, policy_version 11824 (0.0013) [2024-07-05 16:39:13,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 48431104. Throughput: 0: 2938.7. Samples: 7107516. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:39:13,942][04005] Avg episode reward: [(0, '51.811')] [2024-07-05 16:39:17,137][04594] Updated weights for policy 0, policy_version 11834 (0.0012) [2024-07-05 16:39:18,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 48492544. Throughput: 0: 2937.4. Samples: 7116474. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:39:18,942][04005] Avg episode reward: [(0, '50.847')] [2024-07-05 16:39:20,625][04594] Updated weights for policy 0, policy_version 11844 (0.0012) [2024-07-05 16:39:23,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 48549888. Throughput: 0: 2937.2. Samples: 7133976. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:39:23,942][04005] Avg episode reward: [(0, '50.308')] [2024-07-05 16:39:24,101][04594] Updated weights for policy 0, policy_version 11854 (0.0011) [2024-07-05 16:39:27,595][04594] Updated weights for policy 0, policy_version 11864 (0.0011) [2024-07-05 16:39:28,942][04005] Fps is (10 sec: 11468.2, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 48607232. Throughput: 0: 2939.6. Samples: 7151608. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:39:28,944][04005] Avg episode reward: [(0, '51.111')] [2024-07-05 16:39:31,071][04594] Updated weights for policy 0, policy_version 11874 (0.0011) [2024-07-05 16:39:33,942][04005] Fps is (10 sec: 11878.2, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 48668672. Throughput: 0: 2938.7. Samples: 7160536. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:39:33,943][04005] Avg episode reward: [(0, '51.793')] [2024-07-05 16:39:34,560][04594] Updated weights for policy 0, policy_version 11884 (0.0011) [2024-07-05 16:39:38,037][04594] Updated weights for policy 0, policy_version 11894 (0.0011) [2024-07-05 16:39:38,942][04005] Fps is (10 sec: 11878.8, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 48726016. Throughput: 0: 2938.7. Samples: 7178102. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:39:38,943][04005] Avg episode reward: [(0, '50.672')] [2024-07-05 16:39:41,523][04594] Updated weights for policy 0, policy_version 11904 (0.0011) [2024-07-05 16:39:43,941][04005] Fps is (10 sec: 11469.0, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 48783360. Throughput: 0: 2941.6. Samples: 7195798. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:39:43,942][04005] Avg episode reward: [(0, '48.781')] [2024-07-05 16:39:45,016][04594] Updated weights for policy 0, policy_version 11914 (0.0011) [2024-07-05 16:39:48,493][04594] Updated weights for policy 0, policy_version 11924 (0.0011) [2024-07-05 16:39:48,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 48844800. Throughput: 0: 2937.9. Samples: 7204608. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:39:48,943][04005] Avg episode reward: [(0, '49.741')] [2024-07-05 16:39:51,966][04594] Updated weights for policy 0, policy_version 11934 (0.0011) [2024-07-05 16:39:53,942][04005] Fps is (10 sec: 11878.2, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 48902144. Throughput: 0: 2938.5. Samples: 7222190. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:39:53,943][04005] Avg episode reward: [(0, '51.164')] [2024-07-05 16:39:55,446][04594] Updated weights for policy 0, policy_version 11944 (0.0011) [2024-07-05 16:39:58,940][04594] Updated weights for policy 0, policy_version 11954 (0.0011) [2024-07-05 16:39:58,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 48963584. Throughput: 0: 2944.5. Samples: 7240018. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:39:58,942][04005] Avg episode reward: [(0, '51.874')] [2024-07-05 16:40:02,427][04594] Updated weights for policy 0, policy_version 11964 (0.0011) [2024-07-05 16:40:03,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 49020928. Throughput: 0: 2938.0. Samples: 7248684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:40:03,942][04005] Avg episode reward: [(0, '52.097')] [2024-07-05 16:40:05,914][04594] Updated weights for policy 0, policy_version 11974 (0.0012) [2024-07-05 16:40:08,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 49078272. Throughput: 0: 2938.2. Samples: 7266196. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:40:08,943][04005] Avg episode reward: [(0, '50.524')] [2024-07-05 16:40:09,451][04594] Updated weights for policy 0, policy_version 11984 (0.0011) [2024-07-05 16:40:12,888][04594] Updated weights for policy 0, policy_version 11994 (0.0013) [2024-07-05 16:40:13,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 49139712. Throughput: 0: 2943.6. Samples: 7284070. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:40:13,942][04005] Avg episode reward: [(0, '49.308')] [2024-07-05 16:40:16,365][04594] Updated weights for policy 0, policy_version 12004 (0.0011) [2024-07-05 16:40:18,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 49197056. Throughput: 0: 2937.3. Samples: 7292712. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:40:18,942][04005] Avg episode reward: [(0, '48.621')] [2024-07-05 16:40:19,846][04594] Updated weights for policy 0, policy_version 12014 (0.0011) [2024-07-05 16:40:23,331][04594] Updated weights for policy 0, policy_version 12024 (0.0011) [2024-07-05 16:40:23,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 49254400. Throughput: 0: 2939.2. Samples: 7310368. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:40:23,942][04005] Avg episode reward: [(0, '48.919')] [2024-07-05 16:40:26,803][04594] Updated weights for policy 0, policy_version 12034 (0.0011) [2024-07-05 16:40:28,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.2, 300 sec: 11760.4). Total num frames: 49315840. Throughput: 0: 2943.2. Samples: 7328240. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:40:28,942][04005] Avg episode reward: [(0, '51.087')] [2024-07-05 16:40:30,297][04594] Updated weights for policy 0, policy_version 12044 (0.0011) [2024-07-05 16:40:33,780][04594] Updated weights for policy 0, policy_version 12054 (0.0011) [2024-07-05 16:40:33,942][04005] Fps is (10 sec: 11878.2, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 49373184. Throughput: 0: 2937.1. Samples: 7336778. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:40:33,942][04005] Avg episode reward: [(0, '52.462')] [2024-07-05 16:40:34,123][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000012055_49377280.pth... [2024-07-05 16:40:34,197][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000011366_46555136.pth [2024-07-05 16:40:37,284][04594] Updated weights for policy 0, policy_version 12064 (0.0011) [2024-07-05 16:40:38,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 49430528. Throughput: 0: 2936.9. Samples: 7354352. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:40:38,942][04005] Avg episode reward: [(0, '53.781')] [2024-07-05 16:40:39,019][04581] Saving new best policy, reward=53.781! [2024-07-05 16:40:40,792][04594] Updated weights for policy 0, policy_version 12074 (0.0012) [2024-07-05 16:40:43,941][04005] Fps is (10 sec: 11878.7, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 49491968. Throughput: 0: 2936.0. Samples: 7372138. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:40:43,942][04005] Avg episode reward: [(0, '52.510')] [2024-07-05 16:40:44,276][04594] Updated weights for policy 0, policy_version 12084 (0.0011) [2024-07-05 16:40:47,758][04594] Updated weights for policy 0, policy_version 12094 (0.0011) [2024-07-05 16:40:48,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 49549312. Throughput: 0: 2934.1. Samples: 7380718. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:40:48,942][04005] Avg episode reward: [(0, '51.087')] [2024-07-05 16:40:51,234][04594] Updated weights for policy 0, policy_version 12104 (0.0011) [2024-07-05 16:40:53,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 49606656. Throughput: 0: 2938.1. Samples: 7398412. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:40:53,943][04005] Avg episode reward: [(0, '49.817')] [2024-07-05 16:40:54,711][04594] Updated weights for policy 0, policy_version 12114 (0.0011) [2024-07-05 16:40:58,202][04594] Updated weights for policy 0, policy_version 12124 (0.0013) [2024-07-05 16:40:58,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 49668096. Throughput: 0: 2936.5. Samples: 7416212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:40:58,942][04005] Avg episode reward: [(0, '49.742')] [2024-07-05 16:41:01,684][04594] Updated weights for policy 0, policy_version 12134 (0.0012) [2024-07-05 16:41:03,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 49725440. Throughput: 0: 2934.8. Samples: 7424780. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:41:03,942][04005] Avg episode reward: [(0, '49.113')] [2024-07-05 16:41:05,167][04594] Updated weights for policy 0, policy_version 12144 (0.0011) [2024-07-05 16:41:08,646][04594] Updated weights for policy 0, policy_version 12154 (0.0011) [2024-07-05 16:41:08,942][04005] Fps is (10 sec: 11468.1, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 49782784. Throughput: 0: 2937.1. Samples: 7442540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:41:08,943][04005] Avg episode reward: [(0, '48.409')] [2024-07-05 16:41:12,139][04594] Updated weights for policy 0, policy_version 12164 (0.0014) [2024-07-05 16:41:13,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 49844224. Throughput: 0: 2934.0. Samples: 7460268. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:41:13,942][04005] Avg episode reward: [(0, '50.590')] [2024-07-05 16:41:15,633][04594] Updated weights for policy 0, policy_version 12174 (0.0012) [2024-07-05 16:41:18,941][04005] Fps is (10 sec: 11879.1, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 49901568. Throughput: 0: 2934.4. Samples: 7468826. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:41:18,942][04005] Avg episode reward: [(0, '51.798')] [2024-07-05 16:41:19,121][04594] Updated weights for policy 0, policy_version 12184 (0.0011) [2024-07-05 16:41:22,607][04594] Updated weights for policy 0, policy_version 12194 (0.0013) [2024-07-05 16:41:23,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 49958912. Throughput: 0: 2938.3. Samples: 7486576. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:41:23,942][04005] Avg episode reward: [(0, '52.953')] [2024-07-05 16:41:26,090][04594] Updated weights for policy 0, policy_version 12204 (0.0011) [2024-07-05 16:41:28,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 50020352. Throughput: 0: 2938.1. Samples: 7504354. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:41:28,942][04005] Avg episode reward: [(0, '51.217')] [2024-07-05 16:41:29,580][04594] Updated weights for policy 0, policy_version 12214 (0.0011) [2024-07-05 16:41:33,060][04594] Updated weights for policy 0, policy_version 12224 (0.0011) [2024-07-05 16:41:33,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 50077696. Throughput: 0: 2937.9. Samples: 7512922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:41:33,943][04005] Avg episode reward: [(0, '49.002')] [2024-07-05 16:41:36,556][04594] Updated weights for policy 0, policy_version 12234 (0.0011) [2024-07-05 16:41:38,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 50135040. Throughput: 0: 2940.3. Samples: 7530724. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:41:38,942][04005] Avg episode reward: [(0, '48.517')] [2024-07-05 16:41:40,052][04594] Updated weights for policy 0, policy_version 12244 (0.0011) [2024-07-05 16:41:43,535][04594] Updated weights for policy 0, policy_version 12254 (0.0013) [2024-07-05 16:41:43,941][04005] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 50196480. Throughput: 0: 2936.9. Samples: 7548374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:41:43,942][04005] Avg episode reward: [(0, '49.504')] [2024-07-05 16:41:47,024][04594] Updated weights for policy 0, policy_version 12264 (0.0012) [2024-07-05 16:41:48,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 50253824. Throughput: 0: 2936.8. Samples: 7556934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:41:48,942][04005] Avg episode reward: [(0, '49.769')] [2024-07-05 16:41:50,506][04594] Updated weights for policy 0, policy_version 12274 (0.0011) [2024-07-05 16:41:53,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 50311168. Throughput: 0: 2939.9. Samples: 7574832. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 16:41:53,942][04005] Avg episode reward: [(0, '51.417')] [2024-07-05 16:41:53,982][04594] Updated weights for policy 0, policy_version 12284 (0.0012) [2024-07-05 16:41:57,483][04594] Updated weights for policy 0, policy_version 12294 (0.0011) [2024-07-05 16:41:58,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 50372608. Throughput: 0: 2937.9. Samples: 7592472. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:41:58,942][04005] Avg episode reward: [(0, '51.745')] [2024-07-05 16:42:00,958][04594] Updated weights for policy 0, policy_version 12304 (0.0011) [2024-07-05 16:42:03,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 50429952. Throughput: 0: 2938.3. Samples: 7601048. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:42:03,943][04005] Avg episode reward: [(0, '52.256')] [2024-07-05 16:42:04,431][04594] Updated weights for policy 0, policy_version 12314 (0.0011) [2024-07-05 16:42:07,935][04594] Updated weights for policy 0, policy_version 12324 (0.0012) [2024-07-05 16:42:08,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11742.0, 300 sec: 11746.5). Total num frames: 50487296. Throughput: 0: 2939.9. Samples: 7618870. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:42:08,942][04005] Avg episode reward: [(0, '51.085')] [2024-07-05 16:42:11,428][04594] Updated weights for policy 0, policy_version 12334 (0.0011) [2024-07-05 16:42:13,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 50548736. Throughput: 0: 2936.4. Samples: 7636494. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:42:13,942][04005] Avg episode reward: [(0, '50.318')] [2024-07-05 16:42:14,925][04594] Updated weights for policy 0, policy_version 12344 (0.0012) [2024-07-05 16:42:18,413][04594] Updated weights for policy 0, policy_version 12354 (0.0012) [2024-07-05 16:42:18,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 50606080. Throughput: 0: 2935.3. Samples: 7645010. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:42:18,942][04005] Avg episode reward: [(0, '51.215')] [2024-07-05 16:42:21,907][04594] Updated weights for policy 0, policy_version 12364 (0.0013) [2024-07-05 16:42:23,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 50663424. Throughput: 0: 2935.8. Samples: 7662834. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:42:23,942][04005] Avg episode reward: [(0, '51.016')] [2024-07-05 16:42:25,385][04594] Updated weights for policy 0, policy_version 12374 (0.0013) [2024-07-05 16:42:28,871][04594] Updated weights for policy 0, policy_version 12384 (0.0011) [2024-07-05 16:42:28,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 50724864. Throughput: 0: 2936.8. Samples: 7680530. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:42:28,942][04005] Avg episode reward: [(0, '50.200')] [2024-07-05 16:42:32,358][04594] Updated weights for policy 0, policy_version 12394 (0.0011) [2024-07-05 16:42:33,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 50782208. Throughput: 0: 2936.3. Samples: 7689068. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:42:33,942][04005] Avg episode reward: [(0, '49.145')] [2024-07-05 16:42:34,094][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000012399_50786304.pth... [2024-07-05 16:42:34,166][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000011710_47964160.pth [2024-07-05 16:42:35,861][04594] Updated weights for policy 0, policy_version 12404 (0.0013) [2024-07-05 16:42:38,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 50839552. Throughput: 0: 2935.4. Samples: 7706924. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:42:38,942][04005] Avg episode reward: [(0, '49.475')] [2024-07-05 16:42:39,334][04594] Updated weights for policy 0, policy_version 12414 (0.0012) [2024-07-05 16:42:42,824][04594] Updated weights for policy 0, policy_version 12424 (0.0011) [2024-07-05 16:42:43,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 50900992. Throughput: 0: 2936.8. Samples: 7724626. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:42:43,942][04005] Avg episode reward: [(0, '50.700')] [2024-07-05 16:42:46,300][04594] Updated weights for policy 0, policy_version 12434 (0.0011) [2024-07-05 16:42:48,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 50958336. Throughput: 0: 2936.9. Samples: 7733208. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:42:48,943][04005] Avg episode reward: [(0, '49.446')] [2024-07-05 16:42:49,777][04594] Updated weights for policy 0, policy_version 12444 (0.0011) [2024-07-05 16:42:53,251][04594] Updated weights for policy 0, policy_version 12454 (0.0011) [2024-07-05 16:42:53,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 51015680. Throughput: 0: 2940.8. Samples: 7751204. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:42:53,942][04005] Avg episode reward: [(0, '49.973')] [2024-07-05 16:42:56,732][04594] Updated weights for policy 0, policy_version 12464 (0.0011) [2024-07-05 16:42:58,942][04005] Fps is (10 sec: 11878.2, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 51077120. Throughput: 0: 2939.1. Samples: 7768754. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:42:58,943][04005] Avg episode reward: [(0, '49.986')] [2024-07-05 16:43:00,210][04594] Updated weights for policy 0, policy_version 12474 (0.0011) [2024-07-05 16:43:03,688][04594] Updated weights for policy 0, policy_version 12484 (0.0011) [2024-07-05 16:43:03,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 51134464. Throughput: 0: 2945.9. Samples: 7777576. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:43:03,942][04005] Avg episode reward: [(0, '50.736')] [2024-07-05 16:43:07,154][04594] Updated weights for policy 0, policy_version 12494 (0.0011) [2024-07-05 16:43:08,941][04005] Fps is (10 sec: 11878.7, 60 sec: 11810.2, 300 sec: 11760.4). Total num frames: 51195904. Throughput: 0: 2944.4. Samples: 7795332. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:43:08,942][04005] Avg episode reward: [(0, '48.510')] [2024-07-05 16:43:10,639][04594] Updated weights for policy 0, policy_version 12504 (0.0012) [2024-07-05 16:43:13,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 51253248. Throughput: 0: 2942.0. Samples: 7812920. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:43:13,942][04005] Avg episode reward: [(0, '48.920')] [2024-07-05 16:43:14,110][04594] Updated weights for policy 0, policy_version 12514 (0.0011) [2024-07-05 16:43:17,607][04594] Updated weights for policy 0, policy_version 12524 (0.0011) [2024-07-05 16:43:18,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 51310592. Throughput: 0: 2951.5. Samples: 7821884. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:43:18,943][04005] Avg episode reward: [(0, '49.736')] [2024-07-05 16:43:21,084][04594] Updated weights for policy 0, policy_version 12534 (0.0011) [2024-07-05 16:43:23,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 51372032. Throughput: 0: 2944.1. Samples: 7839408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:43:23,942][04005] Avg episode reward: [(0, '49.606')] [2024-07-05 16:43:24,570][04594] Updated weights for policy 0, policy_version 12544 (0.0012) [2024-07-05 16:43:28,044][04594] Updated weights for policy 0, policy_version 12554 (0.0011) [2024-07-05 16:43:28,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 51429376. Throughput: 0: 2941.7. Samples: 7857004. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:43:28,942][04005] Avg episode reward: [(0, '50.619')] [2024-07-05 16:43:31,519][04594] Updated weights for policy 0, policy_version 12564 (0.0011) [2024-07-05 16:43:33,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 51486720. Throughput: 0: 2950.2. Samples: 7865968. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:43:33,942][04005] Avg episode reward: [(0, '50.127')] [2024-07-05 16:43:35,004][04594] Updated weights for policy 0, policy_version 12574 (0.0011) [2024-07-05 16:43:38,486][04594] Updated weights for policy 0, policy_version 12584 (0.0012) [2024-07-05 16:43:38,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 51548160. Throughput: 0: 2940.4. Samples: 7883522. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:43:38,942][04005] Avg episode reward: [(0, '51.213')] [2024-07-05 16:43:41,970][04594] Updated weights for policy 0, policy_version 12594 (0.0011) [2024-07-05 16:43:43,942][04005] Fps is (10 sec: 11878.2, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 51605504. Throughput: 0: 2940.5. Samples: 7901078. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:43:43,943][04005] Avg episode reward: [(0, '51.256')] [2024-07-05 16:43:45,447][04594] Updated weights for policy 0, policy_version 12604 (0.0011) [2024-07-05 16:43:48,940][04594] Updated weights for policy 0, policy_version 12614 (0.0011) [2024-07-05 16:43:48,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.2, 300 sec: 11760.4). Total num frames: 51666944. Throughput: 0: 2943.9. Samples: 7910052. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:43:48,942][04005] Avg episode reward: [(0, '52.539')] [2024-07-05 16:43:52,424][04594] Updated weights for policy 0, policy_version 12624 (0.0011) [2024-07-05 16:43:53,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 51724288. Throughput: 0: 2937.9. Samples: 7927540. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:43:53,943][04005] Avg episode reward: [(0, '51.567')] [2024-07-05 16:43:55,909][04594] Updated weights for policy 0, policy_version 12634 (0.0011) [2024-07-05 16:43:58,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 51781632. Throughput: 0: 2937.2. Samples: 7945092. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:43:58,942][04005] Avg episode reward: [(0, '50.460')] [2024-07-05 16:43:59,378][04594] Updated weights for policy 0, policy_version 12644 (0.0011) [2024-07-05 16:44:02,859][04594] Updated weights for policy 0, policy_version 12654 (0.0011) [2024-07-05 16:44:03,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 51843072. Throughput: 0: 2938.2. Samples: 7954102. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:44:03,943][04005] Avg episode reward: [(0, '50.530')] [2024-07-05 16:44:06,354][04594] Updated weights for policy 0, policy_version 12664 (0.0012) [2024-07-05 16:44:08,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 51900416. Throughput: 0: 2938.0. Samples: 7971620. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:44:08,942][04005] Avg episode reward: [(0, '49.469')] [2024-07-05 16:44:09,838][04594] Updated weights for policy 0, policy_version 12674 (0.0013) [2024-07-05 16:44:13,327][04594] Updated weights for policy 0, policy_version 12684 (0.0012) [2024-07-05 16:44:13,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 51957760. Throughput: 0: 2936.1. Samples: 7989128. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:44:13,943][04005] Avg episode reward: [(0, '51.035')] [2024-07-05 16:44:16,791][04594] Updated weights for policy 0, policy_version 12694 (0.0011) [2024-07-05 16:44:18,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 52019200. Throughput: 0: 2937.0. Samples: 7998132. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:44:18,942][04005] Avg episode reward: [(0, '49.330')] [2024-07-05 16:44:20,296][04594] Updated weights for policy 0, policy_version 12704 (0.0011) [2024-07-05 16:44:23,774][04594] Updated weights for policy 0, policy_version 12714 (0.0013) [2024-07-05 16:44:23,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 52076544. Throughput: 0: 2936.3. Samples: 8015654. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:44:23,942][04005] Avg episode reward: [(0, '50.318')] [2024-07-05 16:44:27,254][04594] Updated weights for policy 0, policy_version 12724 (0.0011) [2024-07-05 16:44:28,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 52133888. Throughput: 0: 2936.8. Samples: 8033234. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:44:28,942][04005] Avg episode reward: [(0, '50.329')] [2024-07-05 16:44:30,732][04594] Updated weights for policy 0, policy_version 12734 (0.0011) [2024-07-05 16:44:33,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 52195328. Throughput: 0: 2936.4. Samples: 8042192. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:44:33,942][04005] Avg episode reward: [(0, '50.615')] [2024-07-05 16:44:33,945][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000012743_52195328.pth... [2024-07-05 16:44:34,019][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000012055_49377280.pth [2024-07-05 16:44:34,295][04594] Updated weights for policy 0, policy_version 12744 (0.0013) [2024-07-05 16:44:37,705][04594] Updated weights for policy 0, policy_version 12754 (0.0011) [2024-07-05 16:44:38,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 52252672. Throughput: 0: 2937.2. Samples: 8059712. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:44:38,942][04005] Avg episode reward: [(0, '51.924')] [2024-07-05 16:44:41,197][04594] Updated weights for policy 0, policy_version 12764 (0.0012) [2024-07-05 16:44:43,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 52310016. Throughput: 0: 2939.5. Samples: 8077372. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:44:43,942][04005] Avg episode reward: [(0, '52.167')] [2024-07-05 16:44:44,670][04594] Updated weights for policy 0, policy_version 12774 (0.0011) [2024-07-05 16:44:48,146][04594] Updated weights for policy 0, policy_version 12784 (0.0011) [2024-07-05 16:44:48,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 52371456. Throughput: 0: 2936.3. Samples: 8086234. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:44:48,943][04005] Avg episode reward: [(0, '51.846')] [2024-07-05 16:44:51,628][04594] Updated weights for policy 0, policy_version 12794 (0.0011) [2024-07-05 16:44:53,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 52428800. Throughput: 0: 2937.1. Samples: 8103788. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:44:53,943][04005] Avg episode reward: [(0, '52.008')] [2024-07-05 16:44:55,105][04594] Updated weights for policy 0, policy_version 12804 (0.0011) [2024-07-05 16:44:58,588][04594] Updated weights for policy 0, policy_version 12814 (0.0011) [2024-07-05 16:44:58,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 52490240. Throughput: 0: 2945.7. Samples: 8121686. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:44:58,942][04005] Avg episode reward: [(0, '52.319')] [2024-07-05 16:45:02,076][04594] Updated weights for policy 0, policy_version 12824 (0.0012) [2024-07-05 16:45:03,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11760.4). Total num frames: 52547584. Throughput: 0: 2937.9. Samples: 8130338. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:45:03,943][04005] Avg episode reward: [(0, '51.024')] [2024-07-05 16:45:05,553][04594] Updated weights for policy 0, policy_version 12834 (0.0011) [2024-07-05 16:45:08,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 52604928. Throughput: 0: 2938.3. Samples: 8147878. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:45:08,942][04005] Avg episode reward: [(0, '49.974')] [2024-07-05 16:45:09,039][04594] Updated weights for policy 0, policy_version 12844 (0.0011) [2024-07-05 16:45:12,806][04594] Updated weights for policy 0, policy_version 12854 (0.0013) [2024-07-05 16:45:13,942][04005] Fps is (10 sec: 11059.3, 60 sec: 11673.6, 300 sec: 11732.6). Total num frames: 52658176. Throughput: 0: 2913.6. Samples: 8164348. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:45:13,943][04005] Avg episode reward: [(0, '49.731')] [2024-07-05 16:45:17,329][04594] Updated weights for policy 0, policy_version 12864 (0.0015) [2024-07-05 16:45:18,942][04005] Fps is (10 sec: 9830.3, 60 sec: 11400.5, 300 sec: 11691.0). Total num frames: 52703232. Throughput: 0: 2867.7. Samples: 8171240. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:45:18,943][04005] Avg episode reward: [(0, '50.278')] [2024-07-05 16:45:21,860][04594] Updated weights for policy 0, policy_version 12874 (0.0018) [2024-07-05 16:45:23,941][04005] Fps is (10 sec: 9011.2, 60 sec: 11195.7, 300 sec: 11635.4). Total num frames: 52748288. Throughput: 0: 2778.4. Samples: 8184738. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:45:23,942][04005] Avg episode reward: [(0, '49.749')] [2024-07-05 16:45:26,354][04594] Updated weights for policy 0, policy_version 12884 (0.0016) [2024-07-05 16:45:28,941][04005] Fps is (10 sec: 9011.3, 60 sec: 10991.0, 300 sec: 11593.8). Total num frames: 52793344. Throughput: 0: 2692.9. Samples: 8198550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:45:28,943][04005] Avg episode reward: [(0, '50.911')] [2024-07-05 16:45:30,842][04594] Updated weights for policy 0, policy_version 12894 (0.0015) [2024-07-05 16:45:33,942][04005] Fps is (10 sec: 9011.2, 60 sec: 10717.8, 300 sec: 11552.1). Total num frames: 52838400. Throughput: 0: 2645.0. Samples: 8205258. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:45:33,943][04005] Avg episode reward: [(0, '51.265')] [2024-07-05 16:45:35,323][04594] Updated weights for policy 0, policy_version 12904 (0.0015) [2024-07-05 16:45:38,941][04005] Fps is (10 sec: 9420.7, 60 sec: 10581.3, 300 sec: 11510.5). Total num frames: 52887552. Throughput: 0: 2563.1. Samples: 8219128. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:45:38,942][04005] Avg episode reward: [(0, '49.719')] [2024-07-05 16:45:39,851][04594] Updated weights for policy 0, policy_version 12914 (0.0015) [2024-07-05 16:45:43,942][04005] Fps is (10 sec: 9420.5, 60 sec: 10376.5, 300 sec: 11468.8). Total num frames: 52932608. Throughput: 0: 2467.2. Samples: 8232712. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:45:43,943][04005] Avg episode reward: [(0, '51.237')] [2024-07-05 16:45:44,313][04594] Updated weights for policy 0, policy_version 12924 (0.0016) [2024-07-05 16:45:48,806][04594] Updated weights for policy 0, policy_version 12934 (0.0016) [2024-07-05 16:45:48,942][04005] Fps is (10 sec: 9011.1, 60 sec: 10103.5, 300 sec: 11427.1). Total num frames: 52977664. Throughput: 0: 2428.8. Samples: 8239632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:45:48,942][04005] Avg episode reward: [(0, '51.006')] [2024-07-05 16:45:53,286][04594] Updated weights for policy 0, policy_version 12944 (0.0016) [2024-07-05 16:45:53,942][04005] Fps is (10 sec: 9011.5, 60 sec: 9898.7, 300 sec: 11371.6). Total num frames: 53022720. Throughput: 0: 2341.5. Samples: 8253244. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:45:53,942][04005] Avg episode reward: [(0, '51.285')] [2024-07-05 16:45:57,777][04594] Updated weights for policy 0, policy_version 12954 (0.0015) [2024-07-05 16:45:58,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9625.6, 300 sec: 11329.9). Total num frames: 53067776. Throughput: 0: 2277.1. Samples: 8266818. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:45:58,943][04005] Avg episode reward: [(0, '51.529')] [2024-07-05 16:46:02,275][04594] Updated weights for policy 0, policy_version 12964 (0.0017) [2024-07-05 16:46:03,942][04005] Fps is (10 sec: 9011.0, 60 sec: 9420.8, 300 sec: 11288.3). Total num frames: 53112832. Throughput: 0: 2278.6. Samples: 8273778. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:46:03,944][04005] Avg episode reward: [(0, '52.277')] [2024-07-05 16:46:06,742][04594] Updated weights for policy 0, policy_version 12974 (0.0016) [2024-07-05 16:46:08,942][04005] Fps is (10 sec: 9011.3, 60 sec: 9216.0, 300 sec: 11232.8). Total num frames: 53157888. Throughput: 0: 2281.6. Samples: 8287408. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:46:08,942][04005] Avg episode reward: [(0, '53.101')] [2024-07-05 16:46:11,257][04594] Updated weights for policy 0, policy_version 12984 (0.0016) [2024-07-05 16:46:13,941][04005] Fps is (10 sec: 9011.5, 60 sec: 9079.5, 300 sec: 11191.1). Total num frames: 53202944. Throughput: 0: 2280.9. Samples: 8301192. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:46:13,942][04005] Avg episode reward: [(0, '52.675')] [2024-07-05 16:46:15,753][04594] Updated weights for policy 0, policy_version 12994 (0.0018) [2024-07-05 16:46:18,941][04005] Fps is (10 sec: 9420.9, 60 sec: 9147.8, 300 sec: 11163.3). Total num frames: 53252096. Throughput: 0: 2279.1. Samples: 8307818. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:46:18,942][04005] Avg episode reward: [(0, '52.811')] [2024-07-05 16:46:20,229][04594] Updated weights for policy 0, policy_version 13004 (0.0016) [2024-07-05 16:46:23,942][04005] Fps is (10 sec: 9420.7, 60 sec: 9147.7, 300 sec: 11107.8). Total num frames: 53297152. Throughput: 0: 2279.6. Samples: 8321708. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:46:23,942][04005] Avg episode reward: [(0, '51.814')] [2024-07-05 16:46:24,729][04594] Updated weights for policy 0, policy_version 13014 (0.0018) [2024-07-05 16:46:28,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9147.7, 300 sec: 11066.1). Total num frames: 53342208. Throughput: 0: 2279.0. Samples: 8335268. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:46:28,943][04005] Avg episode reward: [(0, '50.651')] [2024-07-05 16:46:29,230][04594] Updated weights for policy 0, policy_version 13024 (0.0019) [2024-07-05 16:46:33,712][04594] Updated weights for policy 0, policy_version 13034 (0.0017) [2024-07-05 16:46:33,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.7, 300 sec: 11024.5). Total num frames: 53387264. Throughput: 0: 2279.0. Samples: 8342186. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:46:33,943][04005] Avg episode reward: [(0, '50.588')] [2024-07-05 16:46:34,163][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000013035_53391360.pth... [2024-07-05 16:46:34,251][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000012399_50786304.pth [2024-07-05 16:46:38,239][04594] Updated weights for policy 0, policy_version 13044 (0.0016) [2024-07-05 16:46:38,942][04005] Fps is (10 sec: 9010.9, 60 sec: 9079.4, 300 sec: 10968.9). Total num frames: 53432320. Throughput: 0: 2276.9. Samples: 8355706. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:46:38,943][04005] Avg episode reward: [(0, '50.735')] [2024-07-05 16:46:42,740][04594] Updated weights for policy 0, policy_version 13054 (0.0016) [2024-07-05 16:46:43,941][04005] Fps is (10 sec: 9011.4, 60 sec: 9079.5, 300 sec: 10927.3). Total num frames: 53477376. Throughput: 0: 2279.2. Samples: 8369380. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:46:43,942][04005] Avg episode reward: [(0, '52.144')] [2024-07-05 16:46:47,209][04594] Updated weights for policy 0, policy_version 13064 (0.0015) [2024-07-05 16:46:48,941][04005] Fps is (10 sec: 9011.6, 60 sec: 9079.5, 300 sec: 10885.6). Total num frames: 53522432. Throughput: 0: 2277.1. Samples: 8376248. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:46:48,942][04005] Avg episode reward: [(0, '51.828')] [2024-07-05 16:46:51,690][04594] Updated weights for policy 0, policy_version 13074 (0.0017) [2024-07-05 16:46:53,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.5, 300 sec: 10830.1). Total num frames: 53567488. Throughput: 0: 2282.7. Samples: 8390128. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:46:53,942][04005] Avg episode reward: [(0, '50.998')] [2024-07-05 16:46:56,200][04594] Updated weights for policy 0, policy_version 13084 (0.0018) [2024-07-05 16:46:58,941][04005] Fps is (10 sec: 9420.8, 60 sec: 9147.8, 300 sec: 10802.3). Total num frames: 53616640. Throughput: 0: 2278.6. Samples: 8403728. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:46:58,942][04005] Avg episode reward: [(0, '50.819')] [2024-07-05 16:47:00,699][04594] Updated weights for policy 0, policy_version 13094 (0.0017) [2024-07-05 16:47:03,942][04005] Fps is (10 sec: 9420.8, 60 sec: 9147.8, 300 sec: 10760.7). Total num frames: 53661696. Throughput: 0: 2283.5. Samples: 8410578. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:47:03,943][04005] Avg episode reward: [(0, '51.016')] [2024-07-05 16:47:05,198][04594] Updated weights for policy 0, policy_version 13104 (0.0017) [2024-07-05 16:47:08,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.7, 300 sec: 10705.1). Total num frames: 53706752. Throughput: 0: 2277.5. Samples: 8424196. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:47:08,942][04005] Avg episode reward: [(0, '51.758')] [2024-07-05 16:47:09,721][04594] Updated weights for policy 0, policy_version 13114 (0.0019) [2024-07-05 16:47:13,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.7, 300 sec: 10663.5). Total num frames: 53751808. Throughput: 0: 2276.8. Samples: 8437724. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:47:13,942][04005] Avg episode reward: [(0, '52.306')] [2024-07-05 16:47:14,220][04594] Updated weights for policy 0, policy_version 13124 (0.0015) [2024-07-05 16:47:18,690][04594] Updated weights for policy 0, policy_version 13134 (0.0016) [2024-07-05 16:47:18,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 10621.8). Total num frames: 53796864. Throughput: 0: 2276.4. Samples: 8444622. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:47:18,942][04005] Avg episode reward: [(0, '51.881')] [2024-07-05 16:47:23,151][04594] Updated weights for policy 0, policy_version 13144 (0.0017) [2024-07-05 16:47:23,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 10566.3). Total num frames: 53841920. Throughput: 0: 2278.8. Samples: 8458252. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:47:23,942][04005] Avg episode reward: [(0, '50.611')] [2024-07-05 16:47:27,617][04594] Updated weights for policy 0, policy_version 13154 (0.0015) [2024-07-05 16:47:28,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 10524.6). Total num frames: 53886976. Throughput: 0: 2284.8. Samples: 8472196. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:47:28,942][04005] Avg episode reward: [(0, '50.096')] [2024-07-05 16:47:32,075][04594] Updated weights for policy 0, policy_version 13164 (0.0015) [2024-07-05 16:47:33,942][04005] Fps is (10 sec: 9420.8, 60 sec: 9147.7, 300 sec: 10496.9). Total num frames: 53936128. Throughput: 0: 2282.1. Samples: 8478942. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:47:33,942][04005] Avg episode reward: [(0, '49.168')] [2024-07-05 16:47:36,545][04594] Updated weights for policy 0, policy_version 13174 (0.0017) [2024-07-05 16:47:38,941][04005] Fps is (10 sec: 9420.9, 60 sec: 9147.8, 300 sec: 10441.3). Total num frames: 53981184. Throughput: 0: 2281.1. Samples: 8492776. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:47:38,942][04005] Avg episode reward: [(0, '51.047')] [2024-07-05 16:47:41,027][04594] Updated weights for policy 0, policy_version 13184 (0.0017) [2024-07-05 16:47:43,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.7, 300 sec: 10399.7). Total num frames: 54026240. Throughput: 0: 2280.4. Samples: 8506344. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:47:43,943][04005] Avg episode reward: [(0, '52.220')] [2024-07-05 16:47:45,504][04594] Updated weights for policy 0, policy_version 13194 (0.0015) [2024-07-05 16:47:48,942][04005] Fps is (10 sec: 9010.8, 60 sec: 9147.7, 300 sec: 10358.0). Total num frames: 54071296. Throughput: 0: 2282.6. Samples: 8513294. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:47:48,943][04005] Avg episode reward: [(0, '52.869')] [2024-07-05 16:47:49,988][04594] Updated weights for policy 0, policy_version 13204 (0.0015) [2024-07-05 16:47:53,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.7, 300 sec: 10302.5). Total num frames: 54116352. Throughput: 0: 2282.5. Samples: 8526910. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:47:53,943][04005] Avg episode reward: [(0, '51.299')] [2024-07-05 16:47:54,440][04594] Updated weights for policy 0, policy_version 13214 (0.0015) [2024-07-05 16:47:58,892][04594] Updated weights for policy 0, policy_version 13224 (0.0016) [2024-07-05 16:47:58,941][04005] Fps is (10 sec: 9421.3, 60 sec: 9147.7, 300 sec: 10274.7). Total num frames: 54165504. Throughput: 0: 2292.5. Samples: 8540886. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:47:58,942][04005] Avg episode reward: [(0, '49.457')] [2024-07-05 16:48:03,341][04594] Updated weights for policy 0, policy_version 13234 (0.0015) [2024-07-05 16:48:03,941][04005] Fps is (10 sec: 9420.9, 60 sec: 9147.7, 300 sec: 10219.2). Total num frames: 54210560. Throughput: 0: 2294.8. Samples: 8547886. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:48:03,942][04005] Avg episode reward: [(0, '49.262')] [2024-07-05 16:48:07,818][04594] Updated weights for policy 0, policy_version 13244 (0.0016) [2024-07-05 16:48:08,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9147.7, 300 sec: 10177.5). Total num frames: 54255616. Throughput: 0: 2294.4. Samples: 8561502. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:48:08,943][04005] Avg episode reward: [(0, '49.248')] [2024-07-05 16:48:12,322][04594] Updated weights for policy 0, policy_version 13254 (0.0016) [2024-07-05 16:48:13,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9147.7, 300 sec: 10135.9). Total num frames: 54300672. Throughput: 0: 2285.7. Samples: 8575054. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:48:13,943][04005] Avg episode reward: [(0, '49.436')] [2024-07-05 16:48:16,797][04594] Updated weights for policy 0, policy_version 13264 (0.0016) [2024-07-05 16:48:18,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.7, 300 sec: 10080.3). Total num frames: 54345728. Throughput: 0: 2291.3. Samples: 8582050. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:48:18,942][04005] Avg episode reward: [(0, '50.543')] [2024-07-05 16:48:21,269][04594] Updated weights for policy 0, policy_version 13274 (0.0016) [2024-07-05 16:48:23,942][04005] Fps is (10 sec: 9420.8, 60 sec: 9216.0, 300 sec: 10052.6). Total num frames: 54394880. Throughput: 0: 2290.3. Samples: 8595838. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:48:23,943][04005] Avg episode reward: [(0, '50.589')] [2024-07-05 16:48:25,738][04594] Updated weights for policy 0, policy_version 13284 (0.0015) [2024-07-05 16:48:28,941][04005] Fps is (10 sec: 9420.9, 60 sec: 9216.0, 300 sec: 10010.9). Total num frames: 54439936. Throughput: 0: 2293.8. Samples: 8609566. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:48:28,943][04005] Avg episode reward: [(0, '50.215')] [2024-07-05 16:48:30,217][04594] Updated weights for policy 0, policy_version 13294 (0.0016) [2024-07-05 16:48:33,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.7, 300 sec: 9955.4). Total num frames: 54484992. Throughput: 0: 2292.7. Samples: 8616464. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:48:33,942][04005] Avg episode reward: [(0, '51.331')] [2024-07-05 16:48:34,282][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000013303_54489088.pth... [2024-07-05 16:48:34,370][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000012743_52195328.pth [2024-07-05 16:48:34,758][04594] Updated weights for policy 0, policy_version 13304 (0.0016) [2024-07-05 16:48:38,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.7, 300 sec: 9913.7). Total num frames: 54530048. Throughput: 0: 2291.0. Samples: 8630006. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:48:38,942][04005] Avg episode reward: [(0, '50.764')] [2024-07-05 16:48:39,230][04594] Updated weights for policy 0, policy_version 13314 (0.0016) [2024-07-05 16:48:43,709][04594] Updated weights for policy 0, policy_version 13324 (0.0015) [2024-07-05 16:48:43,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.7, 300 sec: 9858.2). Total num frames: 54575104. Throughput: 0: 2282.2. Samples: 8643586. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:48:43,942][04005] Avg episode reward: [(0, '51.252')] [2024-07-05 16:48:48,226][04594] Updated weights for policy 0, policy_version 13334 (0.0016) [2024-07-05 16:48:48,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9147.8, 300 sec: 9816.5). Total num frames: 54620160. Throughput: 0: 2280.6. Samples: 8650512. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:48:48,943][04005] Avg episode reward: [(0, '51.421')] [2024-07-05 16:48:52,720][04594] Updated weights for policy 0, policy_version 13344 (0.0014) [2024-07-05 16:48:53,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.7, 300 sec: 9774.9). Total num frames: 54665216. Throughput: 0: 2279.2. Samples: 8664064. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:48:53,942][04005] Avg episode reward: [(0, '52.238')] [2024-07-05 16:48:57,180][04594] Updated weights for policy 0, policy_version 13354 (0.0016) [2024-07-05 16:48:58,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9719.3). Total num frames: 54710272. Throughput: 0: 2286.4. Samples: 8677940. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:48:58,943][04005] Avg episode reward: [(0, '50.623')] [2024-07-05 16:49:01,657][04594] Updated weights for policy 0, policy_version 13364 (0.0016) [2024-07-05 16:49:03,941][04005] Fps is (10 sec: 9420.8, 60 sec: 9147.7, 300 sec: 9691.6). Total num frames: 54759424. Throughput: 0: 2278.5. Samples: 8684582. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:49:03,942][04005] Avg episode reward: [(0, '49.951')] [2024-07-05 16:49:06,162][04594] Updated weights for policy 0, policy_version 13374 (0.0016) [2024-07-05 16:49:08,941][04005] Fps is (10 sec: 9420.9, 60 sec: 9147.7, 300 sec: 9649.9). Total num frames: 54804480. Throughput: 0: 2279.0. Samples: 8698392. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:49:08,942][04005] Avg episode reward: [(0, '49.430')] [2024-07-05 16:49:10,700][04594] Updated weights for policy 0, policy_version 13384 (0.0017) [2024-07-05 16:49:13,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.7, 300 sec: 9594.4). Total num frames: 54849536. Throughput: 0: 2274.8. Samples: 8711932. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:49:13,942][04005] Avg episode reward: [(0, '48.922')] [2024-07-05 16:49:15,230][04594] Updated weights for policy 0, policy_version 13394 (0.0015) [2024-07-05 16:49:18,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9147.7, 300 sec: 9552.7). Total num frames: 54894592. Throughput: 0: 2275.5. Samples: 8718860. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:49:18,943][04005] Avg episode reward: [(0, '51.042')] [2024-07-05 16:49:19,677][04594] Updated weights for policy 0, policy_version 13404 (0.0016) [2024-07-05 16:49:23,942][04005] Fps is (10 sec: 9010.7, 60 sec: 9079.4, 300 sec: 9511.0). Total num frames: 54939648. Throughput: 0: 2276.6. Samples: 8732456. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:49:23,943][04005] Avg episode reward: [(0, '50.725')] [2024-07-05 16:49:24,163][04594] Updated weights for policy 0, policy_version 13414 (0.0016) [2024-07-05 16:49:28,680][04594] Updated weights for policy 0, policy_version 13424 (0.0015) [2024-07-05 16:49:28,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9455.5). Total num frames: 54984704. Throughput: 0: 2275.3. Samples: 8745976. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:49:28,943][04005] Avg episode reward: [(0, '51.129')] [2024-07-05 16:49:33,145][04594] Updated weights for policy 0, policy_version 13434 (0.0017) [2024-07-05 16:49:33,942][04005] Fps is (10 sec: 9011.6, 60 sec: 9079.5, 300 sec: 9413.9). Total num frames: 55029760. Throughput: 0: 2276.2. Samples: 8752942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:49:33,942][04005] Avg episode reward: [(0, '52.316')] [2024-07-05 16:49:37,608][04594] Updated weights for policy 0, policy_version 13444 (0.0016) [2024-07-05 16:49:38,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.4, 300 sec: 9372.2). Total num frames: 55074816. Throughput: 0: 2281.6. Samples: 8766738. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:49:38,943][04005] Avg episode reward: [(0, '51.554')] [2024-07-05 16:49:42,104][04594] Updated weights for policy 0, policy_version 13454 (0.0016) [2024-07-05 16:49:43,942][04005] Fps is (10 sec: 9420.7, 60 sec: 9147.7, 300 sec: 9330.5). Total num frames: 55123968. Throughput: 0: 2279.0. Samples: 8780494. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:49:43,942][04005] Avg episode reward: [(0, '52.439')] [2024-07-05 16:49:46,582][04594] Updated weights for policy 0, policy_version 13464 (0.0015) [2024-07-05 16:49:48,941][04005] Fps is (10 sec: 9421.0, 60 sec: 9147.8, 300 sec: 9288.9). Total num frames: 55169024. Throughput: 0: 2285.1. Samples: 8787410. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:49:48,942][04005] Avg episode reward: [(0, '52.630')] [2024-07-05 16:49:51,036][04594] Updated weights for policy 0, policy_version 13474 (0.0017) [2024-07-05 16:49:53,942][04005] Fps is (10 sec: 9010.8, 60 sec: 9147.7, 300 sec: 9233.3). Total num frames: 55214080. Throughput: 0: 2280.9. Samples: 8801032. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:49:53,944][04005] Avg episode reward: [(0, '51.626')] [2024-07-05 16:49:55,536][04594] Updated weights for policy 0, policy_version 13484 (0.0017) [2024-07-05 16:49:58,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9147.7, 300 sec: 9191.7). Total num frames: 55259136. Throughput: 0: 2280.7. Samples: 8814562. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:49:58,942][04005] Avg episode reward: [(0, '51.039')] [2024-07-05 16:50:00,040][04594] Updated weights for policy 0, policy_version 13494 (0.0015) [2024-07-05 16:50:03,941][04005] Fps is (10 sec: 9011.8, 60 sec: 9079.5, 300 sec: 9150.0). Total num frames: 55304192. Throughput: 0: 2281.4. Samples: 8821522. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:50:03,942][04005] Avg episode reward: [(0, '50.887')] [2024-07-05 16:50:04,541][04594] Updated weights for policy 0, policy_version 13504 (0.0018) [2024-07-05 16:50:08,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.4, 300 sec: 9122.3). Total num frames: 55349248. Throughput: 0: 2283.8. Samples: 8835224. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:50:08,942][04005] Avg episode reward: [(0, '50.760')] [2024-07-05 16:50:08,984][04594] Updated weights for policy 0, policy_version 13514 (0.0016) [2024-07-05 16:50:13,424][04594] Updated weights for policy 0, policy_version 13524 (0.0016) [2024-07-05 16:50:13,941][04005] Fps is (10 sec: 9420.7, 60 sec: 9147.7, 300 sec: 9136.2). Total num frames: 55398400. Throughput: 0: 2292.1. Samples: 8849122. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:50:13,942][04005] Avg episode reward: [(0, '50.254')] [2024-07-05 16:50:17,910][04594] Updated weights for policy 0, policy_version 13534 (0.0018) [2024-07-05 16:50:18,941][04005] Fps is (10 sec: 9420.9, 60 sec: 9147.7, 300 sec: 9136.2). Total num frames: 55443456. Throughput: 0: 2292.0. Samples: 8856080. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:50:18,942][04005] Avg episode reward: [(0, '51.375')] [2024-07-05 16:50:22,394][04594] Updated weights for policy 0, policy_version 13544 (0.0015) [2024-07-05 16:50:23,941][04005] Fps is (10 sec: 9011.3, 60 sec: 9147.8, 300 sec: 9136.2). Total num frames: 55488512. Throughput: 0: 2287.0. Samples: 8869654. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:50:23,942][04005] Avg episode reward: [(0, '51.612')] [2024-07-05 16:50:26,877][04594] Updated weights for policy 0, policy_version 13554 (0.0018) [2024-07-05 16:50:28,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.8, 300 sec: 9136.2). Total num frames: 55533568. Throughput: 0: 2282.6. Samples: 8883212. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:50:28,942][04005] Avg episode reward: [(0, '51.397')] [2024-07-05 16:50:31,365][04594] Updated weights for policy 0, policy_version 13564 (0.0016) [2024-07-05 16:50:33,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.7, 300 sec: 9122.3). Total num frames: 55578624. Throughput: 0: 2283.2. Samples: 8890154. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:50:33,942][04005] Avg episode reward: [(0, '50.968')] [2024-07-05 16:50:34,061][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000013570_55582720.pth... [2024-07-05 16:50:34,149][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000013035_53391360.pth [2024-07-05 16:50:35,859][04594] Updated weights for policy 0, policy_version 13574 (0.0018) [2024-07-05 16:50:38,942][04005] Fps is (10 sec: 9010.9, 60 sec: 9147.7, 300 sec: 9122.3). Total num frames: 55623680. Throughput: 0: 2282.9. Samples: 8903760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:50:38,943][04005] Avg episode reward: [(0, '50.781')] [2024-07-05 16:50:40,344][04594] Updated weights for policy 0, policy_version 13584 (0.0016) [2024-07-05 16:50:43,941][04005] Fps is (10 sec: 9420.7, 60 sec: 9147.7, 300 sec: 9136.2). Total num frames: 55672832. Throughput: 0: 2290.3. Samples: 8917624. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:50:43,943][04005] Avg episode reward: [(0, '51.018')] [2024-07-05 16:50:44,818][04594] Updated weights for policy 0, policy_version 13594 (0.0015) [2024-07-05 16:50:48,941][04005] Fps is (10 sec: 9421.1, 60 sec: 9147.7, 300 sec: 9136.2). Total num frames: 55717888. Throughput: 0: 2289.0. Samples: 8924526. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:50:48,943][04005] Avg episode reward: [(0, '51.151')] [2024-07-05 16:50:49,294][04594] Updated weights for policy 0, policy_version 13604 (0.0017) [2024-07-05 16:50:53,745][04594] Updated weights for policy 0, policy_version 13614 (0.0016) [2024-07-05 16:50:53,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9147.8, 300 sec: 9136.2). Total num frames: 55762944. Throughput: 0: 2289.0. Samples: 8938228. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:50:53,943][04005] Avg episode reward: [(0, '53.581')] [2024-07-05 16:50:58,242][04594] Updated weights for policy 0, policy_version 13624 (0.0018) [2024-07-05 16:50:58,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.7, 300 sec: 9136.2). Total num frames: 55808000. Throughput: 0: 2281.0. Samples: 8951766. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:50:58,943][04005] Avg episode reward: [(0, '52.737')] [2024-07-05 16:51:02,744][04594] Updated weights for policy 0, policy_version 13634 (0.0020) [2024-07-05 16:51:03,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.7, 300 sec: 9136.2). Total num frames: 55853056. Throughput: 0: 2280.4. Samples: 8958700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:51:03,943][04005] Avg episode reward: [(0, '52.388')] [2024-07-05 16:51:07,237][04594] Updated weights for policy 0, policy_version 13644 (0.0017) [2024-07-05 16:51:08,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9147.7, 300 sec: 9136.2). Total num frames: 55898112. Throughput: 0: 2280.6. Samples: 8972280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:51:08,943][04005] Avg episode reward: [(0, '51.691')] [2024-07-05 16:51:11,731][04594] Updated weights for policy 0, policy_version 13654 (0.0015) [2024-07-05 16:51:13,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9122.3). Total num frames: 55943168. Throughput: 0: 2287.1. Samples: 8986132. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:51:13,942][04005] Avg episode reward: [(0, '50.252')] [2024-07-05 16:51:16,246][04594] Updated weights for policy 0, policy_version 13664 (0.0016) [2024-07-05 16:51:18,941][04005] Fps is (10 sec: 9421.0, 60 sec: 9147.7, 300 sec: 9136.2). Total num frames: 55992320. Throughput: 0: 2279.7. Samples: 8992742. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:51:18,943][04005] Avg episode reward: [(0, '50.998')] [2024-07-05 16:51:20,747][04594] Updated weights for policy 0, policy_version 13674 (0.0017) [2024-07-05 16:51:23,941][04005] Fps is (10 sec: 9420.9, 60 sec: 9147.7, 300 sec: 9136.2). Total num frames: 56037376. Throughput: 0: 2286.1. Samples: 9006634. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:51:23,942][04005] Avg episode reward: [(0, '51.222')] [2024-07-05 16:51:25,235][04594] Updated weights for policy 0, policy_version 13684 (0.0016) [2024-07-05 16:51:28,941][04005] Fps is (10 sec: 9011.1, 60 sec: 9147.7, 300 sec: 9136.2). Total num frames: 56082432. Throughput: 0: 2280.8. Samples: 9020260. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:51:28,942][04005] Avg episode reward: [(0, '50.864')] [2024-07-05 16:51:29,705][04594] Updated weights for policy 0, policy_version 13694 (0.0016) [2024-07-05 16:51:33,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.7, 300 sec: 9136.2). Total num frames: 56127488. Throughput: 0: 2280.9. Samples: 9027166. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:51:33,942][04005] Avg episode reward: [(0, '51.130')] [2024-07-05 16:51:34,226][04594] Updated weights for policy 0, policy_version 13704 (0.0017) [2024-07-05 16:51:38,720][04594] Updated weights for policy 0, policy_version 13714 (0.0016) [2024-07-05 16:51:38,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9147.8, 300 sec: 9136.2). Total num frames: 56172544. Throughput: 0: 2277.4. Samples: 9040712. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:51:38,943][04005] Avg episode reward: [(0, '51.682')] [2024-07-05 16:51:43,234][04594] Updated weights for policy 0, policy_version 13724 (0.0017) [2024-07-05 16:51:43,941][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.5, 300 sec: 9136.2). Total num frames: 56217600. Throughput: 0: 2277.2. Samples: 9054238. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:51:43,943][04005] Avg episode reward: [(0, '52.592')] [2024-07-05 16:51:47,761][04594] Updated weights for policy 0, policy_version 13734 (0.0016) [2024-07-05 16:51:48,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.5, 300 sec: 9136.2). Total num frames: 56262656. Throughput: 0: 2277.0. Samples: 9061166. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:51:48,943][04005] Avg episode reward: [(0, '52.176')] [2024-07-05 16:51:52,260][04594] Updated weights for policy 0, policy_version 13744 (0.0018) [2024-07-05 16:51:53,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.5, 300 sec: 9122.3). Total num frames: 56307712. Throughput: 0: 2275.6. Samples: 9074680. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:51:53,943][04005] Avg episode reward: [(0, '51.238')] [2024-07-05 16:51:56,756][04594] Updated weights for policy 0, policy_version 13754 (0.0015) [2024-07-05 16:51:58,941][04005] Fps is (10 sec: 9011.4, 60 sec: 9079.5, 300 sec: 9122.3). Total num frames: 56352768. Throughput: 0: 2275.0. Samples: 9088508. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:51:58,942][04005] Avg episode reward: [(0, '50.152')] [2024-07-05 16:52:01,223][04594] Updated weights for policy 0, policy_version 13764 (0.0016) [2024-07-05 16:52:03,941][04005] Fps is (10 sec: 9420.9, 60 sec: 9147.8, 300 sec: 9136.2). Total num frames: 56401920. Throughput: 0: 2276.4. Samples: 9095180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:52:03,943][04005] Avg episode reward: [(0, '49.208')] [2024-07-05 16:52:05,742][04594] Updated weights for policy 0, policy_version 13774 (0.0019) [2024-07-05 16:52:08,941][04005] Fps is (10 sec: 9420.8, 60 sec: 9147.8, 300 sec: 9136.2). Total num frames: 56446976. Throughput: 0: 2275.0. Samples: 9109008. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:52:08,942][04005] Avg episode reward: [(0, '49.159')] [2024-07-05 16:52:10,252][04594] Updated weights for policy 0, policy_version 13784 (0.0016) [2024-07-05 16:52:13,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9147.7, 300 sec: 9136.2). Total num frames: 56492032. Throughput: 0: 2273.9. Samples: 9122584. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:52:13,943][04005] Avg episode reward: [(0, '50.414')] [2024-07-05 16:52:14,749][04594] Updated weights for policy 0, policy_version 13794 (0.0018) [2024-07-05 16:52:18,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9136.2). Total num frames: 56537088. Throughput: 0: 2273.8. Samples: 9129486. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:52:18,942][04005] Avg episode reward: [(0, '50.322')] [2024-07-05 16:52:19,239][04594] Updated weights for policy 0, policy_version 13804 (0.0016) [2024-07-05 16:52:23,727][04594] Updated weights for policy 0, policy_version 13814 (0.0015) [2024-07-05 16:52:23,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.4, 300 sec: 9136.2). Total num frames: 56582144. Throughput: 0: 2275.3. Samples: 9143102. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 16:52:23,943][04005] Avg episode reward: [(0, '51.276')] [2024-07-05 16:52:28,236][04594] Updated weights for policy 0, policy_version 13824 (0.0017) [2024-07-05 16:52:28,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.5, 300 sec: 9122.3). Total num frames: 56627200. Throughput: 0: 2274.8. Samples: 9156604. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:52:28,942][04005] Avg episode reward: [(0, '50.709')] [2024-07-05 16:52:32,762][04594] Updated weights for policy 0, policy_version 13834 (0.0016) [2024-07-05 16:52:33,942][04005] Fps is (10 sec: 9010.7, 60 sec: 9079.4, 300 sec: 9122.3). Total num frames: 56672256. Throughput: 0: 2273.7. Samples: 9163484. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:52:33,943][04005] Avg episode reward: [(0, '51.922')] [2024-07-05 16:52:34,111][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000013837_56676352.pth... [2024-07-05 16:52:34,203][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000013303_54489088.pth [2024-07-05 16:52:37,253][04594] Updated weights for policy 0, policy_version 13844 (0.0016) [2024-07-05 16:52:38,942][04005] Fps is (10 sec: 9010.8, 60 sec: 9079.4, 300 sec: 9122.3). Total num frames: 56717312. Throughput: 0: 2275.3. Samples: 9177068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:52:38,943][04005] Avg episode reward: [(0, '51.802')] [2024-07-05 16:52:41,730][04594] Updated weights for policy 0, policy_version 13854 (0.0016) [2024-07-05 16:52:43,942][04005] Fps is (10 sec: 9011.7, 60 sec: 9079.5, 300 sec: 9122.3). Total num frames: 56762368. Throughput: 0: 2276.3. Samples: 9190942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:52:43,943][04005] Avg episode reward: [(0, '49.878')] [2024-07-05 16:52:46,213][04594] Updated weights for policy 0, policy_version 13864 (0.0014) [2024-07-05 16:52:48,941][04005] Fps is (10 sec: 9421.2, 60 sec: 9147.7, 300 sec: 9136.2). Total num frames: 56811520. Throughput: 0: 2274.4. Samples: 9197530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:52:48,942][04005] Avg episode reward: [(0, '50.597')] [2024-07-05 16:52:50,697][04594] Updated weights for policy 0, policy_version 13874 (0.0016) [2024-07-05 16:52:53,942][04005] Fps is (10 sec: 9420.8, 60 sec: 9147.7, 300 sec: 9122.3). Total num frames: 56856576. Throughput: 0: 2275.3. Samples: 9211396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:52:53,943][04005] Avg episode reward: [(0, '50.482')] [2024-07-05 16:52:55,218][04594] Updated weights for policy 0, policy_version 13884 (0.0015) [2024-07-05 16:52:58,941][04005] Fps is (10 sec: 9011.3, 60 sec: 9147.7, 300 sec: 9122.3). Total num frames: 56901632. Throughput: 0: 2274.9. Samples: 9224954. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:52:58,942][04005] Avg episode reward: [(0, '50.349')] [2024-07-05 16:52:59,737][04594] Updated weights for policy 0, policy_version 13894 (0.0019) [2024-07-05 16:53:03,942][04005] Fps is (10 sec: 9011.3, 60 sec: 9079.5, 300 sec: 9122.3). Total num frames: 56946688. Throughput: 0: 2275.2. Samples: 9231868. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:53:03,942][04005] Avg episode reward: [(0, '50.640')] [2024-07-05 16:53:04,239][04594] Updated weights for policy 0, policy_version 13904 (0.0016) [2024-07-05 16:53:08,731][04594] Updated weights for policy 0, policy_version 13914 (0.0015) [2024-07-05 16:53:08,942][04005] Fps is (10 sec: 9010.7, 60 sec: 9079.4, 300 sec: 9122.3). Total num frames: 56991744. Throughput: 0: 2274.0. Samples: 9245434. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:53:08,943][04005] Avg episode reward: [(0, '50.690')] [2024-07-05 16:53:13,233][04594] Updated weights for policy 0, policy_version 13924 (0.0017) [2024-07-05 16:53:13,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9122.3). Total num frames: 57036800. Throughput: 0: 2275.4. Samples: 9258996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:53:13,943][04005] Avg episode reward: [(0, '50.405')] [2024-07-05 16:53:17,753][04594] Updated weights for policy 0, policy_version 13934 (0.0016) [2024-07-05 16:53:18,941][04005] Fps is (10 sec: 9011.7, 60 sec: 9079.5, 300 sec: 9108.4). Total num frames: 57081856. Throughput: 0: 2276.1. Samples: 9265908. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:53:18,942][04005] Avg episode reward: [(0, '50.526')] [2024-07-05 16:53:22,240][04594] Updated weights for policy 0, policy_version 13944 (0.0017) [2024-07-05 16:53:23,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9108.4). Total num frames: 57126912. Throughput: 0: 2274.9. Samples: 9279436. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:53:23,942][04005] Avg episode reward: [(0, '50.178')] [2024-07-05 16:53:26,764][04594] Updated weights for policy 0, policy_version 13954 (0.0015) [2024-07-05 16:53:28,942][04005] Fps is (10 sec: 9010.9, 60 sec: 9079.4, 300 sec: 9108.4). Total num frames: 57171968. Throughput: 0: 2275.0. Samples: 9293318. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:53:28,943][04005] Avg episode reward: [(0, '49.921')] [2024-07-05 16:53:31,251][04594] Updated weights for policy 0, policy_version 13964 (0.0016) [2024-07-05 16:53:33,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.6, 300 sec: 9108.4). Total num frames: 57217024. Throughput: 0: 2276.0. Samples: 9299948. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:53:33,943][04005] Avg episode reward: [(0, '50.283')] [2024-07-05 16:53:35,782][04594] Updated weights for policy 0, policy_version 13974 (0.0016) [2024-07-05 16:53:38,941][04005] Fps is (10 sec: 9421.1, 60 sec: 9147.8, 300 sec: 9122.3). Total num frames: 57266176. Throughput: 0: 2273.9. Samples: 9313722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:53:38,942][04005] Avg episode reward: [(0, '49.230')] [2024-07-05 16:53:40,309][04594] Updated weights for policy 0, policy_version 13984 (0.0017) [2024-07-05 16:53:43,942][04005] Fps is (10 sec: 9420.7, 60 sec: 9147.7, 300 sec: 9122.3). Total num frames: 57311232. Throughput: 0: 2274.3. Samples: 9327300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:53:43,943][04005] Avg episode reward: [(0, '49.725')] [2024-07-05 16:53:44,804][04594] Updated weights for policy 0, policy_version 13994 (0.0016) [2024-07-05 16:53:48,941][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.5, 300 sec: 9122.3). Total num frames: 57356288. Throughput: 0: 2271.6. Samples: 9334092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:53:48,942][04005] Avg episode reward: [(0, '50.346')] [2024-07-05 16:53:49,316][04594] Updated weights for policy 0, policy_version 14004 (0.0015) [2024-07-05 16:53:53,806][04594] Updated weights for policy 0, policy_version 14014 (0.0016) [2024-07-05 16:53:53,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.4, 300 sec: 9122.3). Total num frames: 57401344. Throughput: 0: 2273.9. Samples: 9347760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:53:53,943][04005] Avg episode reward: [(0, '50.250')] [2024-07-05 16:53:58,311][04594] Updated weights for policy 0, policy_version 14024 (0.0015) [2024-07-05 16:53:58,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.4, 300 sec: 9108.4). Total num frames: 57446400. Throughput: 0: 2273.2. Samples: 9361288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:53:58,942][04005] Avg episode reward: [(0, '51.446')] [2024-07-05 16:54:02,815][04594] Updated weights for policy 0, policy_version 14034 (0.0019) [2024-07-05 16:54:03,942][04005] Fps is (10 sec: 9011.3, 60 sec: 9079.5, 300 sec: 9108.4). Total num frames: 57491456. Throughput: 0: 2272.3. Samples: 9368162. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:54:03,942][04005] Avg episode reward: [(0, '50.718')] [2024-07-05 16:54:07,352][04594] Updated weights for policy 0, policy_version 14044 (0.0017) [2024-07-05 16:54:08,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.5, 300 sec: 9108.4). Total num frames: 57536512. Throughput: 0: 2271.6. Samples: 9381656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:54:08,942][04005] Avg episode reward: [(0, '50.418')] [2024-07-05 16:54:11,891][04594] Updated weights for policy 0, policy_version 14054 (0.0018) [2024-07-05 16:54:13,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9108.4). Total num frames: 57581568. Throughput: 0: 2263.3. Samples: 9395164. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:54:13,942][04005] Avg episode reward: [(0, '49.869')] [2024-07-05 16:54:16,374][04594] Updated weights for policy 0, policy_version 14064 (0.0016) [2024-07-05 16:54:18,941][04005] Fps is (10 sec: 9011.3, 60 sec: 9079.5, 300 sec: 9108.4). Total num frames: 57626624. Throughput: 0: 2269.8. Samples: 9402088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:54:18,942][04005] Avg episode reward: [(0, '49.918')] [2024-07-05 16:54:20,881][04594] Updated weights for policy 0, policy_version 14074 (0.0016) [2024-07-05 16:54:23,941][04005] Fps is (10 sec: 9011.3, 60 sec: 9079.5, 300 sec: 9108.4). Total num frames: 57671680. Throughput: 0: 2264.9. Samples: 9415642. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:54:23,942][04005] Avg episode reward: [(0, '50.217')] [2024-07-05 16:54:25,376][04594] Updated weights for policy 0, policy_version 14084 (0.0016) [2024-07-05 16:54:28,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9108.4). Total num frames: 57716736. Throughput: 0: 2271.6. Samples: 9429522. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:54:28,943][04005] Avg episode reward: [(0, '50.691')] [2024-07-05 16:54:29,885][04594] Updated weights for policy 0, policy_version 14094 (0.0015) [2024-07-05 16:54:33,941][04005] Fps is (10 sec: 9420.8, 60 sec: 9147.8, 300 sec: 9122.3). Total num frames: 57765888. Throughput: 0: 2267.1. Samples: 9436112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:54:33,942][04005] Avg episode reward: [(0, '51.438')] [2024-07-05 16:54:33,947][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000014103_57765888.pth... [2024-07-05 16:54:34,034][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000013570_55582720.pth [2024-07-05 16:54:34,419][04594] Updated weights for policy 0, policy_version 14104 (0.0017) [2024-07-05 16:54:38,894][04594] Updated weights for policy 0, policy_version 14114 (0.0018) [2024-07-05 16:54:38,941][04005] Fps is (10 sec: 9420.9, 60 sec: 9079.5, 300 sec: 9108.4). Total num frames: 57810944. Throughput: 0: 2270.9. Samples: 9449948. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:54:38,942][04005] Avg episode reward: [(0, '51.141')] [2024-07-05 16:54:43,415][04594] Updated weights for policy 0, policy_version 14124 (0.0017) [2024-07-05 16:54:43,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.5, 300 sec: 9108.4). Total num frames: 57856000. Throughput: 0: 2270.6. Samples: 9463464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:54:43,943][04005] Avg episode reward: [(0, '49.823')] [2024-07-05 16:54:47,908][04594] Updated weights for policy 0, policy_version 14134 (0.0016) [2024-07-05 16:54:48,942][04005] Fps is (10 sec: 9011.0, 60 sec: 9079.5, 300 sec: 9108.4). Total num frames: 57901056. Throughput: 0: 2271.7. Samples: 9470388. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:54:48,943][04005] Avg episode reward: [(0, '48.254')] [2024-07-05 16:54:52,386][04594] Updated weights for policy 0, policy_version 14144 (0.0016) [2024-07-05 16:54:53,942][04005] Fps is (10 sec: 9010.9, 60 sec: 9079.4, 300 sec: 9108.4). Total num frames: 57946112. Throughput: 0: 2274.6. Samples: 9484012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:54:53,943][04005] Avg episode reward: [(0, '47.918')] [2024-07-05 16:54:56,911][04594] Updated weights for policy 0, policy_version 14154 (0.0015) [2024-07-05 16:54:58,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9108.4). Total num frames: 57991168. Throughput: 0: 2273.8. Samples: 9497486. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:54:58,943][04005] Avg episode reward: [(0, '47.725')] [2024-07-05 16:55:01,427][04594] Updated weights for policy 0, policy_version 14164 (0.0015) [2024-07-05 16:55:03,942][04005] Fps is (10 sec: 9011.5, 60 sec: 9079.5, 300 sec: 9108.4). Total num frames: 58036224. Throughput: 0: 2273.8. Samples: 9504410. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:55:03,943][04005] Avg episode reward: [(0, '48.965')] [2024-07-05 16:55:05,949][04594] Updated weights for policy 0, policy_version 14174 (0.0016) [2024-07-05 16:55:08,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9094.5). Total num frames: 58081280. Throughput: 0: 2272.6. Samples: 9517908. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:55:08,942][04005] Avg episode reward: [(0, '49.137')] [2024-07-05 16:55:10,463][04594] Updated weights for policy 0, policy_version 14184 (0.0015) [2024-07-05 16:55:13,942][04005] Fps is (10 sec: 9011.0, 60 sec: 9079.4, 300 sec: 9094.5). Total num frames: 58126336. Throughput: 0: 2266.2. Samples: 9531502. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:55:13,943][04005] Avg episode reward: [(0, '49.590')] [2024-07-05 16:55:14,979][04594] Updated weights for policy 0, policy_version 14194 (0.0017) [2024-07-05 16:55:18,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9094.5). Total num frames: 58171392. Throughput: 0: 2271.0. Samples: 9538306. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:55:18,943][04005] Avg episode reward: [(0, '51.474')] [2024-07-05 16:55:19,481][04594] Updated weights for policy 0, policy_version 14204 (0.0016) [2024-07-05 16:55:23,942][04005] Fps is (10 sec: 9011.4, 60 sec: 9079.4, 300 sec: 9094.5). Total num frames: 58216448. Throughput: 0: 2265.6. Samples: 9551902. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:55:23,943][04005] Avg episode reward: [(0, '51.677')] [2024-07-05 16:55:23,994][04594] Updated weights for policy 0, policy_version 14214 (0.0017) [2024-07-05 16:55:28,520][04594] Updated weights for policy 0, policy_version 14224 (0.0015) [2024-07-05 16:55:28,941][04005] Fps is (10 sec: 9011.3, 60 sec: 9079.5, 300 sec: 9094.5). Total num frames: 58261504. Throughput: 0: 2270.5. Samples: 9565636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:55:28,942][04005] Avg episode reward: [(0, '53.646')] [2024-07-05 16:55:33,037][04594] Updated weights for policy 0, policy_version 14234 (0.0015) [2024-07-05 16:55:33,941][04005] Fps is (10 sec: 9420.9, 60 sec: 9079.5, 300 sec: 9108.4). Total num frames: 58310656. Throughput: 0: 2263.6. Samples: 9572250. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:55:33,942][04005] Avg episode reward: [(0, '52.250')] [2024-07-05 16:55:37,537][04594] Updated weights for policy 0, policy_version 14244 (0.0015) [2024-07-05 16:55:38,941][04005] Fps is (10 sec: 9420.8, 60 sec: 9079.5, 300 sec: 9094.5). Total num frames: 58355712. Throughput: 0: 2268.3. Samples: 9586084. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:55:38,942][04005] Avg episode reward: [(0, '50.757')] [2024-07-05 16:55:42,047][04594] Updated weights for policy 0, policy_version 14254 (0.0015) [2024-07-05 16:55:43,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9094.5). Total num frames: 58400768. Throughput: 0: 2270.3. Samples: 9599648. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:55:43,942][04005] Avg episode reward: [(0, '50.536')] [2024-07-05 16:55:46,544][04594] Updated weights for policy 0, policy_version 14264 (0.0016) [2024-07-05 16:55:48,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.5, 300 sec: 9094.5). Total num frames: 58445824. Throughput: 0: 2270.3. Samples: 9606574. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:55:48,942][04005] Avg episode reward: [(0, '49.929')] [2024-07-05 16:55:51,066][04594] Updated weights for policy 0, policy_version 14274 (0.0016) [2024-07-05 16:55:53,942][04005] Fps is (10 sec: 9010.8, 60 sec: 9079.5, 300 sec: 9094.5). Total num frames: 58490880. Throughput: 0: 2271.1. Samples: 9620106. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:55:53,943][04005] Avg episode reward: [(0, '51.306')] [2024-07-05 16:55:55,575][04594] Updated weights for policy 0, policy_version 14284 (0.0015) [2024-07-05 16:55:58,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9094.5). Total num frames: 58535936. Throughput: 0: 2269.4. Samples: 9633626. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:55:58,942][04005] Avg episode reward: [(0, '50.709')] [2024-07-05 16:56:00,100][04594] Updated weights for policy 0, policy_version 14294 (0.0017) [2024-07-05 16:56:03,941][04005] Fps is (10 sec: 9011.6, 60 sec: 9079.5, 300 sec: 9094.5). Total num frames: 58580992. Throughput: 0: 2271.7. Samples: 9640534. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:56:03,943][04005] Avg episode reward: [(0, '52.264')] [2024-07-05 16:56:04,598][04594] Updated weights for policy 0, policy_version 14304 (0.0016) [2024-07-05 16:56:08,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9094.5). Total num frames: 58626048. Throughput: 0: 2269.9. Samples: 9654046. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:56:08,942][04005] Avg episode reward: [(0, '52.323')] [2024-07-05 16:56:09,098][04594] Updated weights for policy 0, policy_version 14314 (0.0016) [2024-07-05 16:56:13,627][04594] Updated weights for policy 0, policy_version 14324 (0.0017) [2024-07-05 16:56:13,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.5, 300 sec: 9080.6). Total num frames: 58671104. Throughput: 0: 2267.9. Samples: 9667690. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:56:13,943][04005] Avg episode reward: [(0, '52.270')] [2024-07-05 16:56:18,121][04594] Updated weights for policy 0, policy_version 14334 (0.0016) [2024-07-05 16:56:18,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9080.6). Total num frames: 58716160. Throughput: 0: 2272.0. Samples: 9674490. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:56:18,943][04005] Avg episode reward: [(0, '52.779')] [2024-07-05 16:56:22,637][04594] Updated weights for policy 0, policy_version 14344 (0.0018) [2024-07-05 16:56:23,942][04005] Fps is (10 sec: 9010.7, 60 sec: 9079.4, 300 sec: 9080.6). Total num frames: 58761216. Throughput: 0: 2266.9. Samples: 9688094. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:56:23,943][04005] Avg episode reward: [(0, '51.339')] [2024-07-05 16:56:27,154][04594] Updated weights for policy 0, policy_version 14354 (0.0017) [2024-07-05 16:56:28,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.4, 300 sec: 9080.6). Total num frames: 58806272. Throughput: 0: 2270.7. Samples: 9701828. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:56:28,943][04005] Avg episode reward: [(0, '51.400')] [2024-07-05 16:56:31,695][04594] Updated weights for policy 0, policy_version 14364 (0.0019) [2024-07-05 16:56:33,942][04005] Fps is (10 sec: 9011.6, 60 sec: 9011.2, 300 sec: 9080.6). Total num frames: 58851328. Throughput: 0: 2263.2. Samples: 9708420. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:56:33,943][04005] Avg episode reward: [(0, '49.805')] [2024-07-05 16:56:33,970][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000014369_58855424.pth... [2024-07-05 16:56:34,061][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000013837_56676352.pth [2024-07-05 16:56:36,227][04594] Updated weights for policy 0, policy_version 14374 (0.0020) [2024-07-05 16:56:38,942][04005] Fps is (10 sec: 9010.6, 60 sec: 9011.1, 300 sec: 9080.6). Total num frames: 58896384. Throughput: 0: 2267.9. Samples: 9722160. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:56:38,944][04005] Avg episode reward: [(0, '50.241')] [2024-07-05 16:56:40,758][04594] Updated weights for policy 0, policy_version 14384 (0.0017) [2024-07-05 16:56:43,941][04005] Fps is (10 sec: 9421.1, 60 sec: 9079.5, 300 sec: 9094.5). Total num frames: 58945536. Throughput: 0: 2268.0. Samples: 9735686. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:56:43,943][04005] Avg episode reward: [(0, '51.349')] [2024-07-05 16:56:45,308][04594] Updated weights for policy 0, policy_version 14394 (0.0017) [2024-07-05 16:56:48,942][04005] Fps is (10 sec: 9421.4, 60 sec: 9079.5, 300 sec: 9094.5). Total num frames: 58990592. Throughput: 0: 2260.9. Samples: 9742274. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:56:48,942][04005] Avg episode reward: [(0, '50.742')] [2024-07-05 16:56:49,853][04594] Updated weights for policy 0, policy_version 14404 (0.0017) [2024-07-05 16:56:53,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.5, 300 sec: 9094.5). Total num frames: 59035648. Throughput: 0: 2266.7. Samples: 9756048. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:56:53,943][04005] Avg episode reward: [(0, '51.053')] [2024-07-05 16:56:54,404][04594] Updated weights for policy 0, policy_version 14414 (0.0016) [2024-07-05 16:56:58,920][04594] Updated weights for policy 0, policy_version 14424 (0.0017) [2024-07-05 16:56:58,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9080.6). Total num frames: 59080704. Throughput: 0: 2262.8. Samples: 9769514. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:56:58,942][04005] Avg episode reward: [(0, '51.638')] [2024-07-05 16:57:03,442][04594] Updated weights for policy 0, policy_version 14434 (0.0017) [2024-07-05 16:57:03,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9080.6). Total num frames: 59125760. Throughput: 0: 2260.4. Samples: 9776208. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:57:03,943][04005] Avg episode reward: [(0, '52.314')] [2024-07-05 16:57:07,966][04594] Updated weights for policy 0, policy_version 14444 (0.0015) [2024-07-05 16:57:08,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9080.6). Total num frames: 59170816. Throughput: 0: 2262.5. Samples: 9789906. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:57:08,942][04005] Avg episode reward: [(0, '51.443')] [2024-07-05 16:57:12,492][04594] Updated weights for policy 0, policy_version 14454 (0.0016) [2024-07-05 16:57:13,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9080.6). Total num frames: 59215872. Throughput: 0: 2257.7. Samples: 9803424. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:57:13,943][04005] Avg episode reward: [(0, '51.775')] [2024-07-05 16:57:17,000][04594] Updated weights for policy 0, policy_version 14464 (0.0015) [2024-07-05 16:57:18,941][04005] Fps is (10 sec: 9011.3, 60 sec: 9079.5, 300 sec: 9080.6). Total num frames: 59260928. Throughput: 0: 2264.1. Samples: 9810306. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:57:18,942][04005] Avg episode reward: [(0, '52.149')] [2024-07-05 16:57:21,523][04594] Updated weights for policy 0, policy_version 14474 (0.0019) [2024-07-05 16:57:23,941][04005] Fps is (10 sec: 9011.3, 60 sec: 9079.6, 300 sec: 9080.6). Total num frames: 59305984. Throughput: 0: 2259.5. Samples: 9823834. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:57:23,942][04005] Avg episode reward: [(0, '52.452')] [2024-07-05 16:57:26,056][04594] Updated weights for policy 0, policy_version 14484 (0.0017) [2024-07-05 16:57:28,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.5, 300 sec: 9080.6). Total num frames: 59351040. Throughput: 0: 2258.3. Samples: 9837308. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:57:28,943][04005] Avg episode reward: [(0, '51.523')] [2024-07-05 16:57:30,570][04594] Updated weights for policy 0, policy_version 14494 (0.0016) [2024-07-05 16:57:33,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.5, 300 sec: 9080.6). Total num frames: 59396096. Throughput: 0: 2265.0. Samples: 9844200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:57:33,943][04005] Avg episode reward: [(0, '52.343')] [2024-07-05 16:57:35,102][04594] Updated weights for policy 0, policy_version 14504 (0.0016) [2024-07-05 16:57:38,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.6, 300 sec: 9080.6). Total num frames: 59441152. Throughput: 0: 2259.0. Samples: 9857704. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:57:38,942][04005] Avg episode reward: [(0, '52.272')] [2024-07-05 16:57:39,662][04594] Updated weights for policy 0, policy_version 14514 (0.0016) [2024-07-05 16:57:43,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9011.2, 300 sec: 9066.7). Total num frames: 59486208. Throughput: 0: 2259.3. Samples: 9871184. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:57:43,943][04005] Avg episode reward: [(0, '50.983')] [2024-07-05 16:57:44,182][04594] Updated weights for policy 0, policy_version 14524 (0.0017) [2024-07-05 16:57:48,716][04594] Updated weights for policy 0, policy_version 14534 (0.0016) [2024-07-05 16:57:48,941][04005] Fps is (10 sec: 9011.3, 60 sec: 9011.2, 300 sec: 9066.7). Total num frames: 59531264. Throughput: 0: 2263.8. Samples: 9878080. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:57:48,942][04005] Avg episode reward: [(0, '52.466')] [2024-07-05 16:57:53,243][04594] Updated weights for policy 0, policy_version 14544 (0.0018) [2024-07-05 16:57:53,941][04005] Fps is (10 sec: 9011.3, 60 sec: 9011.2, 300 sec: 9066.7). Total num frames: 59576320. Throughput: 0: 2259.0. Samples: 9891560. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:57:53,942][04005] Avg episode reward: [(0, '50.897')] [2024-07-05 16:57:57,775][04594] Updated weights for policy 0, policy_version 14554 (0.0017) [2024-07-05 16:57:58,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9011.2, 300 sec: 9066.7). Total num frames: 59621376. Throughput: 0: 2258.4. Samples: 9905052. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:57:58,943][04005] Avg episode reward: [(0, '49.607')] [2024-07-05 16:58:02,312][04594] Updated weights for policy 0, policy_version 14564 (0.0016) [2024-07-05 16:58:03,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9011.2, 300 sec: 9066.8). Total num frames: 59666432. Throughput: 0: 2258.7. Samples: 9911946. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:58:03,942][04005] Avg episode reward: [(0, '50.580')] [2024-07-05 16:58:06,826][04594] Updated weights for policy 0, policy_version 14574 (0.0019) [2024-07-05 16:58:08,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9011.2, 300 sec: 9066.7). Total num frames: 59711488. Throughput: 0: 2257.7. Samples: 9925430. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:58:08,943][04005] Avg episode reward: [(0, '49.995')] [2024-07-05 16:58:11,343][04594] Updated weights for policy 0, policy_version 14584 (0.0016) [2024-07-05 16:58:13,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9011.2, 300 sec: 9066.7). Total num frames: 59756544. Throughput: 0: 2263.2. Samples: 9939150. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:58:13,943][04005] Avg episode reward: [(0, '51.003')] [2024-07-05 16:58:15,869][04594] Updated weights for policy 0, policy_version 14594 (0.0016) [2024-07-05 16:58:18,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9011.2, 300 sec: 9066.7). Total num frames: 59801600. Throughput: 0: 2258.6. Samples: 9945838. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:58:18,943][04005] Avg episode reward: [(0, '51.455')] [2024-07-05 16:58:20,374][04594] Updated weights for policy 0, policy_version 14604 (0.0016) [2024-07-05 16:58:23,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9011.2, 300 sec: 9066.7). Total num frames: 59846656. Throughput: 0: 2262.1. Samples: 9959500. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:58:23,943][04005] Avg episode reward: [(0, '50.376')] [2024-07-05 16:58:24,889][04594] Updated weights for policy 0, policy_version 14614 (0.0018) [2024-07-05 16:58:28,941][04005] Fps is (10 sec: 9011.3, 60 sec: 9011.2, 300 sec: 9066.7). Total num frames: 59891712. Throughput: 0: 2267.0. Samples: 9973200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:58:28,942][04005] Avg episode reward: [(0, '51.521')] [2024-07-05 16:58:29,429][04594] Updated weights for policy 0, policy_version 14624 (0.0016) [2024-07-05 16:58:33,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9011.2, 300 sec: 9052.8). Total num frames: 59936768. Throughput: 0: 2260.4. Samples: 9979798. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:58:33,943][04005] Avg episode reward: [(0, '49.435')] [2024-07-05 16:58:33,964][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000014634_59940864.pth... [2024-07-05 16:58:33,967][04594] Updated weights for policy 0, policy_version 14634 (0.0017) [2024-07-05 16:58:34,050][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000014103_57765888.pth [2024-07-05 16:58:38,482][04594] Updated weights for policy 0, policy_version 14644 (0.0016) [2024-07-05 16:58:38,941][04005] Fps is (10 sec: 9420.9, 60 sec: 9079.5, 300 sec: 9066.7). Total num frames: 59985920. Throughput: 0: 2267.6. Samples: 9993600. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:58:38,942][04005] Avg episode reward: [(0, '50.244')] [2024-07-05 16:58:43,026][04594] Updated weights for policy 0, policy_version 14654 (0.0015) [2024-07-05 16:58:43,941][04005] Fps is (10 sec: 9421.0, 60 sec: 9079.5, 300 sec: 9066.7). Total num frames: 60030976. Throughput: 0: 2267.6. Samples: 10007092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:58:43,942][04005] Avg episode reward: [(0, '49.311')] [2024-07-05 16:58:47,546][04594] Updated weights for policy 0, policy_version 14664 (0.0018) [2024-07-05 16:58:48,941][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.5, 300 sec: 9066.7). Total num frames: 60076032. Throughput: 0: 2262.4. Samples: 10013756. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:58:48,942][04005] Avg episode reward: [(0, '48.732')] [2024-07-05 16:58:52,049][04594] Updated weights for policy 0, policy_version 14674 (0.0016) [2024-07-05 16:58:53,949][04005] Fps is (10 sec: 9005.1, 60 sec: 9078.5, 300 sec: 9066.5). Total num frames: 60121088. Throughput: 0: 2268.9. Samples: 10027546. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:58:53,950][04005] Avg episode reward: [(0, '50.663')] [2024-07-05 16:58:56,571][04594] Updated weights for policy 0, policy_version 14684 (0.0017) [2024-07-05 16:58:58,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9066.7). Total num frames: 60166144. Throughput: 0: 2264.2. Samples: 10041038. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:58:58,942][04005] Avg episode reward: [(0, '51.273')] [2024-07-05 16:59:01,087][04594] Updated weights for policy 0, policy_version 14694 (0.0016) [2024-07-05 16:59:03,942][04005] Fps is (10 sec: 9017.2, 60 sec: 9079.4, 300 sec: 9066.7). Total num frames: 60211200. Throughput: 0: 2269.6. Samples: 10047968. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:59:03,942][04005] Avg episode reward: [(0, '53.057')] [2024-07-05 16:59:05,600][04594] Updated weights for policy 0, policy_version 14704 (0.0016) [2024-07-05 16:59:08,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.5, 300 sec: 9066.7). Total num frames: 60256256. Throughput: 0: 2267.1. Samples: 10061520. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:59:08,942][04005] Avg episode reward: [(0, '53.547')] [2024-07-05 16:59:10,116][04594] Updated weights for policy 0, policy_version 14714 (0.0017) [2024-07-05 16:59:13,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9066.7). Total num frames: 60301312. Throughput: 0: 2261.3. Samples: 10074958. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:59:13,942][04005] Avg episode reward: [(0, '52.893')] [2024-07-05 16:59:14,640][04594] Updated weights for policy 0, policy_version 14724 (0.0016) [2024-07-05 16:59:18,942][04005] Fps is (10 sec: 9010.9, 60 sec: 9079.4, 300 sec: 9066.7). Total num frames: 60346368. Throughput: 0: 2268.2. Samples: 10081868. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:59:18,943][04005] Avg episode reward: [(0, '52.129')] [2024-07-05 16:59:19,163][04594] Updated weights for policy 0, policy_version 14734 (0.0015) [2024-07-05 16:59:23,671][04594] Updated weights for policy 0, policy_version 14744 (0.0017) [2024-07-05 16:59:23,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.5, 300 sec: 9066.7). Total num frames: 60391424. Throughput: 0: 2261.5. Samples: 10095368. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 16:59:23,943][04005] Avg episode reward: [(0, '51.576')] [2024-07-05 16:59:28,217][04594] Updated weights for policy 0, policy_version 14754 (0.0015) [2024-07-05 16:59:28,941][04005] Fps is (10 sec: 9011.5, 60 sec: 9079.5, 300 sec: 9052.9). Total num frames: 60436480. Throughput: 0: 2261.8. Samples: 10108872. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:59:28,942][04005] Avg episode reward: [(0, '51.010')] [2024-07-05 16:59:32,739][04594] Updated weights for policy 0, policy_version 14764 (0.0015) [2024-07-05 16:59:33,942][04005] Fps is (10 sec: 9011.3, 60 sec: 9079.5, 300 sec: 9052.9). Total num frames: 60481536. Throughput: 0: 2266.6. Samples: 10115754. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:59:33,942][04005] Avg episode reward: [(0, '51.544')] [2024-07-05 16:59:37,354][04594] Updated weights for policy 0, policy_version 14774 (0.0015) [2024-07-05 16:59:38,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9011.2, 300 sec: 9052.9). Total num frames: 60526592. Throughput: 0: 2255.9. Samples: 10129046. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:59:38,943][04005] Avg episode reward: [(0, '52.689')] [2024-07-05 16:59:43,948][04005] Fps is (10 sec: 6959.5, 60 sec: 8669.1, 300 sec: 8983.3). Total num frames: 60551168. Throughput: 0: 2143.0. Samples: 10137486. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:59:43,974][04005] Avg episode reward: [(0, '50.974')] [2024-07-05 16:59:48,948][04005] Fps is (10 sec: 2456.2, 60 sec: 7918.2, 300 sec: 8830.5). Total num frames: 60551168. Throughput: 0: 1997.2. Samples: 10137852. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:59:48,978][04005] Avg episode reward: [(0, '51.482')] [2024-07-05 16:59:49,486][04594] Updated weights for policy 0, policy_version 14784 (0.0133) [2024-07-05 16:59:53,947][04005] Fps is (10 sec: 409.6, 60 sec: 7236.5, 300 sec: 8691.7). Total num frames: 60555264. Throughput: 0: 1713.3. Samples: 10138626. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:59:53,982][04005] Avg episode reward: [(0, '51.352')] [2024-07-05 16:59:58,950][04005] Fps is (10 sec: 409.5, 60 sec: 6484.6, 300 sec: 8538.9). Total num frames: 60555264. Throughput: 0: 1417.4. Samples: 10138750. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 16:59:58,987][04005] Avg episode reward: [(0, '51.352')] [2024-07-05 17:00:03,946][04005] Fps is (10 sec: 409.6, 60 sec: 5802.3, 300 sec: 8400.2). Total num frames: 60559360. Throughput: 0: 1280.3. Samples: 10139484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 17:00:03,978][04005] Avg episode reward: [(0, '51.451')] [2024-07-05 17:00:08,947][04005] Fps is (10 sec: 819.4, 60 sec: 5119.6, 300 sec: 8261.3). Total num frames: 60563456. Throughput: 0: 996.3. Samples: 10140208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 17:00:08,974][04005] Avg episode reward: [(0, '51.431')] [2024-07-05 17:00:13,950][04005] Fps is (10 sec: 409.5, 60 sec: 4368.7, 300 sec: 8108.6). Total num frames: 60563456. Throughput: 0: 712.5. Samples: 10140938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 17:00:13,983][04005] Avg episode reward: [(0, '51.471')] [2024-07-05 17:00:18,948][04005] Fps is (10 sec: 409.6, 60 sec: 3686.1, 300 sec: 7969.7). Total num frames: 60567552. Throughput: 0: 567.0. Samples: 10141272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-05 17:00:18,968][04005] Avg episode reward: [(0, '51.481')] [2024-07-05 17:00:23,952][04005] Fps is (10 sec: 409.4, 60 sec: 2935.0, 300 sec: 7816.9). Total num frames: 60567552. Throughput: 0: 287.4. Samples: 10141980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-05 17:00:23,994][04005] Avg episode reward: [(0, '51.481')] [2024-07-05 17:00:28,954][04005] Fps is (10 sec: 409.4, 60 sec: 2252.4, 300 sec: 7664.1). Total num frames: 60571648. Throughput: 0: 116.4. Samples: 10142724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-05 17:00:28,987][04005] Avg episode reward: [(0, '51.481')] [2024-07-05 17:00:33,954][04005] Fps is (10 sec: 819.1, 60 sec: 1569.9, 300 sec: 7525.3). Total num frames: 60575744. Throughput: 0: 117.1. Samples: 10143124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-05 17:00:33,990][04005] Avg episode reward: [(0, '51.481')] [2024-07-05 17:00:34,251][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000014789_60575744.pth... [2024-07-05 17:00:37,772][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000014369_58855424.pth [2024-07-05 17:00:38,951][04005] Fps is (10 sec: 409.7, 60 sec: 819.1, 300 sec: 7372.6). Total num frames: 60575744. Throughput: 0: 114.8. Samples: 10143794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-05 17:00:38,980][04005] Avg episode reward: [(0, '51.276')] [2024-07-05 17:00:43,948][04005] Fps is (10 sec: 409.8, 60 sec: 477.9, 300 sec: 7233.8). Total num frames: 60579840. Throughput: 0: 127.3. Samples: 10144480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-05 17:00:43,991][04005] Avg episode reward: [(0, '51.276')] [2024-07-05 17:00:48,953][04005] Fps is (10 sec: 409.6, 60 sec: 477.8, 300 sec: 7081.0). Total num frames: 60579840. Throughput: 0: 118.5. Samples: 10144818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-05 17:00:48,996][04005] Avg episode reward: [(0, '51.135')] [2024-07-05 17:00:53,113][04594] Updated weights for policy 0, policy_version 14794 (0.0410) [2024-07-05 17:00:53,941][04005] Fps is (10 sec: 2048.9, 60 sec: 751.0, 300 sec: 6997.9). Total num frames: 60600320. Throughput: 0: 169.3. Samples: 10147824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-05 17:00:53,943][04005] Avg episode reward: [(0, '51.434')] [2024-07-05 17:00:57,715][04594] Updated weights for policy 0, policy_version 14804 (0.0015) [2024-07-05 17:00:58,942][04005] Fps is (10 sec: 6559.1, 60 sec: 1502.0, 300 sec: 6997.9). Total num frames: 60645376. Throughput: 0: 451.0. Samples: 10161232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-05 17:00:58,943][04005] Avg episode reward: [(0, '50.936')] [2024-07-05 17:01:02,262][04594] Updated weights for policy 0, policy_version 14814 (0.0013) [2024-07-05 17:01:03,942][04005] Fps is (10 sec: 9011.1, 60 sec: 2184.7, 300 sec: 6997.9). Total num frames: 60690432. Throughput: 0: 594.2. Samples: 10168008. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:01:03,943][04005] Avg episode reward: [(0, '50.593')] [2024-07-05 17:01:06,750][04594] Updated weights for policy 0, policy_version 14824 (0.0014) [2024-07-05 17:01:08,942][04005] Fps is (10 sec: 9011.2, 60 sec: 2867.4, 300 sec: 6997.9). Total num frames: 60735488. Throughput: 0: 882.9. Samples: 10181702. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:01:08,942][04005] Avg episode reward: [(0, '51.110')] [2024-07-05 17:01:11,249][04594] Updated weights for policy 0, policy_version 14834 (0.0014) [2024-07-05 17:01:13,942][04005] Fps is (10 sec: 9011.3, 60 sec: 3618.4, 300 sec: 6997.9). Total num frames: 60780544. Throughput: 0: 1172.4. Samples: 10195468. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:01:13,943][04005] Avg episode reward: [(0, '51.730')] [2024-07-05 17:01:15,756][04594] Updated weights for policy 0, policy_version 14844 (0.0016) [2024-07-05 17:01:18,942][04005] Fps is (10 sec: 9420.8, 60 sec: 4369.5, 300 sec: 7011.8). Total num frames: 60829696. Throughput: 0: 1310.4. Samples: 10202078. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:01:18,943][04005] Avg episode reward: [(0, '51.650')] [2024-07-05 17:01:20,257][04594] Updated weights for policy 0, policy_version 14854 (0.0016) [2024-07-05 17:01:23,942][04005] Fps is (10 sec: 9420.8, 60 sec: 5120.8, 300 sec: 7011.8). Total num frames: 60874752. Throughput: 0: 1604.5. Samples: 10215986. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:01:23,953][04005] Avg episode reward: [(0, '50.597')] [2024-07-05 17:01:24,746][04594] Updated weights for policy 0, policy_version 14864 (0.0015) [2024-07-05 17:01:28,942][04005] Fps is (10 sec: 9010.8, 60 sec: 5803.7, 300 sec: 7011.8). Total num frames: 60919808. Throughput: 0: 1890.0. Samples: 10229524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:01:28,943][04005] Avg episode reward: [(0, '51.289')] [2024-07-05 17:01:29,267][04594] Updated weights for policy 0, policy_version 14874 (0.0017) [2024-07-05 17:01:33,807][04594] Updated weights for policy 0, policy_version 14884 (0.0016) [2024-07-05 17:01:33,942][04005] Fps is (10 sec: 9011.1, 60 sec: 6486.4, 300 sec: 7011.8). Total num frames: 60964864. Throughput: 0: 2036.2. Samples: 10236430. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:01:33,943][04005] Avg episode reward: [(0, '50.903')] [2024-07-05 17:01:38,318][04594] Updated weights for policy 0, policy_version 14894 (0.0014) [2024-07-05 17:01:38,942][04005] Fps is (10 sec: 9011.5, 60 sec: 7237.2, 300 sec: 6997.9). Total num frames: 61009920. Throughput: 0: 2269.2. Samples: 10249936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:01:38,943][04005] Avg episode reward: [(0, '50.222')] [2024-07-05 17:01:42,857][04594] Updated weights for policy 0, policy_version 14904 (0.0016) [2024-07-05 17:01:43,942][04005] Fps is (10 sec: 9011.3, 60 sec: 7919.5, 300 sec: 6997.9). Total num frames: 61054976. Throughput: 0: 2270.1. Samples: 10263388. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:01:43,943][04005] Avg episode reward: [(0, '51.565')] [2024-07-05 17:01:47,381][04594] Updated weights for policy 0, policy_version 14914 (0.0015) [2024-07-05 17:01:48,942][04005] Fps is (10 sec: 9011.2, 60 sec: 8671.1, 300 sec: 6997.9). Total num frames: 61100032. Throughput: 0: 2272.8. Samples: 10270284. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:01:48,943][04005] Avg episode reward: [(0, '51.461')] [2024-07-05 17:01:51,892][04594] Updated weights for policy 0, policy_version 14924 (0.0015) [2024-07-05 17:01:53,941][04005] Fps is (10 sec: 9011.3, 60 sec: 9079.5, 300 sec: 6997.9). Total num frames: 61145088. Throughput: 0: 2269.6. Samples: 10283832. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:01:53,943][04005] Avg episode reward: [(0, '52.532')] [2024-07-05 17:01:56,426][04594] Updated weights for policy 0, policy_version 14934 (0.0015) [2024-07-05 17:01:58,942][04005] Fps is (10 sec: 9010.9, 60 sec: 9079.4, 300 sec: 6997.9). Total num frames: 61190144. Throughput: 0: 2262.1. Samples: 10297264. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:01:58,943][04005] Avg episode reward: [(0, '51.852')] [2024-07-05 17:02:00,959][04594] Updated weights for policy 0, policy_version 14944 (0.0015) [2024-07-05 17:02:03,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 6997.9). Total num frames: 61235200. Throughput: 0: 2268.9. Samples: 10304180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:02:03,942][04005] Avg episode reward: [(0, '50.405')] [2024-07-05 17:02:05,477][04594] Updated weights for policy 0, policy_version 14954 (0.0015) [2024-07-05 17:02:08,941][04005] Fps is (10 sec: 9011.6, 60 sec: 9079.5, 300 sec: 6997.9). Total num frames: 61280256. Throughput: 0: 2260.1. Samples: 10317690. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:02:08,942][04005] Avg episode reward: [(0, '51.241')] [2024-07-05 17:02:10,022][04594] Updated weights for policy 0, policy_version 14964 (0.0017) [2024-07-05 17:02:13,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9079.4, 300 sec: 6997.9). Total num frames: 61325312. Throughput: 0: 2262.0. Samples: 10331314. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:02:13,943][04005] Avg episode reward: [(0, '52.037')] [2024-07-05 17:02:14,533][04594] Updated weights for policy 0, policy_version 14974 (0.0016) [2024-07-05 17:02:18,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9011.2, 300 sec: 6997.9). Total num frames: 61370368. Throughput: 0: 2260.0. Samples: 10338128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:02:18,943][04005] Avg episode reward: [(0, '53.524')] [2024-07-05 17:02:19,064][04594] Updated weights for policy 0, policy_version 14984 (0.0017) [2024-07-05 17:02:23,596][04594] Updated weights for policy 0, policy_version 14994 (0.0017) [2024-07-05 17:02:23,942][04005] Fps is (10 sec: 9011.3, 60 sec: 9011.2, 300 sec: 6997.9). Total num frames: 61415424. Throughput: 0: 2259.0. Samples: 10351592. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:02:23,943][04005] Avg episode reward: [(0, '54.633')] [2024-07-05 17:02:24,049][04581] Saving new best policy, reward=54.633! [2024-07-05 17:02:28,091][04594] Updated weights for policy 0, policy_version 15004 (0.0017) [2024-07-05 17:02:28,941][04005] Fps is (10 sec: 9011.3, 60 sec: 9011.3, 300 sec: 6997.9). Total num frames: 61460480. Throughput: 0: 2267.4. Samples: 10365422. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:02:28,942][04005] Avg episode reward: [(0, '54.707')] [2024-07-05 17:02:28,995][04581] Saving new best policy, reward=54.707! [2024-07-05 17:02:32,612][04594] Updated weights for policy 0, policy_version 15014 (0.0016) [2024-07-05 17:02:33,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9011.2, 300 sec: 6997.9). Total num frames: 61505536. Throughput: 0: 2261.2. Samples: 10372040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:02:33,943][04005] Avg episode reward: [(0, '55.424')] [2024-07-05 17:02:33,971][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000015017_61509632.pth... [2024-07-05 17:02:34,060][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000014634_59940864.pth [2024-07-05 17:02:34,071][04581] Saving new best policy, reward=55.424! [2024-07-05 17:02:37,140][04594] Updated weights for policy 0, policy_version 15024 (0.0016) [2024-07-05 17:02:38,942][04005] Fps is (10 sec: 9011.0, 60 sec: 9011.2, 300 sec: 6997.9). Total num frames: 61550592. Throughput: 0: 2264.9. Samples: 10385752. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:02:38,943][04005] Avg episode reward: [(0, '54.501')] [2024-07-05 17:02:41,716][04594] Updated weights for policy 0, policy_version 15034 (0.0018) [2024-07-05 17:02:43,942][04005] Fps is (10 sec: 9011.2, 60 sec: 9011.2, 300 sec: 6997.9). Total num frames: 61595648. Throughput: 0: 2266.0. Samples: 10399234. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:02:43,943][04005] Avg episode reward: [(0, '54.323')] [2024-07-05 17:02:46,273][04594] Updated weights for policy 0, policy_version 15044 (0.0015) [2024-07-05 17:02:48,941][04005] Fps is (10 sec: 9011.4, 60 sec: 9011.2, 300 sec: 6997.9). Total num frames: 61640704. Throughput: 0: 2258.2. Samples: 10405800. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:02:48,943][04005] Avg episode reward: [(0, '54.372')] [2024-07-05 17:02:50,788][04594] Updated weights for policy 0, policy_version 15054 (0.0015) [2024-07-05 17:02:53,942][04005] Fps is (10 sec: 9011.0, 60 sec: 9011.2, 300 sec: 6997.9). Total num frames: 61685760. Throughput: 0: 2263.0. Samples: 10419526. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:02:53,943][04005] Avg episode reward: [(0, '52.484')] [2024-07-05 17:02:55,327][04594] Updated weights for policy 0, policy_version 15064 (0.0016) [2024-07-05 17:02:58,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9011.3, 300 sec: 6997.9). Total num frames: 61730816. Throughput: 0: 2261.2. Samples: 10433068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:02:58,943][04005] Avg episode reward: [(0, '50.779')] [2024-07-05 17:02:59,866][04594] Updated weights for policy 0, policy_version 15074 (0.0017) [2024-07-05 17:03:03,941][04005] Fps is (10 sec: 9421.1, 60 sec: 9079.5, 300 sec: 7011.8). Total num frames: 61779968. Throughput: 0: 2256.0. Samples: 10439648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:03:03,942][04005] Avg episode reward: [(0, '50.438')] [2024-07-05 17:03:04,403][04594] Updated weights for policy 0, policy_version 15084 (0.0017) [2024-07-05 17:03:08,941][04005] Fps is (10 sec: 9420.9, 60 sec: 9079.5, 300 sec: 7011.8). Total num frames: 61825024. Throughput: 0: 2261.6. Samples: 10453366. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:03:08,942][04005] Avg episode reward: [(0, '50.030')] [2024-07-05 17:03:08,945][04594] Updated weights for policy 0, policy_version 15094 (0.0015) [2024-07-05 17:03:13,499][04594] Updated weights for policy 0, policy_version 15104 (0.0020) [2024-07-05 17:03:13,942][04005] Fps is (10 sec: 8601.5, 60 sec: 9011.2, 300 sec: 6997.9). Total num frames: 61865984. Throughput: 0: 2254.1. Samples: 10466858. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:03:13,943][04005] Avg episode reward: [(0, '49.809')] [2024-07-05 17:03:18,035][04594] Updated weights for policy 0, policy_version 15114 (0.0016) [2024-07-05 17:03:18,941][04005] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 7011.8). Total num frames: 61915136. Throughput: 0: 2252.9. Samples: 10473420. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:03:18,943][04005] Avg episode reward: [(0, '51.183')] [2024-07-05 17:03:22,564][04594] Updated weights for policy 0, policy_version 15124 (0.0018) [2024-07-05 17:03:23,942][04005] Fps is (10 sec: 9420.8, 60 sec: 9079.5, 300 sec: 7011.8). Total num frames: 61960192. Throughput: 0: 2255.2. Samples: 10487234. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:03:23,942][04005] Avg episode reward: [(0, '49.288')] [2024-07-05 17:03:27,099][04594] Updated weights for policy 0, policy_version 15134 (0.0016) [2024-07-05 17:03:28,941][04005] Fps is (10 sec: 9011.3, 60 sec: 9079.5, 300 sec: 7011.8). Total num frames: 62005248. Throughput: 0: 2254.8. Samples: 10500698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:03:28,942][04005] Avg episode reward: [(0, '49.154')] [2024-07-05 17:03:31,635][04594] Updated weights for policy 0, policy_version 15144 (0.0015) [2024-07-05 17:03:33,941][04005] Fps is (10 sec: 9011.3, 60 sec: 9079.5, 300 sec: 6997.9). Total num frames: 62050304. Throughput: 0: 2256.8. Samples: 10507358. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:03:33,942][04005] Avg episode reward: [(0, '50.538')] [2024-07-05 17:03:36,241][04594] Updated weights for policy 0, policy_version 15154 (0.0013) [2024-07-05 17:03:38,941][04005] Fps is (10 sec: 8601.6, 60 sec: 9011.2, 300 sec: 6984.0). Total num frames: 62091264. Throughput: 0: 2245.3. Samples: 10520562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:03:38,943][04005] Avg episode reward: [(0, '52.238')] [2024-07-05 17:03:40,998][04594] Updated weights for policy 0, policy_version 15164 (0.0017) [2024-07-05 17:03:43,950][04005] Fps is (10 sec: 7776.3, 60 sec: 8873.5, 300 sec: 6956.1). Total num frames: 62128128. Throughput: 0: 2194.1. Samples: 10531820. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:03:43,973][04005] Avg episode reward: [(0, '51.602')] [2024-07-05 17:03:48,952][04005] Fps is (10 sec: 4092.1, 60 sec: 8190.7, 300 sec: 6817.3). Total num frames: 62132224. Throughput: 0: 2056.0. Samples: 10532186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:03:48,983][04005] Avg episode reward: [(0, '51.924')] [2024-07-05 17:03:53,953][04005] Fps is (10 sec: 409.5, 60 sec: 7439.8, 300 sec: 6664.4). Total num frames: 62132224. Throughput: 0: 1767.8. Samples: 10532936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:03:53,994][04005] Avg episode reward: [(0, '51.924')] [2024-07-05 17:03:58,950][04005] Fps is (10 sec: 409.7, 60 sec: 6757.6, 300 sec: 6525.7). Total num frames: 62136320. Throughput: 0: 1483.4. Samples: 10533622. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 17:03:58,984][04005] Avg episode reward: [(0, '51.954')] [2024-07-05 17:04:03,947][04005] Fps is (10 sec: 409.8, 60 sec: 5938.7, 300 sec: 6373.0). Total num frames: 62136320. Throughput: 0: 1346.2. Samples: 10534004. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 17:04:03,990][04005] Avg episode reward: [(0, '52.266')] [2024-07-05 17:04:08,950][04005] Fps is (10 sec: 409.6, 60 sec: 5255.9, 300 sec: 6234.1). Total num frames: 62140416. Throughput: 0: 1055.6. Samples: 10534746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 17:04:08,985][04005] Avg episode reward: [(0, '52.632')] [2024-07-05 17:04:13,952][04005] Fps is (10 sec: 409.4, 60 sec: 4573.1, 300 sec: 6081.3). Total num frames: 62140416. Throughput: 0: 772.9. Samples: 10535484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 17:04:13,994][04005] Avg episode reward: [(0, '52.417')] [2024-07-05 17:04:18,952][04005] Fps is (10 sec: 409.5, 60 sec: 3822.3, 300 sec: 5942.5). Total num frames: 62144512. Throughput: 0: 632.7. Samples: 10535834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 17:04:19,027][04005] Avg episode reward: [(0, '52.659')] [2024-07-05 17:04:23,949][04005] Fps is (10 sec: 819.5, 60 sec: 3140.0, 300 sec: 5803.7). Total num frames: 62148608. Throughput: 0: 354.5. Samples: 10536518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 17:04:23,977][04005] Avg episode reward: [(0, '52.957')] [2024-07-05 17:04:28,955][04005] Fps is (10 sec: 409.5, 60 sec: 2388.9, 300 sec: 5650.9). Total num frames: 62148608. Throughput: 0: 119.8. Samples: 10537210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 17:04:28,983][04005] Avg episode reward: [(0, '52.957')] [2024-07-05 17:04:32,037][04594] Updated weights for policy 0, policy_version 15174 (0.0440) [2024-07-05 17:04:33,958][04005] Fps is (10 sec: 409.3, 60 sec: 1706.3, 300 sec: 5512.0). Total num frames: 62152704. Throughput: 0: 119.9. Samples: 10537582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 17:04:34,012][04005] Avg episode reward: [(0, '52.632')] [2024-07-05 17:04:34,402][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000015174_62152704.pth... [2024-07-05 17:04:38,182][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000014789_60575744.pth [2024-07-05 17:04:38,950][04005] Fps is (10 sec: 409.8, 60 sec: 1023.9, 300 sec: 5428.9). Total num frames: 62152704. Throughput: 0: 115.1. Samples: 10538114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 17:04:38,994][04005] Avg episode reward: [(0, '52.652')] [2024-07-05 17:04:43,957][04005] Fps is (10 sec: 409.7, 60 sec: 477.8, 300 sec: 5442.7). Total num frames: 62156800. Throughput: 0: 105.7. Samples: 10538378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 17:04:44,035][04005] Avg episode reward: [(0, '52.652')] [2024-07-05 17:04:48,955][04005] Fps is (10 sec: 409.4, 60 sec: 409.6, 300 sec: 5428.8). Total num frames: 62156800. Throughput: 0: 111.1. Samples: 10539004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 17:04:48,989][04005] Avg episode reward: [(0, '52.682')] [2024-07-05 17:04:53,948][04005] Fps is (10 sec: 409.8, 60 sec: 477.9, 300 sec: 5442.8). Total num frames: 62160896. Throughput: 0: 109.3. Samples: 10539666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 17:04:53,976][04005] Avg episode reward: [(0, '53.149')] [2024-07-05 17:04:58,942][04005] Fps is (10 sec: 2460.4, 60 sec: 751.0, 300 sec: 5498.4). Total num frames: 62181376. Throughput: 0: 205.0. Samples: 10544706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 17:04:58,943][04005] Avg episode reward: [(0, '53.534')] [2024-07-05 17:05:00,308][04594] Updated weights for policy 0, policy_version 15184 (0.0121) [2024-07-05 17:05:03,941][04005] Fps is (10 sec: 6147.9, 60 sec: 1433.7, 300 sec: 5623.4). Total num frames: 62222336. Throughput: 0: 341.3. Samples: 10551188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 17:05:03,942][04005] Avg episode reward: [(0, '53.650')] [2024-07-05 17:05:04,929][04594] Updated weights for policy 0, policy_version 15194 (0.0014) [2024-07-05 17:05:08,942][04005] Fps is (10 sec: 8601.6, 60 sec: 2116.5, 300 sec: 5776.1). Total num frames: 62267392. Throughput: 0: 626.8. Samples: 10564720. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:05:08,942][04005] Avg episode reward: [(0, '52.590')] [2024-07-05 17:05:09,434][04594] Updated weights for policy 0, policy_version 15204 (0.0015) [2024-07-05 17:05:13,942][04005] Fps is (10 sec: 9011.1, 60 sec: 2867.7, 300 sec: 5915.0). Total num frames: 62312448. Throughput: 0: 916.9. Samples: 10578458. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:05:13,943][04005] Avg episode reward: [(0, '52.639')] [2024-07-05 17:05:13,958][04594] Updated weights for policy 0, policy_version 15214 (0.0016) [2024-07-05 17:05:18,472][04594] Updated weights for policy 0, policy_version 15224 (0.0017) [2024-07-05 17:05:18,941][04005] Fps is (10 sec: 9420.9, 60 sec: 3618.7, 300 sec: 6081.7). Total num frames: 62361600. Throughput: 0: 1055.6. Samples: 10585068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:05:18,942][04005] Avg episode reward: [(0, '49.682')] [2024-07-05 17:05:22,982][04594] Updated weights for policy 0, policy_version 15234 (0.0018) [2024-07-05 17:05:23,941][04005] Fps is (10 sec: 9420.9, 60 sec: 4301.2, 300 sec: 6220.6). Total num frames: 62406656. Throughput: 0: 1350.7. Samples: 10598888. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:05:23,942][04005] Avg episode reward: [(0, '49.289')] [2024-07-05 17:05:27,514][04594] Updated weights for policy 0, policy_version 15244 (0.0015) [2024-07-05 17:05:28,942][04005] Fps is (10 sec: 9010.8, 60 sec: 5052.7, 300 sec: 6359.4). Total num frames: 62451712. Throughput: 0: 1645.2. Samples: 10612396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:05:28,943][04005] Avg episode reward: [(0, '49.613')] [2024-07-05 17:05:32,020][04594] Updated weights for policy 0, policy_version 15254 (0.0016) [2024-07-05 17:05:33,942][04005] Fps is (10 sec: 9011.1, 60 sec: 5735.7, 300 sec: 6512.1). Total num frames: 62496768. Throughput: 0: 1785.6. Samples: 10619334. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:05:33,943][04005] Avg episode reward: [(0, '49.252')] [2024-07-05 17:05:36,554][04594] Updated weights for policy 0, policy_version 15264 (0.0017) [2024-07-05 17:05:38,949][04005] Fps is (10 sec: 9005.5, 60 sec: 6485.4, 300 sec: 6650.7). Total num frames: 62541824. Throughput: 0: 2069.2. Samples: 10632780. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:05:38,955][04005] Avg episode reward: [(0, '51.048')] [2024-07-05 17:05:41,135][04594] Updated weights for policy 0, policy_version 15274 (0.0019) [2024-07-05 17:05:43,942][04005] Fps is (10 sec: 9011.1, 60 sec: 7169.3, 300 sec: 6803.7). Total num frames: 62586880. Throughput: 0: 2255.5. Samples: 10646204. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:05:43,943][04005] Avg episode reward: [(0, '51.797')] [2024-07-05 17:05:45,724][04594] Updated weights for policy 0, policy_version 15284 (0.0015) [2024-07-05 17:05:48,942][04005] Fps is (10 sec: 8607.4, 60 sec: 7852.1, 300 sec: 6872.9). Total num frames: 62627840. Throughput: 0: 2256.6. Samples: 10652736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:05:48,943][04005] Avg episode reward: [(0, '52.530')] [2024-07-05 17:05:50,356][04594] Updated weights for policy 0, policy_version 15294 (0.0016) [2024-07-05 17:05:53,941][04005] Fps is (10 sec: 8601.8, 60 sec: 8534.2, 300 sec: 6873.0). Total num frames: 62672896. Throughput: 0: 2250.4. Samples: 10665986. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:05:53,943][04005] Avg episode reward: [(0, '53.352')] [2024-07-05 17:05:54,987][04594] Updated weights for policy 0, policy_version 15304 (0.0015) [2024-07-05 17:05:58,942][04005] Fps is (10 sec: 9011.2, 60 sec: 8942.9, 300 sec: 6873.0). Total num frames: 62717952. Throughput: 0: 2240.6. Samples: 10679284. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:05:58,943][04005] Avg episode reward: [(0, '52.311')] [2024-07-05 17:05:59,606][04594] Updated weights for policy 0, policy_version 15314 (0.0015) [2024-07-05 17:06:03,942][04005] Fps is (10 sec: 9011.1, 60 sec: 9011.2, 300 sec: 6872.9). Total num frames: 62763008. Throughput: 0: 2244.1. Samples: 10686052. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:06:03,943][04005] Avg episode reward: [(0, '51.519')] [2024-07-05 17:06:04,231][04594] Updated weights for policy 0, policy_version 15324 (0.0016) [2024-07-05 17:06:08,863][04594] Updated weights for policy 0, policy_version 15334 (0.0016) [2024-07-05 17:06:08,941][04005] Fps is (10 sec: 9011.3, 60 sec: 9011.2, 300 sec: 6873.0). Total num frames: 62808064. Throughput: 0: 2232.2. Samples: 10699338. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:06:08,942][04005] Avg episode reward: [(0, '51.731')] [2024-07-05 17:06:13,509][04594] Updated weights for policy 0, policy_version 15344 (0.0016) [2024-07-05 17:06:13,942][04005] Fps is (10 sec: 8601.6, 60 sec: 8942.9, 300 sec: 6845.2). Total num frames: 62849024. Throughput: 0: 2226.7. Samples: 10712598. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:06:13,942][04005] Avg episode reward: [(0, '51.814')] [2024-07-05 17:06:18,165][04594] Updated weights for policy 0, policy_version 15354 (0.0016) [2024-07-05 17:06:18,941][04005] Fps is (10 sec: 8601.6, 60 sec: 8874.7, 300 sec: 6845.2). Total num frames: 62894080. Throughput: 0: 2216.4. Samples: 10719070. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:06:18,942][04005] Avg episode reward: [(0, '51.643')] [2024-07-05 17:06:22,791][04594] Updated weights for policy 0, policy_version 15364 (0.0016) [2024-07-05 17:06:23,941][04005] Fps is (10 sec: 9011.3, 60 sec: 8874.7, 300 sec: 6845.2). Total num frames: 62939136. Throughput: 0: 2212.2. Samples: 10732312. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:06:23,942][04005] Avg episode reward: [(0, '51.339')] [2024-07-05 17:06:27,482][04594] Updated weights for policy 0, policy_version 15374 (0.0016) [2024-07-05 17:06:28,941][04005] Fps is (10 sec: 9011.2, 60 sec: 8874.7, 300 sec: 6845.2). Total num frames: 62984192. Throughput: 0: 2206.7. Samples: 10745504. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:06:28,942][04005] Avg episode reward: [(0, '50.436')] [2024-07-05 17:06:32,146][04594] Updated weights for policy 0, policy_version 15384 (0.0016) [2024-07-05 17:06:33,941][04005] Fps is (10 sec: 8601.6, 60 sec: 8806.4, 300 sec: 6831.3). Total num frames: 63025152. Throughput: 0: 2205.1. Samples: 10751964. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:06:33,942][04005] Avg episode reward: [(0, '48.863')] [2024-07-05 17:06:34,000][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000015388_63029248.pth... [2024-07-05 17:06:34,090][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000015017_61509632.pth [2024-07-05 17:06:36,795][04594] Updated weights for policy 0, policy_version 15394 (0.0016) [2024-07-05 17:06:38,941][04005] Fps is (10 sec: 8601.7, 60 sec: 8807.4, 300 sec: 6831.3). Total num frames: 63070208. Throughput: 0: 2206.0. Samples: 10765258. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:06:38,942][04005] Avg episode reward: [(0, '49.870')] [2024-07-05 17:06:41,421][04594] Updated weights for policy 0, policy_version 15404 (0.0016) [2024-07-05 17:06:43,942][04005] Fps is (10 sec: 9011.1, 60 sec: 8806.4, 300 sec: 6831.3). Total num frames: 63115264. Throughput: 0: 2206.7. Samples: 10778586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:06:43,943][04005] Avg episode reward: [(0, '50.293')] [2024-07-05 17:06:46,032][04594] Updated weights for policy 0, policy_version 15414 (0.0016) [2024-07-05 17:06:48,942][04005] Fps is (10 sec: 9010.8, 60 sec: 8874.6, 300 sec: 6831.3). Total num frames: 63160320. Throughput: 0: 2206.3. Samples: 10785334. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:06:48,943][04005] Avg episode reward: [(0, '51.186')] [2024-07-05 17:06:50,663][04594] Updated weights for policy 0, policy_version 15424 (0.0015) [2024-07-05 17:06:53,941][04005] Fps is (10 sec: 9011.3, 60 sec: 8874.7, 300 sec: 6831.3). Total num frames: 63205376. Throughput: 0: 2206.9. Samples: 10798650. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:06:53,943][04005] Avg episode reward: [(0, '51.154')] [2024-07-05 17:06:55,292][04594] Updated weights for policy 0, policy_version 15434 (0.0014) [2024-07-05 17:06:58,941][04005] Fps is (10 sec: 8601.9, 60 sec: 8806.4, 300 sec: 6817.4). Total num frames: 63246336. Throughput: 0: 2202.5. Samples: 10811710. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:06:58,942][04005] Avg episode reward: [(0, '51.685')] [2024-07-05 17:07:00,032][04594] Updated weights for policy 0, policy_version 15444 (0.0015) [2024-07-05 17:07:03,942][04005] Fps is (10 sec: 8601.6, 60 sec: 8806.4, 300 sec: 6817.4). Total num frames: 63291392. Throughput: 0: 2201.9. Samples: 10818154. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:07:03,943][04005] Avg episode reward: [(0, '51.077')] [2024-07-05 17:07:04,795][04594] Updated weights for policy 0, policy_version 15454 (0.0016) [2024-07-05 17:07:08,953][04005] Fps is (10 sec: 5728.9, 60 sec: 8259.0, 300 sec: 6706.1). Total num frames: 63303680. Throughput: 0: 2068.9. Samples: 10825432. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:07:09,010][04005] Avg episode reward: [(0, '51.513')] [2024-07-05 17:07:13,956][04005] Fps is (10 sec: 1227.6, 60 sec: 7576.4, 300 sec: 6553.4). Total num frames: 63303680. Throughput: 0: 1780.8. Samples: 10825658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:07:13,996][04005] Avg episode reward: [(0, '51.513')] [2024-07-05 17:07:18,984][04005] Fps is (10 sec: 408.5, 60 sec: 6890.7, 300 sec: 6413.9). Total num frames: 63307776. Throughput: 0: 1648.3. Samples: 10826200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 17:07:19,052][04005] Avg episode reward: [(0, '51.613')] [2024-07-05 17:07:23,952][04005] Fps is (10 sec: 409.7, 60 sec: 6143.1, 300 sec: 6261.8). Total num frames: 63307776. Throughput: 0: 1365.7. Samples: 10826724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 17:07:24,009][04005] Avg episode reward: [(0, '51.613')] [2024-07-05 17:07:28,949][04005] Fps is (10 sec: 0.0, 60 sec: 5392.5, 300 sec: 6109.2). Total num frames: 63307776. Throughput: 0: 1080.1. Samples: 10827198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 17:07:29,009][04005] Avg episode reward: [(0, '51.613')] [2024-07-05 17:07:33,950][04005] Fps is (10 sec: 409.7, 60 sec: 4778.1, 300 sec: 5970.3). Total num frames: 63311872. Throughput: 0: 935.2. Samples: 10827424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 17:07:34,025][04005] Avg episode reward: [(0, '51.859')] [2024-07-05 17:07:38,952][04005] Fps is (10 sec: 409.5, 60 sec: 4027.2, 300 sec: 5817.5). Total num frames: 63311872. Throughput: 0: 651.2. Samples: 10827960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-05 17:07:38,989][04005] Avg episode reward: [(0, '51.859')] [2024-07-05 17:07:43,949][04005] Fps is (10 sec: 409.6, 60 sec: 3344.7, 300 sec: 5678.7). Total num frames: 63315968. Throughput: 0: 372.0. Samples: 10828450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-05 17:07:44,001][04005] Avg episode reward: [(0, '51.859')] [2024-07-05 17:07:48,952][04005] Fps is (10 sec: 409.6, 60 sec: 2593.8, 300 sec: 5526.0). Total num frames: 63315968. Throughput: 0: 234.0. Samples: 10828688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-05 17:07:49,044][04005] Avg episode reward: [(0, '51.859')] [2024-07-05 17:07:53,968][04005] Fps is (10 sec: 0.0, 60 sec: 1842.9, 300 sec: 5373.2). Total num frames: 63315968. Throughput: 0: 83.6. Samples: 10829192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-05 17:07:54,054][04005] Avg episode reward: [(0, '51.787')] [2024-07-05 17:07:58,950][04005] Fps is (10 sec: 409.7, 60 sec: 1228.7, 300 sec: 5220.5). Total num frames: 63320064. Throughput: 0: 89.7. Samples: 10829692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 17:07:59,011][04005] Avg episode reward: [(0, '51.875')] [2024-07-05 17:08:03,960][04005] Fps is (10 sec: 409.5, 60 sec: 477.8, 300 sec: 5067.7). Total num frames: 63320064. Throughput: 0: 83.4. Samples: 10829952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 17:08:04,016][04005] Avg episode reward: [(0, '52.006')] [2024-07-05 17:08:08,962][04005] Fps is (10 sec: 409.1, 60 sec: 341.3, 300 sec: 4942.7). Total num frames: 63324160. Throughput: 0: 83.1. Samples: 10830466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 17:08:09,027][04005] Avg episode reward: [(0, '52.006')] [2024-07-05 17:08:13,952][04005] Fps is (10 sec: 409.7, 60 sec: 341.3, 300 sec: 4776.2). Total num frames: 63324160. Throughput: 0: 82.9. Samples: 10830928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 17:08:14,002][04005] Avg episode reward: [(0, '52.036')] [2024-07-05 17:08:18,953][04005] Fps is (10 sec: 0.0, 60 sec: 273.2, 300 sec: 4623.5). Total num frames: 63324160. Throughput: 0: 83.9. Samples: 10831198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-05 17:08:19,003][04005] Avg episode reward: [(0, '52.036')] [2024-07-05 17:08:23,949][04005] Fps is (10 sec: 409.7, 60 sec: 341.3, 300 sec: 4484.7). Total num frames: 63328256. Throughput: 0: 82.3. Samples: 10831662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 17:08:23,993][04005] Avg episode reward: [(0, '52.036')] [2024-07-05 17:08:28,015][04594] Updated weights for policy 0, policy_version 15464 (0.0447) [2024-07-05 17:08:28,942][04005] Fps is (10 sec: 2049.6, 60 sec: 614.5, 300 sec: 4387.6). Total num frames: 63344640. Throughput: 0: 180.0. Samples: 10836548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 17:08:28,944][04005] Avg episode reward: [(0, '52.471')] [2024-07-05 17:08:32,765][04594] Updated weights for policy 0, policy_version 15474 (0.0016) [2024-07-05 17:08:33,942][04005] Fps is (10 sec: 6147.8, 60 sec: 1297.2, 300 sec: 4401.5). Total num frames: 63389696. Throughput: 0: 316.8. Samples: 10842940. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:08:33,944][04005] Avg episode reward: [(0, '52.987')] [2024-07-05 17:08:34,167][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000015477_63393792.pth... [2024-07-05 17:08:34,285][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000015174_62152704.pth [2024-07-05 17:08:37,433][04594] Updated weights for policy 0, policy_version 15484 (0.0013) [2024-07-05 17:08:38,942][04005] Fps is (10 sec: 9011.9, 60 sec: 2048.3, 300 sec: 4429.4). Total num frames: 63434752. Throughput: 0: 597.7. Samples: 10856084. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:08:38,942][04005] Avg episode reward: [(0, '53.003')] [2024-07-05 17:08:42,054][04594] Updated weights for policy 0, policy_version 15494 (0.0015) [2024-07-05 17:08:43,941][04005] Fps is (10 sec: 9011.4, 60 sec: 2730.9, 300 sec: 4568.2). Total num frames: 63479808. Throughput: 0: 883.1. Samples: 10869426. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:08:43,942][04005] Avg episode reward: [(0, '52.538')] [2024-07-05 17:08:46,674][04594] Updated weights for policy 0, policy_version 15504 (0.0015) [2024-07-05 17:08:48,942][04005] Fps is (10 sec: 8601.6, 60 sec: 3413.8, 300 sec: 4707.1). Total num frames: 63520768. Throughput: 0: 1021.8. Samples: 10875922. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:08:48,943][04005] Avg episode reward: [(0, '52.207')] [2024-07-05 17:08:51,276][04594] Updated weights for policy 0, policy_version 15514 (0.0015) [2024-07-05 17:08:53,942][04005] Fps is (10 sec: 8601.4, 60 sec: 4164.8, 300 sec: 4845.9). Total num frames: 63565824. Throughput: 0: 1308.1. Samples: 10889304. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:08:53,943][04005] Avg episode reward: [(0, '51.142')] [2024-07-05 17:08:55,874][04594] Updated weights for policy 0, policy_version 15524 (0.0015) [2024-07-05 17:08:58,942][04005] Fps is (10 sec: 9010.9, 60 sec: 4847.5, 300 sec: 4998.6). Total num frames: 63610880. Throughput: 0: 1597.4. Samples: 10902800. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 17:08:58,943][04005] Avg episode reward: [(0, '51.608')] [2024-07-05 17:09:00,464][04594] Updated weights for policy 0, policy_version 15534 (0.0016) [2024-07-05 17:09:03,941][04005] Fps is (10 sec: 9011.3, 60 sec: 5598.9, 300 sec: 5137.5). Total num frames: 63655936. Throughput: 0: 1738.8. Samples: 10909428. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 17:09:03,943][04005] Avg episode reward: [(0, '52.604')] [2024-07-05 17:09:05,079][04594] Updated weights for policy 0, policy_version 15544 (0.0016) [2024-07-05 17:09:08,942][04005] Fps is (10 sec: 9011.5, 60 sec: 6282.5, 300 sec: 5290.3). Total num frames: 63700992. Throughput: 0: 2024.9. Samples: 10922768. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 17:09:08,942][04005] Avg episode reward: [(0, '52.189')] [2024-07-05 17:09:09,716][04594] Updated weights for policy 0, policy_version 15554 (0.0017) [2024-07-05 17:09:13,942][04005] Fps is (10 sec: 9011.0, 60 sec: 7032.4, 300 sec: 5429.1). Total num frames: 63746048. Throughput: 0: 2208.9. Samples: 10935948. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 17:09:13,943][04005] Avg episode reward: [(0, '53.022')] [2024-07-05 17:09:14,371][04594] Updated weights for policy 0, policy_version 15564 (0.0017) [2024-07-05 17:09:18,942][04005] Fps is (10 sec: 8601.6, 60 sec: 7715.3, 300 sec: 5554.0). Total num frames: 63787008. Throughput: 0: 2211.0. Samples: 10942434. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:09:18,943][04005] Avg episode reward: [(0, '52.312')] [2024-07-05 17:09:19,025][04594] Updated weights for policy 0, policy_version 15574 (0.0016) [2024-07-05 17:09:23,649][04594] Updated weights for policy 0, policy_version 15584 (0.0018) [2024-07-05 17:09:23,941][04005] Fps is (10 sec: 8601.8, 60 sec: 8397.7, 300 sec: 5706.9). Total num frames: 63832064. Throughput: 0: 2213.7. Samples: 10955700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:09:23,942][04005] Avg episode reward: [(0, '51.219')] [2024-07-05 17:09:28,278][04594] Updated weights for policy 0, policy_version 15594 (0.0017) [2024-07-05 17:09:28,942][04005] Fps is (10 sec: 9011.2, 60 sec: 8874.8, 300 sec: 5845.8). Total num frames: 63877120. Throughput: 0: 2212.8. Samples: 10969004. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:09:28,943][04005] Avg episode reward: [(0, '51.541')] [2024-07-05 17:09:32,918][04594] Updated weights for policy 0, policy_version 15604 (0.0015) [2024-07-05 17:09:33,943][04005] Fps is (10 sec: 9009.9, 60 sec: 8874.5, 300 sec: 5998.3). Total num frames: 63922176. Throughput: 0: 2219.0. Samples: 10975780. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:09:33,956][04005] Avg episode reward: [(0, '52.768')] [2024-07-05 17:09:37,544][04594] Updated weights for policy 0, policy_version 15614 (0.0017) [2024-07-05 17:09:38,941][04005] Fps is (10 sec: 9011.3, 60 sec: 8874.7, 300 sec: 6137.3). Total num frames: 63967232. Throughput: 0: 2216.7. Samples: 10989056. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:09:38,942][04005] Avg episode reward: [(0, '51.510')] [2024-07-05 17:09:42,156][04594] Updated weights for policy 0, policy_version 15624 (0.0016) [2024-07-05 17:09:43,942][04005] Fps is (10 sec: 8602.7, 60 sec: 8806.4, 300 sec: 6276.1). Total num frames: 64008192. Throughput: 0: 2213.2. Samples: 11002394. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:09:43,943][04005] Avg episode reward: [(0, '49.879')] [2024-07-05 17:09:46,814][04594] Updated weights for policy 0, policy_version 15634 (0.0015) [2024-07-05 17:09:48,942][04005] Fps is (10 sec: 8601.5, 60 sec: 8874.7, 300 sec: 6414.9). Total num frames: 64053248. Throughput: 0: 2209.0. Samples: 11008832. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:09:48,943][04005] Avg episode reward: [(0, '49.068')] [2024-07-05 17:09:51,420][04594] Updated weights for policy 0, policy_version 15644 (0.0015) [2024-07-05 17:09:53,941][04005] Fps is (10 sec: 9011.3, 60 sec: 8874.7, 300 sec: 6498.1). Total num frames: 64098304. Throughput: 0: 2207.6. Samples: 11022112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:09:53,942][04005] Avg episode reward: [(0, '49.103')] [2024-07-05 17:09:56,058][04594] Updated weights for policy 0, policy_version 15654 (0.0015) [2024-07-05 17:09:58,941][04005] Fps is (10 sec: 9011.3, 60 sec: 8874.7, 300 sec: 6511.9). Total num frames: 64143360. Throughput: 0: 2209.9. Samples: 11035392. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:09:58,942][04005] Avg episode reward: [(0, '50.379')] [2024-07-05 17:10:00,679][04594] Updated weights for policy 0, policy_version 15664 (0.0015) [2024-07-05 17:10:03,941][04005] Fps is (10 sec: 9011.2, 60 sec: 8874.7, 300 sec: 6511.9). Total num frames: 64188416. Throughput: 0: 2214.8. Samples: 11042102. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:10:03,942][04005] Avg episode reward: [(0, '51.395')] [2024-07-05 17:10:05,299][04594] Updated weights for policy 0, policy_version 15674 (0.0015) [2024-07-05 17:10:08,942][04005] Fps is (10 sec: 8601.4, 60 sec: 8806.4, 300 sec: 6498.1). Total num frames: 64229376. Throughput: 0: 2212.8. Samples: 11055278. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:10:08,943][04005] Avg episode reward: [(0, '52.641')] [2024-07-05 17:10:09,965][04594] Updated weights for policy 0, policy_version 15684 (0.0015) [2024-07-05 17:10:13,942][04005] Fps is (10 sec: 8601.5, 60 sec: 8806.4, 300 sec: 6484.2). Total num frames: 64274432. Throughput: 0: 2209.0. Samples: 11068410. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:10:13,943][04005] Avg episode reward: [(0, '52.032')] [2024-07-05 17:10:14,629][04594] Updated weights for policy 0, policy_version 15694 (0.0014) [2024-07-05 17:10:18,941][04005] Fps is (10 sec: 9011.3, 60 sec: 8874.7, 300 sec: 6484.2). Total num frames: 64319488. Throughput: 0: 2207.7. Samples: 11075122. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:10:18,943][04005] Avg episode reward: [(0, '51.952')] [2024-07-05 17:10:19,366][04594] Updated weights for policy 0, policy_version 15704 (0.0016) [2024-07-05 17:10:23,947][04005] Fps is (10 sec: 8188.6, 60 sec: 8737.5, 300 sec: 6456.3). Total num frames: 64356352. Throughput: 0: 2193.7. Samples: 11087784. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:10:23,977][04005] Avg episode reward: [(0, '51.673')] [2024-07-05 17:10:28,950][04005] Fps is (10 sec: 3683.9, 60 sec: 7986.3, 300 sec: 6303.5). Total num frames: 64356352. Throughput: 0: 1929.2. Samples: 11089222. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:10:29,016][04005] Avg episode reward: [(0, '52.153')] [2024-07-05 17:10:33,954][04005] Fps is (10 sec: 409.4, 60 sec: 7303.5, 300 sec: 6164.8). Total num frames: 64360448. Throughput: 0: 1792.2. Samples: 11089498. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:10:34,022][04005] Avg episode reward: [(0, '52.542')] [2024-07-05 17:10:34,423][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000015713_64360448.pth... [2024-07-05 17:10:38,164][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000015388_63029248.pth [2024-07-05 17:10:38,952][04005] Fps is (10 sec: 409.5, 60 sec: 6552.6, 300 sec: 6011.9). Total num frames: 64360448. Throughput: 0: 1508.8. Samples: 11090020. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:10:39,000][04005] Avg episode reward: [(0, '52.592')] [2024-07-05 17:10:43,949][04005] Fps is (10 sec: 0.0, 60 sec: 5870.3, 300 sec: 5873.1). Total num frames: 64360448. Throughput: 0: 1225.7. Samples: 11090556. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:10:43,989][04005] Avg episode reward: [(0, '52.632')] [2024-07-05 17:10:44,140][04594] Updated weights for policy 0, policy_version 15714 (0.0197) [2024-07-05 17:10:48,948][04005] Fps is (10 sec: 409.7, 60 sec: 5187.8, 300 sec: 5734.3). Total num frames: 64364544. Throughput: 0: 1082.6. Samples: 11090824. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:10:48,970][04005] Avg episode reward: [(0, '52.702')] [2024-07-05 17:10:53,952][04005] Fps is (10 sec: 409.5, 60 sec: 4436.7, 300 sec: 5581.5). Total num frames: 64364544. Throughput: 0: 802.2. Samples: 11091382. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:10:54,031][04005] Avg episode reward: [(0, '52.702')] [2024-07-05 17:10:58,950][04005] Fps is (10 sec: 409.6, 60 sec: 3754.2, 300 sec: 5442.7). Total num frames: 64368640. Throughput: 0: 522.1. Samples: 11091908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 17:10:59,002][04005] Avg episode reward: [(0, '52.702')] [2024-07-05 17:11:03,941][04005] Fps is (10 sec: 3279.7, 60 sec: 3481.6, 300 sec: 5387.3). Total num frames: 64397312. Throughput: 0: 421.0. Samples: 11094068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-05 17:11:03,943][04005] Avg episode reward: [(0, '51.561')] [2024-07-05 17:11:04,414][04594] Updated weights for policy 0, policy_version 15724 (0.0075) [2024-07-05 17:11:07,884][04594] Updated weights for policy 0, policy_version 15734 (0.0012) [2024-07-05 17:11:08,941][04005] Fps is (10 sec: 9017.6, 60 sec: 3822.9, 300 sec: 5456.7). Total num frames: 64458752. Throughput: 0: 514.0. Samples: 11110910. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:11:08,942][04005] Avg episode reward: [(0, '51.553')] [2024-07-05 17:11:11,392][04594] Updated weights for policy 0, policy_version 15744 (0.0012) [2024-07-05 17:11:13,941][04005] Fps is (10 sec: 11878.5, 60 sec: 4027.7, 300 sec: 5498.4). Total num frames: 64516096. Throughput: 0: 868.9. Samples: 11128318. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:11:13,942][04005] Avg episode reward: [(0, '50.989')] [2024-07-05 17:11:14,902][04594] Updated weights for policy 0, policy_version 15754 (0.0012) [2024-07-05 17:11:18,399][04594] Updated weights for policy 0, policy_version 15764 (0.0012) [2024-07-05 17:11:18,941][04005] Fps is (10 sec: 11468.7, 60 sec: 4232.5, 300 sec: 5540.0). Total num frames: 64573440. Throughput: 0: 1052.8. Samples: 11136862. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 17:11:18,942][04005] Avg episode reward: [(0, '51.298')] [2024-07-05 17:11:21,886][04594] Updated weights for policy 0, policy_version 15774 (0.0011) [2024-07-05 17:11:23,941][04005] Fps is (10 sec: 11468.9, 60 sec: 4574.2, 300 sec: 5581.7). Total num frames: 64630784. Throughput: 0: 1438.2. Samples: 11154728. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 17:11:23,942][04005] Avg episode reward: [(0, '50.336')] [2024-07-05 17:11:25,380][04594] Updated weights for policy 0, policy_version 15784 (0.0012) [2024-07-05 17:11:28,862][04594] Updated weights for policy 0, policy_version 15794 (0.0012) [2024-07-05 17:11:28,941][04005] Fps is (10 sec: 11878.4, 60 sec: 5598.5, 300 sec: 5651.1). Total num frames: 64692224. Throughput: 0: 1818.0. Samples: 11172354. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 17:11:28,942][04005] Avg episode reward: [(0, '50.771')] [2024-07-05 17:11:32,356][04594] Updated weights for policy 0, policy_version 15804 (0.0012) [2024-07-05 17:11:33,941][04005] Fps is (10 sec: 11878.3, 60 sec: 6486.4, 300 sec: 5692.7). Total num frames: 64749568. Throughput: 0: 2001.7. Samples: 11180890. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 17:11:33,942][04005] Avg episode reward: [(0, '51.491')] [2024-07-05 17:11:35,846][04594] Updated weights for policy 0, policy_version 15814 (0.0012) [2024-07-05 17:11:38,942][04005] Fps is (10 sec: 11468.7, 60 sec: 7442.1, 300 sec: 5734.4). Total num frames: 64806912. Throughput: 0: 2384.3. Samples: 11198656. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 17:11:38,942][04005] Avg episode reward: [(0, '50.412')] [2024-07-05 17:11:39,345][04594] Updated weights for policy 0, policy_version 15824 (0.0011) [2024-07-05 17:11:42,835][04594] Updated weights for policy 0, policy_version 15834 (0.0012) [2024-07-05 17:11:43,941][04005] Fps is (10 sec: 11878.4, 60 sec: 8465.9, 300 sec: 5789.9). Total num frames: 64868352. Throughput: 0: 2766.3. Samples: 11216370. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 17:11:43,942][04005] Avg episode reward: [(0, '50.521')] [2024-07-05 17:11:46,330][04594] Updated weights for policy 0, policy_version 15844 (0.0011) [2024-07-05 17:11:48,941][04005] Fps is (10 sec: 11878.5, 60 sec: 9353.5, 300 sec: 5831.6). Total num frames: 64925696. Throughput: 0: 2907.6. Samples: 11224912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 17:11:48,942][04005] Avg episode reward: [(0, '50.502')] [2024-07-05 17:11:49,816][04594] Updated weights for policy 0, policy_version 15854 (0.0011) [2024-07-05 17:11:53,309][04594] Updated weights for policy 0, policy_version 15864 (0.0012) [2024-07-05 17:11:53,942][04005] Fps is (10 sec: 11468.6, 60 sec: 10309.8, 300 sec: 5887.1). Total num frames: 64983040. Throughput: 0: 2928.2. Samples: 11242680. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:11:53,943][04005] Avg episode reward: [(0, '50.272')] [2024-07-05 17:11:56,800][04594] Updated weights for policy 0, policy_version 15874 (0.0012) [2024-07-05 17:11:58,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11265.3, 300 sec: 5942.7). Total num frames: 65044480. Throughput: 0: 2935.8. Samples: 11260428. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:11:58,942][04005] Avg episode reward: [(0, '52.633')] [2024-07-05 17:12:00,312][04594] Updated weights for policy 0, policy_version 15884 (0.0012) [2024-07-05 17:12:03,804][04594] Updated weights for policy 0, policy_version 15894 (0.0012) [2024-07-05 17:12:03,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 6095.6). Total num frames: 65101824. Throughput: 0: 2934.8. Samples: 11268926. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:12:03,942][04005] Avg episode reward: [(0, '52.778')] [2024-07-05 17:12:07,296][04594] Updated weights for policy 0, policy_version 15904 (0.0012) [2024-07-05 17:12:08,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 6290.0). Total num frames: 65159168. Throughput: 0: 2928.5. Samples: 11286512. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:12:08,942][04005] Avg episode reward: [(0, '53.802')] [2024-07-05 17:12:10,793][04594] Updated weights for policy 0, policy_version 15914 (0.0011) [2024-07-05 17:12:13,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 6485.0). Total num frames: 65220608. Throughput: 0: 2930.3. Samples: 11304220. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:12:13,943][04005] Avg episode reward: [(0, '53.616')] [2024-07-05 17:12:14,294][04594] Updated weights for policy 0, policy_version 15924 (0.0012) [2024-07-05 17:12:17,792][04594] Updated weights for policy 0, policy_version 15934 (0.0012) [2024-07-05 17:12:18,941][04005] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 6678.8). Total num frames: 65277952. Throughput: 0: 2932.2. Samples: 11312838. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 17:12:18,943][04005] Avg episode reward: [(0, '53.807')] [2024-07-05 17:12:21,290][04594] Updated weights for policy 0, policy_version 15944 (0.0012) [2024-07-05 17:12:23,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 6873.1). Total num frames: 65335296. Throughput: 0: 2924.5. Samples: 11330260. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 17:12:23,942][04005] Avg episode reward: [(0, '53.207')] [2024-07-05 17:12:24,846][04594] Updated weights for policy 0, policy_version 15954 (0.0012) [2024-07-05 17:12:28,552][04594] Updated weights for policy 0, policy_version 15964 (0.0012) [2024-07-05 17:12:28,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 7053.6). Total num frames: 65392640. Throughput: 0: 2911.5. Samples: 11347386. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:12:28,942][04005] Avg episode reward: [(0, '52.467')] [2024-07-05 17:12:32,226][04594] Updated weights for policy 0, policy_version 15974 (0.0012) [2024-07-05 17:12:33,942][04005] Fps is (10 sec: 11059.1, 60 sec: 11605.3, 300 sec: 7234.2). Total num frames: 65445888. Throughput: 0: 2906.1. Samples: 11355686. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:12:33,942][04005] Avg episode reward: [(0, '50.129')] [2024-07-05 17:12:34,100][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000015979_65449984.pth... [2024-07-05 17:12:34,174][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000015477_63393792.pth [2024-07-05 17:12:35,920][04594] Updated weights for policy 0, policy_version 15984 (0.0012) [2024-07-05 17:12:38,941][04005] Fps is (10 sec: 11059.3, 60 sec: 11605.4, 300 sec: 7414.6). Total num frames: 65503232. Throughput: 0: 2882.1. Samples: 11372374. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:12:38,942][04005] Avg episode reward: [(0, '49.886')] [2024-07-05 17:12:39,493][04594] Updated weights for policy 0, policy_version 15994 (0.0012) [2024-07-05 17:12:43,031][04594] Updated weights for policy 0, policy_version 16004 (0.0011) [2024-07-05 17:12:43,942][04005] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 7609.1). Total num frames: 65560576. Throughput: 0: 2872.0. Samples: 11389668. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 17:12:43,942][04005] Avg episode reward: [(0, '50.522')] [2024-07-05 17:12:46,555][04594] Updated weights for policy 0, policy_version 16014 (0.0012) [2024-07-05 17:12:48,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 7803.4). Total num frames: 65617920. Throughput: 0: 2880.7. Samples: 11398556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 17:12:48,943][04005] Avg episode reward: [(0, '52.184')] [2024-07-05 17:12:50,138][04594] Updated weights for policy 0, policy_version 16024 (0.0012) [2024-07-05 17:12:53,749][04594] Updated weights for policy 0, policy_version 16034 (0.0011) [2024-07-05 17:12:53,942][04005] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 7983.9). Total num frames: 65675264. Throughput: 0: 2867.7. Samples: 11415560. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 17:12:53,943][04005] Avg episode reward: [(0, '52.362')] [2024-07-05 17:12:57,273][04594] Updated weights for policy 0, policy_version 16044 (0.0012) [2024-07-05 17:12:58,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11468.8, 300 sec: 8178.4). Total num frames: 65732608. Throughput: 0: 2857.0. Samples: 11432786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 17:12:58,942][04005] Avg episode reward: [(0, '52.552')] [2024-07-05 17:13:00,800][04594] Updated weights for policy 0, policy_version 16054 (0.0012) [2024-07-05 17:13:03,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11468.8, 300 sec: 8359.2). Total num frames: 65789952. Throughput: 0: 2863.6. Samples: 11441702. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:13:03,942][04005] Avg episode reward: [(0, '51.249')] [2024-07-05 17:13:04,318][04594] Updated weights for policy 0, policy_version 16064 (0.0012) [2024-07-05 17:13:07,880][04594] Updated weights for policy 0, policy_version 16074 (0.0012) [2024-07-05 17:13:08,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11468.8, 300 sec: 8553.2). Total num frames: 65847296. Throughput: 0: 2863.1. Samples: 11459100. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:13:08,942][04005] Avg episode reward: [(0, '49.092')] [2024-07-05 17:13:11,555][04594] Updated weights for policy 0, policy_version 16084 (0.0012) [2024-07-05 17:13:13,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11400.5, 300 sec: 8747.7). Total num frames: 65904640. Throughput: 0: 2854.4. Samples: 11475834. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:13:13,943][04005] Avg episode reward: [(0, '48.952')] [2024-07-05 17:13:15,087][04594] Updated weights for policy 0, policy_version 16094 (0.0012) [2024-07-05 17:13:18,637][04594] Updated weights for policy 0, policy_version 16104 (0.0012) [2024-07-05 17:13:18,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11400.5, 300 sec: 8928.1). Total num frames: 65961984. Throughput: 0: 2867.8. Samples: 11484738. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:13:18,943][04005] Avg episode reward: [(0, '50.410')] [2024-07-05 17:13:22,345][04594] Updated weights for policy 0, policy_version 16114 (0.0013) [2024-07-05 17:13:23,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11400.5, 300 sec: 9066.8). Total num frames: 66019328. Throughput: 0: 2867.7. Samples: 11501422. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:13:23,942][04005] Avg episode reward: [(0, '52.836')] [2024-07-05 17:13:26,021][04594] Updated weights for policy 0, policy_version 16124 (0.0012) [2024-07-05 17:13:28,941][04005] Fps is (10 sec: 11469.0, 60 sec: 11400.6, 300 sec: 9108.4). Total num frames: 66076672. Throughput: 0: 2868.3. Samples: 11518742. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 17:13:28,942][04005] Avg episode reward: [(0, '52.864')] [2024-07-05 17:13:29,542][04594] Updated weights for policy 0, policy_version 16134 (0.0012) [2024-07-05 17:13:33,116][04594] Updated weights for policy 0, policy_version 16144 (0.0012) [2024-07-05 17:13:33,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11468.8, 300 sec: 9150.0). Total num frames: 66134016. Throughput: 0: 2858.6. Samples: 11527194. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 17:13:33,943][04005] Avg episode reward: [(0, '54.369')] [2024-07-05 17:13:36,671][04594] Updated weights for policy 0, policy_version 16154 (0.0012) [2024-07-05 17:13:38,942][04005] Fps is (10 sec: 11468.5, 60 sec: 11468.8, 300 sec: 9191.7). Total num frames: 66191360. Throughput: 0: 2866.0. Samples: 11544528. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 17:13:38,943][04005] Avg episode reward: [(0, '53.155')] [2024-07-05 17:13:40,219][04594] Updated weights for policy 0, policy_version 16164 (0.0012) [2024-07-05 17:13:43,847][04594] Updated weights for policy 0, policy_version 16174 (0.0012) [2024-07-05 17:13:43,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11468.8, 300 sec: 9247.2). Total num frames: 66248704. Throughput: 0: 2865.7. Samples: 11561742. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 17:13:43,942][04005] Avg episode reward: [(0, '52.922')] [2024-07-05 17:13:47,368][04594] Updated weights for policy 0, policy_version 16184 (0.0012) [2024-07-05 17:13:48,941][04005] Fps is (10 sec: 11469.0, 60 sec: 11468.8, 300 sec: 9288.9). Total num frames: 66306048. Throughput: 0: 2857.0. Samples: 11570266. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 17:13:48,942][04005] Avg episode reward: [(0, '52.796')] [2024-07-05 17:13:50,999][04594] Updated weights for policy 0, policy_version 16194 (0.0011) [2024-07-05 17:13:53,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11468.8, 300 sec: 9330.6). Total num frames: 66363392. Throughput: 0: 2852.2. Samples: 11587448. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 17:13:53,943][04005] Avg episode reward: [(0, '52.305')] [2024-07-05 17:13:54,562][04594] Updated weights for policy 0, policy_version 16204 (0.0011) [2024-07-05 17:13:58,439][04594] Updated weights for policy 0, policy_version 16214 (0.0013) [2024-07-05 17:13:58,942][04005] Fps is (10 sec: 11059.2, 60 sec: 11400.5, 300 sec: 9358.3). Total num frames: 66416640. Throughput: 0: 2845.9. Samples: 11603900. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 17:13:58,943][04005] Avg episode reward: [(0, '53.819')] [2024-07-05 17:14:02,077][04594] Updated weights for policy 0, policy_version 16224 (0.0012) [2024-07-05 17:14:03,941][04005] Fps is (10 sec: 11059.2, 60 sec: 11400.5, 300 sec: 9400.0). Total num frames: 66473984. Throughput: 0: 2831.3. Samples: 11612148. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 17:14:03,942][04005] Avg episode reward: [(0, '53.363')] [2024-07-05 17:14:05,756][04594] Updated weights for policy 0, policy_version 16234 (0.0012) [2024-07-05 17:14:08,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11400.5, 300 sec: 9441.6). Total num frames: 66531328. Throughput: 0: 2840.2. Samples: 11629230. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-07-05 17:14:08,942][04005] Avg episode reward: [(0, '54.270')] [2024-07-05 17:14:09,295][04594] Updated weights for policy 0, policy_version 16244 (0.0012) [2024-07-05 17:14:12,908][04594] Updated weights for policy 0, policy_version 16254 (0.0012) [2024-07-05 17:14:13,941][04005] Fps is (10 sec: 11059.2, 60 sec: 11332.3, 300 sec: 9483.3). Total num frames: 66584576. Throughput: 0: 2832.8. Samples: 11646216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:14:13,942][04005] Avg episode reward: [(0, '53.072')] [2024-07-05 17:14:16,476][04594] Updated weights for policy 0, policy_version 16264 (0.0011) [2024-07-05 17:14:18,941][04005] Fps is (10 sec: 11059.3, 60 sec: 11332.3, 300 sec: 9524.9). Total num frames: 66641920. Throughput: 0: 2836.6. Samples: 11654842. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:14:18,942][04005] Avg episode reward: [(0, '52.267')] [2024-07-05 17:14:20,031][04594] Updated weights for policy 0, policy_version 16274 (0.0012) [2024-07-05 17:14:23,566][04594] Updated weights for policy 0, policy_version 16284 (0.0012) [2024-07-05 17:14:23,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11400.5, 300 sec: 9580.5). Total num frames: 66703360. Throughput: 0: 2837.7. Samples: 11672226. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:14:23,942][04005] Avg episode reward: [(0, '51.190')] [2024-07-05 17:14:27,112][04594] Updated weights for policy 0, policy_version 16294 (0.0012) [2024-07-05 17:14:28,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11400.5, 300 sec: 9622.2). Total num frames: 66760704. Throughput: 0: 2841.9. Samples: 11689626. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:14:28,942][04005] Avg episode reward: [(0, '51.173')] [2024-07-05 17:14:30,631][04594] Updated weights for policy 0, policy_version 16304 (0.0012) [2024-07-05 17:14:33,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11400.6, 300 sec: 9663.8). Total num frames: 66818048. Throughput: 0: 2841.4. Samples: 11698130. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:14:33,942][04005] Avg episode reward: [(0, '51.412')] [2024-07-05 17:14:34,140][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000016314_66822144.pth... [2024-07-05 17:14:34,142][04594] Updated weights for policy 0, policy_version 16314 (0.0012) [2024-07-05 17:14:34,214][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000015713_64360448.pth [2024-07-05 17:14:37,673][04594] Updated weights for policy 0, policy_version 16324 (0.0011) [2024-07-05 17:14:38,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11400.6, 300 sec: 9719.3). Total num frames: 66875392. Throughput: 0: 2847.1. Samples: 11715568. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:14:38,942][04005] Avg episode reward: [(0, '49.609')] [2024-07-05 17:14:41,208][04594] Updated weights for policy 0, policy_version 16334 (0.0012) [2024-07-05 17:14:43,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11400.5, 300 sec: 9761.0). Total num frames: 66932736. Throughput: 0: 2867.2. Samples: 11732926. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:14:43,942][04005] Avg episode reward: [(0, '48.476')] [2024-07-05 17:14:44,737][04594] Updated weights for policy 0, policy_version 16344 (0.0012) [2024-07-05 17:14:48,260][04594] Updated weights for policy 0, policy_version 16354 (0.0012) [2024-07-05 17:14:48,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11400.5, 300 sec: 9802.6). Total num frames: 66990080. Throughput: 0: 2882.2. Samples: 11741848. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:14:48,942][04005] Avg episode reward: [(0, '48.407')] [2024-07-05 17:14:51,789][04594] Updated weights for policy 0, policy_version 16364 (0.0012) [2024-07-05 17:14:53,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11468.8, 300 sec: 9858.2). Total num frames: 67051520. Throughput: 0: 2889.0. Samples: 11759236. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:14:53,942][04005] Avg episode reward: [(0, '46.632')] [2024-07-05 17:14:55,323][04594] Updated weights for policy 0, policy_version 16374 (0.0012) [2024-07-05 17:14:58,839][04594] Updated weights for policy 0, policy_version 16384 (0.0012) [2024-07-05 17:14:58,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11537.1, 300 sec: 9899.8). Total num frames: 67108864. Throughput: 0: 2898.8. Samples: 11776664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:14:58,942][04005] Avg episode reward: [(0, '49.165')] [2024-07-05 17:15:02,372][04594] Updated weights for policy 0, policy_version 16394 (0.0011) [2024-07-05 17:15:03,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 9955.4). Total num frames: 67166208. Throughput: 0: 2895.5. Samples: 11785140. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:15:03,942][04005] Avg episode reward: [(0, '46.908')] [2024-07-05 17:15:05,893][04594] Updated weights for policy 0, policy_version 16404 (0.0011) [2024-07-05 17:15:08,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 9997.0). Total num frames: 67223552. Throughput: 0: 2896.6. Samples: 11802574. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:15:08,943][04005] Avg episode reward: [(0, '48.900')] [2024-07-05 17:15:09,466][04594] Updated weights for policy 0, policy_version 16414 (0.0012) [2024-07-05 17:15:12,961][04594] Updated weights for policy 0, policy_version 16424 (0.0012) [2024-07-05 17:15:13,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 10038.7). Total num frames: 67280896. Throughput: 0: 2896.8. Samples: 11819984. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:15:13,943][04005] Avg episode reward: [(0, '48.531')] [2024-07-05 17:15:16,471][04594] Updated weights for policy 0, policy_version 16434 (0.0011) [2024-07-05 17:15:18,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 10122.1). Total num frames: 67342336. Throughput: 0: 2906.5. Samples: 11828922. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:15:18,942][04005] Avg episode reward: [(0, '48.714')] [2024-07-05 17:15:19,998][04594] Updated weights for policy 0, policy_version 16444 (0.0012) [2024-07-05 17:15:23,514][04594] Updated weights for policy 0, policy_version 16454 (0.0012) [2024-07-05 17:15:23,942][04005] Fps is (10 sec: 11878.4, 60 sec: 11605.3, 300 sec: 10316.6). Total num frames: 67399680. Throughput: 0: 2906.5. Samples: 11846362. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:15:23,942][04005] Avg episode reward: [(0, '49.089')] [2024-07-05 17:15:27,039][04594] Updated weights for policy 0, policy_version 16464 (0.0012) [2024-07-05 17:15:28,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11605.3, 300 sec: 10497.2). Total num frames: 67457024. Throughput: 0: 2907.7. Samples: 11863772. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:15:28,942][04005] Avg episode reward: [(0, '49.819')] [2024-07-05 17:15:30,565][04594] Updated weights for policy 0, policy_version 16474 (0.0012) [2024-07-05 17:15:33,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 10691.6). Total num frames: 67514368. Throughput: 0: 2901.8. Samples: 11872430. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:15:33,942][04005] Avg episode reward: [(0, '52.236')] [2024-07-05 17:15:34,090][04594] Updated weights for policy 0, policy_version 16484 (0.0012) [2024-07-05 17:15:37,638][04594] Updated weights for policy 0, policy_version 16494 (0.0012) [2024-07-05 17:15:38,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 10885.9). Total num frames: 67571712. Throughput: 0: 2900.5. Samples: 11889760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:15:38,942][04005] Avg episode reward: [(0, '51.859')] [2024-07-05 17:15:41,183][04594] Updated weights for policy 0, policy_version 16504 (0.0012) [2024-07-05 17:15:43,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11605.3, 300 sec: 11066.4). Total num frames: 67629056. Throughput: 0: 2898.6. Samples: 11907100. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:15:43,943][04005] Avg episode reward: [(0, '52.390')] [2024-07-05 17:15:44,698][04594] Updated weights for policy 0, policy_version 16514 (0.0012) [2024-07-05 17:15:48,219][04594] Updated weights for policy 0, policy_version 16524 (0.0012) [2024-07-05 17:15:48,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11274.8). Total num frames: 67690496. Throughput: 0: 2907.8. Samples: 11915992. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-07-05 17:15:48,942][04005] Avg episode reward: [(0, '52.316')] [2024-07-05 17:15:51,758][04594] Updated weights for policy 0, policy_version 16534 (0.0012) [2024-07-05 17:15:53,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11605.3, 300 sec: 11455.2). Total num frames: 67747840. Throughput: 0: 2907.2. Samples: 11933398. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:15:53,942][04005] Avg episode reward: [(0, '51.584')] [2024-07-05 17:15:55,300][04594] Updated weights for policy 0, policy_version 16544 (0.0012) [2024-07-05 17:15:58,827][04594] Updated weights for policy 0, policy_version 16554 (0.0012) [2024-07-05 17:15:58,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11552.1). Total num frames: 67805184. Throughput: 0: 2905.8. Samples: 11950746. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:15:58,942][04005] Avg episode reward: [(0, '52.465')] [2024-07-05 17:16:02,358][04594] Updated weights for policy 0, policy_version 16564 (0.0012) [2024-07-05 17:16:03,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11538.2). Total num frames: 67862528. Throughput: 0: 2896.0. Samples: 11959244. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:16:03,942][04005] Avg episode reward: [(0, '52.668')] [2024-07-05 17:16:05,864][04594] Updated weights for policy 0, policy_version 16574 (0.0011) [2024-07-05 17:16:08,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11538.2). Total num frames: 67919872. Throughput: 0: 2899.1. Samples: 11976820. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:16:08,942][04005] Avg episode reward: [(0, '52.930')] [2024-07-05 17:16:09,390][04594] Updated weights for policy 0, policy_version 16584 (0.0012) [2024-07-05 17:16:12,922][04594] Updated weights for policy 0, policy_version 16594 (0.0012) [2024-07-05 17:16:13,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11538.2). Total num frames: 67977216. Throughput: 0: 2898.2. Samples: 11994190. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:16:13,942][04005] Avg episode reward: [(0, '52.417')] [2024-07-05 17:16:16,455][04594] Updated weights for policy 0, policy_version 16604 (0.0012) [2024-07-05 17:16:18,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11605.3, 300 sec: 11552.1). Total num frames: 68038656. Throughput: 0: 2901.6. Samples: 12003002. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:16:18,942][04005] Avg episode reward: [(0, '50.826')] [2024-07-05 17:16:19,993][04594] Updated weights for policy 0, policy_version 16614 (0.0012) [2024-07-05 17:16:23,524][04594] Updated weights for policy 0, policy_version 16624 (0.0012) [2024-07-05 17:16:23,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11605.4, 300 sec: 11538.2). Total num frames: 68096000. Throughput: 0: 2902.7. Samples: 12020380. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:16:23,942][04005] Avg episode reward: [(0, '51.293')] [2024-07-05 17:16:27,062][04594] Updated weights for policy 0, policy_version 16634 (0.0012) [2024-07-05 17:16:28,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11538.2). Total num frames: 68153344. Throughput: 0: 2903.4. Samples: 12037752. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:16:28,942][04005] Avg episode reward: [(0, '51.947')] [2024-07-05 17:16:30,574][04594] Updated weights for policy 0, policy_version 16644 (0.0011) [2024-07-05 17:16:33,942][04005] Fps is (10 sec: 11468.5, 60 sec: 11605.3, 300 sec: 11538.2). Total num frames: 68210688. Throughput: 0: 2897.7. Samples: 12046390. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:16:33,943][04005] Avg episode reward: [(0, '51.119')] [2024-07-05 17:16:34,089][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000016654_68214784.pth... [2024-07-05 17:16:34,091][04594] Updated weights for policy 0, policy_version 16654 (0.0013) [2024-07-05 17:16:34,164][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000015979_65449984.pth [2024-07-05 17:16:37,631][04594] Updated weights for policy 0, policy_version 16664 (0.0012) [2024-07-05 17:16:38,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11524.3). Total num frames: 68268032. Throughput: 0: 2897.5. Samples: 12063786. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:16:38,942][04005] Avg episode reward: [(0, '51.911')] [2024-07-05 17:16:41,180][04594] Updated weights for policy 0, policy_version 16674 (0.0012) [2024-07-05 17:16:43,941][04005] Fps is (10 sec: 11469.0, 60 sec: 11605.4, 300 sec: 11524.3). Total num frames: 68325376. Throughput: 0: 2896.8. Samples: 12081100. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:16:43,942][04005] Avg episode reward: [(0, '50.072')] [2024-07-05 17:16:44,710][04594] Updated weights for policy 0, policy_version 16684 (0.0012) [2024-07-05 17:16:48,225][04594] Updated weights for policy 0, policy_version 16694 (0.0012) [2024-07-05 17:16:48,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11605.3, 300 sec: 11538.2). Total num frames: 68386816. Throughput: 0: 2905.9. Samples: 12090010. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:16:48,943][04005] Avg episode reward: [(0, '52.223')] [2024-07-05 17:16:51,766][04594] Updated weights for policy 0, policy_version 16704 (0.0012) [2024-07-05 17:16:53,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11605.3, 300 sec: 11524.3). Total num frames: 68444160. Throughput: 0: 2901.9. Samples: 12107406. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:16:53,942][04005] Avg episode reward: [(0, '51.167')] [2024-07-05 17:16:55,304][04594] Updated weights for policy 0, policy_version 16714 (0.0012) [2024-07-05 17:16:58,835][04594] Updated weights for policy 0, policy_version 16724 (0.0011) [2024-07-05 17:16:58,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11524.3). Total num frames: 68501504. Throughput: 0: 2901.4. Samples: 12124752. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:16:58,942][04005] Avg episode reward: [(0, '51.364')] [2024-07-05 17:17:02,365][04594] Updated weights for policy 0, policy_version 16734 (0.0012) [2024-07-05 17:17:03,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11605.3, 300 sec: 11524.3). Total num frames: 68558848. Throughput: 0: 2895.1. Samples: 12133280. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:17:03,943][04005] Avg episode reward: [(0, '51.426')] [2024-07-05 17:17:05,896][04594] Updated weights for policy 0, policy_version 16744 (0.0012) [2024-07-05 17:17:08,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11510.5). Total num frames: 68616192. Throughput: 0: 2895.5. Samples: 12150676. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:17:08,943][04005] Avg episode reward: [(0, '52.885')] [2024-07-05 17:17:09,463][04594] Updated weights for policy 0, policy_version 16754 (0.0011) [2024-07-05 17:17:12,970][04594] Updated weights for policy 0, policy_version 16764 (0.0012) [2024-07-05 17:17:13,941][04005] Fps is (10 sec: 11469.0, 60 sec: 11605.4, 300 sec: 11510.5). Total num frames: 68673536. Throughput: 0: 2895.9. Samples: 12168068. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:17:13,942][04005] Avg episode reward: [(0, '53.670')] [2024-07-05 17:17:16,493][04594] Updated weights for policy 0, policy_version 16774 (0.0011) [2024-07-05 17:17:18,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11537.1, 300 sec: 11510.5). Total num frames: 68730880. Throughput: 0: 2902.1. Samples: 12176986. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:17:18,942][04005] Avg episode reward: [(0, '54.096')] [2024-07-05 17:17:20,025][04594] Updated weights for policy 0, policy_version 16784 (0.0012) [2024-07-05 17:17:23,546][04594] Updated weights for policy 0, policy_version 16794 (0.0012) [2024-07-05 17:17:23,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11605.3, 300 sec: 11524.3). Total num frames: 68792320. Throughput: 0: 2902.0. Samples: 12194376. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:17:23,942][04005] Avg episode reward: [(0, '53.991')] [2024-07-05 17:17:27,100][04594] Updated weights for policy 0, policy_version 16804 (0.0012) [2024-07-05 17:17:28,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11605.4, 300 sec: 11538.2). Total num frames: 68849664. Throughput: 0: 2903.1. Samples: 12211740. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:17:28,942][04005] Avg episode reward: [(0, '53.230')] [2024-07-05 17:17:30,650][04594] Updated weights for policy 0, policy_version 16814 (0.0011) [2024-07-05 17:17:33,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.4, 300 sec: 11538.2). Total num frames: 68907008. Throughput: 0: 2893.8. Samples: 12220232. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:17:33,942][04005] Avg episode reward: [(0, '52.495')] [2024-07-05 17:17:34,178][04594] Updated weights for policy 0, policy_version 16824 (0.0012) [2024-07-05 17:17:37,704][04594] Updated weights for policy 0, policy_version 16834 (0.0012) [2024-07-05 17:17:38,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11538.2). Total num frames: 68964352. Throughput: 0: 2894.8. Samples: 12237672. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:17:38,942][04005] Avg episode reward: [(0, '50.856')] [2024-07-05 17:17:41,252][04594] Updated weights for policy 0, policy_version 16844 (0.0012) [2024-07-05 17:17:43,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11538.2). Total num frames: 69021696. Throughput: 0: 2895.7. Samples: 12255060. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:17:43,942][04005] Avg episode reward: [(0, '50.207')] [2024-07-05 17:17:44,770][04594] Updated weights for policy 0, policy_version 16854 (0.0012) [2024-07-05 17:17:48,304][04594] Updated weights for policy 0, policy_version 16864 (0.0012) [2024-07-05 17:17:48,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 11538.2). Total num frames: 69079040. Throughput: 0: 2904.1. Samples: 12263962. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:17:48,942][04005] Avg episode reward: [(0, '50.457')] [2024-07-05 17:17:51,840][04594] Updated weights for policy 0, policy_version 16874 (0.0012) [2024-07-05 17:17:53,942][04005] Fps is (10 sec: 11468.6, 60 sec: 11537.0, 300 sec: 11538.2). Total num frames: 69136384. Throughput: 0: 2904.1. Samples: 12281360. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:17:53,942][04005] Avg episode reward: [(0, '51.209')] [2024-07-05 17:17:55,374][04594] Updated weights for policy 0, policy_version 16884 (0.0011) [2024-07-05 17:17:58,905][04594] Updated weights for policy 0, policy_version 16894 (0.0012) [2024-07-05 17:17:58,941][04005] Fps is (10 sec: 11878.4, 60 sec: 11605.3, 300 sec: 11552.1). Total num frames: 69197824. Throughput: 0: 2904.5. Samples: 12298770. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:17:58,942][04005] Avg episode reward: [(0, '51.114')] [2024-07-05 17:18:02,449][04594] Updated weights for policy 0, policy_version 16904 (0.0012) [2024-07-05 17:18:03,942][04005] Fps is (10 sec: 11878.5, 60 sec: 11605.3, 300 sec: 11552.1). Total num frames: 69255168. Throughput: 0: 2894.7. Samples: 12307246. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:18:03,943][04005] Avg episode reward: [(0, '51.540')] [2024-07-05 17:18:05,981][04594] Updated weights for policy 0, policy_version 16914 (0.0012) [2024-07-05 17:18:08,942][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11552.1). Total num frames: 69312512. Throughput: 0: 2894.3. Samples: 12324620. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:18:08,942][04005] Avg episode reward: [(0, '52.000')] [2024-07-05 17:18:09,524][04594] Updated weights for policy 0, policy_version 16924 (0.0012) [2024-07-05 17:18:13,063][04594] Updated weights for policy 0, policy_version 16934 (0.0012) [2024-07-05 17:18:13,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11552.1). Total num frames: 69369856. Throughput: 0: 2895.6. Samples: 12342040. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:18:13,943][04005] Avg episode reward: [(0, '52.676')] [2024-07-05 17:18:16,588][04594] Updated weights for policy 0, policy_version 16944 (0.0011) [2024-07-05 17:18:18,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11552.1). Total num frames: 69427200. Throughput: 0: 2902.2. Samples: 12350830. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:18:18,942][04005] Avg episode reward: [(0, '50.470')] [2024-07-05 17:18:20,110][04594] Updated weights for policy 0, policy_version 16954 (0.0012) [2024-07-05 17:18:23,639][04594] Updated weights for policy 0, policy_version 16964 (0.0012) [2024-07-05 17:18:23,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11537.1, 300 sec: 11552.1). Total num frames: 69484544. Throughput: 0: 2902.5. Samples: 12368286. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:18:23,942][04005] Avg episode reward: [(0, '49.573')] [2024-07-05 17:18:27,166][04594] Updated weights for policy 0, policy_version 16974 (0.0012) [2024-07-05 17:18:28,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11605.3, 300 sec: 11566.0). Total num frames: 69545984. Throughput: 0: 2903.6. Samples: 12385720. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:18:28,942][04005] Avg episode reward: [(0, '49.038')] [2024-07-05 17:18:30,705][04594] Updated weights for policy 0, policy_version 16984 (0.0012) [2024-07-05 17:18:33,942][04005] Fps is (10 sec: 11878.3, 60 sec: 11605.3, 300 sec: 11566.0). Total num frames: 69603328. Throughput: 0: 2894.5. Samples: 12394216. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:18:33,943][04005] Avg episode reward: [(0, '50.829')] [2024-07-05 17:18:33,946][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000016993_69603328.pth... [2024-07-05 17:18:34,022][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000016314_66822144.pth [2024-07-05 17:18:34,295][04594] Updated weights for policy 0, policy_version 16994 (0.0012) [2024-07-05 17:18:37,779][04594] Updated weights for policy 0, policy_version 17004 (0.0012) [2024-07-05 17:18:38,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11566.0). Total num frames: 69660672. Throughput: 0: 2894.3. Samples: 12411602. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:18:38,942][04005] Avg episode reward: [(0, '52.609')] [2024-07-05 17:18:41,320][04594] Updated weights for policy 0, policy_version 17014 (0.0012) [2024-07-05 17:18:43,942][04005] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11566.0). Total num frames: 69718016. Throughput: 0: 2893.4. Samples: 12428972. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:18:43,943][04005] Avg episode reward: [(0, '52.285')] [2024-07-05 17:18:44,844][04594] Updated weights for policy 0, policy_version 17024 (0.0012) [2024-07-05 17:18:48,379][04594] Updated weights for policy 0, policy_version 17034 (0.0012) [2024-07-05 17:18:48,941][04005] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11566.0). Total num frames: 69775360. Throughput: 0: 2897.6. Samples: 12437638. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:18:48,942][04005] Avg episode reward: [(0, '52.558')] [2024-07-05 17:18:51,913][04594] Updated weights for policy 0, policy_version 17044 (0.0012) [2024-07-05 17:18:53,941][04005] Fps is (10 sec: 11468.9, 60 sec: 11605.4, 300 sec: 11579.9). Total num frames: 69832704. Throughput: 0: 2897.6. Samples: 12455010. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:18:53,942][04005] Avg episode reward: [(0, '51.826')] [2024-07-05 17:18:55,439][04594] Updated weights for policy 0, policy_version 17054 (0.0012) [2024-07-05 17:18:58,941][04005] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 11579.9). Total num frames: 69890048. Throughput: 0: 2897.7. Samples: 12472438. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-07-05 17:18:58,942][04005] Avg episode reward: [(0, '53.025')] [2024-07-05 17:18:58,970][04594] Updated weights for policy 0, policy_version 17064 (0.0012) [2024-07-05 17:19:02,510][04594] Updated weights for policy 0, policy_version 17074 (0.0012) [2024-07-05 17:19:03,941][04005] Fps is (10 sec: 11878.5, 60 sec: 11605.4, 300 sec: 11593.8). Total num frames: 69951488. Throughput: 0: 2896.6. Samples: 12481176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-05 17:19:03,942][04005] Avg episode reward: [(0, '53.389')] [2024-07-05 17:19:06,065][04594] Updated weights for policy 0, policy_version 17084 (0.0012) [2024-07-05 17:19:08,538][04581] Stopping Batcher_0... [2024-07-05 17:19:08,538][04581] Loop batcher_evt_loop terminating... [2024-07-05 17:19:08,538][04005] Component Batcher_0 stopped! [2024-07-05 17:19:08,539][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000017091_70004736.pth... [2024-07-05 17:19:08,551][04598] Stopping RolloutWorker_w4... [2024-07-05 17:19:08,551][04602] Stopping RolloutWorker_w6... [2024-07-05 17:19:08,551][04595] Stopping RolloutWorker_w0... [2024-07-05 17:19:08,551][04602] Loop rollout_proc6_evt_loop terminating... [2024-07-05 17:19:08,551][04596] Stopping RolloutWorker_w2... [2024-07-05 17:19:08,551][04598] Loop rollout_proc4_evt_loop terminating... [2024-07-05 17:19:08,551][04600] Stopping RolloutWorker_w5... [2024-07-05 17:19:08,551][04597] Stopping RolloutWorker_w1... [2024-07-05 17:19:08,551][04595] Loop rollout_proc0_evt_loop terminating... [2024-07-05 17:19:08,552][04596] Loop rollout_proc2_evt_loop terminating... [2024-07-05 17:19:08,552][04597] Loop rollout_proc1_evt_loop terminating... [2024-07-05 17:19:08,551][04005] Component RolloutWorker_w4 stopped! [2024-07-05 17:19:08,552][04600] Loop rollout_proc5_evt_loop terminating... [2024-07-05 17:19:08,552][04599] Stopping RolloutWorker_w3... [2024-07-05 17:19:08,552][04599] Loop rollout_proc3_evt_loop terminating... [2024-07-05 17:19:08,552][04601] Stopping RolloutWorker_w7... [2024-07-05 17:19:08,553][04601] Loop rollout_proc7_evt_loop terminating... [2024-07-05 17:19:08,552][04005] Component RolloutWorker_w6 stopped! [2024-07-05 17:19:08,554][04005] Component RolloutWorker_w0 stopped! [2024-07-05 17:19:08,555][04005] Component RolloutWorker_w2 stopped! [2024-07-05 17:19:08,556][04005] Component RolloutWorker_w5 stopped! [2024-07-05 17:19:08,556][04005] Component RolloutWorker_w1 stopped! [2024-07-05 17:19:08,557][04005] Component RolloutWorker_w3 stopped! [2024-07-05 17:19:08,558][04005] Component RolloutWorker_w7 stopped! [2024-07-05 17:19:08,570][04594] Weights refcount: 2 0 [2024-07-05 17:19:08,572][04594] Stopping InferenceWorker_p0-w0... [2024-07-05 17:19:08,572][04594] Loop inference_proc0-0_evt_loop terminating... [2024-07-05 17:19:08,572][04005] Component InferenceWorker_p0-w0 stopped! [2024-07-05 17:19:08,630][04581] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000016654_68214784.pth [2024-07-05 17:19:08,642][04581] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000017091_70004736.pth... [2024-07-05 17:19:08,756][04581] Stopping LearnerWorker_p0... [2024-07-05 17:19:08,756][04581] Loop learner_proc0_evt_loop terminating... [2024-07-05 17:19:08,756][04005] Component LearnerWorker_p0 stopped! [2024-07-05 17:19:08,758][04005] Waiting for process learner_proc0 to stop... [2024-07-05 17:19:09,723][04005] Waiting for process inference_proc0-0 to join... [2024-07-05 17:19:09,724][04005] Waiting for process rollout_proc0 to join... [2024-07-05 17:19:09,725][04005] Waiting for process rollout_proc1 to join... [2024-07-05 17:19:09,726][04005] Waiting for process rollout_proc2 to join... [2024-07-05 17:19:09,726][04005] Waiting for process rollout_proc3 to join... [2024-07-05 17:19:09,726][04005] Waiting for process rollout_proc4 to join... [2024-07-05 17:19:09,727][04005] Waiting for process rollout_proc5 to join... [2024-07-05 17:19:09,727][04005] Waiting for process rollout_proc6 to join... [2024-07-05 17:19:09,728][04005] Waiting for process rollout_proc7 to join... [2024-07-05 17:19:09,728][04005] Batcher 0 profile tree view: batching: 78.0789, releasing_batches: 0.2794 [2024-07-05 17:19:09,728][04005] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 23.0186 update_model: 37.8160 weight_update: 0.0011 one_step: 0.0029 handle_policy_step: 4669.4145 deserialize: 76.9583, stack: 10.5236, obs_to_device_normalize: 802.4971, forward: 2767.7908, send_messages: 94.5028 prepare_outputs: 844.8409 to_cpu: 761.6729 [2024-07-05 17:19:09,729][04005] Learner 0 profile tree view: misc: 0.0551, prepare_batch: 144.0114 train: 3409.8811 epoch_init: 0.1600, minibatch_init: 0.1716, losses_postprocess: 4.1734, kl_divergence: 2.5807, after_optimizer: 10.7510 calculate_losses: 1133.8747 losses_init: 0.0841, forward_head: 30.6552, bptt_initial: 1074.4381, tail: 5.4593, advantages_returns: 1.4418, losses: 11.3527 bptt: 8.4248 bptt_forward_core: 8.0201 update: 2252.6446 clip: 10.5088 [2024-07-05 17:19:09,729][04005] RolloutWorker_w0 profile tree view: wait_for_trajectories: 1.0361, enqueue_policy_requests: 64.7761, env_step: 1045.3496, overhead: 80.8314, complete_rollouts: 1.9245 save_policy_outputs: 87.0457 split_output_tensors: 40.6469 [2024-07-05 17:19:09,730][04005] RolloutWorker_w7 profile tree view: wait_for_trajectories: 1.1689, enqueue_policy_requests: 67.3333, env_step: 994.4821, overhead: 85.3259, complete_rollouts: 2.0731 save_policy_outputs: 87.7417 split_output_tensors: 41.3133 [2024-07-05 17:19:09,730][04005] Loop Runner_EvtLoop terminating... [2024-07-05 17:19:09,731][04005] Runner profile tree view: main_loop: 4830.5743 [2024-07-05 17:19:09,732][04005] Collected {0: 70004736}, FPS: 10350.7 [2024-07-05 17:19:53,356][04005] Loading existing experiment configuration from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/config.json [2024-07-05 17:19:53,357][04005] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-05 17:19:53,358][04005] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-05 17:19:53,358][04005] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-05 17:19:53,359][04005] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-05 17:19:53,359][04005] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-05 17:19:53,360][04005] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-07-05 17:19:53,360][04005] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-05 17:19:53,361][04005] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-07-05 17:19:53,362][04005] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-07-05 17:19:53,362][04005] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-05 17:19:53,363][04005] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-05 17:19:53,364][04005] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-05 17:19:53,364][04005] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-05 17:19:53,365][04005] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-05 17:19:53,386][04005] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-05 17:19:53,388][04005] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 17:19:53,389][04005] RunningMeanStd input shape: (1,) [2024-07-05 17:19:53,399][04005] Num input channels: 3 [2024-07-05 17:19:53,408][04005] Convolutional layer output size: 4608 [2024-07-05 17:19:53,423][04005] Policy head output size: 512 [2024-07-05 17:19:55,169][04005] Loading state from checkpoint /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000017091_70004736.pth... [2024-07-05 17:19:55,945][04005] Num frames 100... [2024-07-05 17:19:56,021][04005] Num frames 200... [2024-07-05 17:19:56,094][04005] Num frames 300... [2024-07-05 17:19:56,169][04005] Num frames 400... [2024-07-05 17:19:56,242][04005] Num frames 500... [2024-07-05 17:19:56,318][04005] Num frames 600... [2024-07-05 17:19:56,390][04005] Num frames 700... [2024-07-05 17:19:56,461][04005] Num frames 800... [2024-07-05 17:19:56,536][04005] Num frames 900... [2024-07-05 17:19:56,610][04005] Num frames 1000... [2024-07-05 17:19:56,685][04005] Num frames 1100... [2024-07-05 17:19:56,755][04005] Num frames 1200... [2024-07-05 17:19:56,828][04005] Num frames 1300... [2024-07-05 17:19:56,900][04005] Num frames 1400... [2024-07-05 17:19:56,973][04005] Num frames 1500... [2024-07-05 17:19:57,049][04005] Num frames 1600... [2024-07-05 17:19:57,120][04005] Num frames 1700... [2024-07-05 17:19:57,190][04005] Num frames 1800... [2024-07-05 17:19:57,262][04005] Num frames 1900... [2024-07-05 17:19:57,338][04005] Num frames 2000... [2024-07-05 17:19:57,412][04005] Num frames 2100... [2024-07-05 17:19:57,463][04005] Avg episode rewards: #0: 53.999, true rewards: #0: 21.000 [2024-07-05 17:19:57,465][04005] Avg episode reward: 53.999, avg true_objective: 21.000 [2024-07-05 17:19:57,565][04005] Num frames 2200... [2024-07-05 17:19:57,663][04005] Num frames 2300... [2024-07-05 17:19:57,743][04005] Num frames 2400... [2024-07-05 17:19:57,819][04005] Num frames 2500... [2024-07-05 17:19:57,891][04005] Num frames 2600... [2024-07-05 17:19:57,964][04005] Num frames 2700... [2024-07-05 17:19:58,036][04005] Num frames 2800... [2024-07-05 17:19:58,107][04005] Num frames 2900... [2024-07-05 17:19:58,190][04005] Num frames 3000... [2024-07-05 17:19:58,265][04005] Num frames 3100... [2024-07-05 17:19:58,341][04005] Num frames 3200... [2024-07-05 17:19:58,418][04005] Num frames 3300... [2024-07-05 17:19:58,493][04005] Num frames 3400... [2024-07-05 17:19:58,569][04005] Num frames 3500... [2024-07-05 17:19:58,644][04005] Num frames 3600... [2024-07-05 17:19:58,716][04005] Num frames 3700... [2024-07-05 17:19:58,791][04005] Num frames 3800... [2024-07-05 17:19:58,868][04005] Num frames 3900... [2024-07-05 17:19:58,945][04005] Num frames 4000... [2024-07-05 17:19:59,022][04005] Num frames 4100... [2024-07-05 17:19:59,098][04005] Num frames 4200... [2024-07-05 17:19:59,149][04005] Avg episode rewards: #0: 54.999, true rewards: #0: 21.000 [2024-07-05 17:19:59,150][04005] Avg episode reward: 54.999, avg true_objective: 21.000 [2024-07-05 17:19:59,223][04005] Num frames 4300... [2024-07-05 17:19:59,297][04005] Num frames 4400... [2024-07-05 17:19:59,371][04005] Num frames 4500... [2024-07-05 17:19:59,454][04005] Num frames 4600... [2024-07-05 17:19:59,542][04005] Num frames 4700... [2024-07-05 17:19:59,644][04005] Num frames 4800... [2024-07-05 17:19:59,736][04005] Num frames 4900... [2024-07-05 17:19:59,819][04005] Num frames 5000... [2024-07-05 17:19:59,895][04005] Num frames 5100... [2024-07-05 17:19:59,971][04005] Num frames 5200... [2024-07-05 17:20:00,045][04005] Num frames 5300... [2024-07-05 17:20:00,120][04005] Num frames 5400... [2024-07-05 17:20:00,190][04005] Num frames 5500... [2024-07-05 17:20:00,263][04005] Num frames 5600... [2024-07-05 17:20:00,335][04005] Num frames 5700... [2024-07-05 17:20:00,412][04005] Num frames 5800... [2024-07-05 17:20:00,487][04005] Num frames 5900... [2024-07-05 17:20:00,557][04005] Num frames 6000... [2024-07-05 17:20:00,631][04005] Num frames 6100... [2024-07-05 17:20:00,706][04005] Num frames 6200... [2024-07-05 17:20:00,800][04005] Avg episode rewards: #0: 54.835, true rewards: #0: 20.837 [2024-07-05 17:20:00,801][04005] Avg episode reward: 54.835, avg true_objective: 20.837 [2024-07-05 17:20:00,840][04005] Num frames 6300... [2024-07-05 17:20:00,913][04005] Num frames 6400... [2024-07-05 17:20:00,987][04005] Num frames 6500... [2024-07-05 17:20:01,063][04005] Num frames 6600... [2024-07-05 17:20:01,149][04005] Num frames 6700... [2024-07-05 17:20:01,223][04005] Num frames 6800... [2024-07-05 17:20:01,293][04005] Num frames 6900... [2024-07-05 17:20:01,363][04005] Num frames 7000... [2024-07-05 17:20:01,435][04005] Num frames 7100... [2024-07-05 17:20:01,507][04005] Num frames 7200... [2024-07-05 17:20:01,583][04005] Num frames 7300... [2024-07-05 17:20:01,657][04005] Num frames 7400... [2024-07-05 17:20:01,729][04005] Num frames 7500... [2024-07-05 17:20:01,801][04005] Num frames 7600... [2024-07-05 17:20:01,874][04005] Num frames 7700... [2024-07-05 17:20:01,947][04005] Num frames 7800... [2024-07-05 17:20:02,020][04005] Num frames 7900... [2024-07-05 17:20:02,092][04005] Num frames 8000... [2024-07-05 17:20:02,170][04005] Num frames 8100... [2024-07-05 17:20:02,239][04005] Num frames 8200... [2024-07-05 17:20:02,311][04005] Num frames 8300... [2024-07-05 17:20:02,402][04005] Avg episode rewards: #0: 55.376, true rewards: #0: 20.878 [2024-07-05 17:20:02,403][04005] Avg episode reward: 55.376, avg true_objective: 20.878 [2024-07-05 17:20:02,444][04005] Num frames 8400... [2024-07-05 17:20:02,516][04005] Num frames 8500... [2024-07-05 17:20:02,586][04005] Num frames 8600... [2024-07-05 17:20:02,656][04005] Num frames 8700... [2024-07-05 17:20:02,728][04005] Num frames 8800... [2024-07-05 17:20:02,798][04005] Num frames 8900... [2024-07-05 17:20:02,873][04005] Num frames 9000... [2024-07-05 17:20:02,945][04005] Num frames 9100... [2024-07-05 17:20:03,017][04005] Num frames 9200... [2024-07-05 17:20:03,093][04005] Num frames 9300... [2024-07-05 17:20:03,170][04005] Num frames 9400... [2024-07-05 17:20:03,247][04005] Num frames 9500... [2024-07-05 17:20:03,325][04005] Num frames 9600... [2024-07-05 17:20:03,397][04005] Num frames 9700... [2024-07-05 17:20:03,473][04005] Num frames 9800... [2024-07-05 17:20:03,547][04005] Num frames 9900... [2024-07-05 17:20:03,622][04005] Num frames 10000... [2024-07-05 17:20:03,696][04005] Num frames 10100... [2024-07-05 17:20:03,770][04005] Num frames 10200... [2024-07-05 17:20:03,841][04005] Num frames 10300... [2024-07-05 17:20:03,915][04005] Num frames 10400... [2024-07-05 17:20:04,007][04005] Avg episode rewards: #0: 55.701, true rewards: #0: 20.902 [2024-07-05 17:20:04,009][04005] Avg episode reward: 55.701, avg true_objective: 20.902 [2024-07-05 17:20:04,047][04005] Num frames 10500... [2024-07-05 17:20:04,120][04005] Num frames 10600... [2024-07-05 17:20:04,193][04005] Num frames 10700... [2024-07-05 17:20:04,266][04005] Num frames 10800... [2024-07-05 17:20:04,339][04005] Num frames 10900... [2024-07-05 17:20:04,409][04005] Num frames 11000... [2024-07-05 17:20:04,482][04005] Num frames 11100... [2024-07-05 17:20:04,555][04005] Num frames 11200... [2024-07-05 17:20:04,626][04005] Num frames 11300... [2024-07-05 17:20:04,712][04005] Num frames 11400... [2024-07-05 17:20:04,786][04005] Num frames 11500... [2024-07-05 17:20:04,857][04005] Num frames 11600... [2024-07-05 17:20:04,931][04005] Num frames 11700... [2024-07-05 17:20:05,003][04005] Num frames 11800... [2024-07-05 17:20:05,075][04005] Num frames 11900... [2024-07-05 17:20:05,148][04005] Num frames 12000... [2024-07-05 17:20:05,220][04005] Num frames 12100... [2024-07-05 17:20:05,292][04005] Num frames 12200... [2024-07-05 17:20:05,365][04005] Num frames 12300... [2024-07-05 17:20:05,442][04005] Num frames 12400... [2024-07-05 17:20:05,517][04005] Num frames 12500... [2024-07-05 17:20:05,609][04005] Avg episode rewards: #0: 56.084, true rewards: #0: 20.918 [2024-07-05 17:20:05,611][04005] Avg episode reward: 56.084, avg true_objective: 20.918 [2024-07-05 17:20:05,648][04005] Num frames 12600... [2024-07-05 17:20:05,719][04005] Num frames 12700... [2024-07-05 17:20:05,791][04005] Num frames 12800... [2024-07-05 17:20:05,863][04005] Num frames 12900... [2024-07-05 17:20:05,936][04005] Num frames 13000... [2024-07-05 17:20:06,008][04005] Num frames 13100... [2024-07-05 17:20:06,082][04005] Num frames 13200... [2024-07-05 17:20:06,154][04005] Num frames 13300... [2024-07-05 17:20:06,226][04005] Num frames 13400... [2024-07-05 17:20:06,299][04005] Num frames 13500... [2024-07-05 17:20:06,371][04005] Num frames 13600... [2024-07-05 17:20:06,442][04005] Num frames 13700... [2024-07-05 17:20:06,514][04005] Num frames 13800... [2024-07-05 17:20:06,586][04005] Num frames 13900... [2024-07-05 17:20:06,656][04005] Num frames 14000... [2024-07-05 17:20:06,730][04005] Num frames 14100... [2024-07-05 17:20:06,807][04005] Num frames 14200... [2024-07-05 17:20:06,876][04005] Num frames 14300... [2024-07-05 17:20:06,950][04005] Num frames 14400... [2024-07-05 17:20:07,024][04005] Num frames 14500... [2024-07-05 17:20:07,099][04005] Num frames 14600... [2024-07-05 17:20:07,191][04005] Avg episode rewards: #0: 55.929, true rewards: #0: 20.930 [2024-07-05 17:20:07,193][04005] Avg episode reward: 55.929, avg true_objective: 20.930 [2024-07-05 17:20:07,231][04005] Num frames 14700... [2024-07-05 17:20:07,304][04005] Num frames 14800... [2024-07-05 17:20:07,376][04005] Num frames 14900... [2024-07-05 17:20:07,452][04005] Num frames 15000... [2024-07-05 17:20:07,525][04005] Num frames 15100... [2024-07-05 17:20:07,599][04005] Num frames 15200... [2024-07-05 17:20:07,673][04005] Num frames 15300... [2024-07-05 17:20:07,748][04005] Num frames 15400... [2024-07-05 17:20:07,823][04005] Num frames 15500... [2024-07-05 17:20:07,897][04005] Num frames 15600... [2024-07-05 17:20:07,972][04005] Num frames 15700... [2024-07-05 17:20:08,056][04005] Num frames 15800... [2024-07-05 17:20:08,129][04005] Num frames 15900... [2024-07-05 17:20:08,201][04005] Num frames 16000... [2024-07-05 17:20:08,273][04005] Num frames 16100... [2024-07-05 17:20:08,346][04005] Num frames 16200... [2024-07-05 17:20:08,415][04005] Num frames 16300... [2024-07-05 17:20:08,487][04005] Num frames 16400... [2024-07-05 17:20:08,559][04005] Num frames 16500... [2024-07-05 17:20:08,631][04005] Num frames 16600... [2024-07-05 17:20:08,703][04005] Num frames 16700... [2024-07-05 17:20:08,782][04005] Avg episode rewards: #0: 56.417, true rewards: #0: 20.919 [2024-07-05 17:20:08,783][04005] Avg episode reward: 56.417, avg true_objective: 20.919 [2024-07-05 17:20:08,833][04005] Num frames 16800... [2024-07-05 17:20:08,905][04005] Num frames 16900... [2024-07-05 17:20:08,982][04005] Num frames 17000... [2024-07-05 17:20:09,054][04005] Num frames 17100... [2024-07-05 17:20:09,127][04005] Num frames 17200... [2024-07-05 17:20:09,196][04005] Num frames 17300... [2024-07-05 17:20:09,268][04005] Num frames 17400... [2024-07-05 17:20:09,345][04005] Num frames 17500... [2024-07-05 17:20:09,418][04005] Num frames 17600... [2024-07-05 17:20:09,489][04005] Num frames 17700... [2024-07-05 17:20:09,565][04005] Num frames 17800... [2024-07-05 17:20:09,646][04005] Num frames 17900... [2024-07-05 17:20:09,721][04005] Num frames 18000... [2024-07-05 17:20:09,793][04005] Num frames 18100... [2024-07-05 17:20:09,871][04005] Num frames 18200... [2024-07-05 17:20:09,949][04005] Num frames 18300... [2024-07-05 17:20:10,024][04005] Num frames 18400... [2024-07-05 17:20:10,096][04005] Num frames 18500... [2024-07-05 17:20:10,169][04005] Num frames 18600... [2024-07-05 17:20:10,244][04005] Num frames 18700... [2024-07-05 17:20:10,317][04005] Num frames 18800... [2024-07-05 17:20:10,400][04005] Avg episode rewards: #0: 56.593, true rewards: #0: 20.928 [2024-07-05 17:20:10,402][04005] Avg episode reward: 56.593, avg true_objective: 20.928 [2024-07-05 17:20:10,454][04005] Num frames 18900... [2024-07-05 17:20:10,524][04005] Num frames 19000... [2024-07-05 17:20:10,598][04005] Num frames 19100... [2024-07-05 17:20:10,671][04005] Num frames 19200... [2024-07-05 17:20:10,745][04005] Num frames 19300... [2024-07-05 17:20:10,819][04005] Num frames 19400... [2024-07-05 17:20:10,894][04005] Num frames 19500... [2024-07-05 17:20:10,968][04005] Num frames 19600... [2024-07-05 17:20:11,043][04005] Num frames 19700... [2024-07-05 17:20:11,122][04005] Num frames 19800... [2024-07-05 17:20:11,198][04005] Num frames 19900... [2024-07-05 17:20:11,271][04005] Num frames 20000... [2024-07-05 17:20:11,346][04005] Num frames 20100... [2024-07-05 17:20:11,431][04005] Num frames 20200... [2024-07-05 17:20:11,503][04005] Num frames 20300... [2024-07-05 17:20:11,576][04005] Num frames 20400... [2024-07-05 17:20:11,651][04005] Num frames 20500... [2024-07-05 17:20:11,725][04005] Num frames 20600... [2024-07-05 17:20:11,798][04005] Num frames 20700... [2024-07-05 17:20:11,870][04005] Num frames 20800... [2024-07-05 17:20:11,943][04005] Num frames 20900... [2024-07-05 17:20:12,022][04005] Avg episode rewards: #0: 57.434, true rewards: #0: 20.935 [2024-07-05 17:20:12,023][04005] Avg episode reward: 57.434, avg true_objective: 20.935 [2024-07-05 17:20:33,642][04005] Replay video saved to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/replay.mp4! [2024-07-05 17:23:13,711][04005] Loading existing experiment configuration from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/config.json [2024-07-05 17:23:13,712][04005] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-05 17:23:13,712][04005] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-05 17:23:13,713][04005] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-05 17:23:13,713][04005] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-05 17:23:13,714][04005] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-05 17:23:13,714][04005] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-07-05 17:23:13,715][04005] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-05 17:23:13,715][04005] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-07-05 17:23:13,716][04005] Adding new argument 'hf_repository'='ra9hu/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-07-05 17:23:13,716][04005] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-05 17:23:13,717][04005] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-05 17:23:13,717][04005] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-05 17:23:13,718][04005] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-05 17:23:13,718][04005] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-05 17:23:13,735][04005] RunningMeanStd input shape: (3, 72, 128) [2024-07-05 17:23:13,736][04005] RunningMeanStd input shape: (1,) [2024-07-05 17:23:13,744][04005] Num input channels: 3 [2024-07-05 17:23:13,750][04005] Convolutional layer output size: 4608 [2024-07-05 17:23:13,762][04005] Policy head output size: 512 [2024-07-05 17:23:13,829][04005] Loading state from checkpoint /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/checkpoint_p0/checkpoint_000017091_70004736.pth... [2024-07-05 17:23:14,511][04005] Num frames 100... [2024-07-05 17:23:14,603][04005] Num frames 200... [2024-07-05 17:23:14,696][04005] Num frames 300... [2024-07-05 17:23:14,787][04005] Num frames 400... [2024-07-05 17:23:14,884][04005] Num frames 500... [2024-07-05 17:23:14,978][04005] Num frames 600... [2024-07-05 17:23:15,065][04005] Num frames 700... [2024-07-05 17:23:15,156][04005] Num frames 800... [2024-07-05 17:23:15,233][04005] Num frames 900... [2024-07-05 17:23:15,310][04005] Num frames 1000... [2024-07-05 17:23:15,387][04005] Num frames 1100... [2024-07-05 17:23:15,458][04005] Num frames 1200... [2024-07-05 17:23:15,531][04005] Num frames 1300... [2024-07-05 17:23:15,604][04005] Num frames 1400... [2024-07-05 17:23:15,683][04005] Num frames 1500... [2024-07-05 17:23:15,755][04005] Num frames 1600... [2024-07-05 17:23:15,834][04005] Num frames 1700... [2024-07-05 17:23:15,910][04005] Num frames 1800... [2024-07-05 17:23:15,983][04005] Num frames 1900... [2024-07-05 17:23:16,059][04005] Num frames 2000... [2024-07-05 17:23:16,138][04005] Num frames 2100... [2024-07-05 17:23:16,189][04005] Avg episode rewards: #0: 58.999, true rewards: #0: 21.000 [2024-07-05 17:23:16,191][04005] Avg episode reward: 58.999, avg true_objective: 21.000 [2024-07-05 17:23:16,266][04005] Num frames 2200... [2024-07-05 17:23:16,340][04005] Num frames 2300... [2024-07-05 17:23:16,414][04005] Num frames 2400... [2024-07-05 17:23:16,486][04005] Num frames 2500... [2024-07-05 17:23:16,558][04005] Num frames 2600... [2024-07-05 17:23:16,635][04005] Num frames 2700... [2024-07-05 17:23:16,711][04005] Num frames 2800... [2024-07-05 17:23:16,786][04005] Num frames 2900... [2024-07-05 17:23:16,867][04005] Num frames 3000... [2024-07-05 17:23:16,942][04005] Num frames 3100... [2024-07-05 17:23:17,016][04005] Num frames 3200... [2024-07-05 17:23:17,087][04005] Num frames 3300... [2024-07-05 17:23:17,161][04005] Num frames 3400... [2024-07-05 17:23:17,234][04005] Num frames 3500... [2024-07-05 17:23:17,309][04005] Num frames 3600... [2024-07-05 17:23:17,383][04005] Num frames 3700... [2024-07-05 17:23:17,457][04005] Num frames 3800... [2024-07-05 17:23:17,530][04005] Num frames 3900... [2024-07-05 17:23:17,604][04005] Num frames 4000... [2024-07-05 17:23:17,677][04005] Num frames 4100... [2024-07-05 17:23:17,753][04005] Num frames 4200... [2024-07-05 17:23:17,805][04005] Avg episode rewards: #0: 62.499, true rewards: #0: 21.000 [2024-07-05 17:23:17,806][04005] Avg episode reward: 62.499, avg true_objective: 21.000 [2024-07-05 17:23:17,882][04005] Num frames 4300... [2024-07-05 17:23:17,954][04005] Num frames 4400... [2024-07-05 17:23:18,028][04005] Num frames 4500... [2024-07-05 17:23:18,101][04005] Num frames 4600... [2024-07-05 17:23:18,175][04005] Num frames 4700... [2024-07-05 17:23:18,248][04005] Num frames 4800... [2024-07-05 17:23:18,322][04005] Num frames 4900... [2024-07-05 17:23:18,396][04005] Num frames 5000... [2024-07-05 17:23:18,469][04005] Num frames 5100... [2024-07-05 17:23:18,542][04005] Num frames 5200... [2024-07-05 17:23:18,616][04005] Num frames 5300... [2024-07-05 17:23:18,697][04005] Num frames 5400... [2024-07-05 17:23:18,771][04005] Num frames 5500... [2024-07-05 17:23:18,842][04005] Num frames 5600... [2024-07-05 17:23:18,914][04005] Num frames 5700... [2024-07-05 17:23:18,986][04005] Num frames 5800... [2024-07-05 17:23:19,060][04005] Num frames 5900... [2024-07-05 17:23:19,132][04005] Num frames 6000... [2024-07-05 17:23:19,203][04005] Num frames 6100... [2024-07-05 17:23:19,273][04005] Num frames 6200... [2024-07-05 17:23:19,347][04005] Num frames 6300... [2024-07-05 17:23:19,398][04005] Avg episode rewards: #0: 59.999, true rewards: #0: 21.000 [2024-07-05 17:23:19,399][04005] Avg episode reward: 59.999, avg true_objective: 21.000 [2024-07-05 17:23:19,472][04005] Num frames 6400... [2024-07-05 17:23:19,542][04005] Num frames 6500... [2024-07-05 17:23:19,616][04005] Num frames 6600... [2024-07-05 17:23:19,686][04005] Num frames 6700... [2024-07-05 17:23:19,757][04005] Num frames 6800... [2024-07-05 17:23:19,829][04005] Num frames 6900... [2024-07-05 17:23:19,901][04005] Num frames 7000... [2024-07-05 17:23:19,972][04005] Num frames 7100... [2024-07-05 17:23:20,043][04005] Num frames 7200... [2024-07-05 17:23:20,113][04005] Num frames 7300... [2024-07-05 17:23:20,183][04005] Num frames 7400... [2024-07-05 17:23:20,253][04005] Num frames 7500... [2024-07-05 17:23:20,323][04005] Num frames 7600... [2024-07-05 17:23:20,392][04005] Num frames 7700... [2024-07-05 17:23:20,463][04005] Num frames 7800... [2024-07-05 17:23:20,536][04005] Num frames 7900... [2024-07-05 17:23:20,613][04005] Avg episode rewards: #0: 56.829, true rewards: #0: 19.830 [2024-07-05 17:23:20,615][04005] Avg episode reward: 56.829, avg true_objective: 19.830 [2024-07-05 17:23:20,663][04005] Num frames 8000... [2024-07-05 17:23:20,735][04005] Num frames 8100... [2024-07-05 17:23:20,812][04005] Num frames 8200... [2024-07-05 17:23:20,886][04005] Num frames 8300... [2024-07-05 17:23:20,964][04005] Num frames 8400... [2024-07-05 17:23:21,038][04005] Num frames 8500... [2024-07-05 17:23:21,114][04005] Num frames 8600... [2024-07-05 17:23:21,186][04005] Num frames 8700... [2024-07-05 17:23:21,266][04005] Num frames 8800... [2024-07-05 17:23:21,338][04005] Num frames 8900... [2024-07-05 17:23:21,412][04005] Num frames 9000... [2024-07-05 17:23:21,485][04005] Num frames 9100... [2024-07-05 17:23:21,557][04005] Num frames 9200... [2024-07-05 17:23:21,629][04005] Num frames 9300... [2024-07-05 17:23:21,702][04005] Num frames 9400... [2024-07-05 17:23:21,778][04005] Num frames 9500... [2024-07-05 17:23:21,849][04005] Num frames 9600... [2024-07-05 17:23:21,922][04005] Num frames 9700... [2024-07-05 17:23:21,994][04005] Num frames 9800... [2024-07-05 17:23:22,069][04005] Num frames 9900... [2024-07-05 17:23:22,142][04005] Num frames 10000... [2024-07-05 17:23:22,219][04005] Avg episode rewards: #0: 55.863, true rewards: #0: 20.064 [2024-07-05 17:23:22,220][04005] Avg episode reward: 55.863, avg true_objective: 20.064 [2024-07-05 17:23:22,272][04005] Num frames 10100... [2024-07-05 17:23:22,345][04005] Num frames 10200... [2024-07-05 17:23:22,418][04005] Num frames 10300... [2024-07-05 17:23:22,496][04005] Num frames 10400... [2024-07-05 17:23:22,569][04005] Num frames 10500... [2024-07-05 17:23:22,640][04005] Num frames 10600... [2024-07-05 17:23:22,711][04005] Num frames 10700... [2024-07-05 17:23:22,785][04005] Num frames 10800... [2024-07-05 17:23:22,859][04005] Num frames 10900... [2024-07-05 17:23:22,931][04005] Num frames 11000... [2024-07-05 17:23:23,002][04005] Num frames 11100... [2024-07-05 17:23:23,074][04005] Num frames 11200... [2024-07-05 17:23:23,144][04005] Num frames 11300... [2024-07-05 17:23:23,216][04005] Num frames 11400... [2024-07-05 17:23:23,291][04005] Num frames 11500... [2024-07-05 17:23:23,364][04005] Num frames 11600... [2024-07-05 17:23:23,439][04005] Num frames 11700... [2024-07-05 17:23:23,509][04005] Num frames 11800... [2024-07-05 17:23:23,581][04005] Num frames 11900... [2024-07-05 17:23:23,658][04005] Num frames 12000... [2024-07-05 17:23:23,735][04005] Num frames 12100... [2024-07-05 17:23:23,812][04005] Avg episode rewards: #0: 56.719, true rewards: #0: 20.220 [2024-07-05 17:23:23,814][04005] Avg episode reward: 56.719, avg true_objective: 20.220 [2024-07-05 17:23:23,863][04005] Num frames 12200... [2024-07-05 17:23:23,933][04005] Num frames 12300... [2024-07-05 17:23:24,006][04005] Num frames 12400... [2024-07-05 17:23:24,079][04005] Num frames 12500... [2024-07-05 17:23:24,150][04005] Num frames 12600... [2024-07-05 17:23:24,219][04005] Num frames 12700... [2024-07-05 17:23:24,289][04005] Num frames 12800... [2024-07-05 17:23:24,361][04005] Num frames 12900... [2024-07-05 17:23:24,431][04005] Num frames 13000... [2024-07-05 17:23:24,501][04005] Num frames 13100... [2024-07-05 17:23:24,574][04005] Num frames 13200... [2024-07-05 17:23:24,642][04005] Num frames 13300... [2024-07-05 17:23:24,714][04005] Num frames 13400... [2024-07-05 17:23:24,786][04005] Num frames 13500... [2024-07-05 17:23:24,859][04005] Num frames 13600... [2024-07-05 17:23:24,932][04005] Num frames 13700... [2024-07-05 17:23:25,006][04005] Num frames 13800... [2024-07-05 17:23:25,081][04005] Num frames 13900... [2024-07-05 17:23:25,156][04005] Num frames 14000... [2024-07-05 17:23:25,227][04005] Num frames 14100... [2024-07-05 17:23:25,301][04005] Num frames 14200... [2024-07-05 17:23:25,380][04005] Avg episode rewards: #0: 57.330, true rewards: #0: 20.331 [2024-07-05 17:23:25,381][04005] Avg episode reward: 57.330, avg true_objective: 20.331 [2024-07-05 17:23:25,430][04005] Num frames 14300... [2024-07-05 17:23:25,502][04005] Num frames 14400... [2024-07-05 17:23:25,574][04005] Num frames 14500... [2024-07-05 17:23:25,646][04005] Num frames 14600... [2024-07-05 17:23:25,716][04005] Num frames 14700... [2024-07-05 17:23:25,802][04005] Num frames 14800... [2024-07-05 17:23:25,876][04005] Num frames 14900... [2024-07-05 17:23:25,950][04005] Num frames 15000... [2024-07-05 17:23:26,022][04005] Num frames 15100... [2024-07-05 17:23:26,096][04005] Num frames 15200... [2024-07-05 17:23:26,171][04005] Num frames 15300... [2024-07-05 17:23:26,243][04005] Num frames 15400... [2024-07-05 17:23:26,315][04005] Num frames 15500... [2024-07-05 17:23:26,389][04005] Num frames 15600... [2024-07-05 17:23:26,464][04005] Num frames 15700... [2024-07-05 17:23:26,538][04005] Num frames 15800... [2024-07-05 17:23:26,614][04005] Num frames 15900... [2024-07-05 17:23:26,691][04005] Num frames 16000... [2024-07-05 17:23:26,793][04005] Num frames 16100... [2024-07-05 17:23:26,897][04005] Num frames 16200... [2024-07-05 17:23:26,972][04005] Num frames 16300... [2024-07-05 17:23:27,050][04005] Avg episode rewards: #0: 57.789, true rewards: #0: 20.415 [2024-07-05 17:23:27,051][04005] Avg episode reward: 57.789, avg true_objective: 20.415 [2024-07-05 17:23:27,107][04005] Num frames 16400... [2024-07-05 17:23:27,189][04005] Num frames 16500... [2024-07-05 17:23:27,260][04005] Num frames 16600... [2024-07-05 17:23:27,335][04005] Num frames 16700... [2024-07-05 17:23:27,412][04005] Num frames 16800... [2024-07-05 17:23:27,486][04005] Num frames 16900... [2024-07-05 17:23:27,559][04005] Num frames 17000... [2024-07-05 17:23:27,632][04005] Num frames 17100... [2024-07-05 17:23:27,704][04005] Num frames 17200... [2024-07-05 17:23:27,776][04005] Num frames 17300... [2024-07-05 17:23:27,846][04005] Num frames 17400... [2024-07-05 17:23:27,919][04005] Num frames 17500... [2024-07-05 17:23:27,994][04005] Num frames 17600... [2024-07-05 17:23:28,068][04005] Num frames 17700... [2024-07-05 17:23:28,141][04005] Num frames 17800... [2024-07-05 17:23:28,213][04005] Num frames 17900... [2024-07-05 17:23:28,287][04005] Num frames 18000... [2024-07-05 17:23:28,361][04005] Num frames 18100... [2024-07-05 17:23:28,432][04005] Num frames 18200... [2024-07-05 17:23:28,506][04005] Num frames 18300... [2024-07-05 17:23:28,582][04005] Num frames 18400... [2024-07-05 17:23:28,660][04005] Avg episode rewards: #0: 57.590, true rewards: #0: 20.480 [2024-07-05 17:23:28,661][04005] Avg episode reward: 57.590, avg true_objective: 20.480 [2024-07-05 17:23:28,711][04005] Num frames 18500... [2024-07-05 17:23:28,783][04005] Num frames 18600... [2024-07-05 17:23:28,854][04005] Num frames 18700... [2024-07-05 17:23:28,928][04005] Num frames 18800... [2024-07-05 17:23:29,002][04005] Num frames 18900... [2024-07-05 17:23:29,077][04005] Num frames 19000... [2024-07-05 17:23:29,148][04005] Num frames 19100... [2024-07-05 17:23:29,229][04005] Num frames 19200... [2024-07-05 17:23:29,303][04005] Num frames 19300... [2024-07-05 17:23:29,373][04005] Num frames 19400... [2024-07-05 17:23:29,445][04005] Num frames 19500... [2024-07-05 17:23:29,516][04005] Num frames 19600... [2024-07-05 17:23:29,586][04005] Num frames 19700... [2024-07-05 17:23:29,657][04005] Num frames 19800... [2024-07-05 17:23:29,728][04005] Num frames 19900... [2024-07-05 17:23:29,800][04005] Num frames 20000... [2024-07-05 17:23:29,872][04005] Num frames 20100... [2024-07-05 17:23:29,942][04005] Num frames 20200... [2024-07-05 17:23:30,012][04005] Num frames 20300... [2024-07-05 17:23:30,089][04005] Num frames 20400... [2024-07-05 17:23:30,162][04005] Num frames 20500... [2024-07-05 17:23:30,244][04005] Avg episode rewards: #0: 57.831, true rewards: #0: 20.532 [2024-07-05 17:23:30,245][04005] Avg episode reward: 57.831, avg true_objective: 20.532 [2024-07-05 17:23:51,410][04005] Replay video saved to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/conv_resnet/replay.mp4!