diff --git "a/sf_log.txt" "b/sf_log.txt"
new file mode 100644--- /dev/null
+++ "b/sf_log.txt"
@@ -0,0 +1,1587 @@
+[2024-07-03 21:59:57,637][50933] Saving configuration to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/config.json...
+[2024-07-03 21:59:57,638][50933] Rollout worker 0 uses device cpu
+[2024-07-03 21:59:57,639][50933] Rollout worker 1 uses device cpu
+[2024-07-03 21:59:57,639][50933] Rollout worker 2 uses device cpu
+[2024-07-03 21:59:57,639][50933] Rollout worker 3 uses device cpu
+[2024-07-03 21:59:57,639][50933] Rollout worker 4 uses device cpu
+[2024-07-03 21:59:57,640][50933] Rollout worker 5 uses device cpu
+[2024-07-03 21:59:57,640][50933] Rollout worker 6 uses device cpu
+[2024-07-03 21:59:57,640][50933] Rollout worker 7 uses device cpu
+[2024-07-03 21:59:57,684][50933] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-07-03 21:59:57,685][50933] InferenceWorker_p0-w0: min num requests: 2
+[2024-07-03 21:59:57,712][50933] Starting all processes...
+[2024-07-03 21:59:57,713][50933] Starting process learner_proc0
+[2024-07-03 21:59:58,429][50933] Starting all processes...
+[2024-07-03 21:59:58,435][50933] Starting process inference_proc0-0
+[2024-07-03 21:59:58,436][50933] Starting process rollout_proc0
+[2024-07-03 21:59:58,436][50933] Starting process rollout_proc1
+[2024-07-03 21:59:58,436][50933] Starting process rollout_proc2
+[2024-07-03 21:59:58,437][50933] Starting process rollout_proc3
+[2024-07-03 21:59:58,437][50933] Starting process rollout_proc4
+[2024-07-03 21:59:58,437][50933] Starting process rollout_proc5
+[2024-07-03 21:59:58,438][50933] Starting process rollout_proc6
+[2024-07-03 21:59:58,438][50933] Starting process rollout_proc7
+[2024-07-03 22:00:01,016][52280] Worker 0 uses CPU cores [0, 1]
+[2024-07-03 22:00:01,104][52286] Worker 3 uses CPU cores [6, 7]
+[2024-07-03 22:00:01,197][52287] Worker 7 uses CPU cores [14, 15]
+[2024-07-03 22:00:01,246][52285] Worker 5 uses CPU cores [10, 11]
+[2024-07-03 22:00:01,369][52284] Worker 4 uses CPU cores [8, 9]
+[2024-07-03 22:00:01,369][52281] Worker 1 uses CPU cores [2, 3]
+[2024-07-03 22:00:01,378][52267] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-07-03 22:00:01,378][52267] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2024-07-03 22:00:01,424][52267] Num visible devices: 1
+[2024-07-03 22:00:01,425][52283] Worker 2 uses CPU cores [4, 5]
+[2024-07-03 22:00:01,442][52267] Starting seed is not provided
+[2024-07-03 22:00:01,443][52267] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-07-03 22:00:01,443][52267] Initializing actor-critic model on device cuda:0
+[2024-07-03 22:00:01,443][52267] RunningMeanStd input shape: (3, 72, 128)
+[2024-07-03 22:00:01,452][52267] RunningMeanStd input shape: (1,)
+[2024-07-03 22:00:01,460][52267] ConvEncoder: input_channels=3
+[2024-07-03 22:00:01,501][52282] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-07-03 22:00:01,501][52282] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2024-07-03 22:00:01,540][52267] Conv encoder output size: 512
+[2024-07-03 22:00:01,540][52267] Policy head output size: 512
+[2024-07-03 22:00:01,546][52282] Num visible devices: 1
+[2024-07-03 22:00:01,556][52267] Created Actor Critic model with architecture:
+[2024-07-03 22:00:01,556][52267] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2024-07-03 22:00:01,567][52288] Worker 6 uses CPU cores [12, 13]
+[2024-07-03 22:00:01,675][52267] Using optimizer <class 'torch.optim.adam.Adam'>
+[2024-07-03 22:00:02,200][52267] No checkpoints found
+[2024-07-03 22:00:02,200][52267] Did not load from checkpoint, starting from scratch!
+[2024-07-03 22:00:02,200][52267] Initialized policy 0 weights for model version 0
+[2024-07-03 22:00:02,202][52267] LearnerWorker_p0 finished initialization!
+[2024-07-03 22:00:02,202][52267] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-07-03 22:00:02,268][52282] RunningMeanStd input shape: (3, 72, 128)
+[2024-07-03 22:00:02,268][52282] RunningMeanStd input shape: (1,)
+[2024-07-03 22:00:02,276][52282] ConvEncoder: input_channels=3
+[2024-07-03 22:00:02,331][52282] Conv encoder output size: 512
+[2024-07-03 22:00:02,331][52282] Policy head output size: 512
+[2024-07-03 22:00:02,365][50933] Inference worker 0-0 is ready!
+[2024-07-03 22:00:02,366][50933] All inference workers are ready! Signal rollout workers to start!
+[2024-07-03 22:00:02,406][52284] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:00:02,412][52281] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:00:02,414][52280] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:00:02,416][52285] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:00:02,416][52283] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:00:02,416][52288] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:00:02,419][52286] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:00:02,419][52287] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:00:02,543][50933] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-07-03 22:00:02,910][52284] Decorrelating experience for 0 frames...
+[2024-07-03 22:00:02,910][52285] Decorrelating experience for 0 frames...
+[2024-07-03 22:00:02,910][52286] Decorrelating experience for 0 frames...
+[2024-07-03 22:00:02,910][52288] Decorrelating experience for 0 frames...
+[2024-07-03 22:00:02,911][52287] Decorrelating experience for 0 frames...
+[2024-07-03 22:00:03,061][52286] Decorrelating experience for 32 frames...
+[2024-07-03 22:00:03,062][52284] Decorrelating experience for 32 frames...
+[2024-07-03 22:00:03,062][52288] Decorrelating experience for 32 frames...
+[2024-07-03 22:00:03,063][52285] Decorrelating experience for 32 frames...
+[2024-07-03 22:00:03,228][52287] Decorrelating experience for 32 frames...
+[2024-07-03 22:00:03,251][52281] Decorrelating experience for 0 frames...
+[2024-07-03 22:00:03,259][52288] Decorrelating experience for 64 frames...
+[2024-07-03 22:00:03,259][52286] Decorrelating experience for 64 frames...
+[2024-07-03 22:00:03,269][52285] Decorrelating experience for 64 frames...
+[2024-07-03 22:00:03,403][52281] Decorrelating experience for 32 frames...
+[2024-07-03 22:00:03,404][52284] Decorrelating experience for 64 frames...
+[2024-07-03 22:00:03,431][52288] Decorrelating experience for 96 frames...
+[2024-07-03 22:00:03,431][52286] Decorrelating experience for 96 frames...
+[2024-07-03 22:00:03,576][52287] Decorrelating experience for 64 frames...
+[2024-07-03 22:00:03,576][52284] Decorrelating experience for 96 frames...
+[2024-07-03 22:00:03,597][52281] Decorrelating experience for 64 frames...
+[2024-07-03 22:00:03,597][52285] Decorrelating experience for 96 frames...
+[2024-07-03 22:00:03,745][52287] Decorrelating experience for 96 frames...
+[2024-07-03 22:00:03,762][52280] Decorrelating experience for 0 frames...
+[2024-07-03 22:00:03,788][52281] Decorrelating experience for 96 frames...
+[2024-07-03 22:00:03,916][52280] Decorrelating experience for 32 frames...
+[2024-07-03 22:00:04,113][52283] Decorrelating experience for 0 frames...
+[2024-07-03 22:00:04,120][52280] Decorrelating experience for 64 frames...
+[2024-07-03 22:00:04,273][52283] Decorrelating experience for 32 frames...
+[2024-07-03 22:00:04,455][52280] Decorrelating experience for 96 frames...
+[2024-07-03 22:00:04,478][52283] Decorrelating experience for 64 frames...
+[2024-07-03 22:00:04,483][52267] Signal inference workers to stop experience collection...
+[2024-07-03 22:00:04,488][52282] InferenceWorker_p0-w0: stopping experience collection
+[2024-07-03 22:00:04,653][52283] Decorrelating experience for 96 frames...
+[2024-07-03 22:00:05,734][52267] Signal inference workers to resume experience collection...
+[2024-07-03 22:00:05,735][52282] InferenceWorker_p0-w0: resuming experience collection
+[2024-07-03 22:00:07,144][52282] Updated weights for policy 0, policy_version 10 (0.0096)
+[2024-07-03 22:00:07,543][50933] Fps is (10 sec: 9830.8, 60 sec: 9830.8, 300 sec: 9830.8). Total num frames: 49152. Throughput: 0: 491.6. Samples: 2458. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2024-07-03 22:00:07,543][50933] Avg episode reward: [(0, '4.343')]
+[2024-07-03 22:00:08,657][52282] Updated weights for policy 0, policy_version 20 (0.0007)
+[2024-07-03 22:00:10,147][52282] Updated weights for policy 0, policy_version 30 (0.0007)
+[2024-07-03 22:00:11,627][52282] Updated weights for policy 0, policy_version 40 (0.0007)
+[2024-07-03 22:00:12,543][50933] Fps is (10 sec: 18841.9, 60 sec: 18841.9, 300 sec: 18841.9). Total num frames: 188416. Throughput: 0: 3962.1. Samples: 39620. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2024-07-03 22:00:12,543][50933] Avg episode reward: [(0, '4.315')]
+[2024-07-03 22:00:12,546][52267] Saving new best policy, reward=4.315!
+[2024-07-03 22:00:13,106][52282] Updated weights for policy 0, policy_version 50 (0.0006)
+[2024-07-03 22:00:14,546][52282] Updated weights for policy 0, policy_version 60 (0.0006)
+[2024-07-03 22:00:16,014][52282] Updated weights for policy 0, policy_version 70 (0.0006)
+[2024-07-03 22:00:17,463][52282] Updated weights for policy 0, policy_version 80 (0.0007)
+[2024-07-03 22:00:17,543][50933] Fps is (10 sec: 27852.7, 60 sec: 21845.6, 300 sec: 21845.6). Total num frames: 327680. Throughput: 0: 5455.0. Samples: 81824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2024-07-03 22:00:17,543][50933] Avg episode reward: [(0, '4.443')]
+[2024-07-03 22:00:17,544][52267] Saving new best policy, reward=4.443!
+[2024-07-03 22:00:17,677][50933] Heartbeat connected on Batcher_0
+[2024-07-03 22:00:17,680][50933] Heartbeat connected on LearnerWorker_p0
+[2024-07-03 22:00:17,687][50933] Heartbeat connected on InferenceWorker_p0-w0
+[2024-07-03 22:00:17,691][50933] Heartbeat connected on RolloutWorker_w0
+[2024-07-03 22:00:17,693][50933] Heartbeat connected on RolloutWorker_w1
+[2024-07-03 22:00:17,696][50933] Heartbeat connected on RolloutWorker_w2
+[2024-07-03 22:00:17,699][50933] Heartbeat connected on RolloutWorker_w3
+[2024-07-03 22:00:17,703][50933] Heartbeat connected on RolloutWorker_w4
+[2024-07-03 22:00:17,705][50933] Heartbeat connected on RolloutWorker_w5
+[2024-07-03 22:00:17,708][50933] Heartbeat connected on RolloutWorker_w6
+[2024-07-03 22:00:17,713][50933] Heartbeat connected on RolloutWorker_w7
+[2024-07-03 22:00:18,934][52282] Updated weights for policy 0, policy_version 90 (0.0007)
+[2024-07-03 22:00:20,382][52282] Updated weights for policy 0, policy_version 100 (0.0006)
+[2024-07-03 22:00:21,925][52282] Updated weights for policy 0, policy_version 110 (0.0007)
+[2024-07-03 22:00:22,543][50933] Fps is (10 sec: 27852.9, 60 sec: 23347.5, 300 sec: 23347.5). Total num frames: 466944. Throughput: 0: 5144.2. Samples: 102882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2024-07-03 22:00:22,543][50933] Avg episode reward: [(0, '4.404')]
+[2024-07-03 22:00:23,477][52282] Updated weights for policy 0, policy_version 120 (0.0007)
+[2024-07-03 22:00:24,948][52282] Updated weights for policy 0, policy_version 130 (0.0006)
+[2024-07-03 22:00:26,408][52282] Updated weights for policy 0, policy_version 140 (0.0006)
+[2024-07-03 22:00:27,543][50933] Fps is (10 sec: 27443.4, 60 sec: 24084.7, 300 sec: 24084.7). Total num frames: 602112. Throughput: 0: 5749.5. Samples: 143736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2024-07-03 22:00:27,543][50933] Avg episode reward: [(0, '4.278')]
+[2024-07-03 22:00:27,842][52282] Updated weights for policy 0, policy_version 150 (0.0006)
+[2024-07-03 22:00:29,278][52282] Updated weights for policy 0, policy_version 160 (0.0006)
+[2024-07-03 22:00:30,728][52282] Updated weights for policy 0, policy_version 170 (0.0006)
+[2024-07-03 22:00:32,148][52282] Updated weights for policy 0, policy_version 180 (0.0006)
+[2024-07-03 22:00:32,543][50933] Fps is (10 sec: 27852.7, 60 sec: 24849.2, 300 sec: 24849.2). Total num frames: 745472. Throughput: 0: 6213.5. Samples: 186404. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2024-07-03 22:00:32,543][50933] Avg episode reward: [(0, '4.483')]
+[2024-07-03 22:00:32,546][52267] Saving new best policy, reward=4.483!
+[2024-07-03 22:00:33,627][52282] Updated weights for policy 0, policy_version 190 (0.0006)
+[2024-07-03 22:00:35,062][52282] Updated weights for policy 0, policy_version 200 (0.0006)
+[2024-07-03 22:00:36,504][52282] Updated weights for policy 0, policy_version 210 (0.0006)
+[2024-07-03 22:00:37,543][50933] Fps is (10 sec: 28672.0, 60 sec: 25395.3, 300 sec: 25395.3). Total num frames: 888832. Throughput: 0: 5928.0. Samples: 207480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2024-07-03 22:00:37,543][50933] Avg episode reward: [(0, '4.433')]
+[2024-07-03 22:00:37,965][52282] Updated weights for policy 0, policy_version 220 (0.0006)
+[2024-07-03 22:00:39,441][52282] Updated weights for policy 0, policy_version 230 (0.0007)
+[2024-07-03 22:00:40,885][52282] Updated weights for policy 0, policy_version 240 (0.0007)
+[2024-07-03 22:00:42,382][52282] Updated weights for policy 0, policy_version 250 (0.0006)
+[2024-07-03 22:00:42,543][50933] Fps is (10 sec: 28262.2, 60 sec: 25702.5, 300 sec: 25702.5). Total num frames: 1028096. Throughput: 0: 6235.4. Samples: 249414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2024-07-03 22:00:42,543][50933] Avg episode reward: [(0, '4.472')]
+[2024-07-03 22:00:43,824][52282] Updated weights for policy 0, policy_version 260 (0.0006)
+[2024-07-03 22:00:45,266][52282] Updated weights for policy 0, policy_version 270 (0.0006)
+[2024-07-03 22:00:46,707][52282] Updated weights for policy 0, policy_version 280 (0.0006)
+[2024-07-03 22:00:47,543][50933] Fps is (10 sec: 27852.6, 60 sec: 25941.4, 300 sec: 25941.4). Total num frames: 1167360. Throughput: 0: 6483.3. Samples: 291748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2024-07-03 22:00:47,544][50933] Avg episode reward: [(0, '4.797')]
+[2024-07-03 22:00:47,544][52267] Saving new best policy, reward=4.797!
+[2024-07-03 22:00:48,213][52282] Updated weights for policy 0, policy_version 290 (0.0006)
+[2024-07-03 22:00:49,746][52282] Updated weights for policy 0, policy_version 300 (0.0007)
+[2024-07-03 22:00:51,238][52282] Updated weights for policy 0, policy_version 310 (0.0006)
+[2024-07-03 22:00:52,543][50933] Fps is (10 sec: 27852.8, 60 sec: 26132.5, 300 sec: 26132.5). Total num frames: 1306624. Throughput: 0: 6881.9. Samples: 312146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2024-07-03 22:00:52,544][50933] Avg episode reward: [(0, '4.885')]
+[2024-07-03 22:00:52,546][52267] Saving new best policy, reward=4.885!
+[2024-07-03 22:00:52,687][52282] Updated weights for policy 0, policy_version 320 (0.0007)
+[2024-07-03 22:00:54,137][52282] Updated weights for policy 0, policy_version 330 (0.0007)
+[2024-07-03 22:00:55,574][52282] Updated weights for policy 0, policy_version 340 (0.0006)
+[2024-07-03 22:00:57,017][52282] Updated weights for policy 0, policy_version 350 (0.0007)
+[2024-07-03 22:00:57,543][50933] Fps is (10 sec: 27852.9, 60 sec: 26289.0, 300 sec: 26289.0). Total num frames: 1445888. Throughput: 0: 6993.8. Samples: 354340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2024-07-03 22:00:57,543][50933] Avg episode reward: [(0, '4.610')]
+[2024-07-03 22:00:58,451][52282] Updated weights for policy 0, policy_version 360 (0.0007)
+[2024-07-03 22:00:59,994][52282] Updated weights for policy 0, policy_version 370 (0.0007)
+[2024-07-03 22:01:01,493][52282] Updated weights for policy 0, policy_version 380 (0.0006)
+[2024-07-03 22:01:02,543][50933] Fps is (10 sec: 27443.1, 60 sec: 26351.0, 300 sec: 26351.0). Total num frames: 1581056. Throughput: 0: 6970.4. Samples: 395494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-07-03 22:01:02,544][50933] Avg episode reward: [(0, '4.584')]
+[2024-07-03 22:01:03,020][52282] Updated weights for policy 0, policy_version 390 (0.0007)
+[2024-07-03 22:01:04,454][52282] Updated weights for policy 0, policy_version 400 (0.0007)
+[2024-07-03 22:01:05,889][52282] Updated weights for policy 0, policy_version 410 (0.0006)
+[2024-07-03 22:01:07,336][52282] Updated weights for policy 0, policy_version 420 (0.0006)
+[2024-07-03 22:01:07,543][50933] Fps is (10 sec: 27852.7, 60 sec: 27921.0, 300 sec: 26529.5). Total num frames: 1724416. Throughput: 0: 6976.0. Samples: 416802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-07-03 22:01:07,543][50933] Avg episode reward: [(0, '4.922')]
+[2024-07-03 22:01:07,544][52267] Saving new best policy, reward=4.922!
+[2024-07-03 22:01:08,773][52282] Updated weights for policy 0, policy_version 430 (0.0007)
+[2024-07-03 22:01:10,227][52282] Updated weights for policy 0, policy_version 440 (0.0007)
+[2024-07-03 22:01:11,666][52282] Updated weights for policy 0, policy_version 450 (0.0007)
+[2024-07-03 22:01:12,543][50933] Fps is (10 sec: 28672.2, 60 sec: 27989.3, 300 sec: 26682.6). Total num frames: 1867776. Throughput: 0: 7014.9. Samples: 459406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-07-03 22:01:12,543][50933] Avg episode reward: [(0, '4.441')]
+[2024-07-03 22:01:13,110][52282] Updated weights for policy 0, policy_version 460 (0.0006)
+[2024-07-03 22:01:14,584][52282] Updated weights for policy 0, policy_version 470 (0.0006)
+[2024-07-03 22:01:16,030][52282] Updated weights for policy 0, policy_version 480 (0.0007)
+[2024-07-03 22:01:17,495][52282] Updated weights for policy 0, policy_version 490 (0.0006)
+[2024-07-03 22:01:17,543][50933] Fps is (10 sec: 28262.5, 60 sec: 27989.3, 300 sec: 26760.6). Total num frames: 2007040. Throughput: 0: 7008.0. Samples: 501762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:01:17,543][50933] Avg episode reward: [(0, '4.585')]
+[2024-07-03 22:01:18,991][52282] Updated weights for policy 0, policy_version 500 (0.0006)
+[2024-07-03 22:01:20,449][52282] Updated weights for policy 0, policy_version 510 (0.0006)
+[2024-07-03 22:01:21,967][52282] Updated weights for policy 0, policy_version 520 (0.0007)
+[2024-07-03 22:01:22,543][50933] Fps is (10 sec: 27442.9, 60 sec: 27921.0, 300 sec: 26777.6). Total num frames: 2142208. Throughput: 0: 7002.1. Samples: 522574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:01:22,544][50933] Avg episode reward: [(0, '4.604')]
+[2024-07-03 22:01:23,448][52282] Updated weights for policy 0, policy_version 530 (0.0006)
+[2024-07-03 22:01:24,898][52282] Updated weights for policy 0, policy_version 540 (0.0007)
+[2024-07-03 22:01:26,351][52282] Updated weights for policy 0, policy_version 550 (0.0007)
+[2024-07-03 22:01:27,543][50933] Fps is (10 sec: 27852.8, 60 sec: 28057.6, 300 sec: 26889.1). Total num frames: 2285568. Throughput: 0: 6995.5. Samples: 564212. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:01:27,543][50933] Avg episode reward: [(0, '4.713')]
+[2024-07-03 22:01:27,808][52282] Updated weights for policy 0, policy_version 560 (0.0007)
+[2024-07-03 22:01:29,252][52282] Updated weights for policy 0, policy_version 570 (0.0007)
+[2024-07-03 22:01:30,694][52282] Updated weights for policy 0, policy_version 580 (0.0007)
+[2024-07-03 22:01:32,150][52282] Updated weights for policy 0, policy_version 590 (0.0007)
+[2024-07-03 22:01:32,543][50933] Fps is (10 sec: 28262.5, 60 sec: 27989.3, 300 sec: 26942.6). Total num frames: 2424832. Throughput: 0: 6997.7. Samples: 606646. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:01:32,543][50933] Avg episode reward: [(0, '4.567')]
+[2024-07-03 22:01:33,604][52282] Updated weights for policy 0, policy_version 600 (0.0006)
+[2024-07-03 22:01:35,046][52282] Updated weights for policy 0, policy_version 610 (0.0006)
+[2024-07-03 22:01:36,494][52282] Updated weights for policy 0, policy_version 620 (0.0006)
+[2024-07-03 22:01:37,543][50933] Fps is (10 sec: 28262.5, 60 sec: 27989.3, 300 sec: 27033.7). Total num frames: 2568192. Throughput: 0: 7013.8. Samples: 627768. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:01:37,543][50933] Avg episode reward: [(0, '4.946')]
+[2024-07-03 22:01:37,544][52267] Saving new best policy, reward=4.946!
+[2024-07-03 22:01:37,956][52282] Updated weights for policy 0, policy_version 630 (0.0007)
+[2024-07-03 22:01:39,427][52282] Updated weights for policy 0, policy_version 640 (0.0006)
+[2024-07-03 22:01:40,877][52282] Updated weights for policy 0, policy_version 650 (0.0006)
+[2024-07-03 22:01:42,330][52282] Updated weights for policy 0, policy_version 660 (0.0007)
+[2024-07-03 22:01:42,543][50933] Fps is (10 sec: 28262.4, 60 sec: 27989.3, 300 sec: 27074.6). Total num frames: 2707456. Throughput: 0: 7012.8. Samples: 669918. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:01:42,544][50933] Avg episode reward: [(0, '5.278')]
+[2024-07-03 22:01:42,546][52267] Saving new best policy, reward=5.278!
+[2024-07-03 22:01:43,776][52282] Updated weights for policy 0, policy_version 670 (0.0006)
+[2024-07-03 22:01:45,229][52282] Updated weights for policy 0, policy_version 680 (0.0006)
+[2024-07-03 22:01:46,677][52282] Updated weights for policy 0, policy_version 690 (0.0006)
+[2024-07-03 22:01:47,543][50933] Fps is (10 sec: 27852.5, 60 sec: 27989.3, 300 sec: 27111.6). Total num frames: 2846720. Throughput: 0: 7042.0. Samples: 712382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:01:47,544][50933] Avg episode reward: [(0, '4.903')]
+[2024-07-03 22:01:48,130][52282] Updated weights for policy 0, policy_version 700 (0.0006)
+[2024-07-03 22:01:49,575][52282] Updated weights for policy 0, policy_version 710 (0.0006)
+[2024-07-03 22:01:51,030][52282] Updated weights for policy 0, policy_version 720 (0.0007)
+[2024-07-03 22:01:52,479][52282] Updated weights for policy 0, policy_version 730 (0.0007)
+[2024-07-03 22:01:52,543][50933] Fps is (10 sec: 28262.5, 60 sec: 28057.6, 300 sec: 27182.6). Total num frames: 2990080. Throughput: 0: 7039.1. Samples: 733560. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:01:52,544][50933] Avg episode reward: [(0, '4.446')]
+[2024-07-03 22:01:52,546][52267] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000000730_2990080.pth...
+[2024-07-03 22:01:53,969][52282] Updated weights for policy 0, policy_version 740 (0.0007)
+[2024-07-03 22:01:55,412][52282] Updated weights for policy 0, policy_version 750 (0.0007)
+[2024-07-03 22:01:56,890][52282] Updated weights for policy 0, policy_version 760 (0.0006)
+[2024-07-03 22:01:57,543][50933] Fps is (10 sec: 28262.5, 60 sec: 28057.6, 300 sec: 27211.7). Total num frames: 3129344. Throughput: 0: 7028.6. Samples: 775692. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:01:57,544][50933] Avg episode reward: [(0, '4.802')]
+[2024-07-03 22:01:58,332][52282] Updated weights for policy 0, policy_version 770 (0.0006)
+[2024-07-03 22:01:59,764][52282] Updated weights for policy 0, policy_version 780 (0.0006)
+[2024-07-03 22:02:01,218][52282] Updated weights for policy 0, policy_version 790 (0.0006)
+[2024-07-03 22:02:02,543][50933] Fps is (10 sec: 27852.8, 60 sec: 28125.9, 300 sec: 27238.4). Total num frames: 3268608. Throughput: 0: 7022.9. Samples: 817794. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:02:02,544][50933] Avg episode reward: [(0, '5.027')]
+[2024-07-03 22:02:02,701][52282] Updated weights for policy 0, policy_version 800 (0.0006)
+[2024-07-03 22:02:04,196][52282] Updated weights for policy 0, policy_version 810 (0.0006)
+[2024-07-03 22:02:05,648][52282] Updated weights for policy 0, policy_version 820 (0.0006)
+[2024-07-03 22:02:07,099][52282] Updated weights for policy 0, policy_version 830 (0.0006)
+[2024-07-03 22:02:07,543][50933] Fps is (10 sec: 28262.3, 60 sec: 28125.8, 300 sec: 27295.8). Total num frames: 3411968. Throughput: 0: 7021.3. Samples: 838534. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:02:07,543][50933] Avg episode reward: [(0, '4.713')]
+[2024-07-03 22:02:08,556][52282] Updated weights for policy 0, policy_version 840 (0.0006)
+[2024-07-03 22:02:10,015][52282] Updated weights for policy 0, policy_version 850 (0.0006)
+[2024-07-03 22:02:11,468][52282] Updated weights for policy 0, policy_version 860 (0.0006)
+[2024-07-03 22:02:12,543][50933] Fps is (10 sec: 28262.4, 60 sec: 28057.6, 300 sec: 27317.2). Total num frames: 3551232. Throughput: 0: 7036.9. Samples: 880872. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:02:12,543][50933] Avg episode reward: [(0, '4.729')]
+[2024-07-03 22:02:12,923][52282] Updated weights for policy 0, policy_version 870 (0.0007)
+[2024-07-03 22:02:14,392][52282] Updated weights for policy 0, policy_version 880 (0.0006)
+[2024-07-03 22:02:15,852][52282] Updated weights for policy 0, policy_version 890 (0.0006)
+[2024-07-03 22:02:17,339][52282] Updated weights for policy 0, policy_version 900 (0.0007)
+[2024-07-03 22:02:17,543][50933] Fps is (10 sec: 27852.9, 60 sec: 28057.6, 300 sec: 27337.0). Total num frames: 3690496. Throughput: 0: 7021.6. Samples: 922616. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:02:17,543][50933] Avg episode reward: [(0, '4.525')]
+[2024-07-03 22:02:18,797][52282] Updated weights for policy 0, policy_version 910 (0.0007)
+[2024-07-03 22:02:20,247][52282] Updated weights for policy 0, policy_version 920 (0.0006)
+[2024-07-03 22:02:21,693][52282] Updated weights for policy 0, policy_version 930 (0.0006)
+[2024-07-03 22:02:22,543][50933] Fps is (10 sec: 27852.3, 60 sec: 28125.8, 300 sec: 27355.4). Total num frames: 3829760. Throughput: 0: 7023.3. Samples: 943816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:02:22,544][50933] Avg episode reward: [(0, '4.576')]
+[2024-07-03 22:02:23,161][52282] Updated weights for policy 0, policy_version 940 (0.0007)
+[2024-07-03 22:02:24,604][52282] Updated weights for policy 0, policy_version 950 (0.0006)
+[2024-07-03 22:02:26,058][52282] Updated weights for policy 0, policy_version 960 (0.0006)
+[2024-07-03 22:02:27,515][52282] Updated weights for policy 0, policy_version 970 (0.0006)
+[2024-07-03 22:02:27,543][50933] Fps is (10 sec: 28262.6, 60 sec: 28125.9, 300 sec: 27400.9). Total num frames: 3973120. Throughput: 0: 7024.6. Samples: 986024. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:02:27,543][50933] Avg episode reward: [(0, '4.954')]
+[2024-07-03 22:02:28,661][52267] Stopping Batcher_0...
+[2024-07-03 22:02:28,661][52267] Loop batcher_evt_loop terminating...
+[2024-07-03 22:02:28,661][52267] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2024-07-03 22:02:28,661][50933] Component Batcher_0 stopped!
+[2024-07-03 22:02:28,670][52284] Stopping RolloutWorker_w4...
+[2024-07-03 22:02:28,670][52286] Stopping RolloutWorker_w3...
+[2024-07-03 22:02:28,670][52285] Stopping RolloutWorker_w5...
+[2024-07-03 22:02:28,670][52284] Loop rollout_proc4_evt_loop terminating...
+[2024-07-03 22:02:28,670][52286] Loop rollout_proc3_evt_loop terminating...
+[2024-07-03 22:02:28,670][52285] Loop rollout_proc5_evt_loop terminating...
+[2024-07-03 22:02:28,670][52281] Stopping RolloutWorker_w1...
+[2024-07-03 22:02:28,670][52281] Loop rollout_proc1_evt_loop terminating...
+[2024-07-03 22:02:28,670][52288] Stopping RolloutWorker_w6...
+[2024-07-03 22:02:28,670][52287] Stopping RolloutWorker_w7...
+[2024-07-03 22:02:28,670][52288] Loop rollout_proc6_evt_loop terminating...
+[2024-07-03 22:02:28,670][50933] Component RolloutWorker_w4 stopped!
+[2024-07-03 22:02:28,670][52283] Stopping RolloutWorker_w2...
+[2024-07-03 22:02:28,670][52287] Loop rollout_proc7_evt_loop terminating...
+[2024-07-03 22:02:28,671][52283] Loop rollout_proc2_evt_loop terminating...
+[2024-07-03 22:02:28,671][52280] Stopping RolloutWorker_w0...
+[2024-07-03 22:02:28,671][50933] Component RolloutWorker_w3 stopped!
+[2024-07-03 22:02:28,671][52280] Loop rollout_proc0_evt_loop terminating...
+[2024-07-03 22:02:28,671][50933] Component RolloutWorker_w5 stopped!
+[2024-07-03 22:02:28,672][50933] Component RolloutWorker_w1 stopped!
+[2024-07-03 22:02:28,673][50933] Component RolloutWorker_w6 stopped!
+[2024-07-03 22:02:28,674][50933] Component RolloutWorker_w7 stopped!
+[2024-07-03 22:02:28,674][50933] Component RolloutWorker_w2 stopped!
+[2024-07-03 22:02:28,675][50933] Component RolloutWorker_w0 stopped!
+[2024-07-03 22:02:28,688][52282] Weights refcount: 2 0
+[2024-07-03 22:02:28,690][52282] Stopping InferenceWorker_p0-w0...
+[2024-07-03 22:02:28,690][52282] Loop inference_proc0-0_evt_loop terminating...
+[2024-07-03 22:02:28,690][50933] Component InferenceWorker_p0-w0 stopped!
+[2024-07-03 22:02:28,726][52267] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2024-07-03 22:02:28,812][52267] Stopping LearnerWorker_p0...
+[2024-07-03 22:02:28,813][52267] Loop learner_proc0_evt_loop terminating...
+[2024-07-03 22:02:28,812][50933] Component LearnerWorker_p0 stopped!
+[2024-07-03 22:02:28,814][50933] Waiting for process learner_proc0 to stop...
+[2024-07-03 22:02:29,814][50933] Waiting for process inference_proc0-0 to join...
+[2024-07-03 22:02:29,814][50933] Waiting for process rollout_proc0 to join...
+[2024-07-03 22:02:29,815][50933] Waiting for process rollout_proc1 to join...
+[2024-07-03 22:02:29,816][50933] Waiting for process rollout_proc2 to join...
+[2024-07-03 22:02:29,817][50933] Waiting for process rollout_proc3 to join...
+[2024-07-03 22:02:29,817][50933] Waiting for process rollout_proc4 to join...
+[2024-07-03 22:02:29,817][50933] Waiting for process rollout_proc5 to join...
+[2024-07-03 22:02:29,818][50933] Waiting for process rollout_proc6 to join...
+[2024-07-03 22:02:29,818][50933] Waiting for process rollout_proc7 to join...
+[2024-07-03 22:02:29,818][50933] Batcher 0 profile tree view:
+batching: 5.8192, releasing_batches: 0.0206
+[2024-07-03 22:02:29,818][50933] InferenceWorker_p0-w0 profile tree view:
+wait_policy: 0.0000
+  wait_policy_total: 2.6022
+update_model: 2.1613
+  weight_update: 0.0006
+one_step: 0.0022
+  handle_policy_step: 133.9007
+    deserialize: 5.6377, stack: 0.7367, obs_to_device_normalize: 28.7127, forward: 73.9404, send_messages: 6.5120
+    prepare_outputs: 13.6292
+      to_cpu: 7.9695
+[2024-07-03 22:02:29,819][50933] Learner 0 profile tree view:
+misc: 0.0036, prepare_batch: 8.9830
+train: 20.5424
+  epoch_init: 0.0030, minibatch_init: 0.0042, losses_postprocess: 0.1365, kl_divergence: 0.1275, after_optimizer: 8.7303
+  calculate_losses: 7.5125
+    losses_init: 0.0018, forward_head: 0.4894, bptt_initial: 5.4018, tail: 0.3618, advantages_returns: 0.0883, losses: 0.4649
+    bptt: 0.6121
+      bptt_forward_core: 0.5857
+  update: 3.8011
+    clip: 0.4336
+[2024-07-03 22:02:29,819][50933] RolloutWorker_w0 profile tree view:
+wait_for_trajectories: 0.0687, enqueue_policy_requests: 4.1133, env_step: 59.0065, overhead: 4.4516, complete_rollouts: 0.1292
+save_policy_outputs: 3.8450
+  split_output_tensors: 1.8805
+[2024-07-03 22:02:29,819][50933] RolloutWorker_w7 profile tree view:
+wait_for_trajectories: 0.0727, enqueue_policy_requests: 4.1815, env_step: 55.6920, overhead: 4.5741, complete_rollouts: 0.1285
+save_policy_outputs: 3.9624
+  split_output_tensors: 1.9538
+[2024-07-03 22:02:29,820][50933] Loop Runner_EvtLoop terminating...
+[2024-07-03 22:02:29,820][50933] Runner profile tree view:
+main_loop: 152.1079
+[2024-07-03 22:02:29,820][50933] Collected {0: 4005888}, FPS: 26335.8
+[2024-07-03 22:04:44,342][50933] Environment doom_basic already registered, overwriting...
+[2024-07-03 22:04:44,343][50933] Environment doom_two_colors_easy already registered, overwriting...
+[2024-07-03 22:04:44,343][50933] Environment doom_two_colors_hard already registered, overwriting...
+[2024-07-03 22:04:44,344][50933] Environment doom_dm already registered, overwriting...
+[2024-07-03 22:04:44,344][50933] Environment doom_dwango5 already registered, overwriting...
+[2024-07-03 22:04:44,344][50933] Environment doom_my_way_home_flat_actions already registered, overwriting...
+[2024-07-03 22:04:44,345][50933] Environment doom_defend_the_center_flat_actions already registered, overwriting...
+[2024-07-03 22:04:44,345][50933] Environment doom_my_way_home already registered, overwriting...
+[2024-07-03 22:04:44,345][50933] Environment doom_deadly_corridor already registered, overwriting...
+[2024-07-03 22:04:44,345][50933] Environment doom_defend_the_center already registered, overwriting...
+[2024-07-03 22:04:44,346][50933] Environment doom_defend_the_line already registered, overwriting...
+[2024-07-03 22:04:44,346][50933] Environment doom_health_gathering already registered, overwriting...
+[2024-07-03 22:04:44,346][50933] Environment doom_health_gathering_supreme already registered, overwriting...
+[2024-07-03 22:04:44,346][50933] Environment doom_battle already registered, overwriting...
+[2024-07-03 22:04:44,347][50933] Environment doom_battle2 already registered, overwriting...
+[2024-07-03 22:04:44,347][50933] Environment doom_duel_bots already registered, overwriting...
+[2024-07-03 22:04:44,347][50933] Environment doom_deathmatch_bots already registered, overwriting...
+[2024-07-03 22:04:44,347][50933] Environment doom_duel already registered, overwriting...
+[2024-07-03 22:04:44,348][50933] Environment doom_deathmatch_full already registered, overwriting...
+[2024-07-03 22:04:44,348][50933] Environment doom_benchmark already registered, overwriting...
+[2024-07-03 22:04:44,348][50933] register_encoder_factory: <function make_vizdoom_encoder at 0x720221f6a560>
+[2024-07-03 22:04:44,353][50933] Loading existing experiment configuration from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/config.json
+[2024-07-03 22:04:44,354][50933] Overriding arg 'train_for_env_steps' with value 20000000 passed from command line
+[2024-07-03 22:04:44,358][50933] Experiment dir /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment already exists!
+[2024-07-03 22:04:44,358][50933] Resuming existing experiment from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment...
+[2024-07-03 22:04:44,359][50933] Weights and Biases integration disabled
+[2024-07-03 22:04:44,360][50933] Environment var CUDA_VISIBLE_DEVICES is 0
+
+[2024-07-03 22:04:46,643][50933] Starting experiment with the following configuration:
+help=False
+algo=APPO
+env=doom_health_gathering_supreme
+experiment=default_experiment
+train_dir=/home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir
+restart_behavior=resume
+device=gpu
+seed=None
+num_policies=1
+async_rl=True
+serial_mode=False
+batched_sampling=False
+num_batches_to_accumulate=2
+worker_num_splits=2
+policy_workers_per_policy=1
+max_policy_lag=1000
+num_workers=8
+num_envs_per_worker=4
+batch_size=1024
+num_batches_per_epoch=1
+num_epochs=1
+rollout=32
+recurrence=32
+shuffle_minibatches=False
+gamma=0.99
+reward_scale=1.0
+reward_clip=1000.0
+value_bootstrap=False
+normalize_returns=True
+exploration_loss_coeff=0.001
+value_loss_coeff=0.5
+kl_loss_coeff=0.0
+exploration_loss=symmetric_kl
+gae_lambda=0.95
+ppo_clip_ratio=0.1
+ppo_clip_value=0.2
+with_vtrace=False
+vtrace_rho=1.0
+vtrace_c=1.0
+optimizer=adam
+adam_eps=1e-06
+adam_beta1=0.9
+adam_beta2=0.999
+max_grad_norm=4.0
+learning_rate=0.0001
+lr_schedule=constant
+lr_schedule_kl_threshold=0.008
+lr_adaptive_min=1e-06
+lr_adaptive_max=0.01
+obs_subtract_mean=0.0
+obs_scale=255.0
+normalize_input=True
+normalize_input_keys=None
+decorrelate_experience_max_seconds=0
+decorrelate_envs_on_one_worker=True
+actor_worker_gpus=[]
+set_workers_cpu_affinity=True
+force_envs_single_thread=False
+default_niceness=0
+log_to_file=True
+experiment_summaries_interval=10
+flush_summaries_interval=30
+stats_avg=100
+summaries_use_frameskip=True
+heartbeat_interval=20
+heartbeat_reporting_interval=600
+train_for_env_steps=20000000
+train_for_seconds=10000000000
+save_every_sec=120
+keep_checkpoints=2
+load_checkpoint_kind=latest
+save_milestones_sec=-1
+save_best_every_sec=5
+save_best_metric=reward
+save_best_after=100000
+benchmark=False
+encoder_mlp_layers=[512, 512]
+encoder_conv_architecture=convnet_simple
+encoder_conv_mlp_layers=[512]
+use_rnn=True
+rnn_size=512
+rnn_type=gru
+rnn_num_layers=1
+decoder_mlp_layers=[]
+nonlinearity=elu
+policy_initialization=orthogonal
+policy_init_gain=1.0
+actor_critic_share_weights=True
+adaptive_stddev=True
+continuous_tanh_scale=0.0
+initial_stddev=1.0
+use_env_info_cache=False
+env_gpu_actions=False
+env_gpu_observations=True
+env_frameskip=4
+env_framestack=1
+pixel_format=CHW
+use_record_episode_statistics=False
+with_wandb=False
+wandb_user=None
+wandb_project=sample_factory
+wandb_group=None
+wandb_job_type=SF
+wandb_tags=[]
+with_pbt=False
+pbt_mix_policies_in_one_env=True
+pbt_period_env_steps=5000000
+pbt_start_mutation=20000000
+pbt_replace_fraction=0.3
+pbt_mutation_rate=0.15
+pbt_replace_reward_gap=0.1
+pbt_replace_reward_gap_absolute=1e-06
+pbt_optimize_gamma=False
+pbt_target_objective=true_objective
+pbt_perturb_min=1.1
+pbt_perturb_max=1.5
+num_agents=-1
+num_humans=0
+num_bots=-1
+start_bot_difficulty=None
+timelimit=None
+res_w=128
+res_h=72
+wide_aspect_ratio=False
+eval_env_frameskip=1
+fps=35
+command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
+cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
+git_hash=unknown
+git_repo_name=not a git repository
+[2024-07-03 22:04:46,644][50933] Saving configuration to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/config.json...
+[2024-07-03 22:04:46,645][50933] Rollout worker 0 uses device cpu
+[2024-07-03 22:04:46,645][50933] Rollout worker 1 uses device cpu
+[2024-07-03 22:04:46,645][50933] Rollout worker 2 uses device cpu
+[2024-07-03 22:04:46,646][50933] Rollout worker 3 uses device cpu
+[2024-07-03 22:04:46,646][50933] Rollout worker 4 uses device cpu
+[2024-07-03 22:04:46,646][50933] Rollout worker 5 uses device cpu
+[2024-07-03 22:04:46,647][50933] Rollout worker 6 uses device cpu
+[2024-07-03 22:04:46,647][50933] Rollout worker 7 uses device cpu
+[2024-07-03 22:04:46,680][50933] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-07-03 22:04:46,680][50933] InferenceWorker_p0-w0: min num requests: 2
+[2024-07-03 22:04:46,708][50933] Starting all processes...
+[2024-07-03 22:04:46,708][50933] Starting process learner_proc0
+[2024-07-03 22:04:46,758][50933] Starting all processes...
+[2024-07-03 22:04:46,761][50933] Starting process inference_proc0-0
+[2024-07-03 22:04:46,762][50933] Starting process rollout_proc0
+[2024-07-03 22:04:46,762][50933] Starting process rollout_proc1
+[2024-07-03 22:04:46,762][50933] Starting process rollout_proc2
+[2024-07-03 22:04:46,762][50933] Starting process rollout_proc3
+[2024-07-03 22:04:46,762][50933] Starting process rollout_proc4
+[2024-07-03 22:04:46,762][50933] Starting process rollout_proc5
+[2024-07-03 22:04:46,762][50933] Starting process rollout_proc6
+[2024-07-03 22:04:46,763][50933] Starting process rollout_proc7
+[2024-07-03 22:04:49,297][53888] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-07-03 22:04:49,297][53888] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2024-07-03 22:04:49,330][53907] Worker 6 uses CPU cores [12, 13]
+[2024-07-03 22:04:49,337][53904] Worker 1 uses CPU cores [2, 3]
+[2024-07-03 22:04:49,354][53901] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-07-03 22:04:49,355][53901] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2024-07-03 22:04:49,413][53903] Worker 2 uses CPU cores [4, 5]
+[2024-07-03 22:04:49,417][53905] Worker 3 uses CPU cores [6, 7]
+[2024-07-03 22:04:49,638][53908] Worker 5 uses CPU cores [10, 11]
+[2024-07-03 22:04:49,666][53909] Worker 7 uses CPU cores [14, 15]
+[2024-07-03 22:04:49,681][53906] Worker 4 uses CPU cores [8, 9]
+[2024-07-03 22:04:49,681][53902] Worker 0 uses CPU cores [0, 1]
+[2024-07-03 22:04:50,920][53888] Num visible devices: 1
+[2024-07-03 22:04:50,921][53901] Num visible devices: 1
+[2024-07-03 22:04:50,945][53888] Starting seed is not provided
+[2024-07-03 22:04:50,945][53888] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-07-03 22:04:50,945][53888] Initializing actor-critic model on device cuda:0
+[2024-07-03 22:04:50,945][53888] RunningMeanStd input shape: (3, 72, 128)
+[2024-07-03 22:04:50,946][53888] RunningMeanStd input shape: (1,)
+[2024-07-03 22:04:50,953][53888] ConvEncoder: input_channels=3
+[2024-07-03 22:04:51,007][53888] Conv encoder output size: 512
+[2024-07-03 22:04:51,007][53888] Policy head output size: 512
+[2024-07-03 22:04:51,015][53888] Created Actor Critic model with architecture:
+[2024-07-03 22:04:51,015][53888] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2024-07-03 22:04:51,104][53888] Using optimizer <class 'torch.optim.adam.Adam'>
+[2024-07-03 22:04:51,607][53888] Loading state from checkpoint /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2024-07-03 22:04:51,635][53888] Loading model from checkpoint
+[2024-07-03 22:04:51,636][53888] Loaded experiment state at self.train_step=978, self.env_steps=4005888
+[2024-07-03 22:04:51,636][53888] Initialized policy 0 weights for model version 978
+[2024-07-03 22:04:51,637][53888] LearnerWorker_p0 finished initialization!
+[2024-07-03 22:04:51,637][53888] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-07-03 22:04:51,693][53901] RunningMeanStd input shape: (3, 72, 128)
+[2024-07-03 22:04:51,693][53901] RunningMeanStd input shape: (1,)
+[2024-07-03 22:04:51,701][53901] ConvEncoder: input_channels=3
+[2024-07-03 22:04:51,754][53901] Conv encoder output size: 512
+[2024-07-03 22:04:51,754][53901] Policy head output size: 512
+[2024-07-03 22:04:51,787][50933] Inference worker 0-0 is ready!
+[2024-07-03 22:04:51,787][50933] All inference workers are ready! Signal rollout workers to start!
+[2024-07-03 22:04:51,813][53904] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:04:51,814][53906] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:04:51,814][53903] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:04:51,814][53902] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:04:51,814][53909] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:04:51,814][53907] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:04:51,815][53908] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:04:51,815][53905] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:04:52,303][53906] Decorrelating experience for 0 frames...
+[2024-07-03 22:04:52,303][53903] Decorrelating experience for 0 frames...
+[2024-07-03 22:04:52,303][53907] Decorrelating experience for 0 frames...
+[2024-07-03 22:04:52,304][53904] Decorrelating experience for 0 frames...
+[2024-07-03 22:04:52,304][53902] Decorrelating experience for 0 frames...
+[2024-07-03 22:04:52,304][53909] Decorrelating experience for 0 frames...
+[2024-07-03 22:04:52,458][53906] Decorrelating experience for 32 frames...
+[2024-07-03 22:04:52,459][53903] Decorrelating experience for 32 frames...
+[2024-07-03 22:04:52,459][53902] Decorrelating experience for 32 frames...
+[2024-07-03 22:04:52,459][53904] Decorrelating experience for 32 frames...
+[2024-07-03 22:04:52,459][53907] Decorrelating experience for 32 frames...
+[2024-07-03 22:04:52,460][53909] Decorrelating experience for 32 frames...
+[2024-07-03 22:04:52,510][53908] Decorrelating experience for 0 frames...
+[2024-07-03 22:04:52,659][53905] Decorrelating experience for 0 frames...
+[2024-07-03 22:04:52,660][53903] Decorrelating experience for 64 frames...
+[2024-07-03 22:04:52,661][53906] Decorrelating experience for 64 frames...
+[2024-07-03 22:04:52,663][53908] Decorrelating experience for 32 frames...
+[2024-07-03 22:04:52,663][53904] Decorrelating experience for 64 frames...
+[2024-07-03 22:04:52,663][53902] Decorrelating experience for 64 frames...
+[2024-07-03 22:04:52,815][53905] Decorrelating experience for 32 frames...
+[2024-07-03 22:04:52,815][53909] Decorrelating experience for 64 frames...
+[2024-07-03 22:04:52,832][53903] Decorrelating experience for 96 frames...
+[2024-07-03 22:04:52,833][53906] Decorrelating experience for 96 frames...
+[2024-07-03 22:04:52,977][53907] Decorrelating experience for 64 frames...
+[2024-07-03 22:04:52,987][53902] Decorrelating experience for 96 frames...
+[2024-07-03 22:04:52,988][53904] Decorrelating experience for 96 frames...
+[2024-07-03 22:04:52,988][53909] Decorrelating experience for 96 frames...
+[2024-07-03 22:04:53,015][53905] Decorrelating experience for 64 frames...
+[2024-07-03 22:04:53,154][53907] Decorrelating experience for 96 frames...
+[2024-07-03 22:04:53,174][53908] Decorrelating experience for 64 frames...
+[2024-07-03 22:04:53,192][53905] Decorrelating experience for 96 frames...
+[2024-07-03 22:04:53,347][53908] Decorrelating experience for 96 frames...
+[2024-07-03 22:04:53,673][53888] Signal inference workers to stop experience collection...
+[2024-07-03 22:04:53,677][53901] InferenceWorker_p0-w0: stopping experience collection
+[2024-07-03 22:04:54,360][50933] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-07-03 22:04:54,362][50933] Avg episode reward: [(0, '2.608')]
+[2024-07-03 22:04:54,753][53888] Signal inference workers to resume experience collection...
+[2024-07-03 22:04:54,753][53901] InferenceWorker_p0-w0: resuming experience collection
+[2024-07-03 22:04:56,130][53901] Updated weights for policy 0, policy_version 988 (0.0093)
+[2024-07-03 22:04:57,648][53901] Updated weights for policy 0, policy_version 998 (0.0007)
+[2024-07-03 22:04:59,130][53901] Updated weights for policy 0, policy_version 1008 (0.0007)
+[2024-07-03 22:04:59,360][50933] Fps is (10 sec: 25395.3, 60 sec: 25395.3, 300 sec: 25395.3). Total num frames: 4132864. Throughput: 0: 4490.0. Samples: 22450. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:04:59,361][50933] Avg episode reward: [(0, '4.436')]
+[2024-07-03 22:05:00,608][53901] Updated weights for policy 0, policy_version 1018 (0.0007)
+[2024-07-03 22:05:02,081][53901] Updated weights for policy 0, policy_version 1028 (0.0006)
+[2024-07-03 22:05:03,533][53901] Updated weights for policy 0, policy_version 1038 (0.0006)
+[2024-07-03 22:05:04,360][50933] Fps is (10 sec: 26624.2, 60 sec: 26624.2, 300 sec: 26624.2). Total num frames: 4272128. Throughput: 0: 6432.8. Samples: 64328. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:05:04,361][50933] Avg episode reward: [(0, '4.777')]
+[2024-07-03 22:05:04,988][53901] Updated weights for policy 0, policy_version 1048 (0.0006)
+[2024-07-03 22:05:06,508][53901] Updated weights for policy 0, policy_version 1058 (0.0007)
+[2024-07-03 22:05:06,673][50933] Heartbeat connected on Batcher_0
+[2024-07-03 22:05:06,676][50933] Heartbeat connected on LearnerWorker_p0
+[2024-07-03 22:05:06,682][50933] Heartbeat connected on InferenceWorker_p0-w0
+[2024-07-03 22:05:06,686][50933] Heartbeat connected on RolloutWorker_w0
+[2024-07-03 22:05:06,688][50933] Heartbeat connected on RolloutWorker_w1
+[2024-07-03 22:05:06,691][50933] Heartbeat connected on RolloutWorker_w2
+[2024-07-03 22:05:06,695][50933] Heartbeat connected on RolloutWorker_w3
+[2024-07-03 22:05:06,698][50933] Heartbeat connected on RolloutWorker_w4
+[2024-07-03 22:05:06,701][50933] Heartbeat connected on RolloutWorker_w5
+[2024-07-03 22:05:06,704][50933] Heartbeat connected on RolloutWorker_w6
+[2024-07-03 22:05:06,708][50933] Heartbeat connected on RolloutWorker_w7
+[2024-07-03 22:05:08,025][53901] Updated weights for policy 0, policy_version 1068 (0.0007)
+[2024-07-03 22:05:09,360][50933] Fps is (10 sec: 27853.1, 60 sec: 27033.9, 300 sec: 27033.9). Total num frames: 4411392. Throughput: 0: 5656.7. Samples: 84850. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:05:09,361][50933] Avg episode reward: [(0, '4.608')]
+[2024-07-03 22:05:09,473][53901] Updated weights for policy 0, policy_version 1078 (0.0007)
+[2024-07-03 22:05:10,892][53901] Updated weights for policy 0, policy_version 1088 (0.0006)
+[2024-07-03 22:05:12,363][53901] Updated weights for policy 0, policy_version 1098 (0.0007)
+[2024-07-03 22:05:13,807][53901] Updated weights for policy 0, policy_version 1108 (0.0007)
+[2024-07-03 22:05:14,360][50933] Fps is (10 sec: 27853.2, 60 sec: 27238.7, 300 sec: 27238.7). Total num frames: 4550656. Throughput: 0: 6346.6. Samples: 126930. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:05:14,361][50933] Avg episode reward: [(0, '4.786')]
+[2024-07-03 22:05:15,243][53901] Updated weights for policy 0, policy_version 1118 (0.0006)
+[2024-07-03 22:05:16,642][53901] Updated weights for policy 0, policy_version 1128 (0.0007)
+[2024-07-03 22:05:18,053][53901] Updated weights for policy 0, policy_version 1138 (0.0007)
+[2024-07-03 22:05:19,360][50933] Fps is (10 sec: 28671.3, 60 sec: 27688.9, 300 sec: 27688.9). Total num frames: 4698112. Throughput: 0: 6805.3. Samples: 170132. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:05:19,361][50933] Avg episode reward: [(0, '4.668')]
+[2024-07-03 22:05:19,479][53901] Updated weights for policy 0, policy_version 1148 (0.0007)
+[2024-07-03 22:05:20,871][53901] Updated weights for policy 0, policy_version 1158 (0.0007)
+[2024-07-03 22:05:22,292][53901] Updated weights for policy 0, policy_version 1168 (0.0007)
+[2024-07-03 22:05:23,685][53901] Updated weights for policy 0, policy_version 1178 (0.0006)
+[2024-07-03 22:05:24,360][50933] Fps is (10 sec: 29081.6, 60 sec: 27853.0, 300 sec: 27853.0). Total num frames: 4841472. Throughput: 0: 6398.4. Samples: 191950. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:05:24,361][50933] Avg episode reward: [(0, '4.754')]
+[2024-07-03 22:05:25,082][53901] Updated weights for policy 0, policy_version 1188 (0.0007)
+[2024-07-03 22:05:26,578][53901] Updated weights for policy 0, policy_version 1198 (0.0007)
+[2024-07-03 22:05:27,975][53901] Updated weights for policy 0, policy_version 1208 (0.0007)
+[2024-07-03 22:05:29,360][50933] Fps is (10 sec: 28673.0, 60 sec: 27970.0, 300 sec: 27970.0). Total num frames: 4984832. Throughput: 0: 6723.5. Samples: 235322. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:05:29,361][50933] Avg episode reward: [(0, '4.999')]
+[2024-07-03 22:05:29,371][53901] Updated weights for policy 0, policy_version 1218 (0.0006)
+[2024-07-03 22:05:30,794][53901] Updated weights for policy 0, policy_version 1228 (0.0007)
+[2024-07-03 22:05:32,186][53901] Updated weights for policy 0, policy_version 1238 (0.0007)
+[2024-07-03 22:05:33,598][53901] Updated weights for policy 0, policy_version 1248 (0.0007)
+[2024-07-03 22:05:34,360][50933] Fps is (10 sec: 29081.4, 60 sec: 28160.1, 300 sec: 28160.1). Total num frames: 5132288. Throughput: 0: 6973.1. Samples: 278922. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:05:34,361][50933] Avg episode reward: [(0, '4.714')]
+[2024-07-03 22:05:35,011][53901] Updated weights for policy 0, policy_version 1258 (0.0006)
+[2024-07-03 22:05:36,522][53901] Updated weights for policy 0, policy_version 1268 (0.0007)
+[2024-07-03 22:05:38,034][53901] Updated weights for policy 0, policy_version 1278 (0.0007)
+[2024-07-03 22:05:39,360][50933] Fps is (10 sec: 28671.8, 60 sec: 28126.0, 300 sec: 28126.0). Total num frames: 5271552. Throughput: 0: 6662.0. Samples: 299788. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:05:39,361][50933] Avg episode reward: [(0, '4.738')]
+[2024-07-03 22:05:39,492][53901] Updated weights for policy 0, policy_version 1288 (0.0006)
+[2024-07-03 22:05:40,910][53901] Updated weights for policy 0, policy_version 1298 (0.0006)
+[2024-07-03 22:05:42,327][53901] Updated weights for policy 0, policy_version 1308 (0.0007)
+[2024-07-03 22:05:43,778][53901] Updated weights for policy 0, policy_version 1318 (0.0007)
+[2024-07-03 22:05:44,360][50933] Fps is (10 sec: 28262.6, 60 sec: 28180.6, 300 sec: 28180.6). Total num frames: 5414912. Throughput: 0: 7104.8. Samples: 342164. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:05:44,361][50933] Avg episode reward: [(0, '4.645')]
+[2024-07-03 22:05:45,215][53901] Updated weights for policy 0, policy_version 1328 (0.0007)
+[2024-07-03 22:05:46,648][53901] Updated weights for policy 0, policy_version 1338 (0.0007)
+[2024-07-03 22:05:48,076][53901] Updated weights for policy 0, policy_version 1348 (0.0007)
+[2024-07-03 22:05:49,360][50933] Fps is (10 sec: 28671.9, 60 sec: 28225.2, 300 sec: 28225.2). Total num frames: 5558272. Throughput: 0: 7125.7. Samples: 384984. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:05:49,361][50933] Avg episode reward: [(0, '4.433')]
+[2024-07-03 22:05:49,491][53901] Updated weights for policy 0, policy_version 1358 (0.0007)
+[2024-07-03 22:05:50,931][53901] Updated weights for policy 0, policy_version 1368 (0.0007)
+[2024-07-03 22:05:52,384][53901] Updated weights for policy 0, policy_version 1378 (0.0007)
+[2024-07-03 22:05:53,842][53901] Updated weights for policy 0, policy_version 1388 (0.0007)
+[2024-07-03 22:05:54,360][50933] Fps is (10 sec: 28262.2, 60 sec: 28194.2, 300 sec: 28194.2). Total num frames: 5697536. Throughput: 0: 7145.2. Samples: 406382. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:05:54,361][50933] Avg episode reward: [(0, '4.702')]
+[2024-07-03 22:05:55,271][53901] Updated weights for policy 0, policy_version 1398 (0.0007)
+[2024-07-03 22:05:56,713][53901] Updated weights for policy 0, policy_version 1408 (0.0007)
+[2024-07-03 22:05:58,146][53901] Updated weights for policy 0, policy_version 1418 (0.0006)
+[2024-07-03 22:05:59,360][50933] Fps is (10 sec: 28262.5, 60 sec: 28467.3, 300 sec: 28231.0). Total num frames: 5840896. Throughput: 0: 7158.9. Samples: 449082. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:05:59,361][50933] Avg episode reward: [(0, '4.579')]
+[2024-07-03 22:05:59,575][53901] Updated weights for policy 0, policy_version 1428 (0.0006)
+[2024-07-03 22:06:01,009][53901] Updated weights for policy 0, policy_version 1438 (0.0007)
+[2024-07-03 22:06:02,455][53901] Updated weights for policy 0, policy_version 1448 (0.0007)
+[2024-07-03 22:06:03,876][53901] Updated weights for policy 0, policy_version 1458 (0.0007)
+[2024-07-03 22:06:04,360][50933] Fps is (10 sec: 28671.9, 60 sec: 28535.5, 300 sec: 28262.4). Total num frames: 5984256. Throughput: 0: 7148.5. Samples: 491812. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:06:04,361][50933] Avg episode reward: [(0, '4.630')]
+[2024-07-03 22:06:05,306][53901] Updated weights for policy 0, policy_version 1468 (0.0007)
+[2024-07-03 22:06:06,742][53901] Updated weights for policy 0, policy_version 1478 (0.0007)
+[2024-07-03 22:06:08,179][53901] Updated weights for policy 0, policy_version 1488 (0.0007)
+[2024-07-03 22:06:09,360][50933] Fps is (10 sec: 28672.0, 60 sec: 28603.8, 300 sec: 28289.8). Total num frames: 6127616. Throughput: 0: 7139.0. Samples: 513204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:06:09,361][50933] Avg episode reward: [(0, '4.589')]
+[2024-07-03 22:06:09,619][53901] Updated weights for policy 0, policy_version 1498 (0.0007)
+[2024-07-03 22:06:11,054][53901] Updated weights for policy 0, policy_version 1508 (0.0007)
+[2024-07-03 22:06:12,487][53901] Updated weights for policy 0, policy_version 1518 (0.0007)
+[2024-07-03 22:06:13,917][53901] Updated weights for policy 0, policy_version 1528 (0.0007)
+[2024-07-03 22:06:14,360][50933] Fps is (10 sec: 28672.1, 60 sec: 28672.0, 300 sec: 28313.6). Total num frames: 6270976. Throughput: 0: 7121.4. Samples: 555784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-07-03 22:06:14,361][50933] Avg episode reward: [(0, '4.532')]
+[2024-07-03 22:06:15,351][53901] Updated weights for policy 0, policy_version 1538 (0.0006)
+[2024-07-03 22:06:16,802][53901] Updated weights for policy 0, policy_version 1548 (0.0007)
+[2024-07-03 22:06:18,239][53901] Updated weights for policy 0, policy_version 1558 (0.0007)
+[2024-07-03 22:06:19,360][50933] Fps is (10 sec: 28262.4, 60 sec: 28535.6, 300 sec: 28286.6). Total num frames: 6410240. Throughput: 0: 7104.6. Samples: 598628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:06:19,361][50933] Avg episode reward: [(0, '4.511')]
+[2024-07-03 22:06:19,669][53901] Updated weights for policy 0, policy_version 1568 (0.0007)
+[2024-07-03 22:06:21,112][53901] Updated weights for policy 0, policy_version 1578 (0.0007)
+[2024-07-03 22:06:22,553][53901] Updated weights for policy 0, policy_version 1588 (0.0007)
+[2024-07-03 22:06:23,988][53901] Updated weights for policy 0, policy_version 1598 (0.0007)
+[2024-07-03 22:06:24,360][50933] Fps is (10 sec: 28262.6, 60 sec: 28535.5, 300 sec: 28308.0). Total num frames: 6553600. Throughput: 0: 7117.1. Samples: 620056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:06:24,361][50933] Avg episode reward: [(0, '4.566')]
+[2024-07-03 22:06:25,419][53901] Updated weights for policy 0, policy_version 1608 (0.0007)
+[2024-07-03 22:06:26,854][53901] Updated weights for policy 0, policy_version 1618 (0.0007)
+[2024-07-03 22:06:28,279][53901] Updated weights for policy 0, policy_version 1628 (0.0007)
+[2024-07-03 22:06:29,360][50933] Fps is (10 sec: 28671.8, 60 sec: 28535.4, 300 sec: 28327.1). Total num frames: 6696960. Throughput: 0: 7126.3. Samples: 662846. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:06:29,361][50933] Avg episode reward: [(0, '4.545')]
+[2024-07-03 22:06:29,691][53901] Updated weights for policy 0, policy_version 1638 (0.0007)
+[2024-07-03 22:06:31,130][53901] Updated weights for policy 0, policy_version 1648 (0.0006)
+[2024-07-03 22:06:32,574][53901] Updated weights for policy 0, policy_version 1658 (0.0006)
+[2024-07-03 22:06:34,006][53901] Updated weights for policy 0, policy_version 1668 (0.0007)
+[2024-07-03 22:06:34,360][50933] Fps is (10 sec: 28672.0, 60 sec: 28467.2, 300 sec: 28344.4). Total num frames: 6840320. Throughput: 0: 7129.4. Samples: 705806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-07-03 22:06:34,361][50933] Avg episode reward: [(0, '4.574')]
+[2024-07-03 22:06:35,439][53901] Updated weights for policy 0, policy_version 1678 (0.0007)
+[2024-07-03 22:06:36,875][53901] Updated weights for policy 0, policy_version 1688 (0.0007)
+[2024-07-03 22:06:38,325][53901] Updated weights for policy 0, policy_version 1698 (0.0007)
+[2024-07-03 22:06:39,360][50933] Fps is (10 sec: 28672.2, 60 sec: 28535.5, 300 sec: 28360.0). Total num frames: 6983680. Throughput: 0: 7128.4. Samples: 727158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:06:39,361][50933] Avg episode reward: [(0, '4.571')]
+[2024-07-03 22:06:39,798][53901] Updated weights for policy 0, policy_version 1708 (0.0007)
+[2024-07-03 22:06:41,278][53901] Updated weights for policy 0, policy_version 1718 (0.0007)
+[2024-07-03 22:06:42,746][53901] Updated weights for policy 0, policy_version 1728 (0.0007)
+[2024-07-03 22:06:44,284][53901] Updated weights for policy 0, policy_version 1738 (0.0007)
+[2024-07-03 22:06:44,360][50933] Fps is (10 sec: 27852.6, 60 sec: 28398.9, 300 sec: 28299.7). Total num frames: 7118848. Throughput: 0: 7108.5. Samples: 768966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-07-03 22:06:44,361][50933] Avg episode reward: [(0, '4.881')]
+[2024-07-03 22:06:44,364][53888] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000001738_7118848.pth...
+[2024-07-03 22:06:44,426][53888] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000000730_2990080.pth
+[2024-07-03 22:06:45,738][53901] Updated weights for policy 0, policy_version 1748 (0.0007)
+[2024-07-03 22:06:47,181][53901] Updated weights for policy 0, policy_version 1758 (0.0007)
+[2024-07-03 22:06:48,619][53901] Updated weights for policy 0, policy_version 1768 (0.0007)
+[2024-07-03 22:06:49,360][50933] Fps is (10 sec: 27852.6, 60 sec: 28398.9, 300 sec: 28315.9). Total num frames: 7262208. Throughput: 0: 7089.4. Samples: 810834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-07-03 22:06:49,361][50933] Avg episode reward: [(0, '4.822')]
+[2024-07-03 22:06:50,080][53901] Updated weights for policy 0, policy_version 1778 (0.0006)
+[2024-07-03 22:06:51,521][53901] Updated weights for policy 0, policy_version 1788 (0.0007)
+[2024-07-03 22:06:52,959][53901] Updated weights for policy 0, policy_version 1798 (0.0007)
+[2024-07-03 22:06:54,360][50933] Fps is (10 sec: 28262.4, 60 sec: 28398.9, 300 sec: 28296.6). Total num frames: 7401472. Throughput: 0: 7088.8. Samples: 832200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:06:54,361][50933] Avg episode reward: [(0, '4.752')]
+[2024-07-03 22:06:54,401][53901] Updated weights for policy 0, policy_version 1808 (0.0007)
+[2024-07-03 22:06:55,836][53901] Updated weights for policy 0, policy_version 1818 (0.0007)
+[2024-07-03 22:06:57,263][53901] Updated weights for policy 0, policy_version 1828 (0.0007)
+[2024-07-03 22:06:58,692][53901] Updated weights for policy 0, policy_version 1838 (0.0006)
+[2024-07-03 22:06:59,360][50933] Fps is (10 sec: 28262.4, 60 sec: 28398.9, 300 sec: 28311.6). Total num frames: 7544832. Throughput: 0: 7095.3. Samples: 875074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:06:59,361][50933] Avg episode reward: [(0, '4.665')]
+[2024-07-03 22:07:00,139][53901] Updated weights for policy 0, policy_version 1848 (0.0006)
+[2024-07-03 22:07:01,635][53901] Updated weights for policy 0, policy_version 1858 (0.0007)
+[2024-07-03 22:07:03,303][53901] Updated weights for policy 0, policy_version 1868 (0.0007)
+[2024-07-03 22:07:04,360][50933] Fps is (10 sec: 27442.6, 60 sec: 28194.0, 300 sec: 28230.9). Total num frames: 7675904. Throughput: 0: 7037.5. Samples: 915316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:07:04,361][50933] Avg episode reward: [(0, '4.394')]
+[2024-07-03 22:07:04,864][53901] Updated weights for policy 0, policy_version 1878 (0.0007)
+[2024-07-03 22:07:06,415][53901] Updated weights for policy 0, policy_version 1888 (0.0007)
+[2024-07-03 22:07:07,927][53901] Updated weights for policy 0, policy_version 1898 (0.0007)
+[2024-07-03 22:07:09,360][50933] Fps is (10 sec: 26623.5, 60 sec: 28057.5, 300 sec: 28186.5). Total num frames: 7811072. Throughput: 0: 7005.7. Samples: 935312. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:07:09,361][50933] Avg episode reward: [(0, '4.716')]
+[2024-07-03 22:07:09,418][53901] Updated weights for policy 0, policy_version 1908 (0.0007)
+[2024-07-03 22:07:10,924][53901] Updated weights for policy 0, policy_version 1918 (0.0007)
+[2024-07-03 22:07:12,418][53901] Updated weights for policy 0, policy_version 1928 (0.0007)
+[2024-07-03 22:07:13,924][53901] Updated weights for policy 0, policy_version 1938 (0.0007)
+[2024-07-03 22:07:14,360][50933] Fps is (10 sec: 27443.7, 60 sec: 27989.3, 300 sec: 28174.6). Total num frames: 7950336. Throughput: 0: 6964.6. Samples: 976254. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-07-03 22:07:14,361][50933] Avg episode reward: [(0, '4.781')]
+[2024-07-03 22:07:15,366][53901] Updated weights for policy 0, policy_version 1948 (0.0007)
+[2024-07-03 22:07:16,805][53901] Updated weights for policy 0, policy_version 1958 (0.0007)
+[2024-07-03 22:07:18,233][53901] Updated weights for policy 0, policy_version 1968 (0.0007)
+[2024-07-03 22:07:19,360][50933] Fps is (10 sec: 27852.8, 60 sec: 27989.2, 300 sec: 28163.5). Total num frames: 8089600. Throughput: 0: 6952.3. Samples: 1018662. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:07:19,361][50933] Avg episode reward: [(0, '5.170')]
+[2024-07-03 22:07:19,704][53901] Updated weights for policy 0, policy_version 1978 (0.0006)
+[2024-07-03 22:07:21,141][53901] Updated weights for policy 0, policy_version 1988 (0.0007)
+[2024-07-03 22:07:22,582][53901] Updated weights for policy 0, policy_version 1998 (0.0007)
+[2024-07-03 22:07:24,020][53901] Updated weights for policy 0, policy_version 2008 (0.0007)
+[2024-07-03 22:07:24,360][50933] Fps is (10 sec: 28262.4, 60 sec: 27989.3, 300 sec: 28180.5). Total num frames: 8232960. Throughput: 0: 6947.0. Samples: 1039772. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:07:24,361][50933] Avg episode reward: [(0, '5.006')]
+[2024-07-03 22:07:25,476][53901] Updated weights for policy 0, policy_version 2018 (0.0007)
+[2024-07-03 22:07:26,910][53901] Updated weights for policy 0, policy_version 2028 (0.0007)
+[2024-07-03 22:07:28,352][53901] Updated weights for policy 0, policy_version 2038 (0.0007)
+[2024-07-03 22:07:29,360][50933] Fps is (10 sec: 28262.9, 60 sec: 27921.1, 300 sec: 28169.9). Total num frames: 8372224. Throughput: 0: 6961.8. Samples: 1082246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2024-07-03 22:07:29,361][50933] Avg episode reward: [(0, '4.984')]
+[2024-07-03 22:07:29,804][53901] Updated weights for policy 0, policy_version 2048 (0.0007)
+[2024-07-03 22:07:31,258][53901] Updated weights for policy 0, policy_version 2058 (0.0007)
+[2024-07-03 22:07:32,715][53901] Updated weights for policy 0, policy_version 2068 (0.0007)
+[2024-07-03 22:07:34,186][53901] Updated weights for policy 0, policy_version 2078 (0.0007)
+[2024-07-03 22:07:34,360][50933] Fps is (10 sec: 28262.3, 60 sec: 27921.0, 300 sec: 28185.6). Total num frames: 8515584. Throughput: 0: 6970.2. Samples: 1124492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2024-07-03 22:07:34,361][50933] Avg episode reward: [(0, '4.577')]
+[2024-07-03 22:07:35,644][53901] Updated weights for policy 0, policy_version 2088 (0.0007)
+[2024-07-03 22:07:37,074][53901] Updated weights for policy 0, policy_version 2098 (0.0006)
+[2024-07-03 22:07:38,520][53901] Updated weights for policy 0, policy_version 2108 (0.0007)
+[2024-07-03 22:07:39,360][50933] Fps is (10 sec: 28262.5, 60 sec: 27852.8, 300 sec: 28175.5). Total num frames: 8654848. Throughput: 0: 6965.0. Samples: 1145624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:07:39,361][50933] Avg episode reward: [(0, '4.655')]
+[2024-07-03 22:07:39,970][53901] Updated weights for policy 0, policy_version 2118 (0.0007)
+[2024-07-03 22:07:41,412][53901] Updated weights for policy 0, policy_version 2128 (0.0007)
+[2024-07-03 22:07:42,878][53901] Updated weights for policy 0, policy_version 2138 (0.0007)
+[2024-07-03 22:07:44,333][53901] Updated weights for policy 0, policy_version 2148 (0.0006)
+[2024-07-03 22:07:44,360][50933] Fps is (10 sec: 28262.5, 60 sec: 27989.3, 300 sec: 28190.1). Total num frames: 8798208. Throughput: 0: 6952.9. Samples: 1187954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:07:44,361][50933] Avg episode reward: [(0, '4.542')]
+[2024-07-03 22:07:45,788][53901] Updated weights for policy 0, policy_version 2158 (0.0007)
+[2024-07-03 22:07:47,286][53901] Updated weights for policy 0, policy_version 2168 (0.0007)
+[2024-07-03 22:07:48,735][53901] Updated weights for policy 0, policy_version 2178 (0.0006)
+[2024-07-03 22:07:49,360][50933] Fps is (10 sec: 28262.1, 60 sec: 27921.0, 300 sec: 28180.5). Total num frames: 8937472. Throughput: 0: 6991.0. Samples: 1229910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2024-07-03 22:07:49,361][50933] Avg episode reward: [(0, '4.661')]
+[2024-07-03 22:07:50,202][53901] Updated weights for policy 0, policy_version 2188 (0.0007)
+[2024-07-03 22:07:51,670][53901] Updated weights for policy 0, policy_version 2198 (0.0007)
+[2024-07-03 22:07:53,155][53901] Updated weights for policy 0, policy_version 2208 (0.0007)
+[2024-07-03 22:07:54,360][50933] Fps is (10 sec: 27852.8, 60 sec: 27921.0, 300 sec: 28171.4). Total num frames: 9076736. Throughput: 0: 7010.2. Samples: 1250768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2024-07-03 22:07:54,361][50933] Avg episode reward: [(0, '4.719')]
+[2024-07-03 22:07:54,619][53901] Updated weights for policy 0, policy_version 2218 (0.0007)
+[2024-07-03 22:07:56,119][53901] Updated weights for policy 0, policy_version 2228 (0.0007)
+[2024-07-03 22:07:57,706][53901] Updated weights for policy 0, policy_version 2238 (0.0007)
+[2024-07-03 22:07:59,258][53901] Updated weights for policy 0, policy_version 2248 (0.0007)
+[2024-07-03 22:07:59,360][50933] Fps is (10 sec: 27033.7, 60 sec: 27716.3, 300 sec: 28118.5). Total num frames: 9207808. Throughput: 0: 7007.4. Samples: 1291586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2024-07-03 22:07:59,361][50933] Avg episode reward: [(0, '4.543')]
+[2024-07-03 22:08:00,799][53901] Updated weights for policy 0, policy_version 2258 (0.0007)
+[2024-07-03 22:08:02,303][53901] Updated weights for policy 0, policy_version 2268 (0.0007)
+[2024-07-03 22:08:03,822][53901] Updated weights for policy 0, policy_version 2278 (0.0007)
+[2024-07-03 22:08:04,360][50933] Fps is (10 sec: 26624.2, 60 sec: 27784.7, 300 sec: 28090.0). Total num frames: 9342976. Throughput: 0: 6960.0. Samples: 1331860. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:08:04,361][50933] Avg episode reward: [(0, '4.616')]
+[2024-07-03 22:08:05,321][53901] Updated weights for policy 0, policy_version 2288 (0.0007)
+[2024-07-03 22:08:06,823][53901] Updated weights for policy 0, policy_version 2298 (0.0007)
+[2024-07-03 22:08:08,298][53901] Updated weights for policy 0, policy_version 2308 (0.0006)
+[2024-07-03 22:08:09,360][50933] Fps is (10 sec: 27443.3, 60 sec: 27852.9, 300 sec: 28083.9). Total num frames: 9482240. Throughput: 0: 6946.5. Samples: 1352364. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:08:09,361][50933] Avg episode reward: [(0, '4.520')]
+[2024-07-03 22:08:09,762][53901] Updated weights for policy 0, policy_version 2318 (0.0007)
+[2024-07-03 22:08:11,214][53901] Updated weights for policy 0, policy_version 2328 (0.0007)
+[2024-07-03 22:08:12,662][53901] Updated weights for policy 0, policy_version 2338 (0.0007)
+[2024-07-03 22:08:14,121][53901] Updated weights for policy 0, policy_version 2348 (0.0006)
+[2024-07-03 22:08:14,360][50933] Fps is (10 sec: 27852.9, 60 sec: 27852.8, 300 sec: 28078.1). Total num frames: 9621504. Throughput: 0: 6938.2. Samples: 1394466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:08:14,361][50933] Avg episode reward: [(0, '4.566')]
+[2024-07-03 22:08:15,562][53901] Updated weights for policy 0, policy_version 2358 (0.0007)
+[2024-07-03 22:08:17,031][53901] Updated weights for policy 0, policy_version 2368 (0.0006)
+[2024-07-03 22:08:18,473][53901] Updated weights for policy 0, policy_version 2378 (0.0007)
+[2024-07-03 22:08:19,360][50933] Fps is (10 sec: 28262.5, 60 sec: 27921.2, 300 sec: 28092.6). Total num frames: 9764864. Throughput: 0: 6937.0. Samples: 1436658. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:08:19,361][50933] Avg episode reward: [(0, '4.698')]
+[2024-07-03 22:08:19,939][53901] Updated weights for policy 0, policy_version 2388 (0.0007)
+[2024-07-03 22:08:21,394][53901] Updated weights for policy 0, policy_version 2398 (0.0006)
+[2024-07-03 22:08:22,838][53901] Updated weights for policy 0, policy_version 2408 (0.0007)
+[2024-07-03 22:08:24,293][53901] Updated weights for policy 0, policy_version 2418 (0.0007)
+[2024-07-03 22:08:24,360][50933] Fps is (10 sec: 28262.4, 60 sec: 27852.8, 300 sec: 28086.9). Total num frames: 9904128. Throughput: 0: 6938.0. Samples: 1457834. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:08:24,361][50933] Avg episode reward: [(0, '4.596')]
+[2024-07-03 22:08:25,735][53901] Updated weights for policy 0, policy_version 2428 (0.0007)
+[2024-07-03 22:08:27,183][53901] Updated weights for policy 0, policy_version 2438 (0.0006)
+[2024-07-03 22:08:28,637][53901] Updated weights for policy 0, policy_version 2448 (0.0007)
+[2024-07-03 22:08:29,360][50933] Fps is (10 sec: 27852.7, 60 sec: 27852.8, 300 sec: 28081.4). Total num frames: 10043392. Throughput: 0: 6936.9. Samples: 1500114. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:08:29,361][50933] Avg episode reward: [(0, '4.762')]
+[2024-07-03 22:08:30,099][53901] Updated weights for policy 0, policy_version 2458 (0.0007)
+[2024-07-03 22:08:31,545][53901] Updated weights for policy 0, policy_version 2468 (0.0007)
+[2024-07-03 22:08:33,000][53901] Updated weights for policy 0, policy_version 2478 (0.0006)
+[2024-07-03 22:08:34,360][50933] Fps is (10 sec: 28262.5, 60 sec: 27852.9, 300 sec: 28094.9). Total num frames: 10186752. Throughput: 0: 6946.4. Samples: 1542498. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:08:34,361][50933] Avg episode reward: [(0, '4.576')]
+[2024-07-03 22:08:34,456][53901] Updated weights for policy 0, policy_version 2488 (0.0007)
+[2024-07-03 22:08:35,917][53901] Updated weights for policy 0, policy_version 2498 (0.0007)
+[2024-07-03 22:08:37,374][53901] Updated weights for policy 0, policy_version 2508 (0.0007)
+[2024-07-03 22:08:38,829][53901] Updated weights for policy 0, policy_version 2518 (0.0007)
+[2024-07-03 22:08:39,360][50933] Fps is (10 sec: 28262.3, 60 sec: 27852.8, 300 sec: 28089.5). Total num frames: 10326016. Throughput: 0: 6949.3. Samples: 1563484. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:08:39,361][50933] Avg episode reward: [(0, '4.646')]
+[2024-07-03 22:08:40,281][53901] Updated weights for policy 0, policy_version 2528 (0.0007)
+[2024-07-03 22:08:41,739][53901] Updated weights for policy 0, policy_version 2538 (0.0006)
+[2024-07-03 22:08:43,197][53901] Updated weights for policy 0, policy_version 2548 (0.0007)
+[2024-07-03 22:08:44,360][50933] Fps is (10 sec: 27852.8, 60 sec: 27784.6, 300 sec: 28084.3). Total num frames: 10465280. Throughput: 0: 6982.2. Samples: 1605784. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:08:44,361][50933] Avg episode reward: [(0, '4.676')]
+[2024-07-03 22:08:44,370][53888] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000002556_10469376.pth...
+[2024-07-03 22:08:44,425][53888] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth
+[2024-07-03 22:08:44,675][53901] Updated weights for policy 0, policy_version 2558 (0.0007)
+[2024-07-03 22:08:46,121][53901] Updated weights for policy 0, policy_version 2568 (0.0007)
+[2024-07-03 22:08:47,583][53901] Updated weights for policy 0, policy_version 2578 (0.0007)
+[2024-07-03 22:08:49,033][53901] Updated weights for policy 0, policy_version 2588 (0.0007)
+[2024-07-03 22:08:49,360][50933] Fps is (10 sec: 28262.3, 60 sec: 27852.8, 300 sec: 28096.8). Total num frames: 10608640. Throughput: 0: 7020.1. Samples: 1647766. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:08:49,361][50933] Avg episode reward: [(0, '4.727')]
+[2024-07-03 22:08:50,489][53901] Updated weights for policy 0, policy_version 2598 (0.0007)
+[2024-07-03 22:08:51,962][53901] Updated weights for policy 0, policy_version 2608 (0.0007)
+[2024-07-03 22:08:53,419][53901] Updated weights for policy 0, policy_version 2618 (0.0007)
+[2024-07-03 22:08:54,360][50933] Fps is (10 sec: 28262.3, 60 sec: 27852.8, 300 sec: 28091.8). Total num frames: 10747904. Throughput: 0: 7032.7. Samples: 1668836. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:08:54,361][50933] Avg episode reward: [(0, '4.536')]
+[2024-07-03 22:08:54,872][53901] Updated weights for policy 0, policy_version 2628 (0.0007)
+[2024-07-03 22:08:56,333][53901] Updated weights for policy 0, policy_version 2638 (0.0007)
+[2024-07-03 22:08:57,789][53901] Updated weights for policy 0, policy_version 2648 (0.0007)
+[2024-07-03 22:08:59,256][53901] Updated weights for policy 0, policy_version 2658 (0.0007)
+[2024-07-03 22:08:59,360][50933] Fps is (10 sec: 27852.9, 60 sec: 27989.4, 300 sec: 28086.9). Total num frames: 10887168. Throughput: 0: 7031.8. Samples: 1710898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:08:59,361][50933] Avg episode reward: [(0, '4.709')]
+[2024-07-03 22:09:00,689][53901] Updated weights for policy 0, policy_version 2668 (0.0006)
+[2024-07-03 22:09:02,158][53901] Updated weights for policy 0, policy_version 2678 (0.0006)
+[2024-07-03 22:09:03,569][53901] Updated weights for policy 0, policy_version 2688 (0.0006)
+[2024-07-03 22:09:04,360][50933] Fps is (10 sec: 28262.3, 60 sec: 28125.9, 300 sec: 28098.6). Total num frames: 11030528. Throughput: 0: 7042.4. Samples: 1753566. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:09:04,361][50933] Avg episode reward: [(0, '4.831')]
+[2024-07-03 22:09:04,983][53901] Updated weights for policy 0, policy_version 2698 (0.0006)
+[2024-07-03 22:09:06,392][53901] Updated weights for policy 0, policy_version 2708 (0.0006)
+[2024-07-03 22:09:07,799][53901] Updated weights for policy 0, policy_version 2718 (0.0006)
+[2024-07-03 22:09:09,221][53901] Updated weights for policy 0, policy_version 2728 (0.0007)
+[2024-07-03 22:09:09,360][50933] Fps is (10 sec: 28672.0, 60 sec: 28194.1, 300 sec: 28109.8). Total num frames: 11173888. Throughput: 0: 7055.1. Samples: 1775312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:09:09,361][50933] Avg episode reward: [(0, '4.745')]
+[2024-07-03 22:09:10,626][53901] Updated weights for policy 0, policy_version 2738 (0.0007)
+[2024-07-03 22:09:12,049][53901] Updated weights for policy 0, policy_version 2748 (0.0006)
+[2024-07-03 22:09:13,484][53901] Updated weights for policy 0, policy_version 2758 (0.0006)
+[2024-07-03 22:09:14,360][50933] Fps is (10 sec: 29081.8, 60 sec: 28330.7, 300 sec: 28136.4). Total num frames: 11321344. Throughput: 0: 7075.4. Samples: 1818508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2024-07-03 22:09:14,361][50933] Avg episode reward: [(0, '4.723')]
+[2024-07-03 22:09:14,895][53901] Updated weights for policy 0, policy_version 2768 (0.0007)
+[2024-07-03 22:09:16,308][53901] Updated weights for policy 0, policy_version 2778 (0.0006)
+[2024-07-03 22:09:17,728][53901] Updated weights for policy 0, policy_version 2788 (0.0007)
+[2024-07-03 22:09:19,143][53901] Updated weights for policy 0, policy_version 2798 (0.0006)
+[2024-07-03 22:09:19,360][50933] Fps is (10 sec: 29081.6, 60 sec: 28330.7, 300 sec: 28146.5). Total num frames: 11464704. Throughput: 0: 7096.9. Samples: 1861860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-07-03 22:09:19,361][50933] Avg episode reward: [(0, '4.805')]
+[2024-07-03 22:09:20,564][53901] Updated weights for policy 0, policy_version 2808 (0.0007)
+[2024-07-03 22:09:21,984][53901] Updated weights for policy 0, policy_version 2818 (0.0006)
+[2024-07-03 22:09:23,442][53901] Updated weights for policy 0, policy_version 2828 (0.0006)
+[2024-07-03 22:09:24,360][50933] Fps is (10 sec: 28671.9, 60 sec: 28398.9, 300 sec: 28156.2). Total num frames: 11608064. Throughput: 0: 7112.5. Samples: 1883548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-07-03 22:09:24,361][50933] Avg episode reward: [(0, '4.574')]
+[2024-07-03 22:09:24,878][53901] Updated weights for policy 0, policy_version 2838 (0.0006)
+[2024-07-03 22:09:26,305][53901] Updated weights for policy 0, policy_version 2848 (0.0006)
+[2024-07-03 22:09:27,714][53901] Updated weights for policy 0, policy_version 2858 (0.0006)
+[2024-07-03 22:09:29,130][53901] Updated weights for policy 0, policy_version 2868 (0.0006)
+[2024-07-03 22:09:29,360][50933] Fps is (10 sec: 28671.8, 60 sec: 28467.2, 300 sec: 28165.6). Total num frames: 11751424. Throughput: 0: 7125.3. Samples: 1926424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2024-07-03 22:09:29,361][50933] Avg episode reward: [(0, '4.800')]
+[2024-07-03 22:09:30,552][53901] Updated weights for policy 0, policy_version 2878 (0.0006)
+[2024-07-03 22:09:31,966][53901] Updated weights for policy 0, policy_version 2888 (0.0006)
+[2024-07-03 22:09:33,373][53901] Updated weights for policy 0, policy_version 2898 (0.0006)
+[2024-07-03 22:09:34,360][50933] Fps is (10 sec: 28672.0, 60 sec: 28467.2, 300 sec: 28174.6). Total num frames: 11894784. Throughput: 0: 7156.5. Samples: 1969810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2024-07-03 22:09:34,361][50933] Avg episode reward: [(0, '4.568')]
+[2024-07-03 22:09:34,809][53901] Updated weights for policy 0, policy_version 2908 (0.0006)
+[2024-07-03 22:09:36,228][53901] Updated weights for policy 0, policy_version 2918 (0.0006)
+[2024-07-03 22:09:37,647][53901] Updated weights for policy 0, policy_version 2928 (0.0007)
+[2024-07-03 22:09:39,051][53901] Updated weights for policy 0, policy_version 2938 (0.0007)
+[2024-07-03 22:09:39,360][50933] Fps is (10 sec: 29081.7, 60 sec: 28603.7, 300 sec: 28197.7). Total num frames: 12042240. Throughput: 0: 7168.2. Samples: 1991406. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:09:39,361][50933] Avg episode reward: [(0, '4.705')]
+[2024-07-03 22:09:40,469][53901] Updated weights for policy 0, policy_version 2948 (0.0007)
+[2024-07-03 22:09:41,878][53901] Updated weights for policy 0, policy_version 2958 (0.0006)
+[2024-07-03 22:09:43,315][53901] Updated weights for policy 0, policy_version 2968 (0.0007)
+[2024-07-03 22:09:44,360][50933] Fps is (10 sec: 29081.5, 60 sec: 28672.0, 300 sec: 28205.9). Total num frames: 12185600. Throughput: 0: 7200.7. Samples: 2034928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:09:44,361][50933] Avg episode reward: [(0, '4.761')]
+[2024-07-03 22:09:44,729][53901] Updated weights for policy 0, policy_version 2978 (0.0007)
+[2024-07-03 22:09:46,146][53901] Updated weights for policy 0, policy_version 2988 (0.0007)
+[2024-07-03 22:09:47,563][53901] Updated weights for policy 0, policy_version 2998 (0.0007)
+[2024-07-03 22:09:48,972][53901] Updated weights for policy 0, policy_version 3008 (0.0006)
+[2024-07-03 22:09:49,360][50933] Fps is (10 sec: 28672.0, 60 sec: 28672.0, 300 sec: 28213.8). Total num frames: 12328960. Throughput: 0: 7217.3. Samples: 2078346. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:09:49,361][50933] Avg episode reward: [(0, '4.847')]
+[2024-07-03 22:09:50,398][53901] Updated weights for policy 0, policy_version 3018 (0.0007)
+[2024-07-03 22:09:51,827][53901] Updated weights for policy 0, policy_version 3028 (0.0007)
+[2024-07-03 22:09:53,240][53901] Updated weights for policy 0, policy_version 3038 (0.0007)
+[2024-07-03 22:09:54,360][50933] Fps is (10 sec: 28672.0, 60 sec: 28740.3, 300 sec: 28269.4). Total num frames: 12472320. Throughput: 0: 7213.1. Samples: 2099904. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:09:54,361][50933] Avg episode reward: [(0, '4.512')]
+[2024-07-03 22:09:54,672][53901] Updated weights for policy 0, policy_version 3048 (0.0007)
+[2024-07-03 22:09:56,103][53901] Updated weights for policy 0, policy_version 3058 (0.0006)
+[2024-07-03 22:09:57,530][53901] Updated weights for policy 0, policy_version 3068 (0.0006)
+[2024-07-03 22:09:58,967][53901] Updated weights for policy 0, policy_version 3078 (0.0007)
+[2024-07-03 22:09:59,360][50933] Fps is (10 sec: 28671.4, 60 sec: 28808.4, 300 sec: 28283.2). Total num frames: 12615680. Throughput: 0: 7207.3. Samples: 2142840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:09:59,361][50933] Avg episode reward: [(0, '4.928')]
+[2024-07-03 22:10:00,380][53901] Updated weights for policy 0, policy_version 3088 (0.0006)
+[2024-07-03 22:10:01,802][53901] Updated weights for policy 0, policy_version 3098 (0.0006)
+[2024-07-03 22:10:03,205][53901] Updated weights for policy 0, policy_version 3108 (0.0007)
+[2024-07-03 22:10:04,360][50933] Fps is (10 sec: 29081.6, 60 sec: 28876.8, 300 sec: 28311.0). Total num frames: 12763136. Throughput: 0: 7206.6. Samples: 2186156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:10:04,361][50933] Avg episode reward: [(0, '4.795')]
+[2024-07-03 22:10:04,611][53901] Updated weights for policy 0, policy_version 3118 (0.0006)
+[2024-07-03 22:10:06,013][53901] Updated weights for policy 0, policy_version 3128 (0.0006)
+[2024-07-03 22:10:07,439][53901] Updated weights for policy 0, policy_version 3138 (0.0006)
+[2024-07-03 22:10:08,838][53901] Updated weights for policy 0, policy_version 3148 (0.0007)
+[2024-07-03 22:10:09,360][50933] Fps is (10 sec: 29082.1, 60 sec: 28876.8, 300 sec: 28324.9). Total num frames: 12906496. Throughput: 0: 7211.2. Samples: 2208052. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:10:09,361][50933] Avg episode reward: [(0, '4.961')]
+[2024-07-03 22:10:10,249][53901] Updated weights for policy 0, policy_version 3158 (0.0007)
+[2024-07-03 22:10:11,660][53901] Updated weights for policy 0, policy_version 3168 (0.0006)
+[2024-07-03 22:10:13,046][53901] Updated weights for policy 0, policy_version 3178 (0.0006)
+[2024-07-03 22:10:14,360][50933] Fps is (10 sec: 29081.4, 60 sec: 28876.7, 300 sec: 28324.9). Total num frames: 13053952. Throughput: 0: 7227.5. Samples: 2251664. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:10:14,361][50933] Avg episode reward: [(0, '4.770')]
+[2024-07-03 22:10:14,467][53901] Updated weights for policy 0, policy_version 3188 (0.0007)
+[2024-07-03 22:10:15,866][53901] Updated weights for policy 0, policy_version 3198 (0.0007)
+[2024-07-03 22:10:17,277][53901] Updated weights for policy 0, policy_version 3208 (0.0007)
+[2024-07-03 22:10:18,686][53901] Updated weights for policy 0, policy_version 3218 (0.0006)
+[2024-07-03 22:10:19,360][50933] Fps is (10 sec: 29081.5, 60 sec: 28876.8, 300 sec: 28324.9). Total num frames: 13197312. Throughput: 0: 7233.4. Samples: 2295312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2024-07-03 22:10:19,361][50933] Avg episode reward: [(0, '4.484')]
+[2024-07-03 22:10:20,088][53901] Updated weights for policy 0, policy_version 3228 (0.0006)
+[2024-07-03 22:10:21,488][53901] Updated weights for policy 0, policy_version 3238 (0.0006)
+[2024-07-03 22:10:22,898][53901] Updated weights for policy 0, policy_version 3248 (0.0006)
+[2024-07-03 22:10:24,314][53901] Updated weights for policy 0, policy_version 3258 (0.0006)
+[2024-07-03 22:10:24,360][50933] Fps is (10 sec: 29081.4, 60 sec: 28945.0, 300 sec: 28338.7). Total num frames: 13344768. Throughput: 0: 7239.7. Samples: 2317192. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:10:24,361][50933] Avg episode reward: [(0, '4.774')]
+[2024-07-03 22:10:25,735][53901] Updated weights for policy 0, policy_version 3268 (0.0006)
+[2024-07-03 22:10:27,134][53901] Updated weights for policy 0, policy_version 3278 (0.0007)
+[2024-07-03 22:10:28,553][53901] Updated weights for policy 0, policy_version 3288 (0.0006)
+[2024-07-03 22:10:29,360][50933] Fps is (10 sec: 29081.5, 60 sec: 28945.0, 300 sec: 28324.9). Total num frames: 13488128. Throughput: 0: 7242.4. Samples: 2360836. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:10:29,361][50933] Avg episode reward: [(0, '4.653')]
+[2024-07-03 22:10:29,956][53901] Updated weights for policy 0, policy_version 3298 (0.0006)
+[2024-07-03 22:10:31,351][53901] Updated weights for policy 0, policy_version 3308 (0.0006)
+[2024-07-03 22:10:32,749][53901] Updated weights for policy 0, policy_version 3318 (0.0006)
+[2024-07-03 22:10:34,151][53901] Updated weights for policy 0, policy_version 3328 (0.0007)
+[2024-07-03 22:10:34,360][50933] Fps is (10 sec: 29082.0, 60 sec: 29013.3, 300 sec: 28352.6). Total num frames: 13635584. Throughput: 0: 7250.1. Samples: 2404602. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:10:34,361][50933] Avg episode reward: [(0, '4.876')]
+[2024-07-03 22:10:35,592][53901] Updated weights for policy 0, policy_version 3338 (0.0006)
+[2024-07-03 22:10:37,071][53901] Updated weights for policy 0, policy_version 3348 (0.0007)
+[2024-07-03 22:10:38,530][53901] Updated weights for policy 0, policy_version 3358 (0.0006)
+[2024-07-03 22:10:39,360][50933] Fps is (10 sec: 28672.3, 60 sec: 28876.8, 300 sec: 28338.8). Total num frames: 13774848. Throughput: 0: 7243.7. Samples: 2425872. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:10:39,361][50933] Avg episode reward: [(0, '4.522')]
+[2024-07-03 22:10:39,986][53901] Updated weights for policy 0, policy_version 3368 (0.0007)
+[2024-07-03 22:10:41,454][53901] Updated weights for policy 0, policy_version 3378 (0.0007)
+[2024-07-03 22:10:42,916][53901] Updated weights for policy 0, policy_version 3388 (0.0007)
+[2024-07-03 22:10:44,360][50933] Fps is (10 sec: 27852.8, 60 sec: 28808.5, 300 sec: 28324.9). Total num frames: 13914112. Throughput: 0: 7221.9. Samples: 2467824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:10:44,361][50933] Avg episode reward: [(0, '4.540')]
+[2024-07-03 22:10:44,364][53888] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000003397_13914112.pth...
+[2024-07-03 22:10:44,422][53888] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000001738_7118848.pth
+[2024-07-03 22:10:44,462][53901] Updated weights for policy 0, policy_version 3398 (0.0007)
+[2024-07-03 22:10:45,886][53901] Updated weights for policy 0, policy_version 3408 (0.0007)
+[2024-07-03 22:10:47,342][53901] Updated weights for policy 0, policy_version 3418 (0.0006)
+[2024-07-03 22:10:48,800][53901] Updated weights for policy 0, policy_version 3428 (0.0007)
+[2024-07-03 22:10:49,360][50933] Fps is (10 sec: 27852.9, 60 sec: 28740.3, 300 sec: 28324.9). Total num frames: 14053376. Throughput: 0: 7187.6. Samples: 2509596. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:10:49,361][50933] Avg episode reward: [(0, '4.704')]
+[2024-07-03 22:10:50,259][53901] Updated weights for policy 0, policy_version 3438 (0.0007)
+[2024-07-03 22:10:51,729][53901] Updated weights for policy 0, policy_version 3448 (0.0007)
+[2024-07-03 22:10:53,194][53901] Updated weights for policy 0, policy_version 3458 (0.0007)
+[2024-07-03 22:10:54,360][50933] Fps is (10 sec: 28262.6, 60 sec: 28740.3, 300 sec: 28324.9). Total num frames: 14196736. Throughput: 0: 7167.9. Samples: 2530606. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:10:54,361][50933] Avg episode reward: [(0, '4.860')]
+[2024-07-03 22:10:54,652][53901] Updated weights for policy 0, policy_version 3468 (0.0006)
+[2024-07-03 22:10:56,125][53901] Updated weights for policy 0, policy_version 3478 (0.0007)
+[2024-07-03 22:10:57,581][53901] Updated weights for policy 0, policy_version 3488 (0.0007)
+[2024-07-03 22:10:59,062][53901] Updated weights for policy 0, policy_version 3498 (0.0007)
+[2024-07-03 22:10:59,360][50933] Fps is (10 sec: 28262.3, 60 sec: 28672.1, 300 sec: 28311.0). Total num frames: 14336000. Throughput: 0: 7131.3. Samples: 2572574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:10:59,361][50933] Avg episode reward: [(0, '4.683')]
+[2024-07-03 22:11:00,534][53901] Updated weights for policy 0, policy_version 3508 (0.0007)
+[2024-07-03 22:11:02,011][53901] Updated weights for policy 0, policy_version 3518 (0.0007)
+[2024-07-03 22:11:03,481][53901] Updated weights for policy 0, policy_version 3528 (0.0007)
+[2024-07-03 22:11:04,360][50933] Fps is (10 sec: 27852.7, 60 sec: 28535.5, 300 sec: 28297.1). Total num frames: 14475264. Throughput: 0: 7085.5. Samples: 2614160. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:11:04,361][50933] Avg episode reward: [(0, '4.896')]
+[2024-07-03 22:11:04,945][53901] Updated weights for policy 0, policy_version 3538 (0.0006)
+[2024-07-03 22:11:06,403][53901] Updated weights for policy 0, policy_version 3548 (0.0007)
+[2024-07-03 22:11:07,869][53901] Updated weights for policy 0, policy_version 3558 (0.0007)
+[2024-07-03 22:11:09,349][53901] Updated weights for policy 0, policy_version 3568 (0.0007)
+[2024-07-03 22:11:09,360][50933] Fps is (10 sec: 27852.6, 60 sec: 28467.2, 300 sec: 28283.2). Total num frames: 14614528. Throughput: 0: 7067.6. Samples: 2635232. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:11:09,361][50933] Avg episode reward: [(0, '4.909')]
+[2024-07-03 22:11:10,810][53901] Updated weights for policy 0, policy_version 3578 (0.0006)
+[2024-07-03 22:11:12,271][53901] Updated weights for policy 0, policy_version 3588 (0.0007)
+[2024-07-03 22:11:13,724][53901] Updated weights for policy 0, policy_version 3598 (0.0007)
+[2024-07-03 22:11:14,360][50933] Fps is (10 sec: 27852.8, 60 sec: 28330.7, 300 sec: 28283.2). Total num frames: 14753792. Throughput: 0: 7030.9. Samples: 2677228. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:11:14,361][50933] Avg episode reward: [(0, '4.861')]
+[2024-07-03 22:11:15,186][53901] Updated weights for policy 0, policy_version 3608 (0.0006)
+[2024-07-03 22:11:16,647][53901] Updated weights for policy 0, policy_version 3618 (0.0007)
+[2024-07-03 22:11:18,098][53901] Updated weights for policy 0, policy_version 3628 (0.0007)
+[2024-07-03 22:11:19,360][50933] Fps is (10 sec: 27852.9, 60 sec: 28262.4, 300 sec: 28269.3). Total num frames: 14893056. Throughput: 0: 6996.0. Samples: 2719420. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:11:19,361][50933] Avg episode reward: [(0, '5.235')]
+[2024-07-03 22:11:19,565][53901] Updated weights for policy 0, policy_version 3638 (0.0007)
+[2024-07-03 22:11:21,010][53901] Updated weights for policy 0, policy_version 3648 (0.0007)
+[2024-07-03 22:11:22,460][53901] Updated weights for policy 0, policy_version 3658 (0.0007)
+[2024-07-03 22:11:23,918][53901] Updated weights for policy 0, policy_version 3668 (0.0007)
+[2024-07-03 22:11:24,360][50933] Fps is (10 sec: 28262.4, 60 sec: 28194.2, 300 sec: 28269.3). Total num frames: 15036416. Throughput: 0: 6991.2. Samples: 2740474. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:11:24,361][50933] Avg episode reward: [(0, '4.706')]
+[2024-07-03 22:11:25,377][53901] Updated weights for policy 0, policy_version 3678 (0.0007)
+[2024-07-03 22:11:26,832][53901] Updated weights for policy 0, policy_version 3688 (0.0007)
+[2024-07-03 22:11:28,288][53901] Updated weights for policy 0, policy_version 3698 (0.0007)
+[2024-07-03 22:11:29,360][50933] Fps is (10 sec: 28262.4, 60 sec: 28125.9, 300 sec: 28255.5). Total num frames: 15175680. Throughput: 0: 6996.8. Samples: 2782680. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:11:29,361][50933] Avg episode reward: [(0, '4.924')]
+[2024-07-03 22:11:29,747][53901] Updated weights for policy 0, policy_version 3708 (0.0007)
+[2024-07-03 22:11:31,213][53901] Updated weights for policy 0, policy_version 3718 (0.0007)
+[2024-07-03 22:11:32,673][53901] Updated weights for policy 0, policy_version 3728 (0.0007)
+[2024-07-03 22:11:34,131][53901] Updated weights for policy 0, policy_version 3738 (0.0007)
+[2024-07-03 22:11:34,360][50933] Fps is (10 sec: 27852.8, 60 sec: 27989.3, 300 sec: 28241.6). Total num frames: 15314944. Throughput: 0: 7004.1. Samples: 2824782. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:11:34,361][50933] Avg episode reward: [(0, '4.807')]
+[2024-07-03 22:11:35,589][53901] Updated weights for policy 0, policy_version 3748 (0.0007)
+[2024-07-03 22:11:37,048][53901] Updated weights for policy 0, policy_version 3758 (0.0007)
+[2024-07-03 22:11:38,504][53901] Updated weights for policy 0, policy_version 3768 (0.0007)
+[2024-07-03 22:11:39,360][50933] Fps is (10 sec: 27853.0, 60 sec: 27989.4, 300 sec: 28255.5). Total num frames: 15454208. Throughput: 0: 7003.5. Samples: 2845764. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:11:39,361][50933] Avg episode reward: [(0, '4.506')]
+[2024-07-03 22:11:39,957][53901] Updated weights for policy 0, policy_version 3778 (0.0007)
+[2024-07-03 22:11:41,407][53901] Updated weights for policy 0, policy_version 3788 (0.0006)
+[2024-07-03 22:11:42,870][53901] Updated weights for policy 0, policy_version 3798 (0.0007)
+[2024-07-03 22:11:44,328][53901] Updated weights for policy 0, policy_version 3808 (0.0007)
+[2024-07-03 22:11:44,360][50933] Fps is (10 sec: 28262.4, 60 sec: 28057.6, 300 sec: 28255.5). Total num frames: 15597568. Throughput: 0: 7010.7. Samples: 2888056. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:11:44,361][50933] Avg episode reward: [(0, '4.636')]
+[2024-07-03 22:11:45,799][53901] Updated weights for policy 0, policy_version 3818 (0.0007)
+[2024-07-03 22:11:47,256][53901] Updated weights for policy 0, policy_version 3828 (0.0007)
+[2024-07-03 22:11:48,712][53901] Updated weights for policy 0, policy_version 3838 (0.0007)
+[2024-07-03 22:11:49,360][50933] Fps is (10 sec: 28262.1, 60 sec: 28057.6, 300 sec: 28255.5). Total num frames: 15736832. Throughput: 0: 7021.9. Samples: 2930146. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:11:49,361][50933] Avg episode reward: [(0, '4.829')]
+[2024-07-03 22:11:50,164][53901] Updated weights for policy 0, policy_version 3848 (0.0007)
+[2024-07-03 22:11:51,651][53901] Updated weights for policy 0, policy_version 3858 (0.0007)
+[2024-07-03 22:11:53,101][53901] Updated weights for policy 0, policy_version 3868 (0.0007)
+[2024-07-03 22:11:54,360][50933] Fps is (10 sec: 27852.9, 60 sec: 27989.3, 300 sec: 28241.6). Total num frames: 15876096. Throughput: 0: 7016.8. Samples: 2950986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:11:54,361][50933] Avg episode reward: [(0, '4.937')]
+[2024-07-03 22:11:54,546][53901] Updated weights for policy 0, policy_version 3878 (0.0007)
+[2024-07-03 22:11:55,992][53901] Updated weights for policy 0, policy_version 3888 (0.0007)
+[2024-07-03 22:11:57,435][53901] Updated weights for policy 0, policy_version 3898 (0.0007)
+[2024-07-03 22:11:58,908][53901] Updated weights for policy 0, policy_version 3908 (0.0007)
+[2024-07-03 22:11:59,360][50933] Fps is (10 sec: 28262.5, 60 sec: 28057.6, 300 sec: 28283.3). Total num frames: 16019456. Throughput: 0: 7026.5. Samples: 2993422. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:11:59,361][50933] Avg episode reward: [(0, '5.231')]
+[2024-07-03 22:12:00,357][53901] Updated weights for policy 0, policy_version 3918 (0.0007)
+[2024-07-03 22:12:01,810][53901] Updated weights for policy 0, policy_version 3928 (0.0007)
+[2024-07-03 22:12:03,275][53901] Updated weights for policy 0, policy_version 3938 (0.0007)
+[2024-07-03 22:12:04,360][50933] Fps is (10 sec: 28262.3, 60 sec: 28057.6, 300 sec: 28297.1). Total num frames: 16158720. Throughput: 0: 7025.1. Samples: 3035548. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:12:04,361][50933] Avg episode reward: [(0, '4.582')]
+[2024-07-03 22:12:04,731][53901] Updated weights for policy 0, policy_version 3948 (0.0007)
+[2024-07-03 22:12:06,193][53901] Updated weights for policy 0, policy_version 3958 (0.0007)
+[2024-07-03 22:12:07,651][53901] Updated weights for policy 0, policy_version 3968 (0.0007)
+[2024-07-03 22:12:09,103][53901] Updated weights for policy 0, policy_version 3978 (0.0007)
+[2024-07-03 22:12:09,360][50933] Fps is (10 sec: 27852.9, 60 sec: 28057.6, 300 sec: 28297.1). Total num frames: 16297984. Throughput: 0: 7023.6. Samples: 3056536. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:12:09,361][50933] Avg episode reward: [(0, '4.707')]
+[2024-07-03 22:12:10,574][53901] Updated weights for policy 0, policy_version 3988 (0.0007)
+[2024-07-03 22:12:12,043][53901] Updated weights for policy 0, policy_version 3998 (0.0007)
+[2024-07-03 22:12:13,503][53901] Updated weights for policy 0, policy_version 4008 (0.0007)
+[2024-07-03 22:12:14,360][50933] Fps is (10 sec: 27852.6, 60 sec: 28057.6, 300 sec: 28297.1). Total num frames: 16437248. Throughput: 0: 7019.0. Samples: 3098536. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:12:14,361][50933] Avg episode reward: [(0, '4.517')]
+[2024-07-03 22:12:14,962][53901] Updated weights for policy 0, policy_version 4018 (0.0007)
+[2024-07-03 22:12:16,449][53901] Updated weights for policy 0, policy_version 4028 (0.0007)
+[2024-07-03 22:12:17,907][53901] Updated weights for policy 0, policy_version 4038 (0.0007)
+[2024-07-03 22:12:19,359][53901] Updated weights for policy 0, policy_version 4048 (0.0007)
+[2024-07-03 22:12:19,360][50933] Fps is (10 sec: 28262.6, 60 sec: 28125.9, 300 sec: 28297.1). Total num frames: 16580608. Throughput: 0: 7016.1. Samples: 3140504. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:12:19,361][50933] Avg episode reward: [(0, '4.692')]
+[2024-07-03 22:12:20,818][53901] Updated weights for policy 0, policy_version 4058 (0.0007)
+[2024-07-03 22:12:22,279][53901] Updated weights for policy 0, policy_version 4068 (0.0007)
+[2024-07-03 22:12:23,748][53901] Updated weights for policy 0, policy_version 4078 (0.0007)
+[2024-07-03 22:12:24,360][50933] Fps is (10 sec: 28262.6, 60 sec: 28057.6, 300 sec: 28297.1). Total num frames: 16719872. Throughput: 0: 7019.5. Samples: 3161640. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:12:24,361][50933] Avg episode reward: [(0, '4.727')]
+[2024-07-03 22:12:25,214][53901] Updated weights for policy 0, policy_version 4088 (0.0007)
+[2024-07-03 22:12:26,670][53901] Updated weights for policy 0, policy_version 4098 (0.0007)
+[2024-07-03 22:12:28,127][53901] Updated weights for policy 0, policy_version 4108 (0.0007)
+[2024-07-03 22:12:29,360][50933] Fps is (10 sec: 27852.7, 60 sec: 28057.6, 300 sec: 28283.2). Total num frames: 16859136. Throughput: 0: 7013.4. Samples: 3203658. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:12:29,361][50933] Avg episode reward: [(0, '4.591')]
+[2024-07-03 22:12:29,588][53901] Updated weights for policy 0, policy_version 4118 (0.0007)
+[2024-07-03 22:12:31,075][53901] Updated weights for policy 0, policy_version 4128 (0.0007)
+[2024-07-03 22:12:32,532][53901] Updated weights for policy 0, policy_version 4138 (0.0007)
+[2024-07-03 22:12:33,993][53901] Updated weights for policy 0, policy_version 4148 (0.0007)
+[2024-07-03 22:12:34,360][50933] Fps is (10 sec: 27853.0, 60 sec: 28057.6, 300 sec: 28283.2). Total num frames: 16998400. Throughput: 0: 7008.3. Samples: 3245520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:12:34,361][50933] Avg episode reward: [(0, '4.962')]
+[2024-07-03 22:12:35,457][53901] Updated weights for policy 0, policy_version 4158 (0.0007)
+[2024-07-03 22:12:36,924][53901] Updated weights for policy 0, policy_version 4168 (0.0007)
+[2024-07-03 22:12:38,403][53901] Updated weights for policy 0, policy_version 4178 (0.0007)
+[2024-07-03 22:12:39,360][50933] Fps is (10 sec: 27852.6, 60 sec: 28057.6, 300 sec: 28269.3). Total num frames: 17137664. Throughput: 0: 7008.9. Samples: 3266388. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:12:39,361][50933] Avg episode reward: [(0, '4.741')]
+[2024-07-03 22:12:39,872][53901] Updated weights for policy 0, policy_version 4188 (0.0007)
+[2024-07-03 22:12:41,333][53901] Updated weights for policy 0, policy_version 4198 (0.0007)
+[2024-07-03 22:12:42,797][53901] Updated weights for policy 0, policy_version 4208 (0.0007)
+[2024-07-03 22:12:44,221][53901] Updated weights for policy 0, policy_version 4218 (0.0006)
+[2024-07-03 22:12:44,360][50933] Fps is (10 sec: 27852.5, 60 sec: 27989.3, 300 sec: 28269.3). Total num frames: 17276928. Throughput: 0: 6996.9. Samples: 3308284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:12:44,361][50933] Avg episode reward: [(0, '4.433')]
+[2024-07-03 22:12:44,364][53888] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000004219_17281024.pth...
+[2024-07-03 22:12:44,421][53888] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000002556_10469376.pth
+[2024-07-03 22:12:45,643][53901] Updated weights for policy 0, policy_version 4228 (0.0007)
+[2024-07-03 22:12:47,050][53901] Updated weights for policy 0, policy_version 4238 (0.0006)
+[2024-07-03 22:12:48,481][53901] Updated weights for policy 0, policy_version 4248 (0.0006)
+[2024-07-03 22:12:49,360][50933] Fps is (10 sec: 28671.8, 60 sec: 28125.9, 300 sec: 28297.1). Total num frames: 17424384. Throughput: 0: 7019.4. Samples: 3351420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:12:49,361][50933] Avg episode reward: [(0, '4.856')]
+[2024-07-03 22:12:49,921][53901] Updated weights for policy 0, policy_version 4258 (0.0006)
+[2024-07-03 22:12:51,374][53901] Updated weights for policy 0, policy_version 4268 (0.0007)
+[2024-07-03 22:12:52,830][53901] Updated weights for policy 0, policy_version 4278 (0.0006)
+[2024-07-03 22:12:54,262][53901] Updated weights for policy 0, policy_version 4288 (0.0007)
+[2024-07-03 22:12:54,360][50933] Fps is (10 sec: 28672.0, 60 sec: 28125.8, 300 sec: 28324.9). Total num frames: 17563648. Throughput: 0: 7027.6. Samples: 3372776. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:12:54,361][50933] Avg episode reward: [(0, '4.827')]
+[2024-07-03 22:12:55,695][53901] Updated weights for policy 0, policy_version 4298 (0.0007)
+[2024-07-03 22:12:57,110][53901] Updated weights for policy 0, policy_version 4308 (0.0006)
+[2024-07-03 22:12:58,553][53901] Updated weights for policy 0, policy_version 4318 (0.0007)
+[2024-07-03 22:12:59,360][50933] Fps is (10 sec: 28262.6, 60 sec: 28125.9, 300 sec: 28352.6). Total num frames: 17707008. Throughput: 0: 7049.0. Samples: 3415740. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:12:59,361][50933] Avg episode reward: [(0, '4.578')]
+[2024-07-03 22:12:59,987][53901] Updated weights for policy 0, policy_version 4328 (0.0007)
+[2024-07-03 22:13:01,401][53901] Updated weights for policy 0, policy_version 4338 (0.0007)
+[2024-07-03 22:13:02,819][53901] Updated weights for policy 0, policy_version 4348 (0.0007)
+[2024-07-03 22:13:04,246][53901] Updated weights for policy 0, policy_version 4358 (0.0007)
+[2024-07-03 22:13:04,360][50933] Fps is (10 sec: 28672.1, 60 sec: 28194.1, 300 sec: 28366.5). Total num frames: 17850368. Throughput: 0: 7071.0. Samples: 3458698. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:13:04,361][50933] Avg episode reward: [(0, '4.650')]
+[2024-07-03 22:13:05,655][53901] Updated weights for policy 0, policy_version 4368 (0.0007)
+[2024-07-03 22:13:07,055][53901] Updated weights for policy 0, policy_version 4378 (0.0006)
+[2024-07-03 22:13:08,472][53901] Updated weights for policy 0, policy_version 4388 (0.0007)
+[2024-07-03 22:13:09,360][50933] Fps is (10 sec: 29081.7, 60 sec: 28330.7, 300 sec: 28394.3). Total num frames: 17997824. Throughput: 0: 7083.5. Samples: 3480396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:13:09,361][50933] Avg episode reward: [(0, '4.697')]
+[2024-07-03 22:13:09,868][53901] Updated weights for policy 0, policy_version 4398 (0.0006)
+[2024-07-03 22:13:11,277][53901] Updated weights for policy 0, policy_version 4408 (0.0007)
+[2024-07-03 22:13:12,722][53901] Updated weights for policy 0, policy_version 4418 (0.0007)
+[2024-07-03 22:13:14,157][53901] Updated weights for policy 0, policy_version 4428 (0.0007)
+[2024-07-03 22:13:14,360][50933] Fps is (10 sec: 29081.6, 60 sec: 28399.0, 300 sec: 28394.3). Total num frames: 18141184. Throughput: 0: 7114.2. Samples: 3523796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:13:14,361][50933] Avg episode reward: [(0, '4.791')]
+[2024-07-03 22:13:15,595][53901] Updated weights for policy 0, policy_version 4438 (0.0007)
+[2024-07-03 22:13:17,018][53901] Updated weights for policy 0, policy_version 4448 (0.0007)
+[2024-07-03 22:13:18,474][53901] Updated weights for policy 0, policy_version 4458 (0.0006)
+[2024-07-03 22:13:19,360][50933] Fps is (10 sec: 28262.5, 60 sec: 28330.6, 300 sec: 28394.3). Total num frames: 18280448. Throughput: 0: 7128.9. Samples: 3566320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:13:19,361][50933] Avg episode reward: [(0, '4.766')]
+[2024-07-03 22:13:19,973][53901] Updated weights for policy 0, policy_version 4468 (0.0007)
+[2024-07-03 22:13:21,399][53901] Updated weights for policy 0, policy_version 4478 (0.0006)
+[2024-07-03 22:13:22,843][53901] Updated weights for policy 0, policy_version 4488 (0.0007)
+[2024-07-03 22:13:24,272][53901] Updated weights for policy 0, policy_version 4498 (0.0007)
+[2024-07-03 22:13:24,360][50933] Fps is (10 sec: 28262.5, 60 sec: 28398.9, 300 sec: 28408.2). Total num frames: 18423808. Throughput: 0: 7136.5. Samples: 3587528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-07-03 22:13:24,361][50933] Avg episode reward: [(0, '4.911')]
+[2024-07-03 22:13:25,692][53901] Updated weights for policy 0, policy_version 4508 (0.0006)
+[2024-07-03 22:13:27,108][53901] Updated weights for policy 0, policy_version 4518 (0.0007)
+[2024-07-03 22:13:28,515][53901] Updated weights for policy 0, policy_version 4528 (0.0006)
+[2024-07-03 22:13:29,360][50933] Fps is (10 sec: 29081.4, 60 sec: 28535.4, 300 sec: 28422.1). Total num frames: 18571264. Throughput: 0: 7166.2. Samples: 3630764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:13:29,361][50933] Avg episode reward: [(0, '5.014')]
+[2024-07-03 22:13:29,903][53901] Updated weights for policy 0, policy_version 4538 (0.0006)
+[2024-07-03 22:13:31,331][53901] Updated weights for policy 0, policy_version 4548 (0.0006)
+[2024-07-03 22:13:32,756][53901] Updated weights for policy 0, policy_version 4558 (0.0007)
+[2024-07-03 22:13:34,200][53901] Updated weights for policy 0, policy_version 4568 (0.0007)
+[2024-07-03 22:13:34,360][50933] Fps is (10 sec: 29081.1, 60 sec: 28603.6, 300 sec: 28435.9). Total num frames: 18714624. Throughput: 0: 7167.5. Samples: 3673956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-07-03 22:13:34,361][50933] Avg episode reward: [(0, '4.622')]
+[2024-07-03 22:13:35,617][53901] Updated weights for policy 0, policy_version 4578 (0.0007)
+[2024-07-03 22:13:37,026][53901] Updated weights for policy 0, policy_version 4588 (0.0006)
+[2024-07-03 22:13:38,433][53901] Updated weights for policy 0, policy_version 4598 (0.0006)
+[2024-07-03 22:13:39,360][50933] Fps is (10 sec: 28672.2, 60 sec: 28672.0, 300 sec: 28449.8). Total num frames: 18857984. Throughput: 0: 7174.3. Samples: 3695620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2024-07-03 22:13:39,361][50933] Avg episode reward: [(0, '4.836')]
+[2024-07-03 22:13:39,872][53901] Updated weights for policy 0, policy_version 4608 (0.0007)
+[2024-07-03 22:13:41,298][53901] Updated weights for policy 0, policy_version 4618 (0.0007)
+[2024-07-03 22:13:42,717][53901] Updated weights for policy 0, policy_version 4628 (0.0007)
+[2024-07-03 22:13:44,141][53901] Updated weights for policy 0, policy_version 4638 (0.0006)
+[2024-07-03 22:13:44,360][50933] Fps is (10 sec: 28672.3, 60 sec: 28740.3, 300 sec: 28449.8). Total num frames: 19001344. Throughput: 0: 7179.1. Samples: 3738800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2024-07-03 22:13:44,361][50933] Avg episode reward: [(0, '4.872')]
+[2024-07-03 22:13:45,564][53901] Updated weights for policy 0, policy_version 4648 (0.0007)
+[2024-07-03 22:13:46,968][53901] Updated weights for policy 0, policy_version 4658 (0.0007)
+[2024-07-03 22:13:48,380][53901] Updated weights for policy 0, policy_version 4668 (0.0006)
+[2024-07-03 22:13:49,360][50933] Fps is (10 sec: 29081.6, 60 sec: 28740.3, 300 sec: 28477.6). Total num frames: 19148800. Throughput: 0: 7191.4. Samples: 3782310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:13:49,361][50933] Avg episode reward: [(0, '4.625')]
+[2024-07-03 22:13:49,785][53901] Updated weights for policy 0, policy_version 4678 (0.0007)
+[2024-07-03 22:13:51,180][53901] Updated weights for policy 0, policy_version 4688 (0.0007)
+[2024-07-03 22:13:52,580][53901] Updated weights for policy 0, policy_version 4698 (0.0006)
+[2024-07-03 22:13:54,006][53901] Updated weights for policy 0, policy_version 4708 (0.0007)
+[2024-07-03 22:13:54,360][50933] Fps is (10 sec: 29081.7, 60 sec: 28808.6, 300 sec: 28491.5). Total num frames: 19292160. Throughput: 0: 7196.6. Samples: 3804244. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:13:54,361][50933] Avg episode reward: [(0, '5.176')]
+[2024-07-03 22:13:55,400][53901] Updated weights for policy 0, policy_version 4718 (0.0006)
+[2024-07-03 22:13:56,804][53901] Updated weights for policy 0, policy_version 4728 (0.0006)
+[2024-07-03 22:13:58,213][53901] Updated weights for policy 0, policy_version 4738 (0.0006)
+[2024-07-03 22:13:59,360][50933] Fps is (10 sec: 29081.7, 60 sec: 28876.8, 300 sec: 28505.4). Total num frames: 19439616. Throughput: 0: 7202.5. Samples: 3847910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-07-03 22:13:59,361][50933] Avg episode reward: [(0, '5.372')]
+[2024-07-03 22:13:59,362][53888] Saving new best policy, reward=5.372!
+[2024-07-03 22:13:59,633][53901] Updated weights for policy 0, policy_version 4748 (0.0006)
+[2024-07-03 22:14:01,067][53901] Updated weights for policy 0, policy_version 4758 (0.0007)
+[2024-07-03 22:14:02,509][53901] Updated weights for policy 0, policy_version 4768 (0.0006)
+[2024-07-03 22:14:03,924][53901] Updated weights for policy 0, policy_version 4778 (0.0006)
+[2024-07-03 22:14:04,360][50933] Fps is (10 sec: 29081.5, 60 sec: 28876.8, 300 sec: 28505.4). Total num frames: 19582976. Throughput: 0: 7215.9. Samples: 3891036. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:14:04,361][50933] Avg episode reward: [(0, '5.265')]
+[2024-07-03 22:14:05,315][53901] Updated weights for policy 0, policy_version 4788 (0.0006)
+[2024-07-03 22:14:06,733][53901] Updated weights for policy 0, policy_version 4798 (0.0006)
+[2024-07-03 22:14:08,141][53901] Updated weights for policy 0, policy_version 4808 (0.0006)
+[2024-07-03 22:14:09,360][50933] Fps is (10 sec: 28672.0, 60 sec: 28808.6, 300 sec: 28491.5). Total num frames: 19726336. Throughput: 0: 7231.6. Samples: 3912950. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:14:09,361][50933] Avg episode reward: [(0, '5.192')]
+[2024-07-03 22:14:09,545][53901] Updated weights for policy 0, policy_version 4818 (0.0007)
+[2024-07-03 22:14:10,958][53901] Updated weights for policy 0, policy_version 4828 (0.0006)
+[2024-07-03 22:14:12,372][53901] Updated weights for policy 0, policy_version 4838 (0.0006)
+[2024-07-03 22:14:13,785][53901] Updated weights for policy 0, policy_version 4848 (0.0006)
+[2024-07-03 22:14:14,360][50933] Fps is (10 sec: 29081.6, 60 sec: 28876.8, 300 sec: 28505.4). Total num frames: 19873792. Throughput: 0: 7238.3. Samples: 3956486. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-07-03 22:14:14,361][50933] Avg episode reward: [(0, '4.893')]
+[2024-07-03 22:14:15,187][53901] Updated weights for policy 0, policy_version 4858 (0.0006)
+[2024-07-03 22:14:16,600][53901] Updated weights for policy 0, policy_version 4868 (0.0006)
+[2024-07-03 22:14:18,006][53901] Updated weights for policy 0, policy_version 4878 (0.0007)
+[2024-07-03 22:14:18,847][53888] Stopping Batcher_0...
+[2024-07-03 22:14:18,847][53888] Loop batcher_evt_loop terminating...
+[2024-07-03 22:14:18,848][53888] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth...
+[2024-07-03 22:14:18,847][50933] Component Batcher_0 stopped!
+[2024-07-03 22:14:18,855][53904] Stopping RolloutWorker_w1...
+[2024-07-03 22:14:18,855][53904] Loop rollout_proc1_evt_loop terminating...
+[2024-07-03 22:14:18,855][50933] Component RolloutWorker_w1 stopped!
+[2024-07-03 22:14:18,855][53907] Stopping RolloutWorker_w6...
+[2024-07-03 22:14:18,855][53902] Stopping RolloutWorker_w0...
+[2024-07-03 22:14:18,856][53907] Loop rollout_proc6_evt_loop terminating...
+[2024-07-03 22:14:18,856][53903] Stopping RolloutWorker_w2...
+[2024-07-03 22:14:18,856][53902] Loop rollout_proc0_evt_loop terminating...
+[2024-07-03 22:14:18,856][53909] Stopping RolloutWorker_w7...
+[2024-07-03 22:14:18,856][53909] Loop rollout_proc7_evt_loop terminating...
+[2024-07-03 22:14:18,856][53903] Loop rollout_proc2_evt_loop terminating...
+[2024-07-03 22:14:18,856][53905] Stopping RolloutWorker_w3...
+[2024-07-03 22:14:18,856][50933] Component RolloutWorker_w6 stopped!
+[2024-07-03 22:14:18,856][53905] Loop rollout_proc3_evt_loop terminating...
+[2024-07-03 22:14:18,856][53908] Stopping RolloutWorker_w5...
+[2024-07-03 22:14:18,856][50933] Component RolloutWorker_w0 stopped!
+[2024-07-03 22:14:18,857][53906] Stopping RolloutWorker_w4...
+[2024-07-03 22:14:18,857][53908] Loop rollout_proc5_evt_loop terminating...
+[2024-07-03 22:14:18,857][53906] Loop rollout_proc4_evt_loop terminating...
+[2024-07-03 22:14:18,857][50933] Component RolloutWorker_w2 stopped!
+[2024-07-03 22:14:18,858][50933] Component RolloutWorker_w7 stopped!
+[2024-07-03 22:14:18,858][50933] Component RolloutWorker_w3 stopped!
+[2024-07-03 22:14:18,859][50933] Component RolloutWorker_w5 stopped!
+[2024-07-03 22:14:18,860][50933] Component RolloutWorker_w4 stopped!
+[2024-07-03 22:14:18,878][53901] Weights refcount: 2 0
+[2024-07-03 22:14:18,880][53901] Stopping InferenceWorker_p0-w0...
+[2024-07-03 22:14:18,880][53901] Loop inference_proc0-0_evt_loop terminating...
+[2024-07-03 22:14:18,880][50933] Component InferenceWorker_p0-w0 stopped!
+[2024-07-03 22:14:18,911][53888] Removing /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000003397_13914112.pth
+[2024-07-03 22:14:18,921][53888] Saving /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth...
+[2024-07-03 22:14:19,007][53888] Stopping LearnerWorker_p0...
+[2024-07-03 22:14:19,008][53888] Loop learner_proc0_evt_loop terminating...
+[2024-07-03 22:14:19,008][50933] Component LearnerWorker_p0 stopped!
+[2024-07-03 22:14:19,009][50933] Waiting for process learner_proc0 to stop...
+[2024-07-03 22:14:20,022][50933] Waiting for process inference_proc0-0 to join...
+[2024-07-03 22:14:20,023][50933] Waiting for process rollout_proc0 to join...
+[2024-07-03 22:14:20,023][50933] Waiting for process rollout_proc1 to join...
+[2024-07-03 22:14:20,024][50933] Waiting for process rollout_proc2 to join...
+[2024-07-03 22:14:20,024][50933] Waiting for process rollout_proc3 to join...
+[2024-07-03 22:14:20,024][50933] Waiting for process rollout_proc4 to join...
+[2024-07-03 22:14:20,025][50933] Waiting for process rollout_proc5 to join...
+[2024-07-03 22:14:20,025][50933] Waiting for process rollout_proc6 to join...
+[2024-07-03 22:14:20,026][50933] Waiting for process rollout_proc7 to join...
+[2024-07-03 22:14:20,027][50933] Batcher 0 profile tree view:
+batching: 22.9594, releasing_batches: 0.0789
+[2024-07-03 22:14:20,027][50933] InferenceWorker_p0-w0 profile tree view:
+wait_policy: 0.0000
+  wait_policy_total: 7.0915
+update_model: 8.6613
+  weight_update: 0.0006
+one_step: 0.0021
+  handle_policy_step: 524.9799
+    deserialize: 20.0910, stack: 2.8861, obs_to_device_normalize: 114.6728, forward: 288.2487, send_messages: 25.4504
+    prepare_outputs: 54.5863
+      to_cpu: 31.7156
+[2024-07-03 22:14:20,028][50933] Learner 0 profile tree view:
+misc: 0.0160, prepare_batch: 33.3789
+train: 80.9591
+  epoch_init: 0.0138, minibatch_init: 0.0148, losses_postprocess: 0.5249, kl_divergence: 0.5002, after_optimizer: 1.1878
+  calculate_losses: 29.0180
+    losses_init: 0.0073, forward_head: 1.6052, bptt_initial: 20.9158, tail: 1.4332, advantages_returns: 0.3681, losses: 1.8320
+    bptt: 2.4858
+      bptt_forward_core: 2.3782
+  update: 48.7717
+    clip: 1.6368
+[2024-07-03 22:14:20,028][50933] RolloutWorker_w0 profile tree view:
+wait_for_trajectories: 0.3002, enqueue_policy_requests: 16.7158, env_step: 215.2689, overhead: 18.1320, complete_rollouts: 0.5370
+save_policy_outputs: 15.6765
+  split_output_tensors: 7.7140
+[2024-07-03 22:14:20,028][50933] RolloutWorker_w7 profile tree view:
+wait_for_trajectories: 0.2958, enqueue_policy_requests: 17.3155, env_step: 220.1950, overhead: 18.2353, complete_rollouts: 0.5271
+save_policy_outputs: 15.8013
+  split_output_tensors: 7.6335
+[2024-07-03 22:14:20,029][50933] Loop Runner_EvtLoop terminating...
+[2024-07-03 22:14:20,029][50933] Runner profile tree view:
+main_loop: 573.3217
+[2024-07-03 22:14:20,030][50933] Collected {0: 20004864}, FPS: 27905.8
+[2024-07-03 22:15:10,016][50933] Loading existing experiment configuration from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/config.json
+[2024-07-03 22:15:10,017][50933] Overriding arg 'num_workers' with value 1 passed from command line
+[2024-07-03 22:15:10,018][50933] Adding new argument 'no_render'=True that is not in the saved config file!
+[2024-07-03 22:15:10,018][50933] Adding new argument 'save_video'=True that is not in the saved config file!
+[2024-07-03 22:15:10,018][50933] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2024-07-03 22:15:10,019][50933] Adding new argument 'video_name'=None that is not in the saved config file!
+[2024-07-03 22:15:10,019][50933] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2024-07-03 22:15:10,019][50933] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2024-07-03 22:15:10,020][50933] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2024-07-03 22:15:10,020][50933] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2024-07-03 22:15:10,020][50933] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2024-07-03 22:15:10,020][50933] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2024-07-03 22:15:10,021][50933] Adding new argument 'train_script'=None that is not in the saved config file!
+[2024-07-03 22:15:10,021][50933] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2024-07-03 22:15:10,021][50933] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2024-07-03 22:15:10,041][50933] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-07-03 22:15:10,043][50933] RunningMeanStd input shape: (3, 72, 128)
+[2024-07-03 22:15:10,044][50933] RunningMeanStd input shape: (1,)
+[2024-07-03 22:15:10,054][50933] ConvEncoder: input_channels=3
+[2024-07-03 22:15:10,117][50933] Conv encoder output size: 512
+[2024-07-03 22:15:10,118][50933] Policy head output size: 512
+[2024-07-03 22:15:11,787][50933] Loading state from checkpoint /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth...
+[2024-07-03 22:15:12,594][50933] Num frames 100...
+[2024-07-03 22:15:12,659][50933] Num frames 200...
+[2024-07-03 22:15:12,723][50933] Num frames 300...
+[2024-07-03 22:15:12,788][50933] Num frames 400...
+[2024-07-03 22:15:12,895][50933] Avg episode rewards: #0: 6.800, true rewards: #0: 4.800
+[2024-07-03 22:15:12,895][50933] Avg episode reward: 6.800, avg true_objective: 4.800
+[2024-07-03 22:15:12,917][50933] Num frames 500...
+[2024-07-03 22:15:12,981][50933] Num frames 600...
+[2024-07-03 22:15:13,044][50933] Num frames 700...
+[2024-07-03 22:15:13,109][50933] Num frames 800...
+[2024-07-03 22:15:13,205][50933] Avg episode rewards: #0: 5.320, true rewards: #0: 4.320
+[2024-07-03 22:15:13,206][50933] Avg episode reward: 5.320, avg true_objective: 4.320
+[2024-07-03 22:15:13,235][50933] Num frames 900...
+[2024-07-03 22:15:13,298][50933] Num frames 1000...
+[2024-07-03 22:15:13,364][50933] Num frames 1100...
+[2024-07-03 22:15:13,430][50933] Num frames 1200...
+[2024-07-03 22:15:13,494][50933] Num frames 1300...
+[2024-07-03 22:15:13,556][50933] Avg episode rewards: #0: 5.373, true rewards: #0: 4.373
+[2024-07-03 22:15:13,557][50933] Avg episode reward: 5.373, avg true_objective: 4.373
+[2024-07-03 22:15:13,616][50933] Num frames 1400...
+[2024-07-03 22:15:13,677][50933] Num frames 1500...
+[2024-07-03 22:15:13,738][50933] Num frames 1600...
+[2024-07-03 22:15:13,800][50933] Num frames 1700...
+[2024-07-03 22:15:13,872][50933] Avg episode rewards: #0: 5.320, true rewards: #0: 4.320
+[2024-07-03 22:15:13,873][50933] Avg episode reward: 5.320, avg true_objective: 4.320
+[2024-07-03 22:15:13,923][50933] Num frames 1800...
+[2024-07-03 22:15:13,987][50933] Num frames 1900...
+[2024-07-03 22:15:14,049][50933] Num frames 2000...
+[2024-07-03 22:15:14,114][50933] Num frames 2100...
+[2024-07-03 22:15:14,175][50933] Avg episode rewards: #0: 5.024, true rewards: #0: 4.224
+[2024-07-03 22:15:14,176][50933] Avg episode reward: 5.024, avg true_objective: 4.224
+[2024-07-03 22:15:14,232][50933] Num frames 2200...
+[2024-07-03 22:15:14,292][50933] Num frames 2300...
+[2024-07-03 22:15:14,388][50933] Avg episode rewards: #0: 4.613, true rewards: #0: 3.947
+[2024-07-03 22:15:14,389][50933] Avg episode reward: 4.613, avg true_objective: 3.947
+[2024-07-03 22:15:14,416][50933] Num frames 2400...
+[2024-07-03 22:15:14,491][50933] Num frames 2500...
+[2024-07-03 22:15:14,555][50933] Num frames 2600...
+[2024-07-03 22:15:14,616][50933] Num frames 2700...
+[2024-07-03 22:15:14,677][50933] Num frames 2800...
+[2024-07-03 22:15:14,741][50933] Avg episode rewards: #0: 4.737, true rewards: #0: 4.023
+[2024-07-03 22:15:14,742][50933] Avg episode reward: 4.737, avg true_objective: 4.023
+[2024-07-03 22:15:14,801][50933] Num frames 2900...
+[2024-07-03 22:15:14,865][50933] Num frames 3000...
+[2024-07-03 22:15:14,929][50933] Num frames 3100...
+[2024-07-03 22:15:14,990][50933] Num frames 3200...
+[2024-07-03 22:15:15,064][50933] Avg episode rewards: #0: 4.790, true rewards: #0: 4.040
+[2024-07-03 22:15:15,065][50933] Avg episode reward: 4.790, avg true_objective: 4.040
+[2024-07-03 22:15:15,115][50933] Num frames 3300...
+[2024-07-03 22:15:15,176][50933] Num frames 3400...
+[2024-07-03 22:15:15,238][50933] Num frames 3500...
+[2024-07-03 22:15:15,301][50933] Num frames 3600...
+[2024-07-03 22:15:15,366][50933] Avg episode rewards: #0: 4.684, true rewards: #0: 4.018
+[2024-07-03 22:15:15,367][50933] Avg episode reward: 4.684, avg true_objective: 4.018
+[2024-07-03 22:15:15,424][50933] Num frames 3700...
+[2024-07-03 22:15:15,487][50933] Num frames 3800...
+[2024-07-03 22:15:15,549][50933] Num frames 3900...
+[2024-07-03 22:15:15,614][50933] Num frames 4000...
+[2024-07-03 22:15:15,666][50933] Avg episode rewards: #0: 4.600, true rewards: #0: 4.000
+[2024-07-03 22:15:15,667][50933] Avg episode reward: 4.600, avg true_objective: 4.000
+[2024-07-03 22:15:19,237][50933] Replay video saved to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/replay.mp4!
+[2024-07-03 22:18:14,027][50933] Loading existing experiment configuration from /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/config.json
+[2024-07-03 22:18:14,028][50933] Overriding arg 'num_workers' with value 1 passed from command line
+[2024-07-03 22:18:14,029][50933] Adding new argument 'no_render'=True that is not in the saved config file!
+[2024-07-03 22:18:14,029][50933] Adding new argument 'save_video'=True that is not in the saved config file!
+[2024-07-03 22:18:14,029][50933] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2024-07-03 22:18:14,029][50933] Adding new argument 'video_name'=None that is not in the saved config file!
+[2024-07-03 22:18:14,030][50933] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2024-07-03 22:18:14,030][50933] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2024-07-03 22:18:14,030][50933] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2024-07-03 22:18:14,030][50933] Adding new argument 'hf_repository'='ra9hu/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2024-07-03 22:18:14,030][50933] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2024-07-03 22:18:14,031][50933] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2024-07-03 22:18:14,031][50933] Adding new argument 'train_script'=None that is not in the saved config file!
+[2024-07-03 22:18:14,031][50933] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2024-07-03 22:18:14,031][50933] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2024-07-03 22:18:14,044][50933] RunningMeanStd input shape: (3, 72, 128)
+[2024-07-03 22:18:14,045][50933] RunningMeanStd input shape: (1,)
+[2024-07-03 22:18:14,051][50933] ConvEncoder: input_channels=3
+[2024-07-03 22:18:14,073][50933] Conv encoder output size: 512
+[2024-07-03 22:18:14,073][50933] Policy head output size: 512
+[2024-07-03 22:18:14,090][50933] Loading state from checkpoint /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth...
+[2024-07-03 22:18:14,664][50933] Num frames 100...
+[2024-07-03 22:18:14,725][50933] Num frames 200...
+[2024-07-03 22:18:14,787][50933] Num frames 300...
+[2024-07-03 22:18:14,849][50933] Num frames 400...
+[2024-07-03 22:18:14,912][50933] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160
+[2024-07-03 22:18:14,914][50933] Avg episode reward: 5.160, avg true_objective: 4.160
+[2024-07-03 22:18:14,971][50933] Num frames 500...
+[2024-07-03 22:18:15,032][50933] Num frames 600...
+[2024-07-03 22:18:15,094][50933] Num frames 700...
+[2024-07-03 22:18:15,158][50933] Num frames 800...
+[2024-07-03 22:18:15,251][50933] Avg episode rewards: #0: 5.320, true rewards: #0: 4.320
+[2024-07-03 22:18:15,252][50933] Avg episode reward: 5.320, avg true_objective: 4.320
+[2024-07-03 22:18:15,281][50933] Num frames 900...
+[2024-07-03 22:18:15,340][50933] Num frames 1000...
+[2024-07-03 22:18:15,400][50933] Num frames 1100...
+[2024-07-03 22:18:15,460][50933] Num frames 1200...
+[2024-07-03 22:18:15,523][50933] Num frames 1300...
+[2024-07-03 22:18:15,583][50933] Num frames 1400...
+[2024-07-03 22:18:15,664][50933] Avg episode rewards: #0: 6.467, true rewards: #0: 4.800
+[2024-07-03 22:18:15,665][50933] Avg episode reward: 6.467, avg true_objective: 4.800
+[2024-07-03 22:18:15,707][50933] Num frames 1500...
+[2024-07-03 22:18:15,767][50933] Num frames 1600...
+[2024-07-03 22:18:15,829][50933] Num frames 1700...
+[2024-07-03 22:18:15,890][50933] Num frames 1800...
+[2024-07-03 22:18:15,959][50933] Avg episode rewards: #0: 5.810, true rewards: #0: 4.560
+[2024-07-03 22:18:15,960][50933] Avg episode reward: 5.810, avg true_objective: 4.560
+[2024-07-03 22:18:16,011][50933] Num frames 1900...
+[2024-07-03 22:18:16,071][50933] Num frames 2000...
+[2024-07-03 22:18:16,143][50933] Num frames 2100...
+[2024-07-03 22:18:16,205][50933] Num frames 2200...
+[2024-07-03 22:18:16,264][50933] Avg episode rewards: #0: 5.416, true rewards: #0: 4.416
+[2024-07-03 22:18:16,266][50933] Avg episode reward: 5.416, avg true_objective: 4.416
+[2024-07-03 22:18:16,326][50933] Num frames 2300...
+[2024-07-03 22:18:16,390][50933] Num frames 2400...
+[2024-07-03 22:18:16,452][50933] Num frames 2500...
+[2024-07-03 22:18:16,517][50933] Num frames 2600...
+[2024-07-03 22:18:16,586][50933] Avg episode rewards: #0: 5.373, true rewards: #0: 4.373
+[2024-07-03 22:18:16,587][50933] Avg episode reward: 5.373, avg true_objective: 4.373
+[2024-07-03 22:18:16,641][50933] Num frames 2700...
+[2024-07-03 22:18:16,705][50933] Num frames 2800...
+[2024-07-03 22:18:16,768][50933] Num frames 2900...
+[2024-07-03 22:18:16,832][50933] Num frames 3000...
+[2024-07-03 22:18:16,890][50933] Avg episode rewards: #0: 5.154, true rewards: #0: 4.297
+[2024-07-03 22:18:16,891][50933] Avg episode reward: 5.154, avg true_objective: 4.297
+[2024-07-03 22:18:16,952][50933] Num frames 3100...
+[2024-07-03 22:18:17,015][50933] Num frames 3200...
+[2024-07-03 22:18:17,077][50933] Num frames 3300...
+[2024-07-03 22:18:17,185][50933] Avg episode rewards: #0: 4.990, true rewards: #0: 4.240
+[2024-07-03 22:18:17,186][50933] Avg episode reward: 4.990, avg true_objective: 4.240
+[2024-07-03 22:18:17,195][50933] Num frames 3400...
+[2024-07-03 22:18:17,256][50933] Num frames 3500...
+[2024-07-03 22:18:17,315][50933] Num frames 3600...
+[2024-07-03 22:18:17,374][50933] Num frames 3700...
+[2024-07-03 22:18:17,436][50933] Num frames 3800...
+[2024-07-03 22:18:17,494][50933] Avg episode rewards: #0: 5.009, true rewards: #0: 4.231
+[2024-07-03 22:18:17,496][50933] Avg episode reward: 5.009, avg true_objective: 4.231
+[2024-07-03 22:18:17,556][50933] Num frames 3900...
+[2024-07-03 22:18:17,617][50933] Num frames 4000...
+[2024-07-03 22:18:17,678][50933] Num frames 4100...
+[2024-07-03 22:18:17,758][50933] Avg episode rewards: #0: 4.839, true rewards: #0: 4.139
+[2024-07-03 22:18:17,759][50933] Avg episode reward: 4.839, avg true_objective: 4.139
+[2024-07-03 22:18:21,364][50933] Replay video saved to /home/raghu/DL/topics/RL/unit8B-AsyncPPO-SampleFactory/train_dir/default_experiment/replay.mp4!