[2024-07-26 14:45:42,134][00197] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-07-26 14:45:42,139][00197] Rollout worker 0 uses device cpu [2024-07-26 14:45:42,143][00197] Rollout worker 1 uses device cpu [2024-07-26 14:45:42,146][00197] Rollout worker 2 uses device cpu [2024-07-26 14:45:42,147][00197] Rollout worker 3 uses device cpu [2024-07-26 14:45:42,151][00197] Rollout worker 4 uses device cpu [2024-07-26 14:45:42,154][00197] Rollout worker 5 uses device cpu [2024-07-26 14:45:42,158][00197] Rollout worker 6 uses device cpu [2024-07-26 14:45:42,170][00197] Rollout worker 7 uses device cpu [2024-07-26 14:45:42,420][00197] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-26 14:45:42,422][00197] InferenceWorker_p0-w0: min num requests: 2 [2024-07-26 14:45:42,467][00197] Starting all processes... [2024-07-26 14:45:42,468][00197] Starting process learner_proc0 [2024-07-26 14:45:42,581][00197] Starting all processes... [2024-07-26 14:45:42,635][00197] Starting process inference_proc0-0 [2024-07-26 14:45:42,636][00197] Starting process rollout_proc0 [2024-07-26 14:45:42,637][00197] Starting process rollout_proc1 [2024-07-26 14:45:42,638][00197] Starting process rollout_proc2 [2024-07-26 14:45:42,638][00197] Starting process rollout_proc3 [2024-07-26 14:45:42,638][00197] Starting process rollout_proc4 [2024-07-26 14:45:42,638][00197] Starting process rollout_proc5 [2024-07-26 14:45:42,638][00197] Starting process rollout_proc6 [2024-07-26 14:45:42,638][00197] Starting process rollout_proc7 [2024-07-26 14:45:55,476][05677] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-26 14:45:55,479][05677] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-07-26 14:45:55,561][05677] Num visible devices: 1 [2024-07-26 14:45:55,593][05677] Starting seed is not provided [2024-07-26 14:45:55,594][05677] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-26 14:45:55,595][05677] Initializing actor-critic model on device cuda:0 [2024-07-26 14:45:55,595][05677] RunningMeanStd input shape: (3, 72, 128) [2024-07-26 14:45:55,597][05677] RunningMeanStd input shape: (1,) [2024-07-26 14:45:55,706][05677] ConvEncoder: input_channels=3 [2024-07-26 14:45:55,927][05695] Worker 4 uses CPU cores [0] [2024-07-26 14:45:56,097][05697] Worker 6 uses CPU cores [0] [2024-07-26 14:45:56,140][05698] Worker 7 uses CPU cores [1] [2024-07-26 14:45:56,349][05694] Worker 3 uses CPU cores [1] [2024-07-26 14:45:56,483][05693] Worker 2 uses CPU cores [0] [2024-07-26 14:45:56,486][05691] Worker 0 uses CPU cores [0] [2024-07-26 14:45:56,496][05696] Worker 5 uses CPU cores [1] [2024-07-26 14:45:56,526][05677] Conv encoder output size: 512 [2024-07-26 14:45:56,527][05677] Policy head output size: 512 [2024-07-26 14:45:56,568][05690] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-26 14:45:56,568][05690] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-07-26 14:45:56,572][05692] Worker 1 uses CPU cores [1] [2024-07-26 14:45:56,580][05677] Created Actor Critic model with architecture: [2024-07-26 14:45:56,595][05690] Num visible devices: 1 [2024-07-26 14:45:56,584][05677] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-07-26 14:46:01,739][05677] Using optimizer [2024-07-26 14:46:01,741][05677] No checkpoints found [2024-07-26 14:46:01,741][05677] Did not load from checkpoint, starting from scratch! [2024-07-26 14:46:01,741][05677] Initialized policy 0 weights for model version 0 [2024-07-26 14:46:01,744][05677] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-26 14:46:01,752][05677] LearnerWorker_p0 finished initialization! [2024-07-26 14:46:01,946][05690] RunningMeanStd input shape: (3, 72, 128) [2024-07-26 14:46:01,949][05690] RunningMeanStd input shape: (1,) [2024-07-26 14:46:01,968][05690] ConvEncoder: input_channels=3 [2024-07-26 14:46:02,073][05690] Conv encoder output size: 512 [2024-07-26 14:46:02,073][05690] Policy head output size: 512 [2024-07-26 14:46:02,404][00197] Heartbeat connected on Batcher_0 [2024-07-26 14:46:02,414][00197] Heartbeat connected on LearnerWorker_p0 [2024-07-26 14:46:02,430][00197] Heartbeat connected on RolloutWorker_w0 [2024-07-26 14:46:02,438][00197] Heartbeat connected on RolloutWorker_w1 [2024-07-26 14:46:02,441][00197] Heartbeat connected on RolloutWorker_w2 [2024-07-26 14:46:02,446][00197] Heartbeat connected on RolloutWorker_w3 [2024-07-26 14:46:02,457][00197] Heartbeat connected on RolloutWorker_w4 [2024-07-26 14:46:02,460][00197] Heartbeat connected on RolloutWorker_w5 [2024-07-26 14:46:02,464][00197] Heartbeat connected on RolloutWorker_w6 [2024-07-26 14:46:02,467][00197] Heartbeat connected on RolloutWorker_w7 [2024-07-26 14:46:02,555][00197] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-26 14:46:03,697][00197] Inference worker 0-0 is ready! [2024-07-26 14:46:03,699][00197] All inference workers are ready! Signal rollout workers to start! [2024-07-26 14:46:03,702][00197] Heartbeat connected on InferenceWorker_p0-w0 [2024-07-26 14:46:03,836][05694] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 14:46:03,846][05696] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 14:46:03,883][05691] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 14:46:03,888][05698] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 14:46:03,885][05695] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 14:46:03,904][05692] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 14:46:03,933][05693] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 14:46:03,955][05697] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 14:46:05,959][05695] Decorrelating experience for 0 frames... [2024-07-26 14:46:05,961][05691] Decorrelating experience for 0 frames... [2024-07-26 14:46:05,958][05694] Decorrelating experience for 0 frames... [2024-07-26 14:46:05,959][05692] Decorrelating experience for 0 frames... [2024-07-26 14:46:05,961][05696] Decorrelating experience for 0 frames... [2024-07-26 14:46:05,962][05698] Decorrelating experience for 0 frames... [2024-07-26 14:46:07,465][05695] Decorrelating experience for 32 frames... [2024-07-26 14:46:07,464][05693] Decorrelating experience for 0 frames... [2024-07-26 14:46:07,471][05697] Decorrelating experience for 0 frames... [2024-07-26 14:46:07,555][00197] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-26 14:46:07,829][05696] Decorrelating experience for 32 frames... [2024-07-26 14:46:07,832][05692] Decorrelating experience for 32 frames... [2024-07-26 14:46:07,835][05694] Decorrelating experience for 32 frames... [2024-07-26 14:46:08,867][05698] Decorrelating experience for 32 frames... [2024-07-26 14:46:09,141][05694] Decorrelating experience for 64 frames... [2024-07-26 14:46:09,554][05691] Decorrelating experience for 32 frames... [2024-07-26 14:46:09,830][05695] Decorrelating experience for 64 frames... [2024-07-26 14:46:09,960][05697] Decorrelating experience for 32 frames... [2024-07-26 14:46:09,963][05693] Decorrelating experience for 32 frames... [2024-07-26 14:46:10,464][05698] Decorrelating experience for 64 frames... [2024-07-26 14:46:11,013][05694] Decorrelating experience for 96 frames... [2024-07-26 14:46:11,183][05695] Decorrelating experience for 96 frames... [2024-07-26 14:46:11,258][05696] Decorrelating experience for 64 frames... [2024-07-26 14:46:11,409][05697] Decorrelating experience for 64 frames... [2024-07-26 14:46:11,411][05693] Decorrelating experience for 64 frames... [2024-07-26 14:46:11,421][05692] Decorrelating experience for 64 frames... [2024-07-26 14:46:12,397][05691] Decorrelating experience for 64 frames... [2024-07-26 14:46:12,409][05696] Decorrelating experience for 96 frames... [2024-07-26 14:46:12,535][05698] Decorrelating experience for 96 frames... [2024-07-26 14:46:12,555][00197] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-26 14:46:12,656][05693] Decorrelating experience for 96 frames... [2024-07-26 14:46:13,210][05692] Decorrelating experience for 96 frames... [2024-07-26 14:46:13,538][05697] Decorrelating experience for 96 frames... [2024-07-26 14:46:13,589][05691] Decorrelating experience for 96 frames... [2024-07-26 14:46:16,927][05677] Signal inference workers to stop experience collection... [2024-07-26 14:46:16,959][05690] InferenceWorker_p0-w0: stopping experience collection [2024-07-26 14:46:17,555][00197] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 94.3. Samples: 1414. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-26 14:46:17,556][00197] Avg episode reward: [(0, '2.100')] [2024-07-26 14:46:19,102][05677] Signal inference workers to resume experience collection... [2024-07-26 14:46:19,103][05690] InferenceWorker_p0-w0: resuming experience collection [2024-07-26 14:46:22,555][00197] Fps is (10 sec: 1228.7, 60 sec: 614.4, 300 sec: 614.4). Total num frames: 12288. Throughput: 0: 168.1. Samples: 3362. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-07-26 14:46:22,560][00197] Avg episode reward: [(0, '3.041')] [2024-07-26 14:46:27,555][00197] Fps is (10 sec: 2867.2, 60 sec: 1146.9, 300 sec: 1146.9). Total num frames: 28672. Throughput: 0: 216.1. Samples: 5402. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-26 14:46:27,562][00197] Avg episode reward: [(0, '3.735')] [2024-07-26 14:46:30,289][05690] Updated weights for policy 0, policy_version 10 (0.0032) [2024-07-26 14:46:32,555][00197] Fps is (10 sec: 3686.7, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 49152. Throughput: 0: 368.3. Samples: 11048. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-07-26 14:46:32,557][00197] Avg episode reward: [(0, '4.456')] [2024-07-26 14:46:37,555][00197] Fps is (10 sec: 4096.0, 60 sec: 1989.5, 300 sec: 1989.5). Total num frames: 69632. Throughput: 0: 499.2. Samples: 17472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:46:37,557][00197] Avg episode reward: [(0, '4.407')] [2024-07-26 14:46:41,869][05690] Updated weights for policy 0, policy_version 20 (0.0019) [2024-07-26 14:46:42,555][00197] Fps is (10 sec: 3276.8, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 81920. Throughput: 0: 491.5. Samples: 19660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:46:42,557][00197] Avg episode reward: [(0, '4.272')] [2024-07-26 14:46:47,555][00197] Fps is (10 sec: 3276.8, 60 sec: 2275.6, 300 sec: 2275.6). Total num frames: 102400. Throughput: 0: 538.5. Samples: 24232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:46:47,558][00197] Avg episode reward: [(0, '4.298')] [2024-07-26 14:46:47,567][05677] Saving new best policy, reward=4.298! [2024-07-26 14:46:52,326][05690] Updated weights for policy 0, policy_version 30 (0.0021) [2024-07-26 14:46:52,555][00197] Fps is (10 sec: 4096.0, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 122880. Throughput: 0: 677.6. Samples: 30490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:46:52,557][00197] Avg episode reward: [(0, '4.406')] [2024-07-26 14:46:52,560][05677] Saving new best policy, reward=4.406! [2024-07-26 14:46:57,559][00197] Fps is (10 sec: 3275.2, 60 sec: 2457.4, 300 sec: 2457.4). Total num frames: 135168. Throughput: 0: 746.0. Samples: 33572. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-26 14:46:57,566][00197] Avg episode reward: [(0, '4.312')] [2024-07-26 14:47:02,555][00197] Fps is (10 sec: 2867.2, 60 sec: 2525.9, 300 sec: 2525.9). Total num frames: 151552. Throughput: 0: 791.3. Samples: 37024. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-26 14:47:02,557][00197] Avg episode reward: [(0, '4.375')] [2024-07-26 14:47:05,008][05690] Updated weights for policy 0, policy_version 40 (0.0026) [2024-07-26 14:47:07,555][00197] Fps is (10 sec: 3688.2, 60 sec: 2867.2, 300 sec: 2646.6). Total num frames: 172032. Throughput: 0: 885.2. Samples: 43194. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-26 14:47:07,557][00197] Avg episode reward: [(0, '4.552')] [2024-07-26 14:47:07,567][05677] Saving new best policy, reward=4.552! [2024-07-26 14:47:12,558][00197] Fps is (10 sec: 4094.7, 60 sec: 3208.4, 300 sec: 2750.1). Total num frames: 192512. Throughput: 0: 910.8. Samples: 46392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:47:12,567][00197] Avg episode reward: [(0, '4.579')] [2024-07-26 14:47:12,575][05677] Saving new best policy, reward=4.579! [2024-07-26 14:47:16,717][05690] Updated weights for policy 0, policy_version 50 (0.0017) [2024-07-26 14:47:17,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 2730.7). Total num frames: 204800. Throughput: 0: 891.5. Samples: 51166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:47:17,558][00197] Avg episode reward: [(0, '4.500')] [2024-07-26 14:47:22,555][00197] Fps is (10 sec: 3277.8, 60 sec: 3549.9, 300 sec: 2816.0). Total num frames: 225280. Throughput: 0: 860.1. Samples: 56178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:47:22,560][00197] Avg episode reward: [(0, '4.434')] [2024-07-26 14:47:27,057][05690] Updated weights for policy 0, policy_version 60 (0.0023) [2024-07-26 14:47:27,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 2891.3). Total num frames: 245760. Throughput: 0: 884.0. Samples: 59442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:47:27,560][00197] Avg episode reward: [(0, '4.427')] [2024-07-26 14:47:32,555][00197] Fps is (10 sec: 3686.1, 60 sec: 3549.8, 300 sec: 2912.7). Total num frames: 262144. Throughput: 0: 911.9. Samples: 65266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:47:32,558][00197] Avg episode reward: [(0, '4.460')] [2024-07-26 14:47:37,557][00197] Fps is (10 sec: 3276.2, 60 sec: 3481.5, 300 sec: 2931.8). Total num frames: 278528. Throughput: 0: 866.9. Samples: 69504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:47:37,561][00197] Avg episode reward: [(0, '4.680')] [2024-07-26 14:47:37,571][05677] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth... [2024-07-26 14:47:37,701][05677] Saving new best policy, reward=4.680! [2024-07-26 14:47:39,356][05690] Updated weights for policy 0, policy_version 70 (0.0029) [2024-07-26 14:47:42,555][00197] Fps is (10 sec: 3686.7, 60 sec: 3618.1, 300 sec: 2990.1). Total num frames: 299008. Throughput: 0: 866.6. Samples: 72566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:47:42,557][00197] Avg episode reward: [(0, '4.490')] [2024-07-26 14:47:47,555][00197] Fps is (10 sec: 4096.7, 60 sec: 3618.1, 300 sec: 3042.7). Total num frames: 319488. Throughput: 0: 935.3. Samples: 79112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:47:47,557][00197] Avg episode reward: [(0, '4.414')] [2024-07-26 14:47:50,464][05690] Updated weights for policy 0, policy_version 80 (0.0017) [2024-07-26 14:47:52,557][00197] Fps is (10 sec: 3276.1, 60 sec: 3481.5, 300 sec: 3016.1). Total num frames: 331776. Throughput: 0: 891.9. Samples: 83330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 14:47:52,558][00197] Avg episode reward: [(0, '4.356')] [2024-07-26 14:47:57,555][00197] Fps is (10 sec: 3276.9, 60 sec: 3618.4, 300 sec: 3063.1). Total num frames: 352256. Throughput: 0: 871.7. Samples: 85616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:47:57,561][00197] Avg episode reward: [(0, '4.218')] [2024-07-26 14:48:01,209][05690] Updated weights for policy 0, policy_version 90 (0.0024) [2024-07-26 14:48:02,555][00197] Fps is (10 sec: 4096.8, 60 sec: 3686.4, 300 sec: 3106.1). Total num frames: 372736. Throughput: 0: 912.1. Samples: 92212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:48:02,559][00197] Avg episode reward: [(0, '4.299')] [2024-07-26 14:48:07,557][00197] Fps is (10 sec: 3685.7, 60 sec: 3618.0, 300 sec: 3112.9). Total num frames: 389120. Throughput: 0: 921.5. Samples: 97648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 14:48:07,559][00197] Avg episode reward: [(0, '4.411')] [2024-07-26 14:48:12,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3119.3). Total num frames: 405504. Throughput: 0: 894.9. Samples: 99714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:48:12,561][00197] Avg episode reward: [(0, '4.409')] [2024-07-26 14:48:13,250][05690] Updated weights for policy 0, policy_version 100 (0.0016) [2024-07-26 14:48:17,555][00197] Fps is (10 sec: 3687.1, 60 sec: 3686.4, 300 sec: 3155.4). Total num frames: 425984. Throughput: 0: 895.8. Samples: 105576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:48:17,559][00197] Avg episode reward: [(0, '4.510')] [2024-07-26 14:48:22,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3189.0). Total num frames: 446464. Throughput: 0: 940.8. Samples: 111840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 14:48:22,561][00197] Avg episode reward: [(0, '4.530')] [2024-07-26 14:48:23,922][05690] Updated weights for policy 0, policy_version 110 (0.0018) [2024-07-26 14:48:27,558][00197] Fps is (10 sec: 3275.6, 60 sec: 3549.6, 300 sec: 3163.7). Total num frames: 458752. Throughput: 0: 917.5. Samples: 113858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:48:27,561][00197] Avg episode reward: [(0, '4.423')] [2024-07-26 14:48:32,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3194.9). Total num frames: 479232. Throughput: 0: 877.7. Samples: 118610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:48:32,557][00197] Avg episode reward: [(0, '4.341')] [2024-07-26 14:48:35,337][05690] Updated weights for policy 0, policy_version 120 (0.0018) [2024-07-26 14:48:37,555][00197] Fps is (10 sec: 4097.5, 60 sec: 3686.5, 300 sec: 3223.9). Total num frames: 499712. Throughput: 0: 928.0. Samples: 125090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:48:37,557][00197] Avg episode reward: [(0, '4.455')] [2024-07-26 14:48:42,556][00197] Fps is (10 sec: 3685.7, 60 sec: 3618.0, 300 sec: 3225.6). Total num frames: 516096. Throughput: 0: 941.9. Samples: 128004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:48:42,559][00197] Avg episode reward: [(0, '4.638')] [2024-07-26 14:48:47,264][05690] Updated weights for policy 0, policy_version 130 (0.0013) [2024-07-26 14:48:47,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3227.2). Total num frames: 532480. Throughput: 0: 886.9. Samples: 132122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:48:47,557][00197] Avg episode reward: [(0, '4.691')] [2024-07-26 14:48:47,564][05677] Saving new best policy, reward=4.691! [2024-07-26 14:48:52,557][00197] Fps is (10 sec: 3686.1, 60 sec: 3686.4, 300 sec: 3252.7). Total num frames: 552960. Throughput: 0: 903.0. Samples: 138282. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-26 14:48:52,565][00197] Avg episode reward: [(0, '4.430')] [2024-07-26 14:48:57,460][05690] Updated weights for policy 0, policy_version 140 (0.0032) [2024-07-26 14:48:57,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 573440. Throughput: 0: 929.5. Samples: 141542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-26 14:48:57,559][00197] Avg episode reward: [(0, '4.564')] [2024-07-26 14:49:02,560][00197] Fps is (10 sec: 3275.8, 60 sec: 3549.5, 300 sec: 3253.9). Total num frames: 585728. Throughput: 0: 901.5. Samples: 146148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:49:02,563][00197] Avg episode reward: [(0, '4.630')] [2024-07-26 14:49:07,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3276.8). Total num frames: 606208. Throughput: 0: 885.3. Samples: 151680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:49:07,559][00197] Avg episode reward: [(0, '4.524')] [2024-07-26 14:49:09,183][05690] Updated weights for policy 0, policy_version 150 (0.0025) [2024-07-26 14:49:12,555][00197] Fps is (10 sec: 4098.3, 60 sec: 3686.4, 300 sec: 3298.4). Total num frames: 626688. Throughput: 0: 913.5. Samples: 154962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:49:12,557][00197] Avg episode reward: [(0, '4.444')] [2024-07-26 14:49:17,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3297.8). Total num frames: 643072. Throughput: 0: 934.3. Samples: 160652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:49:17,559][00197] Avg episode reward: [(0, '4.523')] [2024-07-26 14:49:20,902][05690] Updated weights for policy 0, policy_version 160 (0.0027) [2024-07-26 14:49:22,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3297.3). Total num frames: 659456. Throughput: 0: 892.0. Samples: 165230. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-26 14:49:22,557][00197] Avg episode reward: [(0, '4.492')] [2024-07-26 14:49:27,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3316.8). Total num frames: 679936. Throughput: 0: 897.7. Samples: 168398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 14:49:27,557][00197] Avg episode reward: [(0, '4.424')] [2024-07-26 14:49:30,285][05690] Updated weights for policy 0, policy_version 170 (0.0022) [2024-07-26 14:49:32,555][00197] Fps is (10 sec: 4095.7, 60 sec: 3686.4, 300 sec: 3335.3). Total num frames: 700416. Throughput: 0: 952.2. Samples: 174972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:49:32,558][00197] Avg episode reward: [(0, '4.415')] [2024-07-26 14:49:37,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3314.9). Total num frames: 712704. Throughput: 0: 908.2. Samples: 179148. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-26 14:49:37,557][00197] Avg episode reward: [(0, '4.343')] [2024-07-26 14:49:37,571][05677] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000174_712704.pth... [2024-07-26 14:49:42,405][05690] Updated weights for policy 0, policy_version 180 (0.0028) [2024-07-26 14:49:42,555][00197] Fps is (10 sec: 3686.7, 60 sec: 3686.5, 300 sec: 3351.3). Total num frames: 737280. Throughput: 0: 896.8. Samples: 181896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:49:42,558][00197] Avg episode reward: [(0, '4.294')] [2024-07-26 14:49:47,558][00197] Fps is (10 sec: 4504.1, 60 sec: 3754.5, 300 sec: 3367.8). Total num frames: 757760. Throughput: 0: 941.9. Samples: 188532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-26 14:49:47,561][00197] Avg episode reward: [(0, '4.502')] [2024-07-26 14:49:52,558][00197] Fps is (10 sec: 3685.2, 60 sec: 3686.4, 300 sec: 3365.8). Total num frames: 774144. Throughput: 0: 931.4. Samples: 193594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:49:52,560][00197] Avg episode reward: [(0, '4.553')] [2024-07-26 14:49:53,919][05690] Updated weights for policy 0, policy_version 190 (0.0029) [2024-07-26 14:49:57,555][00197] Fps is (10 sec: 3277.9, 60 sec: 3618.1, 300 sec: 3363.9). Total num frames: 790528. Throughput: 0: 905.1. Samples: 195690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:49:57,557][00197] Avg episode reward: [(0, '4.669')] [2024-07-26 14:50:02,555][00197] Fps is (10 sec: 3687.6, 60 sec: 3755.0, 300 sec: 3379.2). Total num frames: 811008. Throughput: 0: 921.9. Samples: 202138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:50:02,557][00197] Avg episode reward: [(0, '4.706')] [2024-07-26 14:50:02,591][05677] Saving new best policy, reward=4.706! [2024-07-26 14:50:03,674][05690] Updated weights for policy 0, policy_version 200 (0.0032) [2024-07-26 14:50:07,559][00197] Fps is (10 sec: 4094.3, 60 sec: 3754.4, 300 sec: 3393.8). Total num frames: 831488. Throughput: 0: 950.1. Samples: 207990. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-26 14:50:07,562][00197] Avg episode reward: [(0, '4.431')] [2024-07-26 14:50:12,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3375.1). Total num frames: 843776. Throughput: 0: 924.5. Samples: 210002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:50:12,559][00197] Avg episode reward: [(0, '4.516')] [2024-07-26 14:50:15,677][05690] Updated weights for policy 0, policy_version 210 (0.0032) [2024-07-26 14:50:17,555][00197] Fps is (10 sec: 3278.2, 60 sec: 3686.4, 300 sec: 3389.2). Total num frames: 864256. Throughput: 0: 901.7. Samples: 215546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:50:17,561][00197] Avg episode reward: [(0, '4.430')] [2024-07-26 14:50:22,555][00197] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3418.6). Total num frames: 888832. Throughput: 0: 954.0. Samples: 222078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:50:22,559][00197] Avg episode reward: [(0, '4.435')] [2024-07-26 14:50:26,646][05690] Updated weights for policy 0, policy_version 220 (0.0014) [2024-07-26 14:50:27,556][00197] Fps is (10 sec: 3686.1, 60 sec: 3686.3, 300 sec: 3400.4). Total num frames: 901120. Throughput: 0: 944.6. Samples: 224406. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-26 14:50:27,560][00197] Avg episode reward: [(0, '4.428')] [2024-07-26 14:50:32,555][00197] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3398.2). Total num frames: 917504. Throughput: 0: 898.2. Samples: 228950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:50:32,557][00197] Avg episode reward: [(0, '4.340')] [2024-07-26 14:50:37,147][05690] Updated weights for policy 0, policy_version 230 (0.0020) [2024-07-26 14:50:37,555][00197] Fps is (10 sec: 4096.4, 60 sec: 3822.9, 300 sec: 3425.7). Total num frames: 942080. Throughput: 0: 932.8. Samples: 235566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:50:37,557][00197] Avg episode reward: [(0, '4.353')] [2024-07-26 14:50:42,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3423.1). Total num frames: 958464. Throughput: 0: 959.6. Samples: 238870. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-26 14:50:42,557][00197] Avg episode reward: [(0, '4.606')] [2024-07-26 14:50:47,555][00197] Fps is (10 sec: 3276.7, 60 sec: 3618.3, 300 sec: 3420.5). Total num frames: 974848. Throughput: 0: 906.7. Samples: 242942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:50:47,563][00197] Avg episode reward: [(0, '4.656')] [2024-07-26 14:50:48,960][05690] Updated weights for policy 0, policy_version 240 (0.0014) [2024-07-26 14:50:52,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3432.2). Total num frames: 995328. Throughput: 0: 914.9. Samples: 249156. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-07-26 14:50:52,559][00197] Avg episode reward: [(0, '4.520')] [2024-07-26 14:50:57,558][00197] Fps is (10 sec: 4094.6, 60 sec: 3754.4, 300 sec: 3443.4). Total num frames: 1015808. Throughput: 0: 940.0. Samples: 252306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:50:57,561][00197] Avg episode reward: [(0, '4.359')] [2024-07-26 14:50:59,522][05690] Updated weights for policy 0, policy_version 250 (0.0015) [2024-07-26 14:51:02,555][00197] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3499.0). Total num frames: 1032192. Throughput: 0: 929.2. Samples: 257360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:51:02,560][00197] Avg episode reward: [(0, '4.449')] [2024-07-26 14:51:07,555][00197] Fps is (10 sec: 3278.0, 60 sec: 3618.4, 300 sec: 3554.5). Total num frames: 1048576. Throughput: 0: 899.8. Samples: 262568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:51:07,562][00197] Avg episode reward: [(0, '4.521')] [2024-07-26 14:51:10,668][05690] Updated weights for policy 0, policy_version 260 (0.0026) [2024-07-26 14:51:12,556][00197] Fps is (10 sec: 3686.1, 60 sec: 3754.6, 300 sec: 3623.9). Total num frames: 1069056. Throughput: 0: 919.6. Samples: 265788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:51:12,563][00197] Avg episode reward: [(0, '4.414')] [2024-07-26 14:51:17,556][00197] Fps is (10 sec: 3686.1, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 1085440. Throughput: 0: 949.1. Samples: 271660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:51:17,562][00197] Avg episode reward: [(0, '4.481')] [2024-07-26 14:51:22,527][05690] Updated weights for policy 0, policy_version 270 (0.0018) [2024-07-26 14:51:22,555][00197] Fps is (10 sec: 3686.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 1105920. Throughput: 0: 897.5. Samples: 275954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:51:22,557][00197] Avg episode reward: [(0, '4.643')] [2024-07-26 14:51:27,556][00197] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 1126400. Throughput: 0: 896.1. Samples: 279196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:51:27,558][00197] Avg episode reward: [(0, '4.637')] [2024-07-26 14:51:32,557][00197] Fps is (10 sec: 3685.4, 60 sec: 3754.5, 300 sec: 3637.8). Total num frames: 1142784. Throughput: 0: 953.9. Samples: 285870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 14:51:32,560][00197] Avg episode reward: [(0, '4.643')] [2024-07-26 14:51:32,794][05690] Updated weights for policy 0, policy_version 280 (0.0014) [2024-07-26 14:51:37,557][00197] Fps is (10 sec: 3276.4, 60 sec: 3618.0, 300 sec: 3651.7). Total num frames: 1159168. Throughput: 0: 910.5. Samples: 290132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 14:51:37,561][00197] Avg episode reward: [(0, '4.719')] [2024-07-26 14:51:37,572][05677] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000283_1159168.pth... [2024-07-26 14:51:37,759][05677] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth [2024-07-26 14:51:37,778][05677] Saving new best policy, reward=4.719! [2024-07-26 14:51:42,555][00197] Fps is (10 sec: 3687.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1179648. Throughput: 0: 896.2. Samples: 292630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 14:51:42,557][00197] Avg episode reward: [(0, '4.737')] [2024-07-26 14:51:42,565][05677] Saving new best policy, reward=4.737! [2024-07-26 14:51:44,270][05690] Updated weights for policy 0, policy_version 290 (0.0019) [2024-07-26 14:51:47,555][00197] Fps is (10 sec: 4096.9, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 1200128. Throughput: 0: 928.6. Samples: 299146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 14:51:47,562][00197] Avg episode reward: [(0, '4.613')] [2024-07-26 14:51:52,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1216512. Throughput: 0: 928.3. Samples: 304342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:51:52,557][00197] Avg episode reward: [(0, '4.599')] [2024-07-26 14:51:56,224][05690] Updated weights for policy 0, policy_version 300 (0.0030) [2024-07-26 14:51:57,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3665.6). Total num frames: 1232896. Throughput: 0: 903.2. Samples: 306432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:51:57,556][00197] Avg episode reward: [(0, '4.914')] [2024-07-26 14:51:57,572][05677] Saving new best policy, reward=4.914! [2024-07-26 14:52:02,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1253376. Throughput: 0: 911.4. Samples: 312672. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-26 14:52:02,557][00197] Avg episode reward: [(0, '4.883')] [2024-07-26 14:52:05,529][05690] Updated weights for policy 0, policy_version 310 (0.0013) [2024-07-26 14:52:07,555][00197] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 1273856. Throughput: 0: 954.6. Samples: 318910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:52:07,563][00197] Avg episode reward: [(0, '5.048')] [2024-07-26 14:52:07,573][05677] Saving new best policy, reward=5.048! [2024-07-26 14:52:12,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3665.6). Total num frames: 1286144. Throughput: 0: 926.2. Samples: 320872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:52:12,561][00197] Avg episode reward: [(0, '4.916')] [2024-07-26 14:52:17,291][05690] Updated weights for policy 0, policy_version 320 (0.0021) [2024-07-26 14:52:17,555][00197] Fps is (10 sec: 3686.2, 60 sec: 3754.7, 300 sec: 3679.4). Total num frames: 1310720. Throughput: 0: 899.8. Samples: 326358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 14:52:17,559][00197] Avg episode reward: [(0, '5.049')] [2024-07-26 14:52:22,557][00197] Fps is (10 sec: 4504.7, 60 sec: 3754.5, 300 sec: 3679.4). Total num frames: 1331200. Throughput: 0: 952.5. Samples: 332996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:52:22,563][00197] Avg episode reward: [(0, '5.030')] [2024-07-26 14:52:27,555][00197] Fps is (10 sec: 3277.0, 60 sec: 3618.2, 300 sec: 3665.6). Total num frames: 1343488. Throughput: 0: 952.5. Samples: 335494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 14:52:27,560][00197] Avg episode reward: [(0, '4.997')] [2024-07-26 14:52:29,127][05690] Updated weights for policy 0, policy_version 330 (0.0026) [2024-07-26 14:52:32,555][00197] Fps is (10 sec: 3277.4, 60 sec: 3686.6, 300 sec: 3679.5). Total num frames: 1363968. Throughput: 0: 905.7. Samples: 339904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:52:32,557][00197] Avg episode reward: [(0, '5.056')] [2024-07-26 14:52:32,561][05677] Saving new best policy, reward=5.056! [2024-07-26 14:52:37,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3679.5). Total num frames: 1384448. Throughput: 0: 936.2. Samples: 346472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:52:37,560][00197] Avg episode reward: [(0, '5.529')] [2024-07-26 14:52:37,572][05677] Saving new best policy, reward=5.529! [2024-07-26 14:52:38,850][05690] Updated weights for policy 0, policy_version 340 (0.0012) [2024-07-26 14:52:42,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1404928. Throughput: 0: 963.9. Samples: 349806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 14:52:42,557][00197] Avg episode reward: [(0, '5.802')] [2024-07-26 14:52:42,560][05677] Saving new best policy, reward=5.802! [2024-07-26 14:52:47,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 1417216. Throughput: 0: 916.7. Samples: 353924. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-26 14:52:47,561][00197] Avg episode reward: [(0, '5.943')] [2024-07-26 14:52:47,569][05677] Saving new best policy, reward=5.943! [2024-07-26 14:52:50,801][05690] Updated weights for policy 0, policy_version 350 (0.0017) [2024-07-26 14:52:52,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1437696. Throughput: 0: 912.0. Samples: 359950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:52:52,559][00197] Avg episode reward: [(0, '6.171')] [2024-07-26 14:52:52,577][05677] Saving new best policy, reward=6.171! [2024-07-26 14:52:57,555][00197] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 1462272. Throughput: 0: 940.4. Samples: 363188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:52:57,560][00197] Avg episode reward: [(0, '6.373')] [2024-07-26 14:52:57,572][05677] Saving new best policy, reward=6.373! [2024-07-26 14:53:02,121][05690] Updated weights for policy 0, policy_version 360 (0.0027) [2024-07-26 14:53:02,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1474560. Throughput: 0: 929.7. Samples: 368194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:53:02,560][00197] Avg episode reward: [(0, '6.333')] [2024-07-26 14:53:07,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1495040. Throughput: 0: 899.4. Samples: 373468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:53:07,560][00197] Avg episode reward: [(0, '7.062')] [2024-07-26 14:53:07,570][05677] Saving new best policy, reward=7.062! [2024-07-26 14:53:12,220][05690] Updated weights for policy 0, policy_version 370 (0.0023) [2024-07-26 14:53:12,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 1515520. Throughput: 0: 916.5. Samples: 376736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:53:12,564][00197] Avg episode reward: [(0, '7.369')] [2024-07-26 14:53:12,565][05677] Saving new best policy, reward=7.369! [2024-07-26 14:53:17,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1531904. Throughput: 0: 952.7. Samples: 382774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:53:17,565][00197] Avg episode reward: [(0, '7.167')] [2024-07-26 14:53:22,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3693.4). Total num frames: 1548288. Throughput: 0: 899.6. Samples: 386954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 14:53:22,557][00197] Avg episode reward: [(0, '6.833')] [2024-07-26 14:53:24,244][05690] Updated weights for policy 0, policy_version 380 (0.0017) [2024-07-26 14:53:27,555][00197] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1568768. Throughput: 0: 900.1. Samples: 390310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:53:27,562][00197] Avg episode reward: [(0, '6.958')] [2024-07-26 14:53:32,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1589248. Throughput: 0: 952.8. Samples: 396800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:53:32,556][00197] Avg episode reward: [(0, '7.228')] [2024-07-26 14:53:34,677][05690] Updated weights for policy 0, policy_version 390 (0.0015) [2024-07-26 14:53:37,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 1601536. Throughput: 0: 920.1. Samples: 401354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:53:37,559][00197] Avg episode reward: [(0, '7.569')] [2024-07-26 14:53:37,591][05677] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000392_1605632.pth... [2024-07-26 14:53:37,769][05677] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000174_712704.pth [2024-07-26 14:53:37,805][05677] Saving new best policy, reward=7.569! [2024-07-26 14:53:42,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 1622016. Throughput: 0: 900.9. Samples: 403728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:53:42,560][00197] Avg episode reward: [(0, '7.505')] [2024-07-26 14:53:45,591][05690] Updated weights for policy 0, policy_version 400 (0.0019) [2024-07-26 14:53:47,555][00197] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3707.3). Total num frames: 1646592. Throughput: 0: 935.6. Samples: 410294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:53:47,558][00197] Avg episode reward: [(0, '8.179')] [2024-07-26 14:53:47,569][05677] Saving new best policy, reward=8.179! [2024-07-26 14:53:52,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1658880. Throughput: 0: 938.7. Samples: 415708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:53:52,563][00197] Avg episode reward: [(0, '8.183')] [2024-07-26 14:53:52,572][05677] Saving new best policy, reward=8.183! [2024-07-26 14:53:57,555][00197] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3693.4). Total num frames: 1675264. Throughput: 0: 911.7. Samples: 417762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:53:57,557][00197] Avg episode reward: [(0, '8.139')] [2024-07-26 14:53:57,726][05690] Updated weights for policy 0, policy_version 410 (0.0037) [2024-07-26 14:54:02,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1699840. Throughput: 0: 911.2. Samples: 423778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:54:02,560][00197] Avg episode reward: [(0, '8.351')] [2024-07-26 14:54:02,563][05677] Saving new best policy, reward=8.351! [2024-07-26 14:54:07,481][05690] Updated weights for policy 0, policy_version 420 (0.0018) [2024-07-26 14:54:07,555][00197] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1720320. Throughput: 0: 958.8. Samples: 430098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:54:07,558][00197] Avg episode reward: [(0, '8.136')] [2024-07-26 14:54:12,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 1732608. Throughput: 0: 930.7. Samples: 432192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:54:12,561][00197] Avg episode reward: [(0, '8.183')] [2024-07-26 14:54:17,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1753088. Throughput: 0: 902.4. Samples: 437410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:54:17,562][00197] Avg episode reward: [(0, '8.057')] [2024-07-26 14:54:18,840][05690] Updated weights for policy 0, policy_version 430 (0.0022) [2024-07-26 14:54:22,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1773568. Throughput: 0: 947.5. Samples: 443990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:54:22,560][00197] Avg episode reward: [(0, '8.146')] [2024-07-26 14:54:27,555][00197] Fps is (10 sec: 3686.2, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1789952. Throughput: 0: 957.4. Samples: 446810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:54:27,561][00197] Avg episode reward: [(0, '8.266')] [2024-07-26 14:54:31,047][05690] Updated weights for policy 0, policy_version 440 (0.0017) [2024-07-26 14:54:32,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 1806336. Throughput: 0: 902.8. Samples: 450918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:54:32,560][00197] Avg episode reward: [(0, '8.553')] [2024-07-26 14:54:32,563][05677] Saving new best policy, reward=8.553! [2024-07-26 14:54:37,555][00197] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 1830912. Throughput: 0: 927.6. Samples: 457448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:54:37,559][00197] Avg episode reward: [(0, '9.073')] [2024-07-26 14:54:37,567][05677] Saving new best policy, reward=9.073! [2024-07-26 14:54:40,349][05690] Updated weights for policy 0, policy_version 450 (0.0012) [2024-07-26 14:54:42,555][00197] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3693.4). Total num frames: 1847296. Throughput: 0: 951.5. Samples: 460578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:54:42,560][00197] Avg episode reward: [(0, '10.099')] [2024-07-26 14:54:42,565][05677] Saving new best policy, reward=10.099! [2024-07-26 14:54:47,564][00197] Fps is (10 sec: 3273.8, 60 sec: 3617.6, 300 sec: 3693.3). Total num frames: 1863680. Throughput: 0: 916.2. Samples: 465016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:54:47,568][00197] Avg episode reward: [(0, '10.372')] [2024-07-26 14:54:47,586][05677] Saving new best policy, reward=10.372! [2024-07-26 14:54:52,416][05690] Updated weights for policy 0, policy_version 460 (0.0080) [2024-07-26 14:54:52,555][00197] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1884160. Throughput: 0: 903.2. Samples: 470742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:54:52,561][00197] Avg episode reward: [(0, '10.243')] [2024-07-26 14:54:57,555][00197] Fps is (10 sec: 4099.8, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 1904640. Throughput: 0: 928.4. Samples: 473972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:54:57,564][00197] Avg episode reward: [(0, '9.524')] [2024-07-26 14:55:02,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 1916928. Throughput: 0: 930.8. Samples: 479296. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:55:02,560][00197] Avg episode reward: [(0, '9.713')] [2024-07-26 14:55:04,393][05690] Updated weights for policy 0, policy_version 470 (0.0012) [2024-07-26 14:55:07,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 1937408. Throughput: 0: 890.0. Samples: 484038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 14:55:07,560][00197] Avg episode reward: [(0, '10.001')] [2024-07-26 14:55:12,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1957888. Throughput: 0: 900.3. Samples: 487324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:55:12,559][00197] Avg episode reward: [(0, '10.363')] [2024-07-26 14:55:14,020][05690] Updated weights for policy 0, policy_version 480 (0.0021) [2024-07-26 14:55:17,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1978368. Throughput: 0: 953.6. Samples: 493828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:55:17,561][00197] Avg episode reward: [(0, '11.120')] [2024-07-26 14:55:17,570][05677] Saving new best policy, reward=11.120! [2024-07-26 14:55:22,555][00197] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3693.4). Total num frames: 1990656. Throughput: 0: 901.9. Samples: 498034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:55:22,557][00197] Avg episode reward: [(0, '11.222')] [2024-07-26 14:55:22,561][05677] Saving new best policy, reward=11.222! [2024-07-26 14:55:25,930][05690] Updated weights for policy 0, policy_version 490 (0.0020) [2024-07-26 14:55:27,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2011136. Throughput: 0: 898.5. Samples: 501010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:55:27,557][00197] Avg episode reward: [(0, '11.903')] [2024-07-26 14:55:27,566][05677] Saving new best policy, reward=11.903! [2024-07-26 14:55:32,555][00197] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2031616. Throughput: 0: 942.5. Samples: 507420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:55:32,564][00197] Avg episode reward: [(0, '12.969')] [2024-07-26 14:55:32,569][05677] Saving new best policy, reward=12.969! [2024-07-26 14:55:37,471][05690] Updated weights for policy 0, policy_version 500 (0.0016) [2024-07-26 14:55:37,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2048000. Throughput: 0: 917.4. Samples: 512026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:55:37,559][00197] Avg episode reward: [(0, '13.239')] [2024-07-26 14:55:37,571][05677] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000500_2048000.pth... [2024-07-26 14:55:37,762][05677] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000283_1159168.pth [2024-07-26 14:55:37,785][05677] Saving new best policy, reward=13.239! [2024-07-26 14:55:42,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2064384. Throughput: 0: 891.4. Samples: 514084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 14:55:42,562][00197] Avg episode reward: [(0, '13.123')] [2024-07-26 14:55:47,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3687.0, 300 sec: 3693.3). Total num frames: 2084864. Throughput: 0: 918.7. Samples: 520638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 14:55:47,562][00197] Avg episode reward: [(0, '13.546')] [2024-07-26 14:55:47,572][05677] Saving new best policy, reward=13.546! [2024-07-26 14:55:47,827][05690] Updated weights for policy 0, policy_version 510 (0.0013) [2024-07-26 14:55:52,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 2101248. Throughput: 0: 936.4. Samples: 526176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:55:52,557][00197] Avg episode reward: [(0, '13.614')] [2024-07-26 14:55:52,559][05677] Saving new best policy, reward=13.614! [2024-07-26 14:55:57,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 2117632. Throughput: 0: 906.7. Samples: 528124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:55:57,557][00197] Avg episode reward: [(0, '13.615')] [2024-07-26 14:56:00,165][05690] Updated weights for policy 0, policy_version 520 (0.0015) [2024-07-26 14:56:02,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2138112. Throughput: 0: 884.7. Samples: 533638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 14:56:02,557][00197] Avg episode reward: [(0, '13.841')] [2024-07-26 14:56:02,560][05677] Saving new best policy, reward=13.841! [2024-07-26 14:56:07,557][00197] Fps is (10 sec: 4095.1, 60 sec: 3686.3, 300 sec: 3693.3). Total num frames: 2158592. Throughput: 0: 930.7. Samples: 539918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:56:07,559][00197] Avg episode reward: [(0, '13.290')] [2024-07-26 14:56:11,835][05690] Updated weights for policy 0, policy_version 530 (0.0033) [2024-07-26 14:56:12,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 2170880. Throughput: 0: 909.9. Samples: 541956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:56:12,560][00197] Avg episode reward: [(0, '13.162')] [2024-07-26 14:56:17,555][00197] Fps is (10 sec: 2867.8, 60 sec: 3481.6, 300 sec: 3665.6). Total num frames: 2187264. Throughput: 0: 865.3. Samples: 546358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 14:56:17,557][00197] Avg episode reward: [(0, '13.048')] [2024-07-26 14:56:22,555][00197] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2207744. Throughput: 0: 901.8. Samples: 552608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:56:22,562][00197] Avg episode reward: [(0, '14.188')] [2024-07-26 14:56:22,567][05677] Saving new best policy, reward=14.188! [2024-07-26 14:56:22,571][05690] Updated weights for policy 0, policy_version 540 (0.0029) [2024-07-26 14:56:27,555][00197] Fps is (10 sec: 3686.1, 60 sec: 3549.8, 300 sec: 3665.6). Total num frames: 2224128. Throughput: 0: 919.6. Samples: 555468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:56:27,562][00197] Avg episode reward: [(0, '14.215')] [2024-07-26 14:56:27,575][05677] Saving new best policy, reward=14.215! [2024-07-26 14:56:32,555][00197] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3651.7). Total num frames: 2236416. Throughput: 0: 856.1. Samples: 559162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:56:32,562][00197] Avg episode reward: [(0, '14.902')] [2024-07-26 14:56:32,568][05677] Saving new best policy, reward=14.902! [2024-07-26 14:56:35,819][05690] Updated weights for policy 0, policy_version 550 (0.0012) [2024-07-26 14:56:37,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3651.7). Total num frames: 2256896. Throughput: 0: 860.1. Samples: 564880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:56:37,559][00197] Avg episode reward: [(0, '16.317')] [2024-07-26 14:56:37,575][05677] Saving new best policy, reward=16.317! [2024-07-26 14:56:42,560][00197] Fps is (10 sec: 4093.7, 60 sec: 3549.5, 300 sec: 3651.6). Total num frames: 2277376. Throughput: 0: 888.5. Samples: 568110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:56:42,565][00197] Avg episode reward: [(0, '15.609')] [2024-07-26 14:56:47,041][05690] Updated weights for policy 0, policy_version 560 (0.0025) [2024-07-26 14:56:47,555][00197] Fps is (10 sec: 3686.6, 60 sec: 3481.6, 300 sec: 3651.7). Total num frames: 2293760. Throughput: 0: 877.1. Samples: 573108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:56:47,560][00197] Avg episode reward: [(0, '15.111')] [2024-07-26 14:56:52,555][00197] Fps is (10 sec: 3688.5, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 2314240. Throughput: 0: 858.4. Samples: 578542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 14:56:52,556][00197] Avg episode reward: [(0, '14.518')] [2024-07-26 14:56:56,820][05690] Updated weights for policy 0, policy_version 570 (0.0014) [2024-07-26 14:56:57,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2334720. Throughput: 0: 888.0. Samples: 581914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:56:57,562][00197] Avg episode reward: [(0, '13.546')] [2024-07-26 14:57:02,556][00197] Fps is (10 sec: 3685.8, 60 sec: 3549.8, 300 sec: 3651.7). Total num frames: 2351104. Throughput: 0: 922.1. Samples: 587854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:57:02,563][00197] Avg episode reward: [(0, '14.035')] [2024-07-26 14:57:07,555][00197] Fps is (10 sec: 3276.9, 60 sec: 3481.7, 300 sec: 3665.6). Total num frames: 2367488. Throughput: 0: 881.7. Samples: 592286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:57:07,556][00197] Avg episode reward: [(0, '14.725')] [2024-07-26 14:57:08,732][05690] Updated weights for policy 0, policy_version 580 (0.0012) [2024-07-26 14:57:12,555][00197] Fps is (10 sec: 4096.7, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2392064. Throughput: 0: 892.4. Samples: 595626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:57:12,560][00197] Avg episode reward: [(0, '15.636')] [2024-07-26 14:57:17,555][00197] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2412544. Throughput: 0: 959.9. Samples: 602358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:57:17,561][00197] Avg episode reward: [(0, '15.084')] [2024-07-26 14:57:18,766][05690] Updated weights for policy 0, policy_version 590 (0.0024) [2024-07-26 14:57:22,557][00197] Fps is (10 sec: 3275.9, 60 sec: 3618.0, 300 sec: 3665.5). Total num frames: 2424832. Throughput: 0: 930.5. Samples: 606754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:57:22,562][00197] Avg episode reward: [(0, '14.236')] [2024-07-26 14:57:27,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2445312. Throughput: 0: 920.6. Samples: 609532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-26 14:57:27,561][00197] Avg episode reward: [(0, '14.800')] [2024-07-26 14:57:29,667][05690] Updated weights for policy 0, policy_version 600 (0.0013) [2024-07-26 14:57:32,555][00197] Fps is (10 sec: 4506.8, 60 sec: 3891.2, 300 sec: 3679.5). Total num frames: 2469888. Throughput: 0: 958.8. Samples: 616254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 14:57:32,558][00197] Avg episode reward: [(0, '15.556')] [2024-07-26 14:57:37,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 2482176. Throughput: 0: 952.5. Samples: 621406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:57:37,561][00197] Avg episode reward: [(0, '15.975')] [2024-07-26 14:57:37,635][05677] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000607_2486272.pth... [2024-07-26 14:57:37,822][05677] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000392_1605632.pth [2024-07-26 14:57:41,676][05690] Updated weights for policy 0, policy_version 610 (0.0020) [2024-07-26 14:57:42,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3755.0, 300 sec: 3679.5). Total num frames: 2502656. Throughput: 0: 922.1. Samples: 623410. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 14:57:42,556][00197] Avg episode reward: [(0, '17.187')] [2024-07-26 14:57:42,562][05677] Saving new best policy, reward=17.187! [2024-07-26 14:57:47,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 2523136. Throughput: 0: 930.8. Samples: 629740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:57:47,558][00197] Avg episode reward: [(0, '17.879')] [2024-07-26 14:57:47,578][05677] Saving new best policy, reward=17.879! [2024-07-26 14:57:51,129][05690] Updated weights for policy 0, policy_version 620 (0.0017) [2024-07-26 14:57:52,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 2543616. Throughput: 0: 966.1. Samples: 635760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:57:52,557][00197] Avg episode reward: [(0, '17.375')] [2024-07-26 14:57:57,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2555904. Throughput: 0: 937.9. Samples: 637832. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-26 14:57:57,560][00197] Avg episode reward: [(0, '18.282')] [2024-07-26 14:57:57,569][05677] Saving new best policy, reward=18.282! [2024-07-26 14:58:02,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3665.6). Total num frames: 2576384. Throughput: 0: 910.6. Samples: 643334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:58:02,563][00197] Avg episode reward: [(0, '17.092')] [2024-07-26 14:58:02,932][05690] Updated weights for policy 0, policy_version 630 (0.0024) [2024-07-26 14:58:07,557][00197] Fps is (10 sec: 4504.8, 60 sec: 3891.1, 300 sec: 3679.4). Total num frames: 2600960. Throughput: 0: 958.3. Samples: 649878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:58:07,561][00197] Avg episode reward: [(0, '17.562')] [2024-07-26 14:58:12,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2613248. Throughput: 0: 950.4. Samples: 652302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:58:12,560][00197] Avg episode reward: [(0, '17.318')] [2024-07-26 14:58:14,721][05690] Updated weights for policy 0, policy_version 640 (0.0018) [2024-07-26 14:58:17,555][00197] Fps is (10 sec: 2867.6, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2629632. Throughput: 0: 903.6. Samples: 656918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:58:17,561][00197] Avg episode reward: [(0, '16.745')] [2024-07-26 14:58:22,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3679.5). Total num frames: 2654208. Throughput: 0: 937.6. Samples: 663596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:58:22,562][00197] Avg episode reward: [(0, '16.349')] [2024-07-26 14:58:23,926][05690] Updated weights for policy 0, policy_version 650 (0.0023) [2024-07-26 14:58:27,556][00197] Fps is (10 sec: 4096.0, 60 sec: 3754.6, 300 sec: 3665.6). Total num frames: 2670592. Throughput: 0: 966.1. Samples: 666884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:58:27,561][00197] Avg episode reward: [(0, '15.997')] [2024-07-26 14:58:32,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 2686976. Throughput: 0: 920.1. Samples: 671146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 14:58:32,560][00197] Avg episode reward: [(0, '16.109')] [2024-07-26 14:58:35,831][05690] Updated weights for policy 0, policy_version 660 (0.0021) [2024-07-26 14:58:37,555][00197] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2707456. Throughput: 0: 923.0. Samples: 677294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:58:37,557][00197] Avg episode reward: [(0, '15.282')] [2024-07-26 14:58:42,556][00197] Fps is (10 sec: 4505.0, 60 sec: 3822.8, 300 sec: 3679.4). Total num frames: 2732032. Throughput: 0: 951.6. Samples: 680656. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-26 14:58:42,562][00197] Avg episode reward: [(0, '16.336')] [2024-07-26 14:58:47,067][05690] Updated weights for policy 0, policy_version 670 (0.0017) [2024-07-26 14:58:47,556][00197] Fps is (10 sec: 3685.8, 60 sec: 3686.3, 300 sec: 3679.4). Total num frames: 2744320. Throughput: 0: 939.3. Samples: 685602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:58:47,559][00197] Avg episode reward: [(0, '15.303')] [2024-07-26 14:58:52,555][00197] Fps is (10 sec: 3277.1, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2764800. Throughput: 0: 913.1. Samples: 690968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:58:52,560][00197] Avg episode reward: [(0, '16.502')] [2024-07-26 14:58:57,192][05690] Updated weights for policy 0, policy_version 680 (0.0032) [2024-07-26 14:58:57,555][00197] Fps is (10 sec: 4096.7, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 2785280. Throughput: 0: 932.8. Samples: 694280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:58:57,559][00197] Avg episode reward: [(0, '18.767')] [2024-07-26 14:58:57,569][05677] Saving new best policy, reward=18.767! [2024-07-26 14:59:02,555][00197] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2801664. Throughput: 0: 958.8. Samples: 700064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 14:59:02,560][00197] Avg episode reward: [(0, '18.884')] [2024-07-26 14:59:02,565][05677] Saving new best policy, reward=18.884! [2024-07-26 14:59:07,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3679.5). Total num frames: 2818048. Throughput: 0: 902.2. Samples: 704194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:59:07,556][00197] Avg episode reward: [(0, '20.010')] [2024-07-26 14:59:07,568][05677] Saving new best policy, reward=20.010! [2024-07-26 14:59:09,395][05690] Updated weights for policy 0, policy_version 690 (0.0016) [2024-07-26 14:59:12,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2838528. Throughput: 0: 899.3. Samples: 707350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:59:12,556][00197] Avg episode reward: [(0, '20.541')] [2024-07-26 14:59:12,560][05677] Saving new best policy, reward=20.541! [2024-07-26 14:59:17,556][00197] Fps is (10 sec: 4095.5, 60 sec: 3822.9, 300 sec: 3679.4). Total num frames: 2859008. Throughput: 0: 948.8. Samples: 713844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:59:17,558][00197] Avg episode reward: [(0, '20.114')] [2024-07-26 14:59:20,650][05690] Updated weights for policy 0, policy_version 700 (0.0012) [2024-07-26 14:59:22,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2871296. Throughput: 0: 904.5. Samples: 717996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 14:59:22,559][00197] Avg episode reward: [(0, '19.418')] [2024-07-26 14:59:27,555][00197] Fps is (10 sec: 3277.2, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2891776. Throughput: 0: 887.1. Samples: 720574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:59:27,558][00197] Avg episode reward: [(0, '19.051')] [2024-07-26 14:59:31,434][05690] Updated weights for policy 0, policy_version 710 (0.0017) [2024-07-26 14:59:32,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2912256. Throughput: 0: 917.2. Samples: 726876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 14:59:32,563][00197] Avg episode reward: [(0, '17.364')] [2024-07-26 14:59:37,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2924544. Throughput: 0: 911.4. Samples: 731982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:59:37,557][00197] Avg episode reward: [(0, '18.144')] [2024-07-26 14:59:37,571][05677] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000714_2924544.pth... [2024-07-26 14:59:37,720][05677] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000500_2048000.pth [2024-07-26 14:59:42,555][00197] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3651.8). Total num frames: 2940928. Throughput: 0: 881.7. Samples: 733958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 14:59:42,557][00197] Avg episode reward: [(0, '17.905')] [2024-07-26 14:59:43,762][05690] Updated weights for policy 0, policy_version 720 (0.0014) [2024-07-26 14:59:47,557][00197] Fps is (10 sec: 4095.1, 60 sec: 3686.4, 300 sec: 3665.5). Total num frames: 2965504. Throughput: 0: 887.6. Samples: 740006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:59:47,559][00197] Avg episode reward: [(0, '18.088')] [2024-07-26 14:59:52,555][00197] Fps is (10 sec: 4095.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2981888. Throughput: 0: 933.1. Samples: 746182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:59:52,565][00197] Avg episode reward: [(0, '18.261')] [2024-07-26 14:59:54,503][05690] Updated weights for policy 0, policy_version 730 (0.0015) [2024-07-26 14:59:57,557][00197] Fps is (10 sec: 3276.8, 60 sec: 3549.7, 300 sec: 3665.5). Total num frames: 2998272. Throughput: 0: 907.2. Samples: 748176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 14:59:57,563][00197] Avg episode reward: [(0, '18.139')] [2024-07-26 15:00:02,555][00197] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 3018752. Throughput: 0: 878.0. Samples: 753354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:00:02,562][00197] Avg episode reward: [(0, '17.886')] [2024-07-26 15:00:05,378][05690] Updated weights for policy 0, policy_version 740 (0.0019) [2024-07-26 15:00:07,556][00197] Fps is (10 sec: 4096.4, 60 sec: 3686.3, 300 sec: 3665.6). Total num frames: 3039232. Throughput: 0: 928.2. Samples: 759766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:00:07,561][00197] Avg episode reward: [(0, '18.012')] [2024-07-26 15:00:12,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 3051520. Throughput: 0: 929.5. Samples: 762402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:00:12,557][00197] Avg episode reward: [(0, '18.855')] [2024-07-26 15:00:17,475][05690] Updated weights for policy 0, policy_version 750 (0.0012) [2024-07-26 15:00:17,555][00197] Fps is (10 sec: 3277.2, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3072000. Throughput: 0: 884.2. Samples: 766666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:00:17,562][00197] Avg episode reward: [(0, '18.525')] [2024-07-26 15:00:22,555][00197] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3092480. Throughput: 0: 914.1. Samples: 773116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:00:22,562][00197] Avg episode reward: [(0, '19.842')] [2024-07-26 15:00:27,555][00197] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3108864. Throughput: 0: 941.7. Samples: 776336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:00:27,560][00197] Avg episode reward: [(0, '20.023')] [2024-07-26 15:00:28,071][05690] Updated weights for policy 0, policy_version 760 (0.0014) [2024-07-26 15:00:32,560][00197] Fps is (10 sec: 3275.1, 60 sec: 3549.5, 300 sec: 3651.6). Total num frames: 3125248. Throughput: 0: 903.1. Samples: 780650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:00:32,565][00197] Avg episode reward: [(0, '20.369')] [2024-07-26 15:00:37,555][00197] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3145728. Throughput: 0: 889.7. Samples: 786220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:00:37,557][00197] Avg episode reward: [(0, '21.101')] [2024-07-26 15:00:37,567][05677] Saving new best policy, reward=21.101! [2024-07-26 15:00:39,373][05690] Updated weights for policy 0, policy_version 770 (0.0026) [2024-07-26 15:00:42,557][00197] Fps is (10 sec: 4097.1, 60 sec: 3754.5, 300 sec: 3665.5). Total num frames: 3166208. Throughput: 0: 916.2. Samples: 789404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:00:42,560][00197] Avg episode reward: [(0, '21.175')] [2024-07-26 15:00:42,566][05677] Saving new best policy, reward=21.175! [2024-07-26 15:00:47,556][00197] Fps is (10 sec: 3276.5, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 3178496. Throughput: 0: 911.8. Samples: 794388. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-26 15:00:47,557][00197] Avg episode reward: [(0, '20.212')] [2024-07-26 15:00:52,038][05690] Updated weights for policy 0, policy_version 780 (0.0019) [2024-07-26 15:00:52,555][00197] Fps is (10 sec: 2868.0, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 3194880. Throughput: 0: 871.8. Samples: 798996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:00:52,557][00197] Avg episode reward: [(0, '20.310')] [2024-07-26 15:00:57,555][00197] Fps is (10 sec: 3686.7, 60 sec: 3618.3, 300 sec: 3651.7). Total num frames: 3215360. Throughput: 0: 884.4. Samples: 802202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:00:57,557][00197] Avg episode reward: [(0, '19.974')] [2024-07-26 15:01:02,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 3231744. Throughput: 0: 921.4. Samples: 808130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:01:02,557][00197] Avg episode reward: [(0, '19.436')] [2024-07-26 15:01:03,099][05690] Updated weights for policy 0, policy_version 790 (0.0033) [2024-07-26 15:01:07,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3651.7). Total num frames: 3248128. Throughput: 0: 864.2. Samples: 812006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:01:07,557][00197] Avg episode reward: [(0, '19.351')] [2024-07-26 15:01:12,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 3268608. Throughput: 0: 860.2. Samples: 815044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 15:01:12,557][00197] Avg episode reward: [(0, '21.352')] [2024-07-26 15:01:12,559][05677] Saving new best policy, reward=21.352! [2024-07-26 15:01:14,463][05690] Updated weights for policy 0, policy_version 800 (0.0020) [2024-07-26 15:01:17,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 3289088. Throughput: 0: 903.0. Samples: 821278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 15:01:17,556][00197] Avg episode reward: [(0, '22.342')] [2024-07-26 15:01:17,567][05677] Saving new best policy, reward=22.342! [2024-07-26 15:01:22,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3651.7). Total num frames: 3301376. Throughput: 0: 880.8. Samples: 825858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:01:22,560][00197] Avg episode reward: [(0, '22.830')] [2024-07-26 15:01:22,562][05677] Saving new best policy, reward=22.830! [2024-07-26 15:01:26,720][05690] Updated weights for policy 0, policy_version 810 (0.0012) [2024-07-26 15:01:27,555][00197] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3665.6). Total num frames: 3317760. Throughput: 0: 858.3. Samples: 828024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:01:27,561][00197] Avg episode reward: [(0, '22.798')] [2024-07-26 15:01:32,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3618.5, 300 sec: 3679.5). Total num frames: 3342336. Throughput: 0: 889.9. Samples: 834434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:01:32,557][00197] Avg episode reward: [(0, '22.165')] [2024-07-26 15:01:37,295][05690] Updated weights for policy 0, policy_version 820 (0.0022) [2024-07-26 15:01:37,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3358720. Throughput: 0: 909.1. Samples: 839906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-26 15:01:37,557][00197] Avg episode reward: [(0, '21.401')] [2024-07-26 15:01:37,574][05677] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000820_3358720.pth... [2024-07-26 15:01:37,736][05677] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000607_2486272.pth [2024-07-26 15:01:42,555][00197] Fps is (10 sec: 2867.2, 60 sec: 3413.5, 300 sec: 3651.7). Total num frames: 3371008. Throughput: 0: 880.0. Samples: 841800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:01:42,559][00197] Avg episode reward: [(0, '21.761')] [2024-07-26 15:01:47,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 3391488. Throughput: 0: 874.1. Samples: 847464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:01:47,561][00197] Avg episode reward: [(0, '20.118')] [2024-07-26 15:01:48,562][05690] Updated weights for policy 0, policy_version 830 (0.0035) [2024-07-26 15:01:52,555][00197] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3416064. Throughput: 0: 931.7. Samples: 853934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:01:52,557][00197] Avg episode reward: [(0, '20.434')] [2024-07-26 15:01:57,557][00197] Fps is (10 sec: 3685.7, 60 sec: 3549.8, 300 sec: 3651.7). Total num frames: 3428352. Throughput: 0: 909.9. Samples: 855992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 15:01:57,558][00197] Avg episode reward: [(0, '20.887')] [2024-07-26 15:02:00,699][05690] Updated weights for policy 0, policy_version 840 (0.0014) [2024-07-26 15:02:02,555][00197] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 3444736. Throughput: 0: 876.2. Samples: 860706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:02:02,562][00197] Avg episode reward: [(0, '21.255')] [2024-07-26 15:02:07,555][00197] Fps is (10 sec: 4096.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 3469312. Throughput: 0: 915.1. Samples: 867038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:02:07,561][00197] Avg episode reward: [(0, '22.122')] [2024-07-26 15:02:11,104][05690] Updated weights for policy 0, policy_version 850 (0.0021) [2024-07-26 15:02:12,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 3481600. Throughput: 0: 933.7. Samples: 870040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 15:02:12,557][00197] Avg episode reward: [(0, '21.686')] [2024-07-26 15:02:17,560][00197] Fps is (10 sec: 2865.6, 60 sec: 3481.3, 300 sec: 3637.8). Total num frames: 3497984. Throughput: 0: 880.6. Samples: 874064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 15:02:17,565][00197] Avg episode reward: [(0, '22.327')] [2024-07-26 15:02:22,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3518464. Throughput: 0: 897.5. Samples: 880294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 15:02:22,560][00197] Avg episode reward: [(0, '22.096')] [2024-07-26 15:02:22,641][05690] Updated weights for policy 0, policy_version 860 (0.0018) [2024-07-26 15:02:27,561][00197] Fps is (10 sec: 4095.6, 60 sec: 3686.0, 300 sec: 3623.8). Total num frames: 3538944. Throughput: 0: 926.5. Samples: 883500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:02:27,564][00197] Avg episode reward: [(0, '21.759')] [2024-07-26 15:02:32,558][00197] Fps is (10 sec: 3275.9, 60 sec: 3481.4, 300 sec: 3623.9). Total num frames: 3551232. Throughput: 0: 901.8. Samples: 888046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:02:32,560][00197] Avg episode reward: [(0, '22.087')] [2024-07-26 15:02:34,904][05690] Updated weights for policy 0, policy_version 870 (0.0021) [2024-07-26 15:02:37,555][00197] Fps is (10 sec: 3278.8, 60 sec: 3549.8, 300 sec: 3623.9). Total num frames: 3571712. Throughput: 0: 874.2. Samples: 893274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:02:37,558][00197] Avg episode reward: [(0, '21.904')] [2024-07-26 15:02:42,555][00197] Fps is (10 sec: 4097.2, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 3592192. Throughput: 0: 898.9. Samples: 896440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:02:42,560][00197] Avg episode reward: [(0, '20.535')] [2024-07-26 15:02:45,386][05690] Updated weights for policy 0, policy_version 880 (0.0013) [2024-07-26 15:02:47,555][00197] Fps is (10 sec: 3686.7, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 3608576. Throughput: 0: 913.4. Samples: 901808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:02:47,559][00197] Avg episode reward: [(0, '21.436')] [2024-07-26 15:02:52,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3623.9). Total num frames: 3624960. Throughput: 0: 869.9. Samples: 906184. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-07-26 15:02:52,560][00197] Avg episode reward: [(0, '20.249')] [2024-07-26 15:02:56,918][05690] Updated weights for policy 0, policy_version 890 (0.0012) [2024-07-26 15:02:57,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3623.9). Total num frames: 3645440. Throughput: 0: 873.8. Samples: 909362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:02:57,560][00197] Avg episode reward: [(0, '20.291')] [2024-07-26 15:03:02,581][00197] Fps is (10 sec: 3676.9, 60 sec: 3616.6, 300 sec: 3595.9). Total num frames: 3661824. Throughput: 0: 926.6. Samples: 915778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:03:02,583][00197] Avg episode reward: [(0, '21.762')] [2024-07-26 15:03:07,561][00197] Fps is (10 sec: 3274.6, 60 sec: 3481.2, 300 sec: 3610.0). Total num frames: 3678208. Throughput: 0: 880.3. Samples: 919914. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 15:03:07,564][00197] Avg episode reward: [(0, '21.698')] [2024-07-26 15:03:09,417][05690] Updated weights for policy 0, policy_version 900 (0.0035) [2024-07-26 15:03:12,555][00197] Fps is (10 sec: 3695.9, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3698688. Throughput: 0: 870.8. Samples: 922678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:03:12,556][00197] Avg episode reward: [(0, '21.553')] [2024-07-26 15:03:17,555][00197] Fps is (10 sec: 4098.8, 60 sec: 3686.8, 300 sec: 3610.0). Total num frames: 3719168. Throughput: 0: 918.1. Samples: 929360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:03:17,557][00197] Avg episode reward: [(0, '22.326')] [2024-07-26 15:03:18,636][05690] Updated weights for policy 0, policy_version 910 (0.0012) [2024-07-26 15:03:22,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 3735552. Throughput: 0: 915.7. Samples: 934478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:03:22,561][00197] Avg episode reward: [(0, '22.323')] [2024-07-26 15:03:27,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3550.3, 300 sec: 3610.0). Total num frames: 3751936. Throughput: 0: 893.9. Samples: 936664. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-26 15:03:27,557][00197] Avg episode reward: [(0, '22.115')] [2024-07-26 15:03:30,228][05690] Updated weights for policy 0, policy_version 920 (0.0017) [2024-07-26 15:03:32,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3754.9, 300 sec: 3623.9). Total num frames: 3776512. Throughput: 0: 919.4. Samples: 943182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:03:32,556][00197] Avg episode reward: [(0, '21.122')] [2024-07-26 15:03:37,555][00197] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3610.0). Total num frames: 3796992. Throughput: 0: 957.9. Samples: 949288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:03:37,560][00197] Avg episode reward: [(0, '19.709')] [2024-07-26 15:03:37,575][05677] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000927_3796992.pth... [2024-07-26 15:03:37,763][05677] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000714_2924544.pth [2024-07-26 15:03:42,207][05690] Updated weights for policy 0, policy_version 930 (0.0019) [2024-07-26 15:03:42,557][00197] Fps is (10 sec: 3275.9, 60 sec: 3618.0, 300 sec: 3610.0). Total num frames: 3809280. Throughput: 0: 929.4. Samples: 951186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 15:03:42,565][00197] Avg episode reward: [(0, '19.396')] [2024-07-26 15:03:47,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 3829760. Throughput: 0: 910.7. Samples: 956734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:03:47,562][00197] Avg episode reward: [(0, '19.310')] [2024-07-26 15:03:51,463][05690] Updated weights for policy 0, policy_version 940 (0.0012) [2024-07-26 15:03:52,555][00197] Fps is (10 sec: 4097.1, 60 sec: 3754.7, 300 sec: 3610.0). Total num frames: 3850240. Throughput: 0: 965.6. Samples: 963360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:03:52,562][00197] Avg episode reward: [(0, '20.098')] [2024-07-26 15:03:57,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 3866624. Throughput: 0: 960.7. Samples: 965908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:03:57,559][00197] Avg episode reward: [(0, '20.729')] [2024-07-26 15:04:02,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3756.3, 300 sec: 3623.9). Total num frames: 3887104. Throughput: 0: 915.1. Samples: 970538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 15:04:02,562][00197] Avg episode reward: [(0, '20.713')] [2024-07-26 15:04:03,263][05690] Updated weights for policy 0, policy_version 950 (0.0020) [2024-07-26 15:04:07,555][00197] Fps is (10 sec: 4096.0, 60 sec: 3823.4, 300 sec: 3623.9). Total num frames: 3907584. Throughput: 0: 950.8. Samples: 977264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:04:07,562][00197] Avg episode reward: [(0, '21.595')] [2024-07-26 15:04:12,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3610.0). Total num frames: 3923968. Throughput: 0: 971.1. Samples: 980362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:04:12,558][00197] Avg episode reward: [(0, '22.943')] [2024-07-26 15:04:12,561][05677] Saving new best policy, reward=22.943! [2024-07-26 15:04:14,376][05690] Updated weights for policy 0, policy_version 960 (0.0020) [2024-07-26 15:04:17,555][00197] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 3940352. Throughput: 0: 919.2. Samples: 984548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:04:17,557][00197] Avg episode reward: [(0, '22.379')] [2024-07-26 15:04:22,555][00197] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3623.9). Total num frames: 3960832. Throughput: 0: 921.6. Samples: 990758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:04:22,561][00197] Avg episode reward: [(0, '21.723')] [2024-07-26 15:04:24,452][05690] Updated weights for policy 0, policy_version 970 (0.0019) [2024-07-26 15:04:27,555][00197] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3637.8). Total num frames: 3985408. Throughput: 0: 953.7. Samples: 994098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:04:27,563][00197] Avg episode reward: [(0, '21.429')] [2024-07-26 15:04:32,556][00197] Fps is (10 sec: 3685.8, 60 sec: 3686.3, 300 sec: 3637.8). Total num frames: 3997696. Throughput: 0: 941.0. Samples: 999080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:04:32,559][00197] Avg episode reward: [(0, '20.115')] [2024-07-26 15:04:34,559][05677] Stopping Batcher_0... [2024-07-26 15:04:34,560][05677] Loop batcher_evt_loop terminating... [2024-07-26 15:04:34,560][00197] Component Batcher_0 stopped! [2024-07-26 15:04:34,566][05677] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-07-26 15:04:34,626][00197] Component RolloutWorker_w7 stopped! [2024-07-26 15:04:34,639][00197] Component RolloutWorker_w6 stopped! [2024-07-26 15:04:34,645][05690] Weights refcount: 2 0 [2024-07-26 15:04:34,626][05698] Stopping RolloutWorker_w7... [2024-07-26 15:04:34,646][05698] Loop rollout_proc7_evt_loop terminating... [2024-07-26 15:04:34,648][05690] Stopping InferenceWorker_p0-w0... [2024-07-26 15:04:34,654][05690] Loop inference_proc0-0_evt_loop terminating... [2024-07-26 15:04:34,657][05696] Stopping RolloutWorker_w5... [2024-07-26 15:04:34,649][00197] Component InferenceWorker_p0-w0 stopped! [2024-07-26 15:04:34,659][05696] Loop rollout_proc5_evt_loop terminating... [2024-07-26 15:04:34,661][05694] Stopping RolloutWorker_w3... [2024-07-26 15:04:34,659][00197] Component RolloutWorker_w5 stopped! [2024-07-26 15:04:34,665][05692] Stopping RolloutWorker_w1... [2024-07-26 15:04:34,664][00197] Component RolloutWorker_w3 stopped! [2024-07-26 15:04:34,666][00197] Component RolloutWorker_w1 stopped! [2024-07-26 15:04:34,670][05692] Loop rollout_proc1_evt_loop terminating... [2024-07-26 15:04:34,662][05694] Loop rollout_proc3_evt_loop terminating... [2024-07-26 15:04:34,644][05697] Stopping RolloutWorker_w6... [2024-07-26 15:04:34,685][00197] Component RolloutWorker_w4 stopped! [2024-07-26 15:04:34,686][05695] Stopping RolloutWorker_w4... [2024-07-26 15:04:34,678][05697] Loop rollout_proc6_evt_loop terminating... [2024-07-26 15:04:34,691][05695] Loop rollout_proc4_evt_loop terminating... [2024-07-26 15:04:34,698][00197] Component RolloutWorker_w2 stopped! [2024-07-26 15:04:34,706][05693] Stopping RolloutWorker_w2... [2024-07-26 15:04:34,707][05693] Loop rollout_proc2_evt_loop terminating... [2024-07-26 15:04:34,722][00197] Component RolloutWorker_w0 stopped! [2024-07-26 15:04:34,723][05691] Stopping RolloutWorker_w0... [2024-07-26 15:04:34,729][05691] Loop rollout_proc0_evt_loop terminating... [2024-07-26 15:04:34,751][05677] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000820_3358720.pth [2024-07-26 15:04:34,766][05677] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-07-26 15:04:35,074][00197] Component LearnerWorker_p0 stopped! [2024-07-26 15:04:35,078][00197] Waiting for process learner_proc0 to stop... [2024-07-26 15:04:35,088][05677] Stopping LearnerWorker_p0... [2024-07-26 15:04:35,089][05677] Loop learner_proc0_evt_loop terminating... [2024-07-26 15:04:36,641][00197] Waiting for process inference_proc0-0 to join... [2024-07-26 15:04:36,773][00197] Waiting for process rollout_proc0 to join... [2024-07-26 15:04:37,983][00197] Waiting for process rollout_proc1 to join... [2024-07-26 15:04:37,987][00197] Waiting for process rollout_proc2 to join... [2024-07-26 15:04:37,992][00197] Waiting for process rollout_proc3 to join... [2024-07-26 15:04:37,995][00197] Waiting for process rollout_proc4 to join... [2024-07-26 15:04:37,999][00197] Waiting for process rollout_proc5 to join... [2024-07-26 15:04:38,003][00197] Waiting for process rollout_proc6 to join... [2024-07-26 15:04:38,006][00197] Waiting for process rollout_proc7 to join... [2024-07-26 15:04:38,010][00197] Batcher 0 profile tree view: batching: 26.3545, releasing_batches: 0.0229 [2024-07-26 15:04:38,014][00197] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 509.5901 update_model: 7.6597 weight_update: 0.0019 one_step: 0.0059 handle_policy_step: 548.2509 deserialize: 14.5852, stack: 3.0100, obs_to_device_normalize: 117.5415, forward: 270.9579, send_messages: 27.8157 prepare_outputs: 86.0504 to_cpu: 53.4687 [2024-07-26 15:04:38,017][00197] Learner 0 profile tree view: misc: 0.0052, prepare_batch: 16.0774 train: 74.3934 epoch_init: 0.0056, minibatch_init: 0.0063, losses_postprocess: 0.5856, kl_divergence: 0.6033, after_optimizer: 33.5106 calculate_losses: 24.9382 losses_init: 0.0055, forward_head: 1.6807, bptt_initial: 16.2334, tail: 1.0725, advantages_returns: 0.3047, losses: 3.1566 bptt: 2.1519 bptt_forward_core: 2.0945 update: 14.1179 clip: 1.4540 [2024-07-26 15:04:38,020][00197] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3883, enqueue_policy_requests: 139.0994, env_step: 849.0450, overhead: 15.9474, complete_rollouts: 6.9152 save_policy_outputs: 25.3255 split_output_tensors: 8.8427 [2024-07-26 15:04:38,021][00197] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3280, enqueue_policy_requests: 139.3807, env_step: 850.8359, overhead: 15.1062, complete_rollouts: 6.7905 save_policy_outputs: 24.4695 split_output_tensors: 8.3511 [2024-07-26 15:04:38,023][00197] Loop Runner_EvtLoop terminating... [2024-07-26 15:04:38,025][00197] Runner profile tree view: main_loop: 1135.5590 [2024-07-26 15:04:38,027][00197] Collected {0: 4005888}, FPS: 3527.7 [2024-07-26 15:04:38,249][00197] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-07-26 15:04:38,251][00197] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-26 15:04:38,253][00197] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-26 15:04:38,256][00197] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-26 15:04:38,258][00197] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-26 15:04:38,261][00197] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-26 15:04:38,262][00197] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-07-26 15:04:38,265][00197] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-26 15:04:38,266][00197] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-07-26 15:04:38,267][00197] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-07-26 15:04:38,268][00197] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-26 15:04:38,270][00197] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-26 15:04:38,271][00197] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-26 15:04:38,272][00197] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-26 15:04:38,273][00197] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-26 15:04:38,289][00197] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 15:04:38,291][00197] RunningMeanStd input shape: (3, 72, 128) [2024-07-26 15:04:38,294][00197] RunningMeanStd input shape: (1,) [2024-07-26 15:04:38,308][00197] ConvEncoder: input_channels=3 [2024-07-26 15:04:38,434][00197] Conv encoder output size: 512 [2024-07-26 15:04:38,436][00197] Policy head output size: 512 [2024-07-26 15:04:40,160][00197] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-07-26 15:04:41,044][00197] Num frames 100... [2024-07-26 15:04:41,170][00197] Num frames 200... [2024-07-26 15:04:41,298][00197] Num frames 300... [2024-07-26 15:04:41,428][00197] Num frames 400... [2024-07-26 15:04:41,551][00197] Num frames 500... [2024-07-26 15:04:41,663][00197] Avg episode rewards: #0: 9.440, true rewards: #0: 5.440 [2024-07-26 15:04:41,664][00197] Avg episode reward: 9.440, avg true_objective: 5.440 [2024-07-26 15:04:41,735][00197] Num frames 600... [2024-07-26 15:04:41,857][00197] Num frames 700... [2024-07-26 15:04:41,999][00197] Num frames 800... [2024-07-26 15:04:42,051][00197] Avg episode rewards: #0: 6.000, true rewards: #0: 4.000 [2024-07-26 15:04:42,054][00197] Avg episode reward: 6.000, avg true_objective: 4.000 [2024-07-26 15:04:42,184][00197] Num frames 900... [2024-07-26 15:04:42,307][00197] Num frames 1000... [2024-07-26 15:04:42,433][00197] Num frames 1100... [2024-07-26 15:04:42,559][00197] Num frames 1200... [2024-07-26 15:04:42,683][00197] Num frames 1300... [2024-07-26 15:04:42,811][00197] Num frames 1400... [2024-07-26 15:04:42,934][00197] Num frames 1500... [2024-07-26 15:04:43,074][00197] Num frames 1600... [2024-07-26 15:04:43,213][00197] Num frames 1700... [2024-07-26 15:04:43,341][00197] Num frames 1800... [2024-07-26 15:04:43,467][00197] Num frames 1900... [2024-07-26 15:04:43,594][00197] Num frames 2000... [2024-07-26 15:04:43,722][00197] Num frames 2100... [2024-07-26 15:04:43,848][00197] Num frames 2200... [2024-07-26 15:04:43,915][00197] Avg episode rewards: #0: 15.693, true rewards: #0: 7.360 [2024-07-26 15:04:43,917][00197] Avg episode reward: 15.693, avg true_objective: 7.360 [2024-07-26 15:04:44,098][00197] Num frames 2300... [2024-07-26 15:04:44,274][00197] Num frames 2400... [2024-07-26 15:04:44,452][00197] Num frames 2500... [2024-07-26 15:04:44,631][00197] Num frames 2600... [2024-07-26 15:04:44,807][00197] Num frames 2700... [2024-07-26 15:04:44,982][00197] Num frames 2800... [2024-07-26 15:04:45,177][00197] Num frames 2900... [2024-07-26 15:04:45,315][00197] Avg episode rewards: #0: 15.110, true rewards: #0: 7.360 [2024-07-26 15:04:45,317][00197] Avg episode reward: 15.110, avg true_objective: 7.360 [2024-07-26 15:04:45,420][00197] Num frames 3000... [2024-07-26 15:04:45,600][00197] Num frames 3100... [2024-07-26 15:04:45,787][00197] Num frames 3200... [2024-07-26 15:04:45,972][00197] Num frames 3300... [2024-07-26 15:04:46,223][00197] Avg episode rewards: #0: 13.584, true rewards: #0: 6.784 [2024-07-26 15:04:46,224][00197] Avg episode reward: 13.584, avg true_objective: 6.784 [2024-07-26 15:04:46,241][00197] Num frames 3400... [2024-07-26 15:04:46,367][00197] Num frames 3500... [2024-07-26 15:04:46,490][00197] Num frames 3600... [2024-07-26 15:04:46,614][00197] Num frames 3700... [2024-07-26 15:04:46,738][00197] Num frames 3800... [2024-07-26 15:04:46,865][00197] Num frames 3900... [2024-07-26 15:04:46,996][00197] Num frames 4000... [2024-07-26 15:04:47,125][00197] Avg episode rewards: #0: 13.258, true rewards: #0: 6.758 [2024-07-26 15:04:47,127][00197] Avg episode reward: 13.258, avg true_objective: 6.758 [2024-07-26 15:04:47,189][00197] Num frames 4100... [2024-07-26 15:04:47,309][00197] Num frames 4200... [2024-07-26 15:04:47,434][00197] Num frames 4300... [2024-07-26 15:04:47,556][00197] Num frames 4400... [2024-07-26 15:04:47,689][00197] Num frames 4500... [2024-07-26 15:04:47,814][00197] Num frames 4600... [2024-07-26 15:04:47,937][00197] Num frames 4700... [2024-07-26 15:04:48,069][00197] Num frames 4800... [2024-07-26 15:04:48,203][00197] Num frames 4900... [2024-07-26 15:04:48,367][00197] Avg episode rewards: #0: 14.119, true rewards: #0: 7.119 [2024-07-26 15:04:48,368][00197] Avg episode reward: 14.119, avg true_objective: 7.119 [2024-07-26 15:04:48,394][00197] Num frames 5000... [2024-07-26 15:04:48,518][00197] Num frames 5100... [2024-07-26 15:04:48,644][00197] Num frames 5200... [2024-07-26 15:04:48,766][00197] Num frames 5300... [2024-07-26 15:04:48,892][00197] Num frames 5400... [2024-07-26 15:04:48,986][00197] Avg episode rewards: #0: 13.039, true rewards: #0: 6.789 [2024-07-26 15:04:48,987][00197] Avg episode reward: 13.039, avg true_objective: 6.789 [2024-07-26 15:04:49,074][00197] Num frames 5500... [2024-07-26 15:04:49,207][00197] Num frames 5600... [2024-07-26 15:04:49,337][00197] Num frames 5700... [2024-07-26 15:04:49,467][00197] Num frames 5800... [2024-07-26 15:04:49,597][00197] Num frames 5900... [2024-07-26 15:04:49,724][00197] Num frames 6000... [2024-07-26 15:04:49,848][00197] Num frames 6100... [2024-07-26 15:04:49,983][00197] Num frames 6200... [2024-07-26 15:04:50,117][00197] Num frames 6300... [2024-07-26 15:04:50,253][00197] Num frames 6400... [2024-07-26 15:04:50,378][00197] Num frames 6500... [2024-07-26 15:04:50,502][00197] Num frames 6600... [2024-07-26 15:04:50,632][00197] Num frames 6700... [2024-07-26 15:04:50,757][00197] Num frames 6800... [2024-07-26 15:04:50,882][00197] Num frames 6900... [2024-07-26 15:04:51,017][00197] Num frames 7000... [2024-07-26 15:04:51,145][00197] Num frames 7100... [2024-07-26 15:04:51,280][00197] Num frames 7200... [2024-07-26 15:04:51,423][00197] Num frames 7300... [2024-07-26 15:04:51,482][00197] Avg episode rewards: #0: 16.114, true rewards: #0: 8.114 [2024-07-26 15:04:51,484][00197] Avg episode reward: 16.114, avg true_objective: 8.114 [2024-07-26 15:04:51,607][00197] Num frames 7400... [2024-07-26 15:04:51,737][00197] Num frames 7500... [2024-07-26 15:04:51,856][00197] Num frames 7600... [2024-07-26 15:04:51,984][00197] Num frames 7700... [2024-07-26 15:04:52,110][00197] Num frames 7800... [2024-07-26 15:04:52,236][00197] Num frames 7900... [2024-07-26 15:04:52,370][00197] Num frames 8000... [2024-07-26 15:04:52,495][00197] Num frames 8100... [2024-07-26 15:04:52,617][00197] Num frames 8200... [2024-07-26 15:04:52,743][00197] Num frames 8300... [2024-07-26 15:04:52,870][00197] Num frames 8400... [2024-07-26 15:04:53,000][00197] Num frames 8500... [2024-07-26 15:04:53,127][00197] Num frames 8600... [2024-07-26 15:04:53,252][00197] Num frames 8700... [2024-07-26 15:04:53,384][00197] Num frames 8800... [2024-07-26 15:04:53,509][00197] Num frames 8900... [2024-07-26 15:04:53,633][00197] Num frames 9000... [2024-07-26 15:04:53,808][00197] Avg episode rewards: #0: 18.695, true rewards: #0: 9.095 [2024-07-26 15:04:53,810][00197] Avg episode reward: 18.695, avg true_objective: 9.095 [2024-07-26 15:05:45,768][00197] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-07-26 15:10:18,830][00197] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-07-26 15:10:18,832][00197] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-26 15:10:18,833][00197] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-26 15:10:18,834][00197] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-26 15:10:18,835][00197] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-26 15:10:18,837][00197] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-26 15:10:18,838][00197] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-07-26 15:10:18,839][00197] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-26 15:10:18,840][00197] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-07-26 15:10:18,842][00197] Adding new argument 'hf_repository'='thomaspalomares/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-07-26 15:10:18,843][00197] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-26 15:10:18,844][00197] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-26 15:10:18,845][00197] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-26 15:10:18,846][00197] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-26 15:10:18,847][00197] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-26 15:10:18,859][00197] RunningMeanStd input shape: (3, 72, 128) [2024-07-26 15:10:18,861][00197] RunningMeanStd input shape: (1,) [2024-07-26 15:10:18,881][00197] ConvEncoder: input_channels=3 [2024-07-26 15:10:18,920][00197] Conv encoder output size: 512 [2024-07-26 15:10:18,921][00197] Policy head output size: 512 [2024-07-26 15:10:18,941][00197] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-07-26 15:10:19,446][00197] Num frames 100... [2024-07-26 15:10:19,571][00197] Num frames 200... [2024-07-26 15:10:19,698][00197] Num frames 300... [2024-07-26 15:10:19,828][00197] Num frames 400... [2024-07-26 15:10:19,962][00197] Num frames 500... [2024-07-26 15:10:20,096][00197] Num frames 600... [2024-07-26 15:10:20,229][00197] Num frames 700... [2024-07-26 15:10:20,354][00197] Num frames 800... [2024-07-26 15:10:20,480][00197] Num frames 900... [2024-07-26 15:10:20,609][00197] Num frames 1000... [2024-07-26 15:10:20,736][00197] Num frames 1100... [2024-07-26 15:10:20,860][00197] Num frames 1200... [2024-07-26 15:10:20,993][00197] Num frames 1300... [2024-07-26 15:10:21,128][00197] Num frames 1400... [2024-07-26 15:10:21,252][00197] Num frames 1500... [2024-07-26 15:10:21,385][00197] Num frames 1600... [2024-07-26 15:10:21,515][00197] Num frames 1700... [2024-07-26 15:10:21,656][00197] Num frames 1800... [2024-07-26 15:10:21,845][00197] Num frames 1900... [2024-07-26 15:10:22,036][00197] Num frames 2000... [2024-07-26 15:10:22,236][00197] Num frames 2100... [2024-07-26 15:10:22,291][00197] Avg episode rewards: #0: 53.999, true rewards: #0: 21.000 [2024-07-26 15:10:22,293][00197] Avg episode reward: 53.999, avg true_objective: 21.000 [2024-07-26 15:10:22,476][00197] Num frames 2200... [2024-07-26 15:10:22,649][00197] Num frames 2300... [2024-07-26 15:10:22,825][00197] Num frames 2400... [2024-07-26 15:10:23,016][00197] Num frames 2500... [2024-07-26 15:10:23,160][00197] Avg episode rewards: #0: 29.739, true rewards: #0: 12.740 [2024-07-26 15:10:23,161][00197] Avg episode reward: 29.739, avg true_objective: 12.740 [2024-07-26 15:10:23,276][00197] Num frames 2600... [2024-07-26 15:10:23,464][00197] Num frames 2700... [2024-07-26 15:10:23,652][00197] Num frames 2800... [2024-07-26 15:10:23,853][00197] Num frames 2900... [2024-07-26 15:10:24,048][00197] Num frames 3000... [2024-07-26 15:10:24,182][00197] Num frames 3100... [2024-07-26 15:10:24,316][00197] Avg episode rewards: #0: 23.853, true rewards: #0: 10.520 [2024-07-26 15:10:24,317][00197] Avg episode reward: 23.853, avg true_objective: 10.520 [2024-07-26 15:10:24,374][00197] Num frames 3200... [2024-07-26 15:10:24,514][00197] Num frames 3300... [2024-07-26 15:10:24,638][00197] Num frames 3400... [2024-07-26 15:10:24,768][00197] Num frames 3500... [2024-07-26 15:10:24,896][00197] Num frames 3600... [2024-07-26 15:10:25,029][00197] Num frames 3700... [2024-07-26 15:10:25,156][00197] Num frames 3800... [2024-07-26 15:10:25,293][00197] Num frames 3900... [2024-07-26 15:10:25,419][00197] Num frames 4000... [2024-07-26 15:10:25,546][00197] Num frames 4100... [2024-07-26 15:10:25,671][00197] Num frames 4200... [2024-07-26 15:10:25,803][00197] Num frames 4300... [2024-07-26 15:10:25,932][00197] Num frames 4400... [2024-07-26 15:10:26,065][00197] Num frames 4500... [2024-07-26 15:10:26,195][00197] Num frames 4600... [2024-07-26 15:10:26,331][00197] Num frames 4700... [2024-07-26 15:10:26,457][00197] Num frames 4800... [2024-07-26 15:10:26,582][00197] Num frames 4900... [2024-07-26 15:10:26,709][00197] Num frames 5000... [2024-07-26 15:10:26,835][00197] Num frames 5100... [2024-07-26 15:10:26,942][00197] Avg episode rewards: #0: 30.100, true rewards: #0: 12.850 [2024-07-26 15:10:26,943][00197] Avg episode reward: 30.100, avg true_objective: 12.850 [2024-07-26 15:10:27,028][00197] Num frames 5200... [2024-07-26 15:10:27,157][00197] Num frames 5300... [2024-07-26 15:10:27,281][00197] Num frames 5400... [2024-07-26 15:10:27,413][00197] Num frames 5500... [2024-07-26 15:10:27,591][00197] Avg episode rewards: #0: 25.798, true rewards: #0: 11.198 [2024-07-26 15:10:27,592][00197] Avg episode reward: 25.798, avg true_objective: 11.198 [2024-07-26 15:10:27,597][00197] Num frames 5600... [2024-07-26 15:10:27,726][00197] Num frames 5700... [2024-07-26 15:10:27,855][00197] Num frames 5800... [2024-07-26 15:10:27,987][00197] Num frames 5900... [2024-07-26 15:10:28,111][00197] Num frames 6000... [2024-07-26 15:10:28,243][00197] Num frames 6100... [2024-07-26 15:10:28,381][00197] Num frames 6200... [2024-07-26 15:10:28,506][00197] Num frames 6300... [2024-07-26 15:10:28,636][00197] Num frames 6400... [2024-07-26 15:10:28,767][00197] Num frames 6500... [2024-07-26 15:10:28,894][00197] Num frames 6600... [2024-07-26 15:10:29,029][00197] Num frames 6700... [2024-07-26 15:10:29,156][00197] Num frames 6800... [2024-07-26 15:10:29,291][00197] Num frames 6900... [2024-07-26 15:10:29,432][00197] Num frames 7000... [2024-07-26 15:10:29,559][00197] Num frames 7100... [2024-07-26 15:10:29,691][00197] Num frames 7200... [2024-07-26 15:10:29,821][00197] Num frames 7300... [2024-07-26 15:10:29,968][00197] Num frames 7400... [2024-07-26 15:10:30,104][00197] Num frames 7500... [2024-07-26 15:10:30,236][00197] Num frames 7600... [2024-07-26 15:10:30,420][00197] Avg episode rewards: #0: 30.831, true rewards: #0: 12.832 [2024-07-26 15:10:30,422][00197] Avg episode reward: 30.831, avg true_objective: 12.832 [2024-07-26 15:10:30,426][00197] Num frames 7700... [2024-07-26 15:10:30,558][00197] Num frames 7800... [2024-07-26 15:10:30,690][00197] Num frames 7900... [2024-07-26 15:10:30,822][00197] Num frames 8000... [2024-07-26 15:10:30,955][00197] Num frames 8100... [2024-07-26 15:10:31,092][00197] Num frames 8200... [2024-07-26 15:10:31,222][00197] Num frames 8300... [2024-07-26 15:10:31,352][00197] Num frames 8400... [2024-07-26 15:10:31,488][00197] Num frames 8500... [2024-07-26 15:10:31,620][00197] Num frames 8600... [2024-07-26 15:10:31,751][00197] Num frames 8700... [2024-07-26 15:10:31,914][00197] Avg episode rewards: #0: 29.267, true rewards: #0: 12.553 [2024-07-26 15:10:31,916][00197] Avg episode reward: 29.267, avg true_objective: 12.553 [2024-07-26 15:10:31,936][00197] Num frames 8800... [2024-07-26 15:10:32,071][00197] Num frames 8900... [2024-07-26 15:10:32,198][00197] Num frames 9000... [2024-07-26 15:10:32,328][00197] Num frames 9100... [2024-07-26 15:10:32,468][00197] Num frames 9200... [2024-07-26 15:10:32,609][00197] Avg episode rewards: #0: 26.583, true rewards: #0: 11.584 [2024-07-26 15:10:32,610][00197] Avg episode reward: 26.583, avg true_objective: 11.584 [2024-07-26 15:10:32,658][00197] Num frames 9300... [2024-07-26 15:10:32,785][00197] Num frames 9400... [2024-07-26 15:10:32,911][00197] Num frames 9500... [2024-07-26 15:10:33,043][00197] Num frames 9600... [2024-07-26 15:10:33,172][00197] Num frames 9700... [2024-07-26 15:10:33,302][00197] Num frames 9800... [2024-07-26 15:10:33,426][00197] Num frames 9900... [2024-07-26 15:10:33,562][00197] Num frames 10000... [2024-07-26 15:10:33,622][00197] Avg episode rewards: #0: 25.114, true rewards: #0: 11.114 [2024-07-26 15:10:33,624][00197] Avg episode reward: 25.114, avg true_objective: 11.114 [2024-07-26 15:10:33,747][00197] Num frames 10100... [2024-07-26 15:10:33,878][00197] Num frames 10200... [2024-07-26 15:10:34,022][00197] Num frames 10300... [2024-07-26 15:10:34,183][00197] Num frames 10400... [2024-07-26 15:10:34,365][00197] Num frames 10500... [2024-07-26 15:10:34,549][00197] Num frames 10600... [2024-07-26 15:10:34,722][00197] Num frames 10700... [2024-07-26 15:10:34,909][00197] Num frames 10800... [2024-07-26 15:10:35,108][00197] Num frames 10900... [2024-07-26 15:10:35,284][00197] Num frames 11000... [2024-07-26 15:10:35,468][00197] Num frames 11100... [2024-07-26 15:10:35,685][00197] Num frames 11200... [2024-07-26 15:10:35,886][00197] Num frames 11300... [2024-07-26 15:10:36,080][00197] Num frames 11400... [2024-07-26 15:10:36,266][00197] Num frames 11500... [2024-07-26 15:10:36,454][00197] Num frames 11600... [2024-07-26 15:10:36,651][00197] Num frames 11700... [2024-07-26 15:10:36,780][00197] Num frames 11800... [2024-07-26 15:10:36,910][00197] Num frames 11900... [2024-07-26 15:10:37,044][00197] Num frames 12000... [2024-07-26 15:10:37,173][00197] Num frames 12100... [2024-07-26 15:10:37,233][00197] Avg episode rewards: #0: 27.203, true rewards: #0: 12.103 [2024-07-26 15:10:37,235][00197] Avg episode reward: 27.203, avg true_objective: 12.103 [2024-07-26 15:11:48,220][00197] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-07-26 15:12:00,516][00197] The model has been pushed to https://huggingface.co/thomaspalomares/rl_course_vizdoom_health_gathering_supreme [2024-07-26 15:12:42,865][00197] Loading legacy config file train_dir/doom_health_gathering_supreme_2222/cfg.json instead of train_dir/doom_health_gathering_supreme_2222/config.json [2024-07-26 15:12:42,867][00197] Loading existing experiment configuration from train_dir/doom_health_gathering_supreme_2222/config.json [2024-07-26 15:12:42,869][00197] Overriding arg 'experiment' with value 'doom_health_gathering_supreme_2222' passed from command line [2024-07-26 15:12:42,871][00197] Overriding arg 'train_dir' with value 'train_dir' passed from command line [2024-07-26 15:12:42,873][00197] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-26 15:12:42,875][00197] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file! [2024-07-26 15:12:42,877][00197] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file! [2024-07-26 15:12:42,879][00197] Adding new argument 'env_gpu_observations'=True that is not in the saved config file! [2024-07-26 15:12:42,880][00197] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-26 15:12:42,882][00197] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-26 15:12:42,883][00197] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-26 15:12:42,884][00197] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-26 15:12:42,886][00197] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-07-26 15:12:42,887][00197] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-26 15:12:42,889][00197] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-07-26 15:12:42,890][00197] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-07-26 15:12:42,892][00197] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-26 15:12:42,899][00197] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-26 15:12:42,900][00197] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-26 15:12:42,901][00197] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-26 15:12:42,903][00197] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-26 15:12:42,913][00197] RunningMeanStd input shape: (3, 72, 128) [2024-07-26 15:12:42,915][00197] RunningMeanStd input shape: (1,) [2024-07-26 15:12:42,927][00197] ConvEncoder: input_channels=3 [2024-07-26 15:12:42,979][00197] Conv encoder output size: 512 [2024-07-26 15:12:42,982][00197] Policy head output size: 512 [2024-07-26 15:12:43,009][00197] Loading state from checkpoint train_dir/doom_health_gathering_supreme_2222/checkpoint_p0/checkpoint_000539850_4422451200.pth... [2024-07-26 15:12:43,519][00197] Num frames 100... [2024-07-26 15:12:43,648][00197] Num frames 200... [2024-07-26 15:12:43,776][00197] Num frames 300... [2024-07-26 15:12:43,904][00197] Num frames 400... [2024-07-26 15:12:44,033][00197] Num frames 500... [2024-07-26 15:12:44,170][00197] Num frames 600... [2024-07-26 15:12:44,299][00197] Num frames 700... [2024-07-26 15:12:44,428][00197] Num frames 800... [2024-07-26 15:12:44,557][00197] Num frames 900... [2024-07-26 15:12:44,688][00197] Num frames 1000... [2024-07-26 15:12:44,819][00197] Num frames 1100... [2024-07-26 15:12:44,948][00197] Num frames 1200... [2024-07-26 15:12:45,080][00197] Num frames 1300... [2024-07-26 15:12:45,215][00197] Num frames 1400... [2024-07-26 15:12:45,345][00197] Num frames 1500... [2024-07-26 15:12:45,494][00197] Num frames 1600... [2024-07-26 15:12:45,636][00197] Num frames 1700... [2024-07-26 15:12:45,764][00197] Num frames 1800... [2024-07-26 15:12:45,892][00197] Num frames 1900... [2024-07-26 15:12:46,029][00197] Num frames 2000... [2024-07-26 15:12:46,159][00197] Num frames 2100... [2024-07-26 15:12:46,211][00197] Avg episode rewards: #0: 64.999, true rewards: #0: 21.000 [2024-07-26 15:12:46,213][00197] Avg episode reward: 64.999, avg true_objective: 21.000 [2024-07-26 15:12:46,349][00197] Num frames 2200... [2024-07-26 15:12:46,480][00197] Num frames 2300... [2024-07-26 15:12:46,610][00197] Num frames 2400... [2024-07-26 15:12:46,737][00197] Num frames 2500... [2024-07-26 15:12:46,868][00197] Num frames 2600... [2024-07-26 15:12:47,006][00197] Num frames 2700... [2024-07-26 15:12:47,132][00197] Num frames 2800... [2024-07-26 15:12:47,269][00197] Num frames 2900... [2024-07-26 15:12:47,400][00197] Num frames 3000... [2024-07-26 15:12:47,526][00197] Num frames 3100... [2024-07-26 15:12:47,656][00197] Num frames 3200... [2024-07-26 15:12:47,782][00197] Num frames 3300... [2024-07-26 15:12:47,907][00197] Num frames 3400... [2024-07-26 15:12:48,041][00197] Num frames 3500... [2024-07-26 15:12:48,169][00197] Num frames 3600... [2024-07-26 15:12:48,313][00197] Num frames 3700... [2024-07-26 15:12:48,444][00197] Num frames 3800... [2024-07-26 15:12:48,574][00197] Num frames 3900... [2024-07-26 15:12:48,702][00197] Num frames 4000... [2024-07-26 15:12:48,827][00197] Num frames 4100... [2024-07-26 15:12:48,962][00197] Num frames 4200... [2024-07-26 15:12:49,013][00197] Avg episode rewards: #0: 65.999, true rewards: #0: 21.000 [2024-07-26 15:12:49,015][00197] Avg episode reward: 65.999, avg true_objective: 21.000 [2024-07-26 15:12:49,143][00197] Num frames 4300... [2024-07-26 15:12:49,277][00197] Num frames 4400... [2024-07-26 15:12:49,405][00197] Num frames 4500... [2024-07-26 15:12:49,532][00197] Num frames 4600... [2024-07-26 15:12:49,664][00197] Num frames 4700... [2024-07-26 15:12:49,792][00197] Num frames 4800... [2024-07-26 15:12:49,920][00197] Num frames 4900... [2024-07-26 15:12:50,057][00197] Num frames 5000... [2024-07-26 15:12:50,189][00197] Num frames 5100... [2024-07-26 15:12:50,327][00197] Num frames 5200... [2024-07-26 15:12:50,457][00197] Num frames 5300... [2024-07-26 15:12:50,586][00197] Num frames 5400... [2024-07-26 15:12:50,724][00197] Num frames 5500... [2024-07-26 15:12:50,905][00197] Num frames 5600... [2024-07-26 15:12:51,104][00197] Num frames 5700... [2024-07-26 15:12:51,293][00197] Num frames 5800... [2024-07-26 15:12:51,487][00197] Num frames 5900... [2024-07-26 15:12:51,680][00197] Num frames 6000... [2024-07-26 15:12:51,858][00197] Num frames 6100... [2024-07-26 15:12:52,048][00197] Num frames 6200... [2024-07-26 15:12:52,238][00197] Num frames 6300... [2024-07-26 15:12:52,292][00197] Avg episode rewards: #0: 64.999, true rewards: #0: 21.000 [2024-07-26 15:12:52,293][00197] Avg episode reward: 64.999, avg true_objective: 21.000 [2024-07-26 15:12:52,487][00197] Num frames 6400... [2024-07-26 15:12:52,674][00197] Num frames 6500... [2024-07-26 15:12:52,860][00197] Num frames 6600... [2024-07-26 15:12:53,042][00197] Num frames 6700... [2024-07-26 15:12:53,246][00197] Num frames 6800... [2024-07-26 15:12:53,417][00197] Num frames 6900... [2024-07-26 15:12:53,541][00197] Num frames 7000... [2024-07-26 15:12:53,667][00197] Num frames 7100... [2024-07-26 15:12:53,795][00197] Num frames 7200... [2024-07-26 15:12:53,921][00197] Num frames 7300... [2024-07-26 15:12:54,054][00197] Num frames 7400... [2024-07-26 15:12:54,182][00197] Num frames 7500... [2024-07-26 15:12:54,315][00197] Num frames 7600... [2024-07-26 15:12:54,453][00197] Num frames 7700... [2024-07-26 15:12:54,580][00197] Num frames 7800... [2024-07-26 15:12:54,711][00197] Num frames 7900... [2024-07-26 15:12:54,837][00197] Num frames 8000... [2024-07-26 15:12:54,974][00197] Num frames 8100... [2024-07-26 15:12:55,117][00197] Num frames 8200... [2024-07-26 15:12:55,250][00197] Num frames 8300... [2024-07-26 15:12:55,382][00197] Num frames 8400... [2024-07-26 15:12:55,435][00197] Avg episode rewards: #0: 63.999, true rewards: #0: 21.000 [2024-07-26 15:12:55,437][00197] Avg episode reward: 63.999, avg true_objective: 21.000 [2024-07-26 15:12:55,562][00197] Num frames 8500... [2024-07-26 15:12:55,687][00197] Num frames 8600... [2024-07-26 15:12:55,812][00197] Num frames 8700... [2024-07-26 15:12:55,940][00197] Num frames 8800... [2024-07-26 15:12:56,073][00197] Num frames 8900... [2024-07-26 15:12:56,200][00197] Num frames 9000... [2024-07-26 15:12:56,325][00197] Num frames 9100... [2024-07-26 15:12:56,465][00197] Num frames 9200... [2024-07-26 15:12:56,592][00197] Num frames 9300... [2024-07-26 15:12:56,719][00197] Num frames 9400... [2024-07-26 15:12:56,848][00197] Num frames 9500... [2024-07-26 15:12:56,980][00197] Num frames 9600... [2024-07-26 15:12:57,104][00197] Num frames 9700... [2024-07-26 15:12:57,232][00197] Num frames 9800... [2024-07-26 15:12:57,360][00197] Num frames 9900... [2024-07-26 15:12:57,497][00197] Num frames 10000... [2024-07-26 15:12:57,627][00197] Num frames 10100... [2024-07-26 15:12:57,758][00197] Num frames 10200... [2024-07-26 15:12:57,883][00197] Num frames 10300... [2024-07-26 15:12:58,014][00197] Num frames 10400... [2024-07-26 15:12:58,144][00197] Num frames 10500... [2024-07-26 15:12:58,196][00197] Avg episode rewards: #0: 64.599, true rewards: #0: 21.000 [2024-07-26 15:12:58,198][00197] Avg episode reward: 64.599, avg true_objective: 21.000 [2024-07-26 15:12:58,328][00197] Num frames 10600... [2024-07-26 15:12:58,453][00197] Num frames 10700... [2024-07-26 15:12:58,587][00197] Num frames 10800... [2024-07-26 15:12:58,709][00197] Num frames 10900... [2024-07-26 15:12:58,834][00197] Num frames 11000... [2024-07-26 15:12:58,966][00197] Num frames 11100... [2024-07-26 15:12:59,095][00197] Num frames 11200... [2024-07-26 15:12:59,223][00197] Num frames 11300... [2024-07-26 15:12:59,359][00197] Num frames 11400... [2024-07-26 15:12:59,486][00197] Num frames 11500... [2024-07-26 15:12:59,623][00197] Num frames 11600... [2024-07-26 15:12:59,755][00197] Num frames 11700... [2024-07-26 15:12:59,883][00197] Num frames 11800... [2024-07-26 15:13:00,014][00197] Num frames 11900... [2024-07-26 15:13:00,141][00197] Num frames 12000... [2024-07-26 15:13:00,270][00197] Num frames 12100... [2024-07-26 15:13:00,401][00197] Num frames 12200... [2024-07-26 15:13:00,528][00197] Num frames 12300... [2024-07-26 15:13:00,715][00197] Avg episode rewards: #0: 62.485, true rewards: #0: 20.653 [2024-07-26 15:13:00,717][00197] Avg episode reward: 62.485, avg true_objective: 20.653 [2024-07-26 15:13:00,731][00197] Num frames 12400... [2024-07-26 15:13:00,856][00197] Num frames 12500... [2024-07-26 15:13:00,988][00197] Num frames 12600... [2024-07-26 15:13:01,116][00197] Num frames 12700... [2024-07-26 15:13:01,246][00197] Num frames 12800... [2024-07-26 15:13:01,375][00197] Num frames 12900... [2024-07-26 15:13:01,502][00197] Num frames 13000... [2024-07-26 15:13:01,635][00197] Num frames 13100... [2024-07-26 15:13:01,761][00197] Num frames 13200... [2024-07-26 15:13:01,890][00197] Num frames 13300... [2024-07-26 15:13:02,026][00197] Num frames 13400... [2024-07-26 15:13:02,154][00197] Num frames 13500... [2024-07-26 15:13:02,282][00197] Num frames 13600... [2024-07-26 15:13:02,412][00197] Num frames 13700... [2024-07-26 15:13:02,538][00197] Num frames 13800... [2024-07-26 15:13:02,676][00197] Num frames 13900... [2024-07-26 15:13:02,804][00197] Num frames 14000... [2024-07-26 15:13:02,932][00197] Num frames 14100... [2024-07-26 15:13:03,070][00197] Num frames 14200... [2024-07-26 15:13:03,198][00197] Num frames 14300... [2024-07-26 15:13:03,328][00197] Num frames 14400... [2024-07-26 15:13:03,561][00197] Avg episode rewards: #0: 61.844, true rewards: #0: 20.703 [2024-07-26 15:13:03,563][00197] Avg episode reward: 61.844, avg true_objective: 20.703 [2024-07-26 15:13:03,589][00197] Num frames 14500... [2024-07-26 15:13:03,792][00197] Num frames 14600... [2024-07-26 15:13:03,974][00197] Num frames 14700... [2024-07-26 15:13:04,159][00197] Num frames 14800... [2024-07-26 15:13:04,341][00197] Num frames 14900... [2024-07-26 15:13:04,522][00197] Num frames 15000... [2024-07-26 15:13:04,712][00197] Num frames 15100... [2024-07-26 15:13:04,902][00197] Num frames 15200... [2024-07-26 15:13:05,094][00197] Num frames 15300... [2024-07-26 15:13:05,286][00197] Num frames 15400... [2024-07-26 15:13:05,478][00197] Num frames 15500... [2024-07-26 15:13:05,684][00197] Num frames 15600... [2024-07-26 15:13:05,870][00197] Num frames 15700... [2024-07-26 15:13:06,009][00197] Num frames 15800... [2024-07-26 15:13:06,141][00197] Num frames 15900... [2024-07-26 15:13:06,269][00197] Num frames 16000... [2024-07-26 15:13:06,402][00197] Num frames 16100... [2024-07-26 15:13:06,530][00197] Num frames 16200... [2024-07-26 15:13:06,661][00197] Num frames 16300... [2024-07-26 15:13:06,799][00197] Num frames 16400... [2024-07-26 15:13:06,928][00197] Num frames 16500... [2024-07-26 15:13:07,107][00197] Avg episode rewards: #0: 61.864, true rewards: #0: 20.740 [2024-07-26 15:13:07,110][00197] Avg episode reward: 61.864, avg true_objective: 20.740 [2024-07-26 15:13:07,125][00197] Num frames 16600... [2024-07-26 15:13:07,255][00197] Num frames 16700... [2024-07-26 15:13:07,384][00197] Num frames 16800... [2024-07-26 15:13:07,513][00197] Num frames 16900... [2024-07-26 15:13:07,645][00197] Num frames 17000... [2024-07-26 15:13:07,771][00197] Num frames 17100... [2024-07-26 15:13:07,906][00197] Num frames 17200... [2024-07-26 15:13:08,041][00197] Num frames 17300... [2024-07-26 15:13:08,170][00197] Num frames 17400... [2024-07-26 15:13:08,303][00197] Num frames 17500... [2024-07-26 15:13:08,431][00197] Num frames 17600... [2024-07-26 15:13:08,559][00197] Num frames 17700... [2024-07-26 15:13:08,689][00197] Num frames 17800... [2024-07-26 15:13:08,828][00197] Num frames 17900... [2024-07-26 15:13:08,969][00197] Num frames 18000... [2024-07-26 15:13:09,098][00197] Num frames 18100... [2024-07-26 15:13:09,226][00197] Num frames 18200... [2024-07-26 15:13:09,355][00197] Num frames 18300... [2024-07-26 15:13:09,485][00197] Num frames 18400... [2024-07-26 15:13:09,618][00197] Num frames 18500... [2024-07-26 15:13:09,749][00197] Num frames 18600... [2024-07-26 15:13:09,940][00197] Avg episode rewards: #0: 62.212, true rewards: #0: 20.769 [2024-07-26 15:13:09,943][00197] Avg episode reward: 62.212, avg true_objective: 20.769 [2024-07-26 15:13:09,959][00197] Num frames 18700... [2024-07-26 15:13:10,089][00197] Num frames 18800... [2024-07-26 15:13:10,220][00197] Num frames 18900... [2024-07-26 15:13:10,352][00197] Num frames 19000... [2024-07-26 15:13:10,481][00197] Num frames 19100... [2024-07-26 15:13:10,610][00197] Num frames 19200... [2024-07-26 15:13:10,737][00197] Num frames 19300... [2024-07-26 15:13:10,866][00197] Num frames 19400... [2024-07-26 15:13:11,014][00197] Num frames 19500... [2024-07-26 15:13:11,143][00197] Num frames 19600... [2024-07-26 15:13:11,274][00197] Num frames 19700... [2024-07-26 15:13:11,408][00197] Num frames 19800... [2024-07-26 15:13:11,542][00197] Num frames 19900... [2024-07-26 15:13:11,672][00197] Num frames 20000... [2024-07-26 15:13:11,801][00197] Num frames 20100... [2024-07-26 15:13:11,937][00197] Num frames 20200... [2024-07-26 15:13:12,074][00197] Num frames 20300... [2024-07-26 15:13:12,205][00197] Num frames 20400... [2024-07-26 15:13:12,336][00197] Num frames 20500... [2024-07-26 15:13:12,469][00197] Num frames 20600... [2024-07-26 15:13:12,600][00197] Num frames 20700... [2024-07-26 15:13:12,776][00197] Avg episode rewards: #0: 62.391, true rewards: #0: 20.792 [2024-07-26 15:13:12,778][00197] Avg episode reward: 62.391, avg true_objective: 20.792 [2024-07-26 15:15:13,473][00197] Replay video saved to train_dir/doom_health_gathering_supreme_2222/replay.mp4! [2024-07-26 15:25:29,277][18919] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-07-26 15:25:29,282][18919] Rollout worker 0 uses device cpu [2024-07-26 15:25:29,284][18919] Rollout worker 1 uses device cpu [2024-07-26 15:25:29,285][18919] Rollout worker 2 uses device cpu [2024-07-26 15:25:29,287][18919] Rollout worker 3 uses device cpu [2024-07-26 15:25:29,291][18919] Rollout worker 4 uses device cpu [2024-07-26 15:25:29,293][18919] Rollout worker 5 uses device cpu [2024-07-26 15:25:29,295][18919] Rollout worker 6 uses device cpu [2024-07-26 15:25:29,296][18919] Rollout worker 7 uses device cpu [2024-07-26 15:25:29,451][18919] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-26 15:25:29,454][18919] InferenceWorker_p0-w0: min num requests: 2 [2024-07-26 15:25:29,488][18919] Starting all processes... [2024-07-26 15:25:29,489][18919] Starting process learner_proc0 [2024-07-26 15:25:29,537][18919] Starting all processes... [2024-07-26 15:25:29,546][18919] Starting process inference_proc0-0 [2024-07-26 15:25:29,548][18919] Starting process rollout_proc0 [2024-07-26 15:25:29,548][18919] Starting process rollout_proc1 [2024-07-26 15:25:29,548][18919] Starting process rollout_proc2 [2024-07-26 15:25:29,548][18919] Starting process rollout_proc3 [2024-07-26 15:25:29,548][18919] Starting process rollout_proc4 [2024-07-26 15:25:29,548][18919] Starting process rollout_proc5 [2024-07-26 15:25:29,548][18919] Starting process rollout_proc6 [2024-07-26 15:25:29,548][18919] Starting process rollout_proc7 [2024-07-26 15:25:40,537][19456] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-26 15:25:40,542][19456] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-07-26 15:25:40,644][19456] Num visible devices: 1 [2024-07-26 15:25:40,989][19458] Worker 1 uses CPU cores [1] [2024-07-26 15:25:40,996][19443] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-26 15:25:40,997][19443] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-07-26 15:25:41,071][19443] Num visible devices: 1 [2024-07-26 15:25:41,133][19443] Starting seed is not provided [2024-07-26 15:25:41,134][19443] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-26 15:25:41,134][19443] Initializing actor-critic model on device cuda:0 [2024-07-26 15:25:41,134][19443] RunningMeanStd input shape: (3, 72, 128) [2024-07-26 15:25:41,136][19443] RunningMeanStd input shape: (1,) [2024-07-26 15:25:41,222][19460] Worker 3 uses CPU cores [1] [2024-07-26 15:25:41,233][19443] ConvEncoder: input_channels=3 [2024-07-26 15:25:41,323][19462] Worker 5 uses CPU cores [1] [2024-07-26 15:25:41,335][19457] Worker 0 uses CPU cores [0] [2024-07-26 15:25:41,377][19463] Worker 6 uses CPU cores [0] [2024-07-26 15:25:41,418][19464] Worker 7 uses CPU cores [1] [2024-07-26 15:25:41,439][19461] Worker 4 uses CPU cores [0] [2024-07-26 15:25:41,458][19459] Worker 2 uses CPU cores [0] [2024-07-26 15:25:41,499][19443] Conv encoder output size: 512 [2024-07-26 15:25:41,500][19443] Policy head output size: 512 [2024-07-26 15:25:41,515][19443] Created Actor Critic model with architecture: [2024-07-26 15:25:41,515][19443] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-07-26 15:25:43,190][19443] Using optimizer [2024-07-26 15:25:43,191][19443] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-07-26 15:25:43,228][19443] Loading model from checkpoint [2024-07-26 15:25:43,232][19443] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2024-07-26 15:25:43,233][19443] Initialized policy 0 weights for model version 978 [2024-07-26 15:25:43,236][19443] LearnerWorker_p0 finished initialization! [2024-07-26 15:25:43,237][19443] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-26 15:25:43,433][19456] RunningMeanStd input shape: (3, 72, 128) [2024-07-26 15:25:43,434][19456] RunningMeanStd input shape: (1,) [2024-07-26 15:25:43,447][19456] ConvEncoder: input_channels=3 [2024-07-26 15:25:43,547][19456] Conv encoder output size: 512 [2024-07-26 15:25:43,548][19456] Policy head output size: 512 [2024-07-26 15:25:45,019][18919] Inference worker 0-0 is ready! [2024-07-26 15:25:45,020][18919] All inference workers are ready! Signal rollout workers to start! [2024-07-26 15:25:45,135][19460] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 15:25:45,141][19464] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 15:25:45,162][19458] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 15:25:45,161][19462] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 15:25:45,158][19463] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 15:25:45,186][19457] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 15:25:45,183][19459] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 15:25:45,203][19461] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 15:25:46,267][18919] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-26 15:25:46,418][19461] Decorrelating experience for 0 frames... [2024-07-26 15:25:46,419][19457] Decorrelating experience for 0 frames... [2024-07-26 15:25:46,423][19459] Decorrelating experience for 0 frames... [2024-07-26 15:25:46,773][19464] Decorrelating experience for 0 frames... [2024-07-26 15:25:46,787][19458] Decorrelating experience for 0 frames... [2024-07-26 15:25:46,796][19460] Decorrelating experience for 0 frames... [2024-07-26 15:25:46,805][19462] Decorrelating experience for 0 frames... [2024-07-26 15:25:47,542][19457] Decorrelating experience for 32 frames... [2024-07-26 15:25:47,545][19461] Decorrelating experience for 32 frames... [2024-07-26 15:25:47,960][19464] Decorrelating experience for 32 frames... [2024-07-26 15:25:47,978][19460] Decorrelating experience for 32 frames... [2024-07-26 15:25:47,981][19458] Decorrelating experience for 32 frames... [2024-07-26 15:25:48,180][19459] Decorrelating experience for 32 frames... [2024-07-26 15:25:48,185][19463] Decorrelating experience for 0 frames... [2024-07-26 15:25:48,731][19457] Decorrelating experience for 64 frames... [2024-07-26 15:25:48,933][19461] Decorrelating experience for 64 frames... [2024-07-26 15:25:49,010][19462] Decorrelating experience for 32 frames... [2024-07-26 15:25:49,267][19460] Decorrelating experience for 64 frames... [2024-07-26 15:25:49,443][18919] Heartbeat connected on Batcher_0 [2024-07-26 15:25:49,446][18919] Heartbeat connected on LearnerWorker_p0 [2024-07-26 15:25:49,489][18919] Heartbeat connected on InferenceWorker_p0-w0 [2024-07-26 15:25:50,063][19464] Decorrelating experience for 64 frames... [2024-07-26 15:25:50,223][19463] Decorrelating experience for 32 frames... [2024-07-26 15:25:50,311][19461] Decorrelating experience for 96 frames... [2024-07-26 15:25:50,636][19458] Decorrelating experience for 64 frames... [2024-07-26 15:25:50,685][18919] Heartbeat connected on RolloutWorker_w4 [2024-07-26 15:25:51,267][18919] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-26 15:25:51,321][19460] Decorrelating experience for 96 frames... [2024-07-26 15:25:51,920][18919] Heartbeat connected on RolloutWorker_w3 [2024-07-26 15:25:52,238][19459] Decorrelating experience for 64 frames... [2024-07-26 15:25:52,765][19462] Decorrelating experience for 64 frames... [2024-07-26 15:25:53,003][19463] Decorrelating experience for 64 frames... [2024-07-26 15:25:54,524][19464] Decorrelating experience for 96 frames... [2024-07-26 15:25:54,888][19459] Decorrelating experience for 96 frames... [2024-07-26 15:25:55,146][18919] Heartbeat connected on RolloutWorker_w7 [2024-07-26 15:25:55,737][18919] Heartbeat connected on RolloutWorker_w2 [2024-07-26 15:25:55,837][19458] Decorrelating experience for 96 frames... [2024-07-26 15:25:56,267][18919] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 74.0. Samples: 740. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-26 15:25:56,270][18919] Avg episode reward: [(0, '2.270')] [2024-07-26 15:25:56,511][19463] Decorrelating experience for 96 frames... [2024-07-26 15:25:57,116][18919] Heartbeat connected on RolloutWorker_w1 [2024-07-26 15:25:57,289][18919] Heartbeat connected on RolloutWorker_w6 [2024-07-26 15:25:58,392][19457] Decorrelating experience for 96 frames... [2024-07-26 15:25:58,627][19443] Signal inference workers to stop experience collection... [2024-07-26 15:25:58,638][19456] InferenceWorker_p0-w0: stopping experience collection [2024-07-26 15:25:58,680][19443] Signal inference workers to resume experience collection... [2024-07-26 15:25:58,685][19456] InferenceWorker_p0-w0: resuming experience collection [2024-07-26 15:25:58,807][18919] Heartbeat connected on RolloutWorker_w0 [2024-07-26 15:25:59,273][19462] Decorrelating experience for 96 frames... [2024-07-26 15:26:00,100][18919] Heartbeat connected on RolloutWorker_w5 [2024-07-26 15:26:01,267][18919] Fps is (10 sec: 819.2, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 4014080. Throughput: 0: 174.4. Samples: 2616. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-07-26 15:26:01,270][18919] Avg episode reward: [(0, '5.562')] [2024-07-26 15:26:06,268][18919] Fps is (10 sec: 2867.1, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 4034560. Throughput: 0: 400.2. Samples: 8004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-26 15:26:06,270][18919] Avg episode reward: [(0, '10.359')] [2024-07-26 15:26:09,890][19456] Updated weights for policy 0, policy_version 988 (0.0715) [2024-07-26 15:26:11,267][18919] Fps is (10 sec: 3276.8, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 4046848. Throughput: 0: 411.6. Samples: 10290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 15:26:11,271][18919] Avg episode reward: [(0, '12.329')] [2024-07-26 15:26:16,267][18919] Fps is (10 sec: 2867.3, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 4063232. Throughput: 0: 484.1. Samples: 14522. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-26 15:26:16,270][18919] Avg episode reward: [(0, '14.364')] [2024-07-26 15:26:21,049][19456] Updated weights for policy 0, policy_version 998 (0.0012) [2024-07-26 15:26:21,267][18919] Fps is (10 sec: 4096.0, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 4087808. Throughput: 0: 599.9. Samples: 20996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-26 15:26:21,270][18919] Avg episode reward: [(0, '16.321')] [2024-07-26 15:26:26,270][18919] Fps is (10 sec: 4094.8, 60 sec: 2457.4, 300 sec: 2457.4). Total num frames: 4104192. Throughput: 0: 605.0. Samples: 24200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:26:26,273][18919] Avg episode reward: [(0, '18.269')] [2024-07-26 15:26:31,268][18919] Fps is (10 sec: 3276.8, 60 sec: 2548.6, 300 sec: 2548.6). Total num frames: 4120576. Throughput: 0: 632.9. Samples: 28482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:26:31,270][18919] Avg episode reward: [(0, '20.306')] [2024-07-26 15:26:33,587][19456] Updated weights for policy 0, policy_version 1008 (0.0016) [2024-07-26 15:26:36,267][18919] Fps is (10 sec: 3687.5, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 4141056. Throughput: 0: 756.8. Samples: 34056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 15:26:36,273][18919] Avg episode reward: [(0, '21.395')] [2024-07-26 15:26:41,267][18919] Fps is (10 sec: 4096.0, 60 sec: 2830.0, 300 sec: 2830.0). Total num frames: 4161536. Throughput: 0: 813.2. Samples: 37336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:26:41,275][18919] Avg episode reward: [(0, '21.053')] [2024-07-26 15:26:43,251][19456] Updated weights for policy 0, policy_version 1018 (0.0019) [2024-07-26 15:26:46,267][18919] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2798.9). Total num frames: 4173824. Throughput: 0: 888.8. Samples: 42614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-26 15:26:46,276][18919] Avg episode reward: [(0, '19.880')] [2024-07-26 15:26:51,267][18919] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2835.7). Total num frames: 4190208. Throughput: 0: 874.7. Samples: 47366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:26:51,272][18919] Avg episode reward: [(0, '20.791')] [2024-07-26 15:26:55,117][19456] Updated weights for policy 0, policy_version 1028 (0.0015) [2024-07-26 15:26:56,267][18919] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 2984.2). Total num frames: 4214784. Throughput: 0: 895.6. Samples: 50590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 15:26:56,270][18919] Avg episode reward: [(0, '20.621')] [2024-07-26 15:27:01,268][18919] Fps is (10 sec: 4095.6, 60 sec: 3618.1, 300 sec: 3003.7). Total num frames: 4231168. Throughput: 0: 940.5. Samples: 56844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:27:01,277][18919] Avg episode reward: [(0, '21.283')] [2024-07-26 15:27:06,267][18919] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3020.8). Total num frames: 4247552. Throughput: 0: 887.8. Samples: 60946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:27:06,273][18919] Avg episode reward: [(0, '21.632')] [2024-07-26 15:27:07,394][19456] Updated weights for policy 0, policy_version 1038 (0.0015) [2024-07-26 15:27:11,267][18919] Fps is (10 sec: 3686.8, 60 sec: 3686.4, 300 sec: 3084.0). Total num frames: 4268032. Throughput: 0: 881.8. Samples: 63878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:27:11,273][18919] Avg episode reward: [(0, '21.233')] [2024-07-26 15:27:16,268][18919] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3140.3). Total num frames: 4288512. Throughput: 0: 931.3. Samples: 70390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:27:16,270][18919] Avg episode reward: [(0, '20.520')] [2024-07-26 15:27:16,741][19456] Updated weights for policy 0, policy_version 1048 (0.0017) [2024-07-26 15:27:21,269][18919] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3147.4). Total num frames: 4304896. Throughput: 0: 914.7. Samples: 75220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:27:21,275][18919] Avg episode reward: [(0, '21.541')] [2024-07-26 15:27:26,267][18919] Fps is (10 sec: 3276.9, 60 sec: 3618.3, 300 sec: 3153.9). Total num frames: 4321280. Throughput: 0: 887.7. Samples: 77282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:27:26,272][18919] Avg episode reward: [(0, '21.475')] [2024-07-26 15:27:26,281][19443] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001055_4321280.pth... [2024-07-26 15:27:26,418][19443] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000927_3796992.pth [2024-07-26 15:27:29,007][19456] Updated weights for policy 0, policy_version 1058 (0.0015) [2024-07-26 15:27:31,267][18919] Fps is (10 sec: 3687.0, 60 sec: 3686.4, 300 sec: 3198.8). Total num frames: 4341760. Throughput: 0: 909.2. Samples: 83530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:27:31,271][18919] Avg episode reward: [(0, '22.249')] [2024-07-26 15:27:36,269][18919] Fps is (10 sec: 3686.0, 60 sec: 3618.1, 300 sec: 3202.3). Total num frames: 4358144. Throughput: 0: 931.9. Samples: 89302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:27:36,273][18919] Avg episode reward: [(0, '23.435')] [2024-07-26 15:27:36,286][19443] Saving new best policy, reward=23.435! [2024-07-26 15:27:41,174][19456] Updated weights for policy 0, policy_version 1068 (0.0022) [2024-07-26 15:27:41,267][18919] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3205.6). Total num frames: 4374528. Throughput: 0: 903.1. Samples: 91230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:27:41,274][18919] Avg episode reward: [(0, '24.767')] [2024-07-26 15:27:41,276][19443] Saving new best policy, reward=24.767! [2024-07-26 15:27:46,267][18919] Fps is (10 sec: 3686.9, 60 sec: 3686.4, 300 sec: 3242.7). Total num frames: 4395008. Throughput: 0: 881.0. Samples: 96490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:27:46,270][18919] Avg episode reward: [(0, '24.187')] [2024-07-26 15:27:51,055][19456] Updated weights for policy 0, policy_version 1078 (0.0013) [2024-07-26 15:27:51,267][18919] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 4415488. Throughput: 0: 933.6. Samples: 102958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:27:51,270][18919] Avg episode reward: [(0, '21.910')] [2024-07-26 15:27:56,267][18919] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3245.3). Total num frames: 4427776. Throughput: 0: 922.3. Samples: 105380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 15:27:56,274][18919] Avg episode reward: [(0, '20.468')] [2024-07-26 15:28:01,267][18919] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3246.5). Total num frames: 4444160. Throughput: 0: 872.7. Samples: 109662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:28:01,274][18919] Avg episode reward: [(0, '19.593')] [2024-07-26 15:28:03,349][19456] Updated weights for policy 0, policy_version 1088 (0.0021) [2024-07-26 15:28:06,267][18919] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3276.8). Total num frames: 4464640. Throughput: 0: 908.8. Samples: 116116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:28:06,272][18919] Avg episode reward: [(0, '19.811')] [2024-07-26 15:28:11,267][18919] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3305.0). Total num frames: 4485120. Throughput: 0: 935.2. Samples: 119364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 15:28:11,277][18919] Avg episode reward: [(0, '21.227')] [2024-07-26 15:28:14,870][19456] Updated weights for policy 0, policy_version 1098 (0.0019) [2024-07-26 15:28:16,268][18919] Fps is (10 sec: 3276.5, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 4497408. Throughput: 0: 890.2. Samples: 123588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:28:16,273][18919] Avg episode reward: [(0, '20.572')] [2024-07-26 15:28:21,267][18919] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3303.2). Total num frames: 4517888. Throughput: 0: 892.0. Samples: 129440. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 15:28:21,275][18919] Avg episode reward: [(0, '21.986')] [2024-07-26 15:28:25,049][19456] Updated weights for policy 0, policy_version 1108 (0.0016) [2024-07-26 15:28:26,267][18919] Fps is (10 sec: 4506.0, 60 sec: 3686.4, 300 sec: 3353.6). Total num frames: 4542464. Throughput: 0: 921.9. Samples: 132716. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 15:28:26,273][18919] Avg episode reward: [(0, '22.223')] [2024-07-26 15:28:31,268][18919] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3326.4). Total num frames: 4554752. Throughput: 0: 922.5. Samples: 138004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:28:31,276][18919] Avg episode reward: [(0, '22.392')] [2024-07-26 15:28:36,267][18919] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3325.0). Total num frames: 4571136. Throughput: 0: 883.2. Samples: 142700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 15:28:36,274][18919] Avg episode reward: [(0, '22.642')] [2024-07-26 15:28:37,274][19456] Updated weights for policy 0, policy_version 1118 (0.0015) [2024-07-26 15:28:41,267][18919] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3370.4). Total num frames: 4595712. Throughput: 0: 902.2. Samples: 145978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:28:41,274][18919] Avg episode reward: [(0, '22.564')] [2024-07-26 15:28:46,268][18919] Fps is (10 sec: 4095.8, 60 sec: 3618.1, 300 sec: 3367.8). Total num frames: 4612096. Throughput: 0: 946.0. Samples: 152234. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 15:28:46,281][18919] Avg episode reward: [(0, '22.004')] [2024-07-26 15:28:47,845][19456] Updated weights for policy 0, policy_version 1128 (0.0015) [2024-07-26 15:28:51,267][18919] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3365.4). Total num frames: 4628480. Throughput: 0: 895.0. Samples: 156392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:28:51,273][18919] Avg episode reward: [(0, '21.631')] [2024-07-26 15:28:56,268][18919] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3384.6). Total num frames: 4648960. Throughput: 0: 889.5. Samples: 159390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 15:28:56,270][18919] Avg episode reward: [(0, '20.860')] [2024-07-26 15:28:58,694][19456] Updated weights for policy 0, policy_version 1138 (0.0026) [2024-07-26 15:29:01,267][18919] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3402.8). Total num frames: 4669440. Throughput: 0: 942.6. Samples: 166002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:29:01,274][18919] Avg episode reward: [(0, '20.520')] [2024-07-26 15:29:06,267][18919] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3399.7). Total num frames: 4685824. Throughput: 0: 919.0. Samples: 170794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:29:06,276][18919] Avg episode reward: [(0, '20.357')] [2024-07-26 15:29:10,689][19456] Updated weights for policy 0, policy_version 1148 (0.0012) [2024-07-26 15:29:11,268][18919] Fps is (10 sec: 3276.6, 60 sec: 3618.1, 300 sec: 3396.7). Total num frames: 4702208. Throughput: 0: 893.1. Samples: 172906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:29:11,273][18919] Avg episode reward: [(0, '20.009')] [2024-07-26 15:29:16,267][18919] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3413.3). Total num frames: 4722688. Throughput: 0: 918.9. Samples: 179354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:29:16,272][18919] Avg episode reward: [(0, '21.261')] [2024-07-26 15:29:20,846][19456] Updated weights for policy 0, policy_version 1158 (0.0018) [2024-07-26 15:29:21,269][18919] Fps is (10 sec: 4095.4, 60 sec: 3754.5, 300 sec: 3429.2). Total num frames: 4743168. Throughput: 0: 943.1. Samples: 185140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:29:21,274][18919] Avg episode reward: [(0, '21.688')] [2024-07-26 15:29:26,270][18919] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3407.1). Total num frames: 4755456. Throughput: 0: 915.3. Samples: 187170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-26 15:29:26,273][18919] Avg episode reward: [(0, '21.516')] [2024-07-26 15:29:26,292][19443] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001161_4755456.pth... [2024-07-26 15:29:26,458][19443] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2024-07-26 15:29:31,267][18919] Fps is (10 sec: 3277.5, 60 sec: 3686.4, 300 sec: 3422.4). Total num frames: 4775936. Throughput: 0: 898.5. Samples: 192668. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 15:29:31,273][18919] Avg episode reward: [(0, '20.834')] [2024-07-26 15:29:32,489][19456] Updated weights for policy 0, policy_version 1168 (0.0013) [2024-07-26 15:29:36,267][18919] Fps is (10 sec: 4506.9, 60 sec: 3822.9, 300 sec: 3454.9). Total num frames: 4800512. Throughput: 0: 946.8. Samples: 198996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:29:36,273][18919] Avg episode reward: [(0, '21.189')] [2024-07-26 15:29:41,267][18919] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3433.7). Total num frames: 4812800. Throughput: 0: 930.9. Samples: 201282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:29:41,273][18919] Avg episode reward: [(0, '21.099')] [2024-07-26 15:29:44,675][19456] Updated weights for policy 0, policy_version 1178 (0.0018) [2024-07-26 15:29:46,267][18919] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3430.4). Total num frames: 4829184. Throughput: 0: 883.3. Samples: 205750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:29:46,270][18919] Avg episode reward: [(0, '23.069')] [2024-07-26 15:29:51,267][18919] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3460.7). Total num frames: 4853760. Throughput: 0: 922.8. Samples: 212322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:29:51,270][18919] Avg episode reward: [(0, '22.822')] [2024-07-26 15:29:54,081][19456] Updated weights for policy 0, policy_version 1188 (0.0018) [2024-07-26 15:29:56,267][18919] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3457.0). Total num frames: 4870144. Throughput: 0: 946.3. Samples: 215490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:29:56,270][18919] Avg episode reward: [(0, '22.195')] [2024-07-26 15:30:01,267][18919] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3437.4). Total num frames: 4882432. Throughput: 0: 894.8. Samples: 219622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:30:01,272][18919] Avg episode reward: [(0, '21.692')] [2024-07-26 15:30:06,267][18919] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3450.1). Total num frames: 4902912. Throughput: 0: 896.0. Samples: 225456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:30:06,274][18919] Avg episode reward: [(0, '23.163')] [2024-07-26 15:30:06,449][19456] Updated weights for policy 0, policy_version 1198 (0.0011) [2024-07-26 15:30:11,267][18919] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3477.7). Total num frames: 4927488. Throughput: 0: 923.9. Samples: 228742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:30:11,270][18919] Avg episode reward: [(0, '22.648')] [2024-07-26 15:30:16,268][18919] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3458.8). Total num frames: 4939776. Throughput: 0: 914.1. Samples: 233804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-26 15:30:16,272][18919] Avg episode reward: [(0, '22.139')] [2024-07-26 15:30:18,490][19456] Updated weights for policy 0, policy_version 1208 (0.0012) [2024-07-26 15:30:21,267][18919] Fps is (10 sec: 2867.2, 60 sec: 3550.0, 300 sec: 3455.5). Total num frames: 4956160. Throughput: 0: 884.3. Samples: 238788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-26 15:30:21,270][18919] Avg episode reward: [(0, '23.052')] [2024-07-26 15:30:26,267][18919] Fps is (10 sec: 4096.1, 60 sec: 3754.8, 300 sec: 3481.6). Total num frames: 4980736. Throughput: 0: 905.3. Samples: 242020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-26 15:30:26,272][18919] Avg episode reward: [(0, '24.683')] [2024-07-26 15:30:28,027][19456] Updated weights for policy 0, policy_version 1218 (0.0012) [2024-07-26 15:30:31,269][18919] Fps is (10 sec: 4095.2, 60 sec: 3686.3, 300 sec: 3478.0). Total num frames: 4997120. Throughput: 0: 939.7. Samples: 248038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-26 15:30:31,272][18919] Avg episode reward: [(0, '25.353')] [2024-07-26 15:30:31,280][19443] Saving new best policy, reward=25.353! [2024-07-26 15:30:33,754][19443] Stopping Batcher_0... [2024-07-26 15:30:33,755][19443] Loop batcher_evt_loop terminating... [2024-07-26 15:30:33,756][18919] Component Batcher_0 stopped! [2024-07-26 15:30:33,767][19443] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-07-26 15:30:33,867][19456] Weights refcount: 2 0 [2024-07-26 15:30:33,879][18919] Component InferenceWorker_p0-w0 stopped! [2024-07-26 15:30:33,883][19456] Stopping InferenceWorker_p0-w0... [2024-07-26 15:30:33,884][19456] Loop inference_proc0-0_evt_loop terminating... [2024-07-26 15:30:33,904][18919] Component RolloutWorker_w2 stopped! [2024-07-26 15:30:33,906][19459] Stopping RolloutWorker_w2... [2024-07-26 15:30:33,919][18919] Component RolloutWorker_w4 stopped! [2024-07-26 15:30:33,907][19459] Loop rollout_proc2_evt_loop terminating... [2024-07-26 15:30:33,922][19461] Stopping RolloutWorker_w4... [2024-07-26 15:30:33,938][18919] Component RolloutWorker_w6 stopped! [2024-07-26 15:30:33,943][18919] Component RolloutWorker_w0 stopped! [2024-07-26 15:30:33,946][19457] Stopping RolloutWorker_w0... [2024-07-26 15:30:33,925][19461] Loop rollout_proc4_evt_loop terminating... [2024-07-26 15:30:33,942][19463] Stopping RolloutWorker_w6... [2024-07-26 15:30:33,951][19463] Loop rollout_proc6_evt_loop terminating... [2024-07-26 15:30:33,947][19457] Loop rollout_proc0_evt_loop terminating... [2024-07-26 15:30:33,989][19460] Stopping RolloutWorker_w3... [2024-07-26 15:30:33,994][19458] Stopping RolloutWorker_w1... [2024-07-26 15:30:33,993][18919] Component RolloutWorker_w3 stopped! [2024-07-26 15:30:34,002][19458] Loop rollout_proc1_evt_loop terminating... [2024-07-26 15:30:34,002][19460] Loop rollout_proc3_evt_loop terminating... [2024-07-26 15:30:33,997][18919] Component RolloutWorker_w1 stopped! [2024-07-26 15:30:34,027][19462] Stopping RolloutWorker_w5... [2024-07-26 15:30:34,028][19462] Loop rollout_proc5_evt_loop terminating... [2024-07-26 15:30:34,028][18919] Component RolloutWorker_w5 stopped! [2024-07-26 15:30:34,032][19464] Stopping RolloutWorker_w7... [2024-07-26 15:30:34,034][19464] Loop rollout_proc7_evt_loop terminating... [2024-07-26 15:30:34,034][18919] Component RolloutWorker_w7 stopped! [2024-07-26 15:30:34,068][19443] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001055_4321280.pth [2024-07-26 15:30:34,120][19443] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-07-26 15:30:34,519][19443] Stopping LearnerWorker_p0... [2024-07-26 15:30:34,520][19443] Loop learner_proc0_evt_loop terminating... [2024-07-26 15:30:34,529][18919] Component LearnerWorker_p0 stopped! [2024-07-26 15:30:34,535][18919] Waiting for process learner_proc0 to stop... [2024-07-26 15:30:36,249][18919] Waiting for process inference_proc0-0 to join... [2024-07-26 15:30:36,257][18919] Waiting for process rollout_proc0 to join... [2024-07-26 15:30:37,527][18919] Waiting for process rollout_proc1 to join... [2024-07-26 15:30:37,531][18919] Waiting for process rollout_proc2 to join... [2024-07-26 15:30:37,536][18919] Waiting for process rollout_proc3 to join... [2024-07-26 15:30:37,538][18919] Waiting for process rollout_proc4 to join... [2024-07-26 15:30:37,543][18919] Waiting for process rollout_proc5 to join... [2024-07-26 15:30:37,546][18919] Waiting for process rollout_proc6 to join... [2024-07-26 15:30:37,550][18919] Waiting for process rollout_proc7 to join... [2024-07-26 15:30:37,554][18919] Batcher 0 profile tree view: batching: 6.4650, releasing_batches: 0.0073 [2024-07-26 15:30:37,555][18919] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0022 wait_policy_total: 133.9347 update_model: 2.5475 weight_update: 0.0012 one_step: 0.0023 handle_policy_step: 141.1796 deserialize: 3.7034, stack: 0.7731, obs_to_device_normalize: 30.0941, forward: 70.7096, send_messages: 7.0674 prepare_outputs: 21.5497 to_cpu: 13.4657 [2024-07-26 15:30:37,558][18919] Learner 0 profile tree view: misc: 0.0014, prepare_batch: 7.9588 train: 20.3950 epoch_init: 0.0016, minibatch_init: 0.0016, losses_postprocess: 0.1678, kl_divergence: 0.1982, after_optimizer: 0.9136 calculate_losses: 6.2420 losses_init: 0.0009, forward_head: 0.6769, bptt_initial: 3.8406, tail: 0.2414, advantages_returns: 0.0625, losses: 0.7417 bptt: 0.6049 bptt_forward_core: 0.5629 update: 12.7197 clip: 0.3748 [2024-07-26 15:30:37,560][18919] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0777, enqueue_policy_requests: 33.8361, env_step: 213.0039, overhead: 3.9676, complete_rollouts: 1.6830 save_policy_outputs: 6.3749 split_output_tensors: 2.1586 [2024-07-26 15:30:37,561][18919] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0474, enqueue_policy_requests: 34.2331, env_step: 216.1595, overhead: 4.0057, complete_rollouts: 1.9941 save_policy_outputs: 6.2127 split_output_tensors: 2.2198 [2024-07-26 15:30:37,563][18919] Loop Runner_EvtLoop terminating... [2024-07-26 15:30:37,567][18919] Runner profile tree view: main_loop: 308.0798 [2024-07-26 15:30:37,568][18919] Collected {0: 5005312}, FPS: 3244.0 [2024-07-26 15:31:19,219][18919] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-07-26 15:31:19,221][18919] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-26 15:31:19,224][18919] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-26 15:31:19,226][18919] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-26 15:31:19,227][18919] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-26 15:31:19,230][18919] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-26 15:31:19,232][18919] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-07-26 15:31:19,233][18919] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-26 15:31:19,234][18919] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-07-26 15:31:19,236][18919] Adding new argument 'hf_repository'='thomaspalomares/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-07-26 15:31:19,238][18919] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-26 15:31:19,239][18919] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-26 15:31:19,241][18919] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-26 15:31:19,243][18919] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-26 15:31:19,244][18919] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-26 15:31:19,262][18919] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-26 15:31:19,264][18919] RunningMeanStd input shape: (3, 72, 128) [2024-07-26 15:31:19,266][18919] RunningMeanStd input shape: (1,) [2024-07-26 15:31:19,281][18919] ConvEncoder: input_channels=3 [2024-07-26 15:31:19,406][18919] Conv encoder output size: 512 [2024-07-26 15:31:19,408][18919] Policy head output size: 512 [2024-07-26 15:31:21,533][18919] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-07-26 15:31:22,685][18919] Num frames 100... [2024-07-26 15:31:22,810][18919] Num frames 200... [2024-07-26 15:31:22,937][18919] Num frames 300... [2024-07-26 15:31:23,067][18919] Num frames 400... [2024-07-26 15:31:23,206][18919] Num frames 500... [2024-07-26 15:31:23,383][18919] Avg episode rewards: #0: 8.960, true rewards: #0: 5.960 [2024-07-26 15:31:23,386][18919] Avg episode reward: 8.960, avg true_objective: 5.960 [2024-07-26 15:31:23,394][18919] Num frames 600... [2024-07-26 15:31:23,517][18919] Num frames 700... [2024-07-26 15:31:23,650][18919] Num frames 800... [2024-07-26 15:31:23,775][18919] Num frames 900... [2024-07-26 15:31:23,899][18919] Num frames 1000... [2024-07-26 15:31:24,029][18919] Num frames 1100... [2024-07-26 15:31:24,181][18919] Avg episode rewards: #0: 9.390, true rewards: #0: 5.890 [2024-07-26 15:31:24,183][18919] Avg episode reward: 9.390, avg true_objective: 5.890 [2024-07-26 15:31:24,213][18919] Num frames 1200... [2024-07-26 15:31:24,340][18919] Num frames 1300... [2024-07-26 15:31:24,466][18919] Num frames 1400... [2024-07-26 15:31:24,596][18919] Num frames 1500... [2024-07-26 15:31:24,720][18919] Num frames 1600... [2024-07-26 15:31:24,845][18919] Num frames 1700... [2024-07-26 15:31:24,980][18919] Num frames 1800... [2024-07-26 15:31:25,113][18919] Num frames 1900... [2024-07-26 15:31:25,258][18919] Num frames 2000... [2024-07-26 15:31:25,396][18919] Num frames 2100... [2024-07-26 15:31:25,521][18919] Num frames 2200... [2024-07-26 15:31:25,580][18919] Avg episode rewards: #0: 13.673, true rewards: #0: 7.340 [2024-07-26 15:31:25,581][18919] Avg episode reward: 13.673, avg true_objective: 7.340 [2024-07-26 15:31:25,705][18919] Num frames 2300... [2024-07-26 15:31:25,828][18919] Num frames 2400... [2024-07-26 15:31:25,951][18919] Num frames 2500... [2024-07-26 15:31:26,116][18919] Avg episode rewards: #0: 11.215, true rewards: #0: 6.465 [2024-07-26 15:31:26,117][18919] Avg episode reward: 11.215, avg true_objective: 6.465 [2024-07-26 15:31:26,138][18919] Num frames 2600... [2024-07-26 15:31:26,274][18919] Num frames 2700... [2024-07-26 15:31:26,406][18919] Num frames 2800... [2024-07-26 15:31:26,532][18919] Num frames 2900... [2024-07-26 15:31:26,657][18919] Num frames 3000... [2024-07-26 15:31:26,781][18919] Num frames 3100... [2024-07-26 15:31:26,909][18919] Num frames 3200... [2024-07-26 15:31:27,083][18919] Avg episode rewards: #0: 11.980, true rewards: #0: 6.580 [2024-07-26 15:31:27,086][18919] Avg episode reward: 11.980, avg true_objective: 6.580 [2024-07-26 15:31:27,101][18919] Num frames 3300... [2024-07-26 15:31:27,234][18919] Num frames 3400... [2024-07-26 15:31:27,366][18919] Num frames 3500... [2024-07-26 15:31:27,495][18919] Num frames 3600... [2024-07-26 15:31:27,622][18919] Num frames 3700... [2024-07-26 15:31:27,747][18919] Num frames 3800... [2024-07-26 15:31:27,871][18919] Num frames 3900... [2024-07-26 15:31:28,006][18919] Num frames 4000... [2024-07-26 15:31:28,138][18919] Num frames 4100... [2024-07-26 15:31:28,274][18919] Num frames 4200... [2024-07-26 15:31:28,402][18919] Num frames 4300... [2024-07-26 15:31:28,528][18919] Num frames 4400... [2024-07-26 15:31:28,657][18919] Num frames 4500... [2024-07-26 15:31:28,779][18919] Num frames 4600... [2024-07-26 15:31:28,904][18919] Num frames 4700... [2024-07-26 15:31:29,037][18919] Num frames 4800... [2024-07-26 15:31:29,168][18919] Num frames 4900... [2024-07-26 15:31:29,350][18919] Avg episode rewards: #0: 17.325, true rewards: #0: 8.325 [2024-07-26 15:31:29,352][18919] Avg episode reward: 17.325, avg true_objective: 8.325 [2024-07-26 15:31:29,361][18919] Num frames 5000... [2024-07-26 15:31:29,489][18919] Num frames 5100... [2024-07-26 15:31:29,616][18919] Num frames 5200... [2024-07-26 15:31:29,744][18919] Num frames 5300... [2024-07-26 15:31:29,875][18919] Num frames 5400... [2024-07-26 15:31:30,013][18919] Num frames 5500... [2024-07-26 15:31:30,139][18919] Num frames 5600... [2024-07-26 15:31:30,266][18919] Num frames 5700... [2024-07-26 15:31:30,413][18919] Num frames 5800... [2024-07-26 15:31:30,542][18919] Num frames 5900... [2024-07-26 15:31:30,672][18919] Num frames 6000... [2024-07-26 15:31:30,799][18919] Num frames 6100... [2024-07-26 15:31:30,927][18919] Num frames 6200... [2024-07-26 15:31:31,044][18919] Avg episode rewards: #0: 18.204, true rewards: #0: 8.919 [2024-07-26 15:31:31,046][18919] Avg episode reward: 18.204, avg true_objective: 8.919 [2024-07-26 15:31:31,123][18919] Num frames 6300... [2024-07-26 15:31:31,248][18919] Num frames 6400... [2024-07-26 15:31:31,387][18919] Num frames 6500... [2024-07-26 15:31:31,518][18919] Num frames 6600... [2024-07-26 15:31:31,645][18919] Num frames 6700... [2024-07-26 15:31:31,773][18919] Num frames 6800... [2024-07-26 15:31:31,904][18919] Num frames 6900... [2024-07-26 15:31:31,979][18919] Avg episode rewards: #0: 17.644, true rewards: #0: 8.644 [2024-07-26 15:31:31,980][18919] Avg episode reward: 17.644, avg true_objective: 8.644 [2024-07-26 15:31:32,090][18919] Num frames 7000... [2024-07-26 15:31:32,223][18919] Num frames 7100... [2024-07-26 15:31:32,354][18919] Num frames 7200... [2024-07-26 15:31:32,543][18919] Num frames 7300... [2024-07-26 15:31:32,726][18919] Num frames 7400... [2024-07-26 15:31:32,900][18919] Num frames 7500... [2024-07-26 15:31:33,079][18919] Num frames 7600... [2024-07-26 15:31:33,260][18919] Num frames 7700... [2024-07-26 15:31:33,460][18919] Num frames 7800... [2024-07-26 15:31:33,641][18919] Num frames 7900... [2024-07-26 15:31:33,832][18919] Num frames 8000... [2024-07-26 15:31:34,017][18919] Num frames 8100... [2024-07-26 15:31:34,205][18919] Num frames 8200... [2024-07-26 15:31:34,374][18919] Avg episode rewards: #0: 19.288, true rewards: #0: 9.177 [2024-07-26 15:31:34,376][18919] Avg episode reward: 19.288, avg true_objective: 9.177 [2024-07-26 15:31:34,452][18919] Num frames 8300... [2024-07-26 15:31:34,652][18919] Num frames 8400... [2024-07-26 15:31:34,836][18919] Num frames 8500... [2024-07-26 15:31:34,991][18919] Num frames 8600... [2024-07-26 15:31:35,118][18919] Num frames 8700... [2024-07-26 15:31:35,247][18919] Num frames 8800... [2024-07-26 15:31:35,376][18919] Num frames 8900... [2024-07-26 15:31:35,504][18919] Num frames 9000... [2024-07-26 15:31:35,649][18919] Avg episode rewards: #0: 19.060, true rewards: #0: 9.060 [2024-07-26 15:31:35,650][18919] Avg episode reward: 19.060, avg true_objective: 9.060 [2024-07-26 15:32:28,310][18919] Replay video saved to /content/train_dir/default_experiment/replay.mp4!