thomaspalomares commited on
Commit
b069293
1 Parent(s): 3c970d9

Upload folder using huggingface_hub

Browse files
.summary/0/events.out.tfevents.1722007526.ee43660ef243 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa692e6021a2407f4b36637e1fca3c8e25d502ce3c7b7a8772045602ffc319b6
3
+ size 223104
README.md CHANGED
@@ -15,7 +15,7 @@ model-index:
15
  type: doom_health_gathering_supreme
16
  metrics:
17
  - type: mean_reward
18
- value: 12.10 +/- 7.25
19
  name: mean_reward
20
  verified: false
21
  ---
 
15
  type: doom_health_gathering_supreme
16
  metrics:
17
  - type: mean_reward
18
+ value: 9.06 +/- 3.93
19
  name: mean_reward
20
  verified: false
21
  ---
checkpoint_p0/best_000001220_4997120_reward_25.353.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4118bba3d25759f7a7321240e48caf2d72ca025cbd1c8589f558e776c3a0d2b
3
+ size 34928806
checkpoint_p0/checkpoint_000001161_4755456.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9449428144e39baee26e362f1394a4a2d988455879c4efbb753b6623791d8ef0
3
+ size 34929220
checkpoint_p0/checkpoint_000001222_5005312.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64365f28c2a01b018cd61456b82c6a65c089f6a68ad2ba16b82711834f517f0a
3
+ size 34929220
config.json CHANGED
@@ -65,7 +65,7 @@
65
  "summaries_use_frameskip": true,
66
  "heartbeat_interval": 20,
67
  "heartbeat_reporting_interval": 600,
68
- "train_for_env_steps": 4000000,
69
  "train_for_seconds": 10000000000,
70
  "save_every_sec": 120,
71
  "keep_checkpoints": 2,
 
65
  "summaries_use_frameskip": true,
66
  "heartbeat_interval": 20,
67
  "heartbeat_reporting_interval": 600,
68
+ "train_for_env_steps": 5000000,
69
  "train_for_seconds": 10000000000,
70
  "save_every_sec": 120,
71
  "keep_checkpoints": 2,
replay.mp4 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6f3bb84cda526e9b0351d99bd1b17e8065a9f5935d758185fe5ebac47a93c561
3
- size 23455932
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2cd6886ba88a1dfc0d50e49293d91cfba7e16fdfe76b086609916c0b2bdfd4bc
3
+ size 16881695
sf_log.txt CHANGED
@@ -1138,3 +1138,774 @@ main_loop: 1135.5590
1138
  [2024-07-26 15:10:37,233][00197] Avg episode rewards: #0: 27.203, true rewards: #0: 12.103
1139
  [2024-07-26 15:10:37,235][00197] Avg episode reward: 27.203, avg true_objective: 12.103
1140
  [2024-07-26 15:11:48,220][00197] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1138
  [2024-07-26 15:10:37,233][00197] Avg episode rewards: #0: 27.203, true rewards: #0: 12.103
1139
  [2024-07-26 15:10:37,235][00197] Avg episode reward: 27.203, avg true_objective: 12.103
1140
  [2024-07-26 15:11:48,220][00197] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
1141
+ [2024-07-26 15:12:00,516][00197] The model has been pushed to https://huggingface.co/thomaspalomares/rl_course_vizdoom_health_gathering_supreme
1142
+ [2024-07-26 15:12:42,865][00197] Loading legacy config file train_dir/doom_health_gathering_supreme_2222/cfg.json instead of train_dir/doom_health_gathering_supreme_2222/config.json
1143
+ [2024-07-26 15:12:42,867][00197] Loading existing experiment configuration from train_dir/doom_health_gathering_supreme_2222/config.json
1144
+ [2024-07-26 15:12:42,869][00197] Overriding arg 'experiment' with value 'doom_health_gathering_supreme_2222' passed from command line
1145
+ [2024-07-26 15:12:42,871][00197] Overriding arg 'train_dir' with value 'train_dir' passed from command line
1146
+ [2024-07-26 15:12:42,873][00197] Overriding arg 'num_workers' with value 1 passed from command line
1147
+ [2024-07-26 15:12:42,875][00197] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file!
1148
+ [2024-07-26 15:12:42,877][00197] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file!
1149
+ [2024-07-26 15:12:42,879][00197] Adding new argument 'env_gpu_observations'=True that is not in the saved config file!
1150
+ [2024-07-26 15:12:42,880][00197] Adding new argument 'no_render'=True that is not in the saved config file!
1151
+ [2024-07-26 15:12:42,882][00197] Adding new argument 'save_video'=True that is not in the saved config file!
1152
+ [2024-07-26 15:12:42,883][00197] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
1153
+ [2024-07-26 15:12:42,884][00197] Adding new argument 'video_name'=None that is not in the saved config file!
1154
+ [2024-07-26 15:12:42,886][00197] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
1155
+ [2024-07-26 15:12:42,887][00197] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
1156
+ [2024-07-26 15:12:42,889][00197] Adding new argument 'push_to_hub'=False that is not in the saved config file!
1157
+ [2024-07-26 15:12:42,890][00197] Adding new argument 'hf_repository'=None that is not in the saved config file!
1158
+ [2024-07-26 15:12:42,892][00197] Adding new argument 'policy_index'=0 that is not in the saved config file!
1159
+ [2024-07-26 15:12:42,899][00197] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
1160
+ [2024-07-26 15:12:42,900][00197] Adding new argument 'train_script'=None that is not in the saved config file!
1161
+ [2024-07-26 15:12:42,901][00197] Adding new argument 'enjoy_script'=None that is not in the saved config file!
1162
+ [2024-07-26 15:12:42,903][00197] Using frameskip 1 and render_action_repeat=4 for evaluation
1163
+ [2024-07-26 15:12:42,913][00197] RunningMeanStd input shape: (3, 72, 128)
1164
+ [2024-07-26 15:12:42,915][00197] RunningMeanStd input shape: (1,)
1165
+ [2024-07-26 15:12:42,927][00197] ConvEncoder: input_channels=3
1166
+ [2024-07-26 15:12:42,979][00197] Conv encoder output size: 512
1167
+ [2024-07-26 15:12:42,982][00197] Policy head output size: 512
1168
+ [2024-07-26 15:12:43,009][00197] Loading state from checkpoint train_dir/doom_health_gathering_supreme_2222/checkpoint_p0/checkpoint_000539850_4422451200.pth...
1169
+ [2024-07-26 15:12:43,519][00197] Num frames 100...
1170
+ [2024-07-26 15:12:43,648][00197] Num frames 200...
1171
+ [2024-07-26 15:12:43,776][00197] Num frames 300...
1172
+ [2024-07-26 15:12:43,904][00197] Num frames 400...
1173
+ [2024-07-26 15:12:44,033][00197] Num frames 500...
1174
+ [2024-07-26 15:12:44,170][00197] Num frames 600...
1175
+ [2024-07-26 15:12:44,299][00197] Num frames 700...
1176
+ [2024-07-26 15:12:44,428][00197] Num frames 800...
1177
+ [2024-07-26 15:12:44,557][00197] Num frames 900...
1178
+ [2024-07-26 15:12:44,688][00197] Num frames 1000...
1179
+ [2024-07-26 15:12:44,819][00197] Num frames 1100...
1180
+ [2024-07-26 15:12:44,948][00197] Num frames 1200...
1181
+ [2024-07-26 15:12:45,080][00197] Num frames 1300...
1182
+ [2024-07-26 15:12:45,215][00197] Num frames 1400...
1183
+ [2024-07-26 15:12:45,345][00197] Num frames 1500...
1184
+ [2024-07-26 15:12:45,494][00197] Num frames 1600...
1185
+ [2024-07-26 15:12:45,636][00197] Num frames 1700...
1186
+ [2024-07-26 15:12:45,764][00197] Num frames 1800...
1187
+ [2024-07-26 15:12:45,892][00197] Num frames 1900...
1188
+ [2024-07-26 15:12:46,029][00197] Num frames 2000...
1189
+ [2024-07-26 15:12:46,159][00197] Num frames 2100...
1190
+ [2024-07-26 15:12:46,211][00197] Avg episode rewards: #0: 64.999, true rewards: #0: 21.000
1191
+ [2024-07-26 15:12:46,213][00197] Avg episode reward: 64.999, avg true_objective: 21.000
1192
+ [2024-07-26 15:12:46,349][00197] Num frames 2200...
1193
+ [2024-07-26 15:12:46,480][00197] Num frames 2300...
1194
+ [2024-07-26 15:12:46,610][00197] Num frames 2400...
1195
+ [2024-07-26 15:12:46,737][00197] Num frames 2500...
1196
+ [2024-07-26 15:12:46,868][00197] Num frames 2600...
1197
+ [2024-07-26 15:12:47,006][00197] Num frames 2700...
1198
+ [2024-07-26 15:12:47,132][00197] Num frames 2800...
1199
+ [2024-07-26 15:12:47,269][00197] Num frames 2900...
1200
+ [2024-07-26 15:12:47,400][00197] Num frames 3000...
1201
+ [2024-07-26 15:12:47,526][00197] Num frames 3100...
1202
+ [2024-07-26 15:12:47,656][00197] Num frames 3200...
1203
+ [2024-07-26 15:12:47,782][00197] Num frames 3300...
1204
+ [2024-07-26 15:12:47,907][00197] Num frames 3400...
1205
+ [2024-07-26 15:12:48,041][00197] Num frames 3500...
1206
+ [2024-07-26 15:12:48,169][00197] Num frames 3600...
1207
+ [2024-07-26 15:12:48,313][00197] Num frames 3700...
1208
+ [2024-07-26 15:12:48,444][00197] Num frames 3800...
1209
+ [2024-07-26 15:12:48,574][00197] Num frames 3900...
1210
+ [2024-07-26 15:12:48,702][00197] Num frames 4000...
1211
+ [2024-07-26 15:12:48,827][00197] Num frames 4100...
1212
+ [2024-07-26 15:12:48,962][00197] Num frames 4200...
1213
+ [2024-07-26 15:12:49,013][00197] Avg episode rewards: #0: 65.999, true rewards: #0: 21.000
1214
+ [2024-07-26 15:12:49,015][00197] Avg episode reward: 65.999, avg true_objective: 21.000
1215
+ [2024-07-26 15:12:49,143][00197] Num frames 4300...
1216
+ [2024-07-26 15:12:49,277][00197] Num frames 4400...
1217
+ [2024-07-26 15:12:49,405][00197] Num frames 4500...
1218
+ [2024-07-26 15:12:49,532][00197] Num frames 4600...
1219
+ [2024-07-26 15:12:49,664][00197] Num frames 4700...
1220
+ [2024-07-26 15:12:49,792][00197] Num frames 4800...
1221
+ [2024-07-26 15:12:49,920][00197] Num frames 4900...
1222
+ [2024-07-26 15:12:50,057][00197] Num frames 5000...
1223
+ [2024-07-26 15:12:50,189][00197] Num frames 5100...
1224
+ [2024-07-26 15:12:50,327][00197] Num frames 5200...
1225
+ [2024-07-26 15:12:50,457][00197] Num frames 5300...
1226
+ [2024-07-26 15:12:50,586][00197] Num frames 5400...
1227
+ [2024-07-26 15:12:50,724][00197] Num frames 5500...
1228
+ [2024-07-26 15:12:50,905][00197] Num frames 5600...
1229
+ [2024-07-26 15:12:51,104][00197] Num frames 5700...
1230
+ [2024-07-26 15:12:51,293][00197] Num frames 5800...
1231
+ [2024-07-26 15:12:51,487][00197] Num frames 5900...
1232
+ [2024-07-26 15:12:51,680][00197] Num frames 6000...
1233
+ [2024-07-26 15:12:51,858][00197] Num frames 6100...
1234
+ [2024-07-26 15:12:52,048][00197] Num frames 6200...
1235
+ [2024-07-26 15:12:52,238][00197] Num frames 6300...
1236
+ [2024-07-26 15:12:52,292][00197] Avg episode rewards: #0: 64.999, true rewards: #0: 21.000
1237
+ [2024-07-26 15:12:52,293][00197] Avg episode reward: 64.999, avg true_objective: 21.000
1238
+ [2024-07-26 15:12:52,487][00197] Num frames 6400...
1239
+ [2024-07-26 15:12:52,674][00197] Num frames 6500...
1240
+ [2024-07-26 15:12:52,860][00197] Num frames 6600...
1241
+ [2024-07-26 15:12:53,042][00197] Num frames 6700...
1242
+ [2024-07-26 15:12:53,246][00197] Num frames 6800...
1243
+ [2024-07-26 15:12:53,417][00197] Num frames 6900...
1244
+ [2024-07-26 15:12:53,541][00197] Num frames 7000...
1245
+ [2024-07-26 15:12:53,667][00197] Num frames 7100...
1246
+ [2024-07-26 15:12:53,795][00197] Num frames 7200...
1247
+ [2024-07-26 15:12:53,921][00197] Num frames 7300...
1248
+ [2024-07-26 15:12:54,054][00197] Num frames 7400...
1249
+ [2024-07-26 15:12:54,182][00197] Num frames 7500...
1250
+ [2024-07-26 15:12:54,315][00197] Num frames 7600...
1251
+ [2024-07-26 15:12:54,453][00197] Num frames 7700...
1252
+ [2024-07-26 15:12:54,580][00197] Num frames 7800...
1253
+ [2024-07-26 15:12:54,711][00197] Num frames 7900...
1254
+ [2024-07-26 15:12:54,837][00197] Num frames 8000...
1255
+ [2024-07-26 15:12:54,974][00197] Num frames 8100...
1256
+ [2024-07-26 15:12:55,117][00197] Num frames 8200...
1257
+ [2024-07-26 15:12:55,250][00197] Num frames 8300...
1258
+ [2024-07-26 15:12:55,382][00197] Num frames 8400...
1259
+ [2024-07-26 15:12:55,435][00197] Avg episode rewards: #0: 63.999, true rewards: #0: 21.000
1260
+ [2024-07-26 15:12:55,437][00197] Avg episode reward: 63.999, avg true_objective: 21.000
1261
+ [2024-07-26 15:12:55,562][00197] Num frames 8500...
1262
+ [2024-07-26 15:12:55,687][00197] Num frames 8600...
1263
+ [2024-07-26 15:12:55,812][00197] Num frames 8700...
1264
+ [2024-07-26 15:12:55,940][00197] Num frames 8800...
1265
+ [2024-07-26 15:12:56,073][00197] Num frames 8900...
1266
+ [2024-07-26 15:12:56,200][00197] Num frames 9000...
1267
+ [2024-07-26 15:12:56,325][00197] Num frames 9100...
1268
+ [2024-07-26 15:12:56,465][00197] Num frames 9200...
1269
+ [2024-07-26 15:12:56,592][00197] Num frames 9300...
1270
+ [2024-07-26 15:12:56,719][00197] Num frames 9400...
1271
+ [2024-07-26 15:12:56,848][00197] Num frames 9500...
1272
+ [2024-07-26 15:12:56,980][00197] Num frames 9600...
1273
+ [2024-07-26 15:12:57,104][00197] Num frames 9700...
1274
+ [2024-07-26 15:12:57,232][00197] Num frames 9800...
1275
+ [2024-07-26 15:12:57,360][00197] Num frames 9900...
1276
+ [2024-07-26 15:12:57,497][00197] Num frames 10000...
1277
+ [2024-07-26 15:12:57,627][00197] Num frames 10100...
1278
+ [2024-07-26 15:12:57,758][00197] Num frames 10200...
1279
+ [2024-07-26 15:12:57,883][00197] Num frames 10300...
1280
+ [2024-07-26 15:12:58,014][00197] Num frames 10400...
1281
+ [2024-07-26 15:12:58,144][00197] Num frames 10500...
1282
+ [2024-07-26 15:12:58,196][00197] Avg episode rewards: #0: 64.599, true rewards: #0: 21.000
1283
+ [2024-07-26 15:12:58,198][00197] Avg episode reward: 64.599, avg true_objective: 21.000
1284
+ [2024-07-26 15:12:58,328][00197] Num frames 10600...
1285
+ [2024-07-26 15:12:58,453][00197] Num frames 10700...
1286
+ [2024-07-26 15:12:58,587][00197] Num frames 10800...
1287
+ [2024-07-26 15:12:58,709][00197] Num frames 10900...
1288
+ [2024-07-26 15:12:58,834][00197] Num frames 11000...
1289
+ [2024-07-26 15:12:58,966][00197] Num frames 11100...
1290
+ [2024-07-26 15:12:59,095][00197] Num frames 11200...
1291
+ [2024-07-26 15:12:59,223][00197] Num frames 11300...
1292
+ [2024-07-26 15:12:59,359][00197] Num frames 11400...
1293
+ [2024-07-26 15:12:59,486][00197] Num frames 11500...
1294
+ [2024-07-26 15:12:59,623][00197] Num frames 11600...
1295
+ [2024-07-26 15:12:59,755][00197] Num frames 11700...
1296
+ [2024-07-26 15:12:59,883][00197] Num frames 11800...
1297
+ [2024-07-26 15:13:00,014][00197] Num frames 11900...
1298
+ [2024-07-26 15:13:00,141][00197] Num frames 12000...
1299
+ [2024-07-26 15:13:00,270][00197] Num frames 12100...
1300
+ [2024-07-26 15:13:00,401][00197] Num frames 12200...
1301
+ [2024-07-26 15:13:00,528][00197] Num frames 12300...
1302
+ [2024-07-26 15:13:00,715][00197] Avg episode rewards: #0: 62.485, true rewards: #0: 20.653
1303
+ [2024-07-26 15:13:00,717][00197] Avg episode reward: 62.485, avg true_objective: 20.653
1304
+ [2024-07-26 15:13:00,731][00197] Num frames 12400...
1305
+ [2024-07-26 15:13:00,856][00197] Num frames 12500...
1306
+ [2024-07-26 15:13:00,988][00197] Num frames 12600...
1307
+ [2024-07-26 15:13:01,116][00197] Num frames 12700...
1308
+ [2024-07-26 15:13:01,246][00197] Num frames 12800...
1309
+ [2024-07-26 15:13:01,375][00197] Num frames 12900...
1310
+ [2024-07-26 15:13:01,502][00197] Num frames 13000...
1311
+ [2024-07-26 15:13:01,635][00197] Num frames 13100...
1312
+ [2024-07-26 15:13:01,761][00197] Num frames 13200...
1313
+ [2024-07-26 15:13:01,890][00197] Num frames 13300...
1314
+ [2024-07-26 15:13:02,026][00197] Num frames 13400...
1315
+ [2024-07-26 15:13:02,154][00197] Num frames 13500...
1316
+ [2024-07-26 15:13:02,282][00197] Num frames 13600...
1317
+ [2024-07-26 15:13:02,412][00197] Num frames 13700...
1318
+ [2024-07-26 15:13:02,538][00197] Num frames 13800...
1319
+ [2024-07-26 15:13:02,676][00197] Num frames 13900...
1320
+ [2024-07-26 15:13:02,804][00197] Num frames 14000...
1321
+ [2024-07-26 15:13:02,932][00197] Num frames 14100...
1322
+ [2024-07-26 15:13:03,070][00197] Num frames 14200...
1323
+ [2024-07-26 15:13:03,198][00197] Num frames 14300...
1324
+ [2024-07-26 15:13:03,328][00197] Num frames 14400...
1325
+ [2024-07-26 15:13:03,561][00197] Avg episode rewards: #0: 61.844, true rewards: #0: 20.703
1326
+ [2024-07-26 15:13:03,563][00197] Avg episode reward: 61.844, avg true_objective: 20.703
1327
+ [2024-07-26 15:13:03,589][00197] Num frames 14500...
1328
+ [2024-07-26 15:13:03,792][00197] Num frames 14600...
1329
+ [2024-07-26 15:13:03,974][00197] Num frames 14700...
1330
+ [2024-07-26 15:13:04,159][00197] Num frames 14800...
1331
+ [2024-07-26 15:13:04,341][00197] Num frames 14900...
1332
+ [2024-07-26 15:13:04,522][00197] Num frames 15000...
1333
+ [2024-07-26 15:13:04,712][00197] Num frames 15100...
1334
+ [2024-07-26 15:13:04,902][00197] Num frames 15200...
1335
+ [2024-07-26 15:13:05,094][00197] Num frames 15300...
1336
+ [2024-07-26 15:13:05,286][00197] Num frames 15400...
1337
+ [2024-07-26 15:13:05,478][00197] Num frames 15500...
1338
+ [2024-07-26 15:13:05,684][00197] Num frames 15600...
1339
+ [2024-07-26 15:13:05,870][00197] Num frames 15700...
1340
+ [2024-07-26 15:13:06,009][00197] Num frames 15800...
1341
+ [2024-07-26 15:13:06,141][00197] Num frames 15900...
1342
+ [2024-07-26 15:13:06,269][00197] Num frames 16000...
1343
+ [2024-07-26 15:13:06,402][00197] Num frames 16100...
1344
+ [2024-07-26 15:13:06,530][00197] Num frames 16200...
1345
+ [2024-07-26 15:13:06,661][00197] Num frames 16300...
1346
+ [2024-07-26 15:13:06,799][00197] Num frames 16400...
1347
+ [2024-07-26 15:13:06,928][00197] Num frames 16500...
1348
+ [2024-07-26 15:13:07,107][00197] Avg episode rewards: #0: 61.864, true rewards: #0: 20.740
1349
+ [2024-07-26 15:13:07,110][00197] Avg episode reward: 61.864, avg true_objective: 20.740
1350
+ [2024-07-26 15:13:07,125][00197] Num frames 16600...
1351
+ [2024-07-26 15:13:07,255][00197] Num frames 16700...
1352
+ [2024-07-26 15:13:07,384][00197] Num frames 16800...
1353
+ [2024-07-26 15:13:07,513][00197] Num frames 16900...
1354
+ [2024-07-26 15:13:07,645][00197] Num frames 17000...
1355
+ [2024-07-26 15:13:07,771][00197] Num frames 17100...
1356
+ [2024-07-26 15:13:07,906][00197] Num frames 17200...
1357
+ [2024-07-26 15:13:08,041][00197] Num frames 17300...
1358
+ [2024-07-26 15:13:08,170][00197] Num frames 17400...
1359
+ [2024-07-26 15:13:08,303][00197] Num frames 17500...
1360
+ [2024-07-26 15:13:08,431][00197] Num frames 17600...
1361
+ [2024-07-26 15:13:08,559][00197] Num frames 17700...
1362
+ [2024-07-26 15:13:08,689][00197] Num frames 17800...
1363
+ [2024-07-26 15:13:08,828][00197] Num frames 17900...
1364
+ [2024-07-26 15:13:08,969][00197] Num frames 18000...
1365
+ [2024-07-26 15:13:09,098][00197] Num frames 18100...
1366
+ [2024-07-26 15:13:09,226][00197] Num frames 18200...
1367
+ [2024-07-26 15:13:09,355][00197] Num frames 18300...
1368
+ [2024-07-26 15:13:09,485][00197] Num frames 18400...
1369
+ [2024-07-26 15:13:09,618][00197] Num frames 18500...
1370
+ [2024-07-26 15:13:09,749][00197] Num frames 18600...
1371
+ [2024-07-26 15:13:09,940][00197] Avg episode rewards: #0: 62.212, true rewards: #0: 20.769
1372
+ [2024-07-26 15:13:09,943][00197] Avg episode reward: 62.212, avg true_objective: 20.769
1373
+ [2024-07-26 15:13:09,959][00197] Num frames 18700...
1374
+ [2024-07-26 15:13:10,089][00197] Num frames 18800...
1375
+ [2024-07-26 15:13:10,220][00197] Num frames 18900...
1376
+ [2024-07-26 15:13:10,352][00197] Num frames 19000...
1377
+ [2024-07-26 15:13:10,481][00197] Num frames 19100...
1378
+ [2024-07-26 15:13:10,610][00197] Num frames 19200...
1379
+ [2024-07-26 15:13:10,737][00197] Num frames 19300...
1380
+ [2024-07-26 15:13:10,866][00197] Num frames 19400...
1381
+ [2024-07-26 15:13:11,014][00197] Num frames 19500...
1382
+ [2024-07-26 15:13:11,143][00197] Num frames 19600...
1383
+ [2024-07-26 15:13:11,274][00197] Num frames 19700...
1384
+ [2024-07-26 15:13:11,408][00197] Num frames 19800...
1385
+ [2024-07-26 15:13:11,542][00197] Num frames 19900...
1386
+ [2024-07-26 15:13:11,672][00197] Num frames 20000...
1387
+ [2024-07-26 15:13:11,801][00197] Num frames 20100...
1388
+ [2024-07-26 15:13:11,937][00197] Num frames 20200...
1389
+ [2024-07-26 15:13:12,074][00197] Num frames 20300...
1390
+ [2024-07-26 15:13:12,205][00197] Num frames 20400...
1391
+ [2024-07-26 15:13:12,336][00197] Num frames 20500...
1392
+ [2024-07-26 15:13:12,469][00197] Num frames 20600...
1393
+ [2024-07-26 15:13:12,600][00197] Num frames 20700...
1394
+ [2024-07-26 15:13:12,776][00197] Avg episode rewards: #0: 62.391, true rewards: #0: 20.792
1395
+ [2024-07-26 15:13:12,778][00197] Avg episode reward: 62.391, avg true_objective: 20.792
1396
+ [2024-07-26 15:15:13,473][00197] Replay video saved to train_dir/doom_health_gathering_supreme_2222/replay.mp4!
1397
+ [2024-07-26 15:25:29,277][18919] Saving configuration to /content/train_dir/default_experiment/config.json...
1398
+ [2024-07-26 15:25:29,282][18919] Rollout worker 0 uses device cpu
1399
+ [2024-07-26 15:25:29,284][18919] Rollout worker 1 uses device cpu
1400
+ [2024-07-26 15:25:29,285][18919] Rollout worker 2 uses device cpu
1401
+ [2024-07-26 15:25:29,287][18919] Rollout worker 3 uses device cpu
1402
+ [2024-07-26 15:25:29,291][18919] Rollout worker 4 uses device cpu
1403
+ [2024-07-26 15:25:29,293][18919] Rollout worker 5 uses device cpu
1404
+ [2024-07-26 15:25:29,295][18919] Rollout worker 6 uses device cpu
1405
+ [2024-07-26 15:25:29,296][18919] Rollout worker 7 uses device cpu
1406
+ [2024-07-26 15:25:29,451][18919] Using GPUs [0] for process 0 (actually maps to GPUs [0])
1407
+ [2024-07-26 15:25:29,454][18919] InferenceWorker_p0-w0: min num requests: 2
1408
+ [2024-07-26 15:25:29,488][18919] Starting all processes...
1409
+ [2024-07-26 15:25:29,489][18919] Starting process learner_proc0
1410
+ [2024-07-26 15:25:29,537][18919] Starting all processes...
1411
+ [2024-07-26 15:25:29,546][18919] Starting process inference_proc0-0
1412
+ [2024-07-26 15:25:29,548][18919] Starting process rollout_proc0
1413
+ [2024-07-26 15:25:29,548][18919] Starting process rollout_proc1
1414
+ [2024-07-26 15:25:29,548][18919] Starting process rollout_proc2
1415
+ [2024-07-26 15:25:29,548][18919] Starting process rollout_proc3
1416
+ [2024-07-26 15:25:29,548][18919] Starting process rollout_proc4
1417
+ [2024-07-26 15:25:29,548][18919] Starting process rollout_proc5
1418
+ [2024-07-26 15:25:29,548][18919] Starting process rollout_proc6
1419
+ [2024-07-26 15:25:29,548][18919] Starting process rollout_proc7
1420
+ [2024-07-26 15:25:40,537][19456] Using GPUs [0] for process 0 (actually maps to GPUs [0])
1421
+ [2024-07-26 15:25:40,542][19456] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
1422
+ [2024-07-26 15:25:40,644][19456] Num visible devices: 1
1423
+ [2024-07-26 15:25:40,989][19458] Worker 1 uses CPU cores [1]
1424
+ [2024-07-26 15:25:40,996][19443] Using GPUs [0] for process 0 (actually maps to GPUs [0])
1425
+ [2024-07-26 15:25:40,997][19443] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
1426
+ [2024-07-26 15:25:41,071][19443] Num visible devices: 1
1427
+ [2024-07-26 15:25:41,133][19443] Starting seed is not provided
1428
+ [2024-07-26 15:25:41,134][19443] Using GPUs [0] for process 0 (actually maps to GPUs [0])
1429
+ [2024-07-26 15:25:41,134][19443] Initializing actor-critic model on device cuda:0
1430
+ [2024-07-26 15:25:41,134][19443] RunningMeanStd input shape: (3, 72, 128)
1431
+ [2024-07-26 15:25:41,136][19443] RunningMeanStd input shape: (1,)
1432
+ [2024-07-26 15:25:41,222][19460] Worker 3 uses CPU cores [1]
1433
+ [2024-07-26 15:25:41,233][19443] ConvEncoder: input_channels=3
1434
+ [2024-07-26 15:25:41,323][19462] Worker 5 uses CPU cores [1]
1435
+ [2024-07-26 15:25:41,335][19457] Worker 0 uses CPU cores [0]
1436
+ [2024-07-26 15:25:41,377][19463] Worker 6 uses CPU cores [0]
1437
+ [2024-07-26 15:25:41,418][19464] Worker 7 uses CPU cores [1]
1438
+ [2024-07-26 15:25:41,439][19461] Worker 4 uses CPU cores [0]
1439
+ [2024-07-26 15:25:41,458][19459] Worker 2 uses CPU cores [0]
1440
+ [2024-07-26 15:25:41,499][19443] Conv encoder output size: 512
1441
+ [2024-07-26 15:25:41,500][19443] Policy head output size: 512
1442
+ [2024-07-26 15:25:41,515][19443] Created Actor Critic model with architecture:
1443
+ [2024-07-26 15:25:41,515][19443] ActorCriticSharedWeights(
1444
+ (obs_normalizer): ObservationNormalizer(
1445
+ (running_mean_std): RunningMeanStdDictInPlace(
1446
+ (running_mean_std): ModuleDict(
1447
+ (obs): RunningMeanStdInPlace()
1448
+ )
1449
+ )
1450
+ )
1451
+ (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
1452
+ (encoder): VizdoomEncoder(
1453
+ (basic_encoder): ConvEncoder(
1454
+ (enc): RecursiveScriptModule(
1455
+ original_name=ConvEncoderImpl
1456
+ (conv_head): RecursiveScriptModule(
1457
+ original_name=Sequential
1458
+ (0): RecursiveScriptModule(original_name=Conv2d)
1459
+ (1): RecursiveScriptModule(original_name=ELU)
1460
+ (2): RecursiveScriptModule(original_name=Conv2d)
1461
+ (3): RecursiveScriptModule(original_name=ELU)
1462
+ (4): RecursiveScriptModule(original_name=Conv2d)
1463
+ (5): RecursiveScriptModule(original_name=ELU)
1464
+ )
1465
+ (mlp_layers): RecursiveScriptModule(
1466
+ original_name=Sequential
1467
+ (0): RecursiveScriptModule(original_name=Linear)
1468
+ (1): RecursiveScriptModule(original_name=ELU)
1469
+ )
1470
+ )
1471
+ )
1472
+ )
1473
+ (core): ModelCoreRNN(
1474
+ (core): GRU(512, 512)
1475
+ )
1476
+ (decoder): MlpDecoder(
1477
+ (mlp): Identity()
1478
+ )
1479
+ (critic_linear): Linear(in_features=512, out_features=1, bias=True)
1480
+ (action_parameterization): ActionParameterizationDefault(
1481
+ (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
1482
+ )
1483
+ )
1484
+ [2024-07-26 15:25:43,190][19443] Using optimizer <class 'torch.optim.adam.Adam'>
1485
+ [2024-07-26 15:25:43,191][19443] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
1486
+ [2024-07-26 15:25:43,228][19443] Loading model from checkpoint
1487
+ [2024-07-26 15:25:43,232][19443] Loaded experiment state at self.train_step=978, self.env_steps=4005888
1488
+ [2024-07-26 15:25:43,233][19443] Initialized policy 0 weights for model version 978
1489
+ [2024-07-26 15:25:43,236][19443] LearnerWorker_p0 finished initialization!
1490
+ [2024-07-26 15:25:43,237][19443] Using GPUs [0] for process 0 (actually maps to GPUs [0])
1491
+ [2024-07-26 15:25:43,433][19456] RunningMeanStd input shape: (3, 72, 128)
1492
+ [2024-07-26 15:25:43,434][19456] RunningMeanStd input shape: (1,)
1493
+ [2024-07-26 15:25:43,447][19456] ConvEncoder: input_channels=3
1494
+ [2024-07-26 15:25:43,547][19456] Conv encoder output size: 512
1495
+ [2024-07-26 15:25:43,548][19456] Policy head output size: 512
1496
+ [2024-07-26 15:25:45,019][18919] Inference worker 0-0 is ready!
1497
+ [2024-07-26 15:25:45,020][18919] All inference workers are ready! Signal rollout workers to start!
1498
+ [2024-07-26 15:25:45,135][19460] Doom resolution: 160x120, resize resolution: (128, 72)
1499
+ [2024-07-26 15:25:45,141][19464] Doom resolution: 160x120, resize resolution: (128, 72)
1500
+ [2024-07-26 15:25:45,162][19458] Doom resolution: 160x120, resize resolution: (128, 72)
1501
+ [2024-07-26 15:25:45,161][19462] Doom resolution: 160x120, resize resolution: (128, 72)
1502
+ [2024-07-26 15:25:45,158][19463] Doom resolution: 160x120, resize resolution: (128, 72)
1503
+ [2024-07-26 15:25:45,186][19457] Doom resolution: 160x120, resize resolution: (128, 72)
1504
+ [2024-07-26 15:25:45,183][19459] Doom resolution: 160x120, resize resolution: (128, 72)
1505
+ [2024-07-26 15:25:45,203][19461] Doom resolution: 160x120, resize resolution: (128, 72)
1506
+ [2024-07-26 15:25:46,267][18919] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
1507
+ [2024-07-26 15:25:46,418][19461] Decorrelating experience for 0 frames...
1508
+ [2024-07-26 15:25:46,419][19457] Decorrelating experience for 0 frames...
1509
+ [2024-07-26 15:25:46,423][19459] Decorrelating experience for 0 frames...
1510
+ [2024-07-26 15:25:46,773][19464] Decorrelating experience for 0 frames...
1511
+ [2024-07-26 15:25:46,787][19458] Decorrelating experience for 0 frames...
1512
+ [2024-07-26 15:25:46,796][19460] Decorrelating experience for 0 frames...
1513
+ [2024-07-26 15:25:46,805][19462] Decorrelating experience for 0 frames...
1514
+ [2024-07-26 15:25:47,542][19457] Decorrelating experience for 32 frames...
1515
+ [2024-07-26 15:25:47,545][19461] Decorrelating experience for 32 frames...
1516
+ [2024-07-26 15:25:47,960][19464] Decorrelating experience for 32 frames...
1517
+ [2024-07-26 15:25:47,978][19460] Decorrelating experience for 32 frames...
1518
+ [2024-07-26 15:25:47,981][19458] Decorrelating experience for 32 frames...
1519
+ [2024-07-26 15:25:48,180][19459] Decorrelating experience for 32 frames...
1520
+ [2024-07-26 15:25:48,185][19463] Decorrelating experience for 0 frames...
1521
+ [2024-07-26 15:25:48,731][19457] Decorrelating experience for 64 frames...
1522
+ [2024-07-26 15:25:48,933][19461] Decorrelating experience for 64 frames...
1523
+ [2024-07-26 15:25:49,010][19462] Decorrelating experience for 32 frames...
1524
+ [2024-07-26 15:25:49,267][19460] Decorrelating experience for 64 frames...
1525
+ [2024-07-26 15:25:49,443][18919] Heartbeat connected on Batcher_0
1526
+ [2024-07-26 15:25:49,446][18919] Heartbeat connected on LearnerWorker_p0
1527
+ [2024-07-26 15:25:49,489][18919] Heartbeat connected on InferenceWorker_p0-w0
1528
+ [2024-07-26 15:25:50,063][19464] Decorrelating experience for 64 frames...
1529
+ [2024-07-26 15:25:50,223][19463] Decorrelating experience for 32 frames...
1530
+ [2024-07-26 15:25:50,311][19461] Decorrelating experience for 96 frames...
1531
+ [2024-07-26 15:25:50,636][19458] Decorrelating experience for 64 frames...
1532
+ [2024-07-26 15:25:50,685][18919] Heartbeat connected on RolloutWorker_w4
1533
+ [2024-07-26 15:25:51,267][18919] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
1534
+ [2024-07-26 15:25:51,321][19460] Decorrelating experience for 96 frames...
1535
+ [2024-07-26 15:25:51,920][18919] Heartbeat connected on RolloutWorker_w3
1536
+ [2024-07-26 15:25:52,238][19459] Decorrelating experience for 64 frames...
1537
+ [2024-07-26 15:25:52,765][19462] Decorrelating experience for 64 frames...
1538
+ [2024-07-26 15:25:53,003][19463] Decorrelating experience for 64 frames...
1539
+ [2024-07-26 15:25:54,524][19464] Decorrelating experience for 96 frames...
1540
+ [2024-07-26 15:25:54,888][19459] Decorrelating experience for 96 frames...
1541
+ [2024-07-26 15:25:55,146][18919] Heartbeat connected on RolloutWorker_w7
1542
+ [2024-07-26 15:25:55,737][18919] Heartbeat connected on RolloutWorker_w2
1543
+ [2024-07-26 15:25:55,837][19458] Decorrelating experience for 96 frames...
1544
+ [2024-07-26 15:25:56,267][18919] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 74.0. Samples: 740. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
1545
+ [2024-07-26 15:25:56,270][18919] Avg episode reward: [(0, '2.270')]
1546
+ [2024-07-26 15:25:56,511][19463] Decorrelating experience for 96 frames...
1547
+ [2024-07-26 15:25:57,116][18919] Heartbeat connected on RolloutWorker_w1
1548
+ [2024-07-26 15:25:57,289][18919] Heartbeat connected on RolloutWorker_w6
1549
+ [2024-07-26 15:25:58,392][19457] Decorrelating experience for 96 frames...
1550
+ [2024-07-26 15:25:58,627][19443] Signal inference workers to stop experience collection...
1551
+ [2024-07-26 15:25:58,638][19456] InferenceWorker_p0-w0: stopping experience collection
1552
+ [2024-07-26 15:25:58,680][19443] Signal inference workers to resume experience collection...
1553
+ [2024-07-26 15:25:58,685][19456] InferenceWorker_p0-w0: resuming experience collection
1554
+ [2024-07-26 15:25:58,807][18919] Heartbeat connected on RolloutWorker_w0
1555
+ [2024-07-26 15:25:59,273][19462] Decorrelating experience for 96 frames...
1556
+ [2024-07-26 15:26:00,100][18919] Heartbeat connected on RolloutWorker_w5
1557
+ [2024-07-26 15:26:01,267][18919] Fps is (10 sec: 819.2, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 4014080. Throughput: 0: 174.4. Samples: 2616. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
1558
+ [2024-07-26 15:26:01,270][18919] Avg episode reward: [(0, '5.562')]
1559
+ [2024-07-26 15:26:06,268][18919] Fps is (10 sec: 2867.1, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 4034560. Throughput: 0: 400.2. Samples: 8004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
1560
+ [2024-07-26 15:26:06,270][18919] Avg episode reward: [(0, '10.359')]
1561
+ [2024-07-26 15:26:09,890][19456] Updated weights for policy 0, policy_version 988 (0.0715)
1562
+ [2024-07-26 15:26:11,267][18919] Fps is (10 sec: 3276.8, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 4046848. Throughput: 0: 411.6. Samples: 10290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
1563
+ [2024-07-26 15:26:11,271][18919] Avg episode reward: [(0, '12.329')]
1564
+ [2024-07-26 15:26:16,267][18919] Fps is (10 sec: 2867.3, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 4063232. Throughput: 0: 484.1. Samples: 14522. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
1565
+ [2024-07-26 15:26:16,270][18919] Avg episode reward: [(0, '14.364')]
1566
+ [2024-07-26 15:26:21,049][19456] Updated weights for policy 0, policy_version 998 (0.0012)
1567
+ [2024-07-26 15:26:21,267][18919] Fps is (10 sec: 4096.0, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 4087808. Throughput: 0: 599.9. Samples: 20996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
1568
+ [2024-07-26 15:26:21,270][18919] Avg episode reward: [(0, '16.321')]
1569
+ [2024-07-26 15:26:26,270][18919] Fps is (10 sec: 4094.8, 60 sec: 2457.4, 300 sec: 2457.4). Total num frames: 4104192. Throughput: 0: 605.0. Samples: 24200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
1570
+ [2024-07-26 15:26:26,273][18919] Avg episode reward: [(0, '18.269')]
1571
+ [2024-07-26 15:26:31,268][18919] Fps is (10 sec: 3276.8, 60 sec: 2548.6, 300 sec: 2548.6). Total num frames: 4120576. Throughput: 0: 632.9. Samples: 28482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
1572
+ [2024-07-26 15:26:31,270][18919] Avg episode reward: [(0, '20.306')]
1573
+ [2024-07-26 15:26:33,587][19456] Updated weights for policy 0, policy_version 1008 (0.0016)
1574
+ [2024-07-26 15:26:36,267][18919] Fps is (10 sec: 3687.5, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 4141056. Throughput: 0: 756.8. Samples: 34056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
1575
+ [2024-07-26 15:26:36,273][18919] Avg episode reward: [(0, '21.395')]
1576
+ [2024-07-26 15:26:41,267][18919] Fps is (10 sec: 4096.0, 60 sec: 2830.0, 300 sec: 2830.0). Total num frames: 4161536. Throughput: 0: 813.2. Samples: 37336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
1577
+ [2024-07-26 15:26:41,275][18919] Avg episode reward: [(0, '21.053')]
1578
+ [2024-07-26 15:26:43,251][19456] Updated weights for policy 0, policy_version 1018 (0.0019)
1579
+ [2024-07-26 15:26:46,267][18919] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2798.9). Total num frames: 4173824. Throughput: 0: 888.8. Samples: 42614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
1580
+ [2024-07-26 15:26:46,276][18919] Avg episode reward: [(0, '19.880')]
1581
+ [2024-07-26 15:26:51,267][18919] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2835.7). Total num frames: 4190208. Throughput: 0: 874.7. Samples: 47366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
1582
+ [2024-07-26 15:26:51,272][18919] Avg episode reward: [(0, '20.791')]
1583
+ [2024-07-26 15:26:55,117][19456] Updated weights for policy 0, policy_version 1028 (0.0015)
1584
+ [2024-07-26 15:26:56,267][18919] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 2984.2). Total num frames: 4214784. Throughput: 0: 895.6. Samples: 50590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
1585
+ [2024-07-26 15:26:56,270][18919] Avg episode reward: [(0, '20.621')]
1586
+ [2024-07-26 15:27:01,268][18919] Fps is (10 sec: 4095.6, 60 sec: 3618.1, 300 sec: 3003.7). Total num frames: 4231168. Throughput: 0: 940.5. Samples: 56844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
1587
+ [2024-07-26 15:27:01,277][18919] Avg episode reward: [(0, '21.283')]
1588
+ [2024-07-26 15:27:06,267][18919] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3020.8). Total num frames: 4247552. Throughput: 0: 887.8. Samples: 60946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
1589
+ [2024-07-26 15:27:06,273][18919] Avg episode reward: [(0, '21.632')]
1590
+ [2024-07-26 15:27:07,394][19456] Updated weights for policy 0, policy_version 1038 (0.0015)
1591
+ [2024-07-26 15:27:11,267][18919] Fps is (10 sec: 3686.8, 60 sec: 3686.4, 300 sec: 3084.0). Total num frames: 4268032. Throughput: 0: 881.8. Samples: 63878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
1592
+ [2024-07-26 15:27:11,273][18919] Avg episode reward: [(0, '21.233')]
1593
+ [2024-07-26 15:27:16,268][18919] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3140.3). Total num frames: 4288512. Throughput: 0: 931.3. Samples: 70390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
1594
+ [2024-07-26 15:27:16,270][18919] Avg episode reward: [(0, '20.520')]
1595
+ [2024-07-26 15:27:16,741][19456] Updated weights for policy 0, policy_version 1048 (0.0017)
1596
+ [2024-07-26 15:27:21,269][18919] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3147.4). Total num frames: 4304896. Throughput: 0: 914.7. Samples: 75220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
1597
+ [2024-07-26 15:27:21,275][18919] Avg episode reward: [(0, '21.541')]
1598
+ [2024-07-26 15:27:26,267][18919] Fps is (10 sec: 3276.9, 60 sec: 3618.3, 300 sec: 3153.9). Total num frames: 4321280. Throughput: 0: 887.7. Samples: 77282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
1599
+ [2024-07-26 15:27:26,272][18919] Avg episode reward: [(0, '21.475')]
1600
+ [2024-07-26 15:27:26,281][19443] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001055_4321280.pth...
1601
+ [2024-07-26 15:27:26,418][19443] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000927_3796992.pth
1602
+ [2024-07-26 15:27:29,007][19456] Updated weights for policy 0, policy_version 1058 (0.0015)
1603
+ [2024-07-26 15:27:31,267][18919] Fps is (10 sec: 3687.0, 60 sec: 3686.4, 300 sec: 3198.8). Total num frames: 4341760. Throughput: 0: 909.2. Samples: 83530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
1604
+ [2024-07-26 15:27:31,271][18919] Avg episode reward: [(0, '22.249')]
1605
+ [2024-07-26 15:27:36,269][18919] Fps is (10 sec: 3686.0, 60 sec: 3618.1, 300 sec: 3202.3). Total num frames: 4358144. Throughput: 0: 931.9. Samples: 89302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
1606
+ [2024-07-26 15:27:36,273][18919] Avg episode reward: [(0, '23.435')]
1607
+ [2024-07-26 15:27:36,286][19443] Saving new best policy, reward=23.435!
1608
+ [2024-07-26 15:27:41,174][19456] Updated weights for policy 0, policy_version 1068 (0.0022)
1609
+ [2024-07-26 15:27:41,267][18919] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3205.6). Total num frames: 4374528. Throughput: 0: 903.1. Samples: 91230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
1610
+ [2024-07-26 15:27:41,274][18919] Avg episode reward: [(0, '24.767')]
1611
+ [2024-07-26 15:27:41,276][19443] Saving new best policy, reward=24.767!
1612
+ [2024-07-26 15:27:46,267][18919] Fps is (10 sec: 3686.9, 60 sec: 3686.4, 300 sec: 3242.7). Total num frames: 4395008. Throughput: 0: 881.0. Samples: 96490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
1613
+ [2024-07-26 15:27:46,270][18919] Avg episode reward: [(0, '24.187')]
1614
+ [2024-07-26 15:27:51,055][19456] Updated weights for policy 0, policy_version 1078 (0.0013)
1615
+ [2024-07-26 15:27:51,267][18919] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 4415488. Throughput: 0: 933.6. Samples: 102958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
1616
+ [2024-07-26 15:27:51,270][18919] Avg episode reward: [(0, '21.910')]
1617
+ [2024-07-26 15:27:56,267][18919] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3245.3). Total num frames: 4427776. Throughput: 0: 922.3. Samples: 105380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
1618
+ [2024-07-26 15:27:56,274][18919] Avg episode reward: [(0, '20.468')]
1619
+ [2024-07-26 15:28:01,267][18919] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3246.5). Total num frames: 4444160. Throughput: 0: 872.7. Samples: 109662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
1620
+ [2024-07-26 15:28:01,274][18919] Avg episode reward: [(0, '19.593')]
1621
+ [2024-07-26 15:28:03,349][19456] Updated weights for policy 0, policy_version 1088 (0.0021)
1622
+ [2024-07-26 15:28:06,267][18919] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3276.8). Total num frames: 4464640. Throughput: 0: 908.8. Samples: 116116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
1623
+ [2024-07-26 15:28:06,272][18919] Avg episode reward: [(0, '19.811')]
1624
+ [2024-07-26 15:28:11,267][18919] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3305.0). Total num frames: 4485120. Throughput: 0: 935.2. Samples: 119364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
1625
+ [2024-07-26 15:28:11,277][18919] Avg episode reward: [(0, '21.227')]
1626
+ [2024-07-26 15:28:14,870][19456] Updated weights for policy 0, policy_version 1098 (0.0019)
1627
+ [2024-07-26 15:28:16,268][18919] Fps is (10 sec: 3276.5, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 4497408. Throughput: 0: 890.2. Samples: 123588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
1628
+ [2024-07-26 15:28:16,273][18919] Avg episode reward: [(0, '20.572')]
1629
+ [2024-07-26 15:28:21,267][18919] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3303.2). Total num frames: 4517888. Throughput: 0: 892.0. Samples: 129440. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
1630
+ [2024-07-26 15:28:21,275][18919] Avg episode reward: [(0, '21.986')]
1631
+ [2024-07-26 15:28:25,049][19456] Updated weights for policy 0, policy_version 1108 (0.0016)
1632
+ [2024-07-26 15:28:26,267][18919] Fps is (10 sec: 4506.0, 60 sec: 3686.4, 300 sec: 3353.6). Total num frames: 4542464. Throughput: 0: 921.9. Samples: 132716. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
1633
+ [2024-07-26 15:28:26,273][18919] Avg episode reward: [(0, '22.223')]
1634
+ [2024-07-26 15:28:31,268][18919] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3326.4). Total num frames: 4554752. Throughput: 0: 922.5. Samples: 138004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
1635
+ [2024-07-26 15:28:31,276][18919] Avg episode reward: [(0, '22.392')]
1636
+ [2024-07-26 15:28:36,267][18919] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3325.0). Total num frames: 4571136. Throughput: 0: 883.2. Samples: 142700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
1637
+ [2024-07-26 15:28:36,274][18919] Avg episode reward: [(0, '22.642')]
1638
+ [2024-07-26 15:28:37,274][19456] Updated weights for policy 0, policy_version 1118 (0.0015)
1639
+ [2024-07-26 15:28:41,267][18919] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3370.4). Total num frames: 4595712. Throughput: 0: 902.2. Samples: 145978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
1640
+ [2024-07-26 15:28:41,274][18919] Avg episode reward: [(0, '22.564')]
1641
+ [2024-07-26 15:28:46,268][18919] Fps is (10 sec: 4095.8, 60 sec: 3618.1, 300 sec: 3367.8). Total num frames: 4612096. Throughput: 0: 946.0. Samples: 152234. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
1642
+ [2024-07-26 15:28:46,281][18919] Avg episode reward: [(0, '22.004')]
1643
+ [2024-07-26 15:28:47,845][19456] Updated weights for policy 0, policy_version 1128 (0.0015)
1644
+ [2024-07-26 15:28:51,267][18919] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3365.4). Total num frames: 4628480. Throughput: 0: 895.0. Samples: 156392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
1645
+ [2024-07-26 15:28:51,273][18919] Avg episode reward: [(0, '21.631')]
1646
+ [2024-07-26 15:28:56,268][18919] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3384.6). Total num frames: 4648960. Throughput: 0: 889.5. Samples: 159390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
1647
+ [2024-07-26 15:28:56,270][18919] Avg episode reward: [(0, '20.860')]
1648
+ [2024-07-26 15:28:58,694][19456] Updated weights for policy 0, policy_version 1138 (0.0026)
1649
+ [2024-07-26 15:29:01,267][18919] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3402.8). Total num frames: 4669440. Throughput: 0: 942.6. Samples: 166002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
1650
+ [2024-07-26 15:29:01,274][18919] Avg episode reward: [(0, '20.520')]
1651
+ [2024-07-26 15:29:06,267][18919] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3399.7). Total num frames: 4685824. Throughput: 0: 919.0. Samples: 170794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
1652
+ [2024-07-26 15:29:06,276][18919] Avg episode reward: [(0, '20.357')]
1653
+ [2024-07-26 15:29:10,689][19456] Updated weights for policy 0, policy_version 1148 (0.0012)
1654
+ [2024-07-26 15:29:11,268][18919] Fps is (10 sec: 3276.6, 60 sec: 3618.1, 300 sec: 3396.7). Total num frames: 4702208. Throughput: 0: 893.1. Samples: 172906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
1655
+ [2024-07-26 15:29:11,273][18919] Avg episode reward: [(0, '20.009')]
1656
+ [2024-07-26 15:29:16,267][18919] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3413.3). Total num frames: 4722688. Throughput: 0: 918.9. Samples: 179354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
1657
+ [2024-07-26 15:29:16,272][18919] Avg episode reward: [(0, '21.261')]
1658
+ [2024-07-26 15:29:20,846][19456] Updated weights for policy 0, policy_version 1158 (0.0018)
1659
+ [2024-07-26 15:29:21,269][18919] Fps is (10 sec: 4095.4, 60 sec: 3754.5, 300 sec: 3429.2). Total num frames: 4743168. Throughput: 0: 943.1. Samples: 185140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
1660
+ [2024-07-26 15:29:21,274][18919] Avg episode reward: [(0, '21.688')]
1661
+ [2024-07-26 15:29:26,270][18919] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3407.1). Total num frames: 4755456. Throughput: 0: 915.3. Samples: 187170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
1662
+ [2024-07-26 15:29:26,273][18919] Avg episode reward: [(0, '21.516')]
1663
+ [2024-07-26 15:29:26,292][19443] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001161_4755456.pth...
1664
+ [2024-07-26 15:29:26,458][19443] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth
1665
+ [2024-07-26 15:29:31,267][18919] Fps is (10 sec: 3277.5, 60 sec: 3686.4, 300 sec: 3422.4). Total num frames: 4775936. Throughput: 0: 898.5. Samples: 192668. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
1666
+ [2024-07-26 15:29:31,273][18919] Avg episode reward: [(0, '20.834')]
1667
+ [2024-07-26 15:29:32,489][19456] Updated weights for policy 0, policy_version 1168 (0.0013)
1668
+ [2024-07-26 15:29:36,267][18919] Fps is (10 sec: 4506.9, 60 sec: 3822.9, 300 sec: 3454.9). Total num frames: 4800512. Throughput: 0: 946.8. Samples: 198996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
1669
+ [2024-07-26 15:29:36,273][18919] Avg episode reward: [(0, '21.189')]
1670
+ [2024-07-26 15:29:41,267][18919] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3433.7). Total num frames: 4812800. Throughput: 0: 930.9. Samples: 201282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
1671
+ [2024-07-26 15:29:41,273][18919] Avg episode reward: [(0, '21.099')]
1672
+ [2024-07-26 15:29:44,675][19456] Updated weights for policy 0, policy_version 1178 (0.0018)
1673
+ [2024-07-26 15:29:46,267][18919] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3430.4). Total num frames: 4829184. Throughput: 0: 883.3. Samples: 205750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
1674
+ [2024-07-26 15:29:46,270][18919] Avg episode reward: [(0, '23.069')]
1675
+ [2024-07-26 15:29:51,267][18919] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3460.7). Total num frames: 4853760. Throughput: 0: 922.8. Samples: 212322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
1676
+ [2024-07-26 15:29:51,270][18919] Avg episode reward: [(0, '22.822')]
1677
+ [2024-07-26 15:29:54,081][19456] Updated weights for policy 0, policy_version 1188 (0.0018)
1678
+ [2024-07-26 15:29:56,267][18919] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3457.0). Total num frames: 4870144. Throughput: 0: 946.3. Samples: 215490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
1679
+ [2024-07-26 15:29:56,270][18919] Avg episode reward: [(0, '22.195')]
1680
+ [2024-07-26 15:30:01,267][18919] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3437.4). Total num frames: 4882432. Throughput: 0: 894.8. Samples: 219622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
1681
+ [2024-07-26 15:30:01,272][18919] Avg episode reward: [(0, '21.692')]
1682
+ [2024-07-26 15:30:06,267][18919] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3450.1). Total num frames: 4902912. Throughput: 0: 896.0. Samples: 225456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
1683
+ [2024-07-26 15:30:06,274][18919] Avg episode reward: [(0, '23.163')]
1684
+ [2024-07-26 15:30:06,449][19456] Updated weights for policy 0, policy_version 1198 (0.0011)
1685
+ [2024-07-26 15:30:11,267][18919] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3477.7). Total num frames: 4927488. Throughput: 0: 923.9. Samples: 228742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
1686
+ [2024-07-26 15:30:11,270][18919] Avg episode reward: [(0, '22.648')]
1687
+ [2024-07-26 15:30:16,268][18919] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3458.8). Total num frames: 4939776. Throughput: 0: 914.1. Samples: 233804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
1688
+ [2024-07-26 15:30:16,272][18919] Avg episode reward: [(0, '22.139')]
1689
+ [2024-07-26 15:30:18,490][19456] Updated weights for policy 0, policy_version 1208 (0.0012)
1690
+ [2024-07-26 15:30:21,267][18919] Fps is (10 sec: 2867.2, 60 sec: 3550.0, 300 sec: 3455.5). Total num frames: 4956160. Throughput: 0: 884.3. Samples: 238788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
1691
+ [2024-07-26 15:30:21,270][18919] Avg episode reward: [(0, '23.052')]
1692
+ [2024-07-26 15:30:26,267][18919] Fps is (10 sec: 4096.1, 60 sec: 3754.8, 300 sec: 3481.6). Total num frames: 4980736. Throughput: 0: 905.3. Samples: 242020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
1693
+ [2024-07-26 15:30:26,272][18919] Avg episode reward: [(0, '24.683')]
1694
+ [2024-07-26 15:30:28,027][19456] Updated weights for policy 0, policy_version 1218 (0.0012)
1695
+ [2024-07-26 15:30:31,269][18919] Fps is (10 sec: 4095.2, 60 sec: 3686.3, 300 sec: 3478.0). Total num frames: 4997120. Throughput: 0: 939.7. Samples: 248038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
1696
+ [2024-07-26 15:30:31,272][18919] Avg episode reward: [(0, '25.353')]
1697
+ [2024-07-26 15:30:31,280][19443] Saving new best policy, reward=25.353!
1698
+ [2024-07-26 15:30:33,754][19443] Stopping Batcher_0...
1699
+ [2024-07-26 15:30:33,755][19443] Loop batcher_evt_loop terminating...
1700
+ [2024-07-26 15:30:33,756][18919] Component Batcher_0 stopped!
1701
+ [2024-07-26 15:30:33,767][19443] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
1702
+ [2024-07-26 15:30:33,867][19456] Weights refcount: 2 0
1703
+ [2024-07-26 15:30:33,879][18919] Component InferenceWorker_p0-w0 stopped!
1704
+ [2024-07-26 15:30:33,883][19456] Stopping InferenceWorker_p0-w0...
1705
+ [2024-07-26 15:30:33,884][19456] Loop inference_proc0-0_evt_loop terminating...
1706
+ [2024-07-26 15:30:33,904][18919] Component RolloutWorker_w2 stopped!
1707
+ [2024-07-26 15:30:33,906][19459] Stopping RolloutWorker_w2...
1708
+ [2024-07-26 15:30:33,919][18919] Component RolloutWorker_w4 stopped!
1709
+ [2024-07-26 15:30:33,907][19459] Loop rollout_proc2_evt_loop terminating...
1710
+ [2024-07-26 15:30:33,922][19461] Stopping RolloutWorker_w4...
1711
+ [2024-07-26 15:30:33,938][18919] Component RolloutWorker_w6 stopped!
1712
+ [2024-07-26 15:30:33,943][18919] Component RolloutWorker_w0 stopped!
1713
+ [2024-07-26 15:30:33,946][19457] Stopping RolloutWorker_w0...
1714
+ [2024-07-26 15:30:33,925][19461] Loop rollout_proc4_evt_loop terminating...
1715
+ [2024-07-26 15:30:33,942][19463] Stopping RolloutWorker_w6...
1716
+ [2024-07-26 15:30:33,951][19463] Loop rollout_proc6_evt_loop terminating...
1717
+ [2024-07-26 15:30:33,947][19457] Loop rollout_proc0_evt_loop terminating...
1718
+ [2024-07-26 15:30:33,989][19460] Stopping RolloutWorker_w3...
1719
+ [2024-07-26 15:30:33,994][19458] Stopping RolloutWorker_w1...
1720
+ [2024-07-26 15:30:33,993][18919] Component RolloutWorker_w3 stopped!
1721
+ [2024-07-26 15:30:34,002][19458] Loop rollout_proc1_evt_loop terminating...
1722
+ [2024-07-26 15:30:34,002][19460] Loop rollout_proc3_evt_loop terminating...
1723
+ [2024-07-26 15:30:33,997][18919] Component RolloutWorker_w1 stopped!
1724
+ [2024-07-26 15:30:34,027][19462] Stopping RolloutWorker_w5...
1725
+ [2024-07-26 15:30:34,028][19462] Loop rollout_proc5_evt_loop terminating...
1726
+ [2024-07-26 15:30:34,028][18919] Component RolloutWorker_w5 stopped!
1727
+ [2024-07-26 15:30:34,032][19464] Stopping RolloutWorker_w7...
1728
+ [2024-07-26 15:30:34,034][19464] Loop rollout_proc7_evt_loop terminating...
1729
+ [2024-07-26 15:30:34,034][18919] Component RolloutWorker_w7 stopped!
1730
+ [2024-07-26 15:30:34,068][19443] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001055_4321280.pth
1731
+ [2024-07-26 15:30:34,120][19443] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
1732
+ [2024-07-26 15:30:34,519][19443] Stopping LearnerWorker_p0...
1733
+ [2024-07-26 15:30:34,520][19443] Loop learner_proc0_evt_loop terminating...
1734
+ [2024-07-26 15:30:34,529][18919] Component LearnerWorker_p0 stopped!
1735
+ [2024-07-26 15:30:34,535][18919] Waiting for process learner_proc0 to stop...
1736
+ [2024-07-26 15:30:36,249][18919] Waiting for process inference_proc0-0 to join...
1737
+ [2024-07-26 15:30:36,257][18919] Waiting for process rollout_proc0 to join...
1738
+ [2024-07-26 15:30:37,527][18919] Waiting for process rollout_proc1 to join...
1739
+ [2024-07-26 15:30:37,531][18919] Waiting for process rollout_proc2 to join...
1740
+ [2024-07-26 15:30:37,536][18919] Waiting for process rollout_proc3 to join...
1741
+ [2024-07-26 15:30:37,538][18919] Waiting for process rollout_proc4 to join...
1742
+ [2024-07-26 15:30:37,543][18919] Waiting for process rollout_proc5 to join...
1743
+ [2024-07-26 15:30:37,546][18919] Waiting for process rollout_proc6 to join...
1744
+ [2024-07-26 15:30:37,550][18919] Waiting for process rollout_proc7 to join...
1745
+ [2024-07-26 15:30:37,554][18919] Batcher 0 profile tree view:
1746
+ batching: 6.4650, releasing_batches: 0.0073
1747
+ [2024-07-26 15:30:37,555][18919] InferenceWorker_p0-w0 profile tree view:
1748
+ wait_policy: 0.0022
1749
+ wait_policy_total: 133.9347
1750
+ update_model: 2.5475
1751
+ weight_update: 0.0012
1752
+ one_step: 0.0023
1753
+ handle_policy_step: 141.1796
1754
+ deserialize: 3.7034, stack: 0.7731, obs_to_device_normalize: 30.0941, forward: 70.7096, send_messages: 7.0674
1755
+ prepare_outputs: 21.5497
1756
+ to_cpu: 13.4657
1757
+ [2024-07-26 15:30:37,558][18919] Learner 0 profile tree view:
1758
+ misc: 0.0014, prepare_batch: 7.9588
1759
+ train: 20.3950
1760
+ epoch_init: 0.0016, minibatch_init: 0.0016, losses_postprocess: 0.1678, kl_divergence: 0.1982, after_optimizer: 0.9136
1761
+ calculate_losses: 6.2420
1762
+ losses_init: 0.0009, forward_head: 0.6769, bptt_initial: 3.8406, tail: 0.2414, advantages_returns: 0.0625, losses: 0.7417
1763
+ bptt: 0.6049
1764
+ bptt_forward_core: 0.5629
1765
+ update: 12.7197
1766
+ clip: 0.3748
1767
+ [2024-07-26 15:30:37,560][18919] RolloutWorker_w0 profile tree view:
1768
+ wait_for_trajectories: 0.0777, enqueue_policy_requests: 33.8361, env_step: 213.0039, overhead: 3.9676, complete_rollouts: 1.6830
1769
+ save_policy_outputs: 6.3749
1770
+ split_output_tensors: 2.1586
1771
+ [2024-07-26 15:30:37,561][18919] RolloutWorker_w7 profile tree view:
1772
+ wait_for_trajectories: 0.0474, enqueue_policy_requests: 34.2331, env_step: 216.1595, overhead: 4.0057, complete_rollouts: 1.9941
1773
+ save_policy_outputs: 6.2127
1774
+ split_output_tensors: 2.2198
1775
+ [2024-07-26 15:30:37,563][18919] Loop Runner_EvtLoop terminating...
1776
+ [2024-07-26 15:30:37,567][18919] Runner profile tree view:
1777
+ main_loop: 308.0798
1778
+ [2024-07-26 15:30:37,568][18919] Collected {0: 5005312}, FPS: 3244.0
1779
+ [2024-07-26 15:31:19,219][18919] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
1780
+ [2024-07-26 15:31:19,221][18919] Overriding arg 'num_workers' with value 1 passed from command line
1781
+ [2024-07-26 15:31:19,224][18919] Adding new argument 'no_render'=True that is not in the saved config file!
1782
+ [2024-07-26 15:31:19,226][18919] Adding new argument 'save_video'=True that is not in the saved config file!
1783
+ [2024-07-26 15:31:19,227][18919] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
1784
+ [2024-07-26 15:31:19,230][18919] Adding new argument 'video_name'=None that is not in the saved config file!
1785
+ [2024-07-26 15:31:19,232][18919] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
1786
+ [2024-07-26 15:31:19,233][18919] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
1787
+ [2024-07-26 15:31:19,234][18919] Adding new argument 'push_to_hub'=True that is not in the saved config file!
1788
+ [2024-07-26 15:31:19,236][18919] Adding new argument 'hf_repository'='thomaspalomares/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
1789
+ [2024-07-26 15:31:19,238][18919] Adding new argument 'policy_index'=0 that is not in the saved config file!
1790
+ [2024-07-26 15:31:19,239][18919] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
1791
+ [2024-07-26 15:31:19,241][18919] Adding new argument 'train_script'=None that is not in the saved config file!
1792
+ [2024-07-26 15:31:19,243][18919] Adding new argument 'enjoy_script'=None that is not in the saved config file!
1793
+ [2024-07-26 15:31:19,244][18919] Using frameskip 1 and render_action_repeat=4 for evaluation
1794
+ [2024-07-26 15:31:19,262][18919] Doom resolution: 160x120, resize resolution: (128, 72)
1795
+ [2024-07-26 15:31:19,264][18919] RunningMeanStd input shape: (3, 72, 128)
1796
+ [2024-07-26 15:31:19,266][18919] RunningMeanStd input shape: (1,)
1797
+ [2024-07-26 15:31:19,281][18919] ConvEncoder: input_channels=3
1798
+ [2024-07-26 15:31:19,406][18919] Conv encoder output size: 512
1799
+ [2024-07-26 15:31:19,408][18919] Policy head output size: 512
1800
+ [2024-07-26 15:31:21,533][18919] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
1801
+ [2024-07-26 15:31:22,685][18919] Num frames 100...
1802
+ [2024-07-26 15:31:22,810][18919] Num frames 200...
1803
+ [2024-07-26 15:31:22,937][18919] Num frames 300...
1804
+ [2024-07-26 15:31:23,067][18919] Num frames 400...
1805
+ [2024-07-26 15:31:23,206][18919] Num frames 500...
1806
+ [2024-07-26 15:31:23,383][18919] Avg episode rewards: #0: 8.960, true rewards: #0: 5.960
1807
+ [2024-07-26 15:31:23,386][18919] Avg episode reward: 8.960, avg true_objective: 5.960
1808
+ [2024-07-26 15:31:23,394][18919] Num frames 600...
1809
+ [2024-07-26 15:31:23,517][18919] Num frames 700...
1810
+ [2024-07-26 15:31:23,650][18919] Num frames 800...
1811
+ [2024-07-26 15:31:23,775][18919] Num frames 900...
1812
+ [2024-07-26 15:31:23,899][18919] Num frames 1000...
1813
+ [2024-07-26 15:31:24,029][18919] Num frames 1100...
1814
+ [2024-07-26 15:31:24,181][18919] Avg episode rewards: #0: 9.390, true rewards: #0: 5.890
1815
+ [2024-07-26 15:31:24,183][18919] Avg episode reward: 9.390, avg true_objective: 5.890
1816
+ [2024-07-26 15:31:24,213][18919] Num frames 1200...
1817
+ [2024-07-26 15:31:24,340][18919] Num frames 1300...
1818
+ [2024-07-26 15:31:24,466][18919] Num frames 1400...
1819
+ [2024-07-26 15:31:24,596][18919] Num frames 1500...
1820
+ [2024-07-26 15:31:24,720][18919] Num frames 1600...
1821
+ [2024-07-26 15:31:24,845][18919] Num frames 1700...
1822
+ [2024-07-26 15:31:24,980][18919] Num frames 1800...
1823
+ [2024-07-26 15:31:25,113][18919] Num frames 1900...
1824
+ [2024-07-26 15:31:25,258][18919] Num frames 2000...
1825
+ [2024-07-26 15:31:25,396][18919] Num frames 2100...
1826
+ [2024-07-26 15:31:25,521][18919] Num frames 2200...
1827
+ [2024-07-26 15:31:25,580][18919] Avg episode rewards: #0: 13.673, true rewards: #0: 7.340
1828
+ [2024-07-26 15:31:25,581][18919] Avg episode reward: 13.673, avg true_objective: 7.340
1829
+ [2024-07-26 15:31:25,705][18919] Num frames 2300...
1830
+ [2024-07-26 15:31:25,828][18919] Num frames 2400...
1831
+ [2024-07-26 15:31:25,951][18919] Num frames 2500...
1832
+ [2024-07-26 15:31:26,116][18919] Avg episode rewards: #0: 11.215, true rewards: #0: 6.465
1833
+ [2024-07-26 15:31:26,117][18919] Avg episode reward: 11.215, avg true_objective: 6.465
1834
+ [2024-07-26 15:31:26,138][18919] Num frames 2600...
1835
+ [2024-07-26 15:31:26,274][18919] Num frames 2700...
1836
+ [2024-07-26 15:31:26,406][18919] Num frames 2800...
1837
+ [2024-07-26 15:31:26,532][18919] Num frames 2900...
1838
+ [2024-07-26 15:31:26,657][18919] Num frames 3000...
1839
+ [2024-07-26 15:31:26,781][18919] Num frames 3100...
1840
+ [2024-07-26 15:31:26,909][18919] Num frames 3200...
1841
+ [2024-07-26 15:31:27,083][18919] Avg episode rewards: #0: 11.980, true rewards: #0: 6.580
1842
+ [2024-07-26 15:31:27,086][18919] Avg episode reward: 11.980, avg true_objective: 6.580
1843
+ [2024-07-26 15:31:27,101][18919] Num frames 3300...
1844
+ [2024-07-26 15:31:27,234][18919] Num frames 3400...
1845
+ [2024-07-26 15:31:27,366][18919] Num frames 3500...
1846
+ [2024-07-26 15:31:27,495][18919] Num frames 3600...
1847
+ [2024-07-26 15:31:27,622][18919] Num frames 3700...
1848
+ [2024-07-26 15:31:27,747][18919] Num frames 3800...
1849
+ [2024-07-26 15:31:27,871][18919] Num frames 3900...
1850
+ [2024-07-26 15:31:28,006][18919] Num frames 4000...
1851
+ [2024-07-26 15:31:28,138][18919] Num frames 4100...
1852
+ [2024-07-26 15:31:28,274][18919] Num frames 4200...
1853
+ [2024-07-26 15:31:28,402][18919] Num frames 4300...
1854
+ [2024-07-26 15:31:28,528][18919] Num frames 4400...
1855
+ [2024-07-26 15:31:28,657][18919] Num frames 4500...
1856
+ [2024-07-26 15:31:28,779][18919] Num frames 4600...
1857
+ [2024-07-26 15:31:28,904][18919] Num frames 4700...
1858
+ [2024-07-26 15:31:29,037][18919] Num frames 4800...
1859
+ [2024-07-26 15:31:29,168][18919] Num frames 4900...
1860
+ [2024-07-26 15:31:29,350][18919] Avg episode rewards: #0: 17.325, true rewards: #0: 8.325
1861
+ [2024-07-26 15:31:29,352][18919] Avg episode reward: 17.325, avg true_objective: 8.325
1862
+ [2024-07-26 15:31:29,361][18919] Num frames 5000...
1863
+ [2024-07-26 15:31:29,489][18919] Num frames 5100...
1864
+ [2024-07-26 15:31:29,616][18919] Num frames 5200...
1865
+ [2024-07-26 15:31:29,744][18919] Num frames 5300...
1866
+ [2024-07-26 15:31:29,875][18919] Num frames 5400...
1867
+ [2024-07-26 15:31:30,013][18919] Num frames 5500...
1868
+ [2024-07-26 15:31:30,139][18919] Num frames 5600...
1869
+ [2024-07-26 15:31:30,266][18919] Num frames 5700...
1870
+ [2024-07-26 15:31:30,413][18919] Num frames 5800...
1871
+ [2024-07-26 15:31:30,542][18919] Num frames 5900...
1872
+ [2024-07-26 15:31:30,672][18919] Num frames 6000...
1873
+ [2024-07-26 15:31:30,799][18919] Num frames 6100...
1874
+ [2024-07-26 15:31:30,927][18919] Num frames 6200...
1875
+ [2024-07-26 15:31:31,044][18919] Avg episode rewards: #0: 18.204, true rewards: #0: 8.919
1876
+ [2024-07-26 15:31:31,046][18919] Avg episode reward: 18.204, avg true_objective: 8.919
1877
+ [2024-07-26 15:31:31,123][18919] Num frames 6300...
1878
+ [2024-07-26 15:31:31,248][18919] Num frames 6400...
1879
+ [2024-07-26 15:31:31,387][18919] Num frames 6500...
1880
+ [2024-07-26 15:31:31,518][18919] Num frames 6600...
1881
+ [2024-07-26 15:31:31,645][18919] Num frames 6700...
1882
+ [2024-07-26 15:31:31,773][18919] Num frames 6800...
1883
+ [2024-07-26 15:31:31,904][18919] Num frames 6900...
1884
+ [2024-07-26 15:31:31,979][18919] Avg episode rewards: #0: 17.644, true rewards: #0: 8.644
1885
+ [2024-07-26 15:31:31,980][18919] Avg episode reward: 17.644, avg true_objective: 8.644
1886
+ [2024-07-26 15:31:32,090][18919] Num frames 7000...
1887
+ [2024-07-26 15:31:32,223][18919] Num frames 7100...
1888
+ [2024-07-26 15:31:32,354][18919] Num frames 7200...
1889
+ [2024-07-26 15:31:32,543][18919] Num frames 7300...
1890
+ [2024-07-26 15:31:32,726][18919] Num frames 7400...
1891
+ [2024-07-26 15:31:32,900][18919] Num frames 7500...
1892
+ [2024-07-26 15:31:33,079][18919] Num frames 7600...
1893
+ [2024-07-26 15:31:33,260][18919] Num frames 7700...
1894
+ [2024-07-26 15:31:33,460][18919] Num frames 7800...
1895
+ [2024-07-26 15:31:33,641][18919] Num frames 7900...
1896
+ [2024-07-26 15:31:33,832][18919] Num frames 8000...
1897
+ [2024-07-26 15:31:34,017][18919] Num frames 8100...
1898
+ [2024-07-26 15:31:34,205][18919] Num frames 8200...
1899
+ [2024-07-26 15:31:34,374][18919] Avg episode rewards: #0: 19.288, true rewards: #0: 9.177
1900
+ [2024-07-26 15:31:34,376][18919] Avg episode reward: 19.288, avg true_objective: 9.177
1901
+ [2024-07-26 15:31:34,452][18919] Num frames 8300...
1902
+ [2024-07-26 15:31:34,652][18919] Num frames 8400...
1903
+ [2024-07-26 15:31:34,836][18919] Num frames 8500...
1904
+ [2024-07-26 15:31:34,991][18919] Num frames 8600...
1905
+ [2024-07-26 15:31:35,118][18919] Num frames 8700...
1906
+ [2024-07-26 15:31:35,247][18919] Num frames 8800...
1907
+ [2024-07-26 15:31:35,376][18919] Num frames 8900...
1908
+ [2024-07-26 15:31:35,504][18919] Num frames 9000...
1909
+ [2024-07-26 15:31:35,649][18919] Avg episode rewards: #0: 19.060, true rewards: #0: 9.060
1910
+ [2024-07-26 15:31:35,650][18919] Avg episode reward: 19.060, avg true_objective: 9.060
1911
+ [2024-07-26 15:32:28,310][18919] Replay video saved to /content/train_dir/default_experiment/replay.mp4!