[2025-02-21 06:30:28,235][00633] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-02-21 06:30:28,239][00633] Rollout worker 0 uses device cpu [2025-02-21 06:30:28,241][00633] Rollout worker 1 uses device cpu [2025-02-21 06:30:28,242][00633] Rollout worker 2 uses device cpu [2025-02-21 06:30:28,246][00633] Rollout worker 3 uses device cpu [2025-02-21 06:30:28,247][00633] Rollout worker 4 uses device cpu [2025-02-21 06:30:28,248][00633] Rollout worker 5 uses device cpu [2025-02-21 06:30:28,248][00633] Rollout worker 6 uses device cpu [2025-02-21 06:30:28,249][00633] Rollout worker 7 uses device cpu [2025-02-21 06:30:28,436][00633] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-21 06:30:28,438][00633] InferenceWorker_p0-w0: min num requests: 2 [2025-02-21 06:30:28,479][00633] Starting all processes... [2025-02-21 06:30:28,480][00633] Starting process learner_proc0 [2025-02-21 06:30:28,562][00633] Starting all processes... [2025-02-21 06:30:28,690][00633] Starting process inference_proc0-0 [2025-02-21 06:30:28,691][00633] Starting process rollout_proc0 [2025-02-21 06:30:28,691][00633] Starting process rollout_proc1 [2025-02-21 06:30:28,691][00633] Starting process rollout_proc2 [2025-02-21 06:30:28,691][00633] Starting process rollout_proc3 [2025-02-21 06:30:28,691][00633] Starting process rollout_proc4 [2025-02-21 06:30:28,691][00633] Starting process rollout_proc5 [2025-02-21 06:30:28,692][00633] Starting process rollout_proc6 [2025-02-21 06:30:28,692][00633] Starting process rollout_proc7 [2025-02-21 06:30:44,797][03235] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-21 06:30:44,797][03235] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-02-21 06:30:44,873][03235] Num visible devices: 1 [2025-02-21 06:30:44,915][03235] Starting seed is not provided [2025-02-21 06:30:44,916][03235] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-21 06:30:44,916][03235] Initializing actor-critic model on device cuda:0 [2025-02-21 06:30:44,917][03235] RunningMeanStd input shape: (3, 72, 128) [2025-02-21 06:30:44,920][03235] RunningMeanStd input shape: (1,) [2025-02-21 06:30:44,999][03235] ConvEncoder: input_channels=3 [2025-02-21 06:30:45,290][03256] Worker 7 uses CPU cores [1] [2025-02-21 06:30:45,478][03253] Worker 4 uses CPU cores [0] [2025-02-21 06:30:45,515][03255] Worker 6 uses CPU cores [0] [2025-02-21 06:30:45,685][03251] Worker 1 uses CPU cores [1] [2025-02-21 06:30:45,686][03250] Worker 2 uses CPU cores [0] [2025-02-21 06:30:45,727][03248] Worker 0 uses CPU cores [0] [2025-02-21 06:30:45,730][03249] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-21 06:30:45,730][03249] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-02-21 06:30:45,757][03254] Worker 5 uses CPU cores [1] [2025-02-21 06:30:45,762][03249] Num visible devices: 1 [2025-02-21 06:30:45,812][03252] Worker 3 uses CPU cores [1] [2025-02-21 06:30:45,820][03235] Conv encoder output size: 512 [2025-02-21 06:30:45,821][03235] Policy head output size: 512 [2025-02-21 06:30:45,878][03235] Created Actor Critic model with architecture: [2025-02-21 06:30:45,878][03235] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-02-21 06:30:46,199][03235] Using optimizer [2025-02-21 06:30:48,426][00633] Heartbeat connected on Batcher_0 [2025-02-21 06:30:48,437][00633] Heartbeat connected on InferenceWorker_p0-w0 [2025-02-21 06:30:48,445][00633] Heartbeat connected on RolloutWorker_w0 [2025-02-21 06:30:48,451][00633] Heartbeat connected on RolloutWorker_w1 [2025-02-21 06:30:48,455][00633] Heartbeat connected on RolloutWorker_w2 [2025-02-21 06:30:48,465][00633] Heartbeat connected on RolloutWorker_w3 [2025-02-21 06:30:48,466][00633] Heartbeat connected on RolloutWorker_w4 [2025-02-21 06:30:48,472][00633] Heartbeat connected on RolloutWorker_w5 [2025-02-21 06:30:48,475][00633] Heartbeat connected on RolloutWorker_w6 [2025-02-21 06:30:48,479][00633] Heartbeat connected on RolloutWorker_w7 [2025-02-21 06:30:51,093][03235] No checkpoints found [2025-02-21 06:30:51,093][03235] Did not load from checkpoint, starting from scratch! [2025-02-21 06:30:51,093][03235] Initialized policy 0 weights for model version 0 [2025-02-21 06:30:51,096][03235] LearnerWorker_p0 finished initialization! [2025-02-21 06:30:51,097][00633] Heartbeat connected on LearnerWorker_p0 [2025-02-21 06:30:51,097][03235] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-21 06:30:51,347][03249] RunningMeanStd input shape: (3, 72, 128) [2025-02-21 06:30:51,348][03249] RunningMeanStd input shape: (1,) [2025-02-21 06:30:51,360][03249] ConvEncoder: input_channels=3 [2025-02-21 06:30:51,475][03249] Conv encoder output size: 512 [2025-02-21 06:30:51,475][03249] Policy head output size: 512 [2025-02-21 06:30:51,510][00633] Inference worker 0-0 is ready! [2025-02-21 06:30:51,511][00633] All inference workers are ready! Signal rollout workers to start! [2025-02-21 06:30:51,629][00633] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-21 06:30:51,757][03254] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-21 06:30:51,763][03250] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-21 06:30:51,824][03248] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-21 06:30:51,831][03256] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-21 06:30:51,867][03251] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-21 06:30:51,887][03255] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-21 06:30:51,906][03252] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-21 06:30:51,933][03253] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-21 06:30:53,282][03254] Decorrelating experience for 0 frames... [2025-02-21 06:30:53,283][03251] Decorrelating experience for 0 frames... [2025-02-21 06:30:53,282][03255] Decorrelating experience for 0 frames... [2025-02-21 06:30:53,284][03253] Decorrelating experience for 0 frames... [2025-02-21 06:30:54,038][03255] Decorrelating experience for 32 frames... [2025-02-21 06:30:54,041][03253] Decorrelating experience for 32 frames... [2025-02-21 06:30:54,055][03254] Decorrelating experience for 32 frames... [2025-02-21 06:30:54,057][03251] Decorrelating experience for 32 frames... [2025-02-21 06:30:55,051][03253] Decorrelating experience for 64 frames... [2025-02-21 06:30:55,056][03255] Decorrelating experience for 64 frames... [2025-02-21 06:30:55,174][03254] Decorrelating experience for 64 frames... [2025-02-21 06:30:55,177][03251] Decorrelating experience for 64 frames... [2025-02-21 06:30:55,915][03253] Decorrelating experience for 96 frames... [2025-02-21 06:30:55,916][03255] Decorrelating experience for 96 frames... [2025-02-21 06:30:56,011][03254] Decorrelating experience for 96 frames... [2025-02-21 06:30:56,009][03251] Decorrelating experience for 96 frames... [2025-02-21 06:30:56,629][00633] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-21 06:30:59,561][03235] Signal inference workers to stop experience collection... [2025-02-21 06:30:59,576][03249] InferenceWorker_p0-w0: stopping experience collection [2025-02-21 06:31:01,626][03235] Signal inference workers to resume experience collection... [2025-02-21 06:31:01,627][03249] InferenceWorker_p0-w0: resuming experience collection [2025-02-21 06:31:01,630][00633] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 216.6. Samples: 2166. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-21 06:31:01,631][00633] Avg episode reward: [(0, '2.959')] [2025-02-21 06:31:06,629][00633] Fps is (10 sec: 2457.6, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 24576. Throughput: 0: 454.7. Samples: 6820. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:31:06,633][00633] Avg episode reward: [(0, '3.932')] [2025-02-21 06:31:09,833][03249] Updated weights for policy 0, policy_version 10 (0.0015) [2025-02-21 06:31:11,629][00633] Fps is (10 sec: 4505.8, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 45056. Throughput: 0: 504.4. Samples: 10088. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:31:11,631][00633] Avg episode reward: [(0, '4.303')] [2025-02-21 06:31:16,629][00633] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 61440. Throughput: 0: 617.4. Samples: 15436. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:31:16,631][00633] Avg episode reward: [(0, '4.309')] [2025-02-21 06:31:20,766][03249] Updated weights for policy 0, policy_version 20 (0.0015) [2025-02-21 06:31:21,629][00633] Fps is (10 sec: 3686.4, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 81920. Throughput: 0: 717.7. Samples: 21530. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:31:21,631][00633] Avg episode reward: [(0, '4.463')] [2025-02-21 06:31:26,629][00633] Fps is (10 sec: 4505.5, 60 sec: 3042.7, 300 sec: 3042.7). Total num frames: 106496. Throughput: 0: 705.0. Samples: 24674. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:31:26,631][00633] Avg episode reward: [(0, '4.528')] [2025-02-21 06:31:26,632][03235] Saving new best policy, reward=4.528! [2025-02-21 06:31:31,629][00633] Fps is (10 sec: 3686.4, 60 sec: 2969.6, 300 sec: 2969.6). Total num frames: 118784. Throughput: 0: 739.9. Samples: 29596. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:31:31,631][00633] Avg episode reward: [(0, '4.334')] [2025-02-21 06:31:32,010][03249] Updated weights for policy 0, policy_version 30 (0.0015) [2025-02-21 06:31:36,629][00633] Fps is (10 sec: 3276.8, 60 sec: 3094.8, 300 sec: 3094.8). Total num frames: 139264. Throughput: 0: 800.2. Samples: 36008. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:31:36,631][00633] Avg episode reward: [(0, '4.307')] [2025-02-21 06:31:41,631][00633] Fps is (10 sec: 4095.5, 60 sec: 3194.8, 300 sec: 3194.8). Total num frames: 159744. Throughput: 0: 871.4. Samples: 39212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:31:41,632][00633] Avg episode reward: [(0, '4.462')] [2025-02-21 06:31:41,827][03249] Updated weights for policy 0, policy_version 40 (0.0013) [2025-02-21 06:31:46,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 180224. Throughput: 0: 934.2. Samples: 44206. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:31:46,631][00633] Avg episode reward: [(0, '4.390')] [2025-02-21 06:31:51,629][00633] Fps is (10 sec: 4096.6, 60 sec: 3345.1, 300 sec: 3345.1). Total num frames: 200704. Throughput: 0: 972.6. Samples: 50586. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:31:51,633][00633] Avg episode reward: [(0, '4.657')] [2025-02-21 06:31:51,639][03235] Saving new best policy, reward=4.657! [2025-02-21 06:31:52,400][03249] Updated weights for policy 0, policy_version 50 (0.0017) [2025-02-21 06:31:56,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3339.8). Total num frames: 217088. Throughput: 0: 966.8. Samples: 53594. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:31:56,631][00633] Avg episode reward: [(0, '4.713')] [2025-02-21 06:31:56,632][03235] Saving new best policy, reward=4.713! [2025-02-21 06:32:01,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3393.8). Total num frames: 237568. Throughput: 0: 960.6. Samples: 58664. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:32:01,836][00633] Avg episode reward: [(0, '4.500')] [2025-02-21 06:32:03,474][03249] Updated weights for policy 0, policy_version 60 (0.0012) [2025-02-21 06:32:06,631][00633] Fps is (10 sec: 4095.3, 60 sec: 3891.1, 300 sec: 3440.6). Total num frames: 258048. Throughput: 0: 966.4. Samples: 65020. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:32:06,632][00633] Avg episode reward: [(0, '4.527')] [2025-02-21 06:32:11,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3430.4). Total num frames: 274432. Throughput: 0: 953.5. Samples: 67580. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:32:11,631][00633] Avg episode reward: [(0, '4.593')] [2025-02-21 06:32:14,545][03249] Updated weights for policy 0, policy_version 70 (0.0016) [2025-02-21 06:32:16,629][00633] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3469.6). Total num frames: 294912. Throughput: 0: 965.7. Samples: 73052. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:32:16,631][00633] Avg episode reward: [(0, '4.637')] [2025-02-21 06:32:21,631][00633] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3504.3). Total num frames: 315392. Throughput: 0: 966.1. Samples: 79486. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:32:21,634][00633] Avg episode reward: [(0, '4.670')] [2025-02-21 06:32:21,642][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth... [2025-02-21 06:32:25,626][03249] Updated weights for policy 0, policy_version 80 (0.0012) [2025-02-21 06:32:26,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3492.4). Total num frames: 331776. Throughput: 0: 937.8. Samples: 81414. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:32:26,634][00633] Avg episode reward: [(0, '4.638')] [2025-02-21 06:32:31,629][00633] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3522.6). Total num frames: 352256. Throughput: 0: 962.6. Samples: 87524. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:32:31,633][00633] Avg episode reward: [(0, '4.596')] [2025-02-21 06:32:35,108][03249] Updated weights for policy 0, policy_version 90 (0.0013) [2025-02-21 06:32:36,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3549.9). Total num frames: 372736. Throughput: 0: 952.3. Samples: 93440. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:32:36,637][00633] Avg episode reward: [(0, '4.415')] [2025-02-21 06:32:41,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3537.5). Total num frames: 389120. Throughput: 0: 929.2. Samples: 95408. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:32:41,640][00633] Avg episode reward: [(0, '4.571')] [2025-02-21 06:32:46,301][03249] Updated weights for policy 0, policy_version 100 (0.0014) [2025-02-21 06:32:46,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3561.7). Total num frames: 409600. Throughput: 0: 959.8. Samples: 101856. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:32:46,633][00633] Avg episode reward: [(0, '4.550')] [2025-02-21 06:32:51,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3549.9). Total num frames: 425984. Throughput: 0: 941.3. Samples: 107378. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:32:51,633][00633] Avg episode reward: [(0, '4.396')] [2025-02-21 06:32:56,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3571.7). Total num frames: 446464. Throughput: 0: 943.8. Samples: 110050. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:32:56,633][00633] Avg episode reward: [(0, '4.550')] [2025-02-21 06:32:57,216][03249] Updated weights for policy 0, policy_version 110 (0.0015) [2025-02-21 06:33:01,636][00633] Fps is (10 sec: 4093.4, 60 sec: 3822.5, 300 sec: 3591.7). Total num frames: 466944. Throughput: 0: 967.4. Samples: 116590. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:33:01,641][00633] Avg episode reward: [(0, '4.528')] [2025-02-21 06:33:06,630][00633] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3580.2). Total num frames: 483328. Throughput: 0: 932.5. Samples: 121446. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:33:06,631][00633] Avg episode reward: [(0, '4.442')] [2025-02-21 06:33:08,018][03249] Updated weights for policy 0, policy_version 120 (0.0019) [2025-02-21 06:33:11,630][00633] Fps is (10 sec: 3688.6, 60 sec: 3822.9, 300 sec: 3598.6). Total num frames: 503808. Throughput: 0: 962.3. Samples: 124716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:33:11,634][00633] Avg episode reward: [(0, '4.526')] [2025-02-21 06:33:16,630][00633] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3644.0). Total num frames: 528384. Throughput: 0: 970.5. Samples: 131198. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:33:16,631][00633] Avg episode reward: [(0, '4.706')] [2025-02-21 06:33:18,303][03249] Updated weights for policy 0, policy_version 130 (0.0022) [2025-02-21 06:33:21,632][00633] Fps is (10 sec: 3685.6, 60 sec: 3754.6, 300 sec: 3604.4). Total num frames: 540672. Throughput: 0: 950.3. Samples: 136204. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:33:21,633][00633] Avg episode reward: [(0, '4.624')] [2025-02-21 06:33:26,629][00633] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3646.8). Total num frames: 565248. Throughput: 0: 979.2. Samples: 139472. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:33:26,634][00633] Avg episode reward: [(0, '4.479')] [2025-02-21 06:33:28,354][03249] Updated weights for policy 0, policy_version 140 (0.0012) [2025-02-21 06:33:31,630][00633] Fps is (10 sec: 4096.8, 60 sec: 3822.9, 300 sec: 3635.2). Total num frames: 581632. Throughput: 0: 976.9. Samples: 145818. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:33:31,631][00633] Avg episode reward: [(0, '4.680')] [2025-02-21 06:33:36,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3649.2). Total num frames: 602112. Throughput: 0: 966.0. Samples: 150850. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-21 06:33:36,631][00633] Avg episode reward: [(0, '4.757')] [2025-02-21 06:33:36,635][03235] Saving new best policy, reward=4.757! [2025-02-21 06:33:39,238][03249] Updated weights for policy 0, policy_version 150 (0.0017) [2025-02-21 06:33:41,629][00633] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3662.3). Total num frames: 622592. Throughput: 0: 978.4. Samples: 154080. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:33:41,631][00633] Avg episode reward: [(0, '4.622')] [2025-02-21 06:33:46,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3651.3). Total num frames: 638976. Throughput: 0: 963.0. Samples: 159920. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:33:46,631][00633] Avg episode reward: [(0, '4.564')] [2025-02-21 06:33:50,145][03249] Updated weights for policy 0, policy_version 160 (0.0015) [2025-02-21 06:33:51,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3663.6). Total num frames: 659456. Throughput: 0: 981.7. Samples: 165622. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:33:51,633][00633] Avg episode reward: [(0, '4.651')] [2025-02-21 06:33:56,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3675.3). Total num frames: 679936. Throughput: 0: 982.9. Samples: 168944. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:33:56,631][00633] Avg episode reward: [(0, '4.587')] [2025-02-21 06:34:00,856][03249] Updated weights for policy 0, policy_version 170 (0.0017) [2025-02-21 06:34:01,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3664.8). Total num frames: 696320. Throughput: 0: 955.4. Samples: 174190. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:34:01,631][00633] Avg episode reward: [(0, '4.596')] [2025-02-21 06:34:06,631][00633] Fps is (10 sec: 4095.5, 60 sec: 3959.4, 300 sec: 3696.9). Total num frames: 720896. Throughput: 0: 980.9. Samples: 180344. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:34:06,632][00633] Avg episode reward: [(0, '4.581')] [2025-02-21 06:34:10,468][03249] Updated weights for policy 0, policy_version 180 (0.0014) [2025-02-21 06:34:11,636][00633] Fps is (10 sec: 4502.7, 60 sec: 3959.1, 300 sec: 3706.8). Total num frames: 741376. Throughput: 0: 980.0. Samples: 183580. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:34:11,637][00633] Avg episode reward: [(0, '4.449')] [2025-02-21 06:34:16,629][00633] Fps is (10 sec: 3686.9, 60 sec: 3823.0, 300 sec: 3696.4). Total num frames: 757760. Throughput: 0: 950.9. Samples: 188606. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:34:16,631][00633] Avg episode reward: [(0, '4.332')] [2025-02-21 06:34:21,277][03249] Updated weights for policy 0, policy_version 190 (0.0013) [2025-02-21 06:34:21,629][00633] Fps is (10 sec: 3688.7, 60 sec: 3959.6, 300 sec: 3705.9). Total num frames: 778240. Throughput: 0: 983.4. Samples: 195102. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:34:21,632][00633] Avg episode reward: [(0, '4.553')] [2025-02-21 06:34:21,640][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000190_778240.pth... [2025-02-21 06:34:26,631][00633] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3715.0). Total num frames: 798720. Throughput: 0: 983.4. Samples: 198334. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:34:26,632][00633] Avg episode reward: [(0, '4.699')] [2025-02-21 06:34:31,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3705.0). Total num frames: 815104. Throughput: 0: 963.6. Samples: 203282. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:34:31,634][00633] Avg episode reward: [(0, '4.610')] [2025-02-21 06:34:32,245][03249] Updated weights for policy 0, policy_version 200 (0.0015) [2025-02-21 06:34:36,630][00633] Fps is (10 sec: 3686.9, 60 sec: 3891.2, 300 sec: 3713.7). Total num frames: 835584. Throughput: 0: 979.3. Samples: 209692. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:34:36,631][00633] Avg episode reward: [(0, '4.508')] [2025-02-21 06:34:41,630][00633] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3704.2). Total num frames: 851968. Throughput: 0: 970.2. Samples: 212604. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:34:41,631][00633] Avg episode reward: [(0, '4.483')] [2025-02-21 06:34:43,088][03249] Updated weights for policy 0, policy_version 210 (0.0013) [2025-02-21 06:34:46,629][00633] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3712.5). Total num frames: 872448. Throughput: 0: 974.0. Samples: 218018. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:34:46,631][00633] Avg episode reward: [(0, '4.674')] [2025-02-21 06:34:51,629][00633] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3737.6). Total num frames: 897024. Throughput: 0: 980.9. Samples: 224484. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:34:51,632][00633] Avg episode reward: [(0, '4.643')] [2025-02-21 06:34:52,605][03249] Updated weights for policy 0, policy_version 220 (0.0013) [2025-02-21 06:34:56,630][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3711.5). Total num frames: 909312. Throughput: 0: 961.4. Samples: 226838. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:34:56,633][00633] Avg episode reward: [(0, '4.506')] [2025-02-21 06:35:01,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3735.6). Total num frames: 933888. Throughput: 0: 980.0. Samples: 232708. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:35:01,633][00633] Avg episode reward: [(0, '4.380')] [2025-02-21 06:35:03,501][03249] Updated weights for policy 0, policy_version 230 (0.0012) [2025-02-21 06:35:06,630][00633] Fps is (10 sec: 4505.5, 60 sec: 3891.3, 300 sec: 3742.6). Total num frames: 954368. Throughput: 0: 974.3. Samples: 238944. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:35:06,636][00633] Avg episode reward: [(0, '4.666')] [2025-02-21 06:35:11,630][00633] Fps is (10 sec: 3686.3, 60 sec: 3823.3, 300 sec: 3733.7). Total num frames: 970752. Throughput: 0: 946.9. Samples: 240942. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:35:11,633][00633] Avg episode reward: [(0, '4.768')] [2025-02-21 06:35:11,643][03235] Saving new best policy, reward=4.768! [2025-02-21 06:35:14,509][03249] Updated weights for policy 0, policy_version 240 (0.0018) [2025-02-21 06:35:16,630][00633] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3740.5). Total num frames: 991232. Throughput: 0: 976.5. Samples: 247224. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:35:16,634][00633] Avg episode reward: [(0, '4.780')] [2025-02-21 06:35:16,638][03235] Saving new best policy, reward=4.780! [2025-02-21 06:35:21,629][00633] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3731.9). Total num frames: 1007616. Throughput: 0: 958.8. Samples: 252838. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:35:21,633][00633] Avg episode reward: [(0, '4.742')] [2025-02-21 06:35:25,411][03249] Updated weights for policy 0, policy_version 250 (0.0012) [2025-02-21 06:35:26,630][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3738.5). Total num frames: 1028096. Throughput: 0: 947.8. Samples: 255256. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:35:26,631][00633] Avg episode reward: [(0, '4.606')] [2025-02-21 06:35:31,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3744.9). Total num frames: 1048576. Throughput: 0: 971.9. Samples: 261752. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:35:31,630][00633] Avg episode reward: [(0, '4.508')] [2025-02-21 06:35:35,941][03249] Updated weights for policy 0, policy_version 260 (0.0013) [2025-02-21 06:35:36,630][00633] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3736.7). Total num frames: 1064960. Throughput: 0: 941.1. Samples: 266834. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:35:36,631][00633] Avg episode reward: [(0, '4.514')] [2025-02-21 06:35:41,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3742.9). Total num frames: 1085440. Throughput: 0: 954.5. Samples: 269792. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:35:41,631][00633] Avg episode reward: [(0, '4.657')] [2025-02-21 06:35:45,989][03249] Updated weights for policy 0, policy_version 270 (0.0017) [2025-02-21 06:35:46,629][00633] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 1105920. Throughput: 0: 966.2. Samples: 276186. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:35:46,633][00633] Avg episode reward: [(0, '4.642')] [2025-02-21 06:35:51,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1122304. Throughput: 0: 931.9. Samples: 280880. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:35:51,634][00633] Avg episode reward: [(0, '4.681')] [2025-02-21 06:35:56,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1142784. Throughput: 0: 957.7. Samples: 284038. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:35:56,634][00633] Avg episode reward: [(0, '4.824')] [2025-02-21 06:35:56,638][03235] Saving new best policy, reward=4.824! [2025-02-21 06:35:57,360][03249] Updated weights for policy 0, policy_version 280 (0.0014) [2025-02-21 06:36:01,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1163264. Throughput: 0: 957.5. Samples: 290310. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:36:01,631][00633] Avg episode reward: [(0, '4.679')] [2025-02-21 06:36:06,629][00633] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 1175552. Throughput: 0: 939.9. Samples: 295132. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:36:06,633][00633] Avg episode reward: [(0, '4.552')] [2025-02-21 06:36:08,526][03249] Updated weights for policy 0, policy_version 290 (0.0018) [2025-02-21 06:36:11,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1200128. Throughput: 0: 956.8. Samples: 298310. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:36:11,636][00633] Avg episode reward: [(0, '4.470')] [2025-02-21 06:36:16,631][00633] Fps is (10 sec: 4095.4, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 1216512. Throughput: 0: 948.8. Samples: 304448. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:36:16,632][00633] Avg episode reward: [(0, '4.591')] [2025-02-21 06:36:19,391][03249] Updated weights for policy 0, policy_version 300 (0.0025) [2025-02-21 06:36:21,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1236992. Throughput: 0: 953.8. Samples: 309754. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:36:21,631][00633] Avg episode reward: [(0, '4.792')] [2025-02-21 06:36:21,639][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000302_1236992.pth... [2025-02-21 06:36:21,736][03235] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth [2025-02-21 06:36:26,629][00633] Fps is (10 sec: 4096.6, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1257472. Throughput: 0: 959.1. Samples: 312952. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:36:26,635][00633] Avg episode reward: [(0, '4.834')] [2025-02-21 06:36:26,639][03235] Saving new best policy, reward=4.834! [2025-02-21 06:36:29,238][03249] Updated weights for policy 0, policy_version 310 (0.0013) [2025-02-21 06:36:31,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1273856. Throughput: 0: 941.0. Samples: 318530. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:36:31,632][00633] Avg episode reward: [(0, '4.815')] [2025-02-21 06:36:36,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 1294336. Throughput: 0: 964.0. Samples: 324260. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:36:36,634][00633] Avg episode reward: [(0, '4.625')] [2025-02-21 06:36:39,964][03249] Updated weights for policy 0, policy_version 320 (0.0017) [2025-02-21 06:36:41,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1314816. Throughput: 0: 966.2. Samples: 327518. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:36:41,631][00633] Avg episode reward: [(0, '4.799')] [2025-02-21 06:36:46,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1331200. Throughput: 0: 938.9. Samples: 332560. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:36:46,634][00633] Avg episode reward: [(0, '4.976')] [2025-02-21 06:36:46,638][03235] Saving new best policy, reward=4.976! [2025-02-21 06:36:50,899][03249] Updated weights for policy 0, policy_version 330 (0.0016) [2025-02-21 06:36:51,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1351680. Throughput: 0: 971.2. Samples: 338834. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:36:51,631][00633] Avg episode reward: [(0, '4.940')] [2025-02-21 06:36:56,631][00633] Fps is (10 sec: 4095.4, 60 sec: 3822.8, 300 sec: 3846.1). Total num frames: 1372160. Throughput: 0: 971.5. Samples: 342030. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:36:56,633][00633] Avg episode reward: [(0, '4.915')] [2025-02-21 06:37:01,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1388544. Throughput: 0: 943.3. Samples: 346896. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:37:01,631][00633] Avg episode reward: [(0, '5.042')] [2025-02-21 06:37:01,639][03235] Saving new best policy, reward=5.042! [2025-02-21 06:37:02,016][03249] Updated weights for policy 0, policy_version 340 (0.0019) [2025-02-21 06:37:06,629][00633] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1409024. Throughput: 0: 965.8. Samples: 353214. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:37:06,634][00633] Avg episode reward: [(0, '4.974')] [2025-02-21 06:37:11,633][00633] Fps is (10 sec: 4094.4, 60 sec: 3822.7, 300 sec: 3846.0). Total num frames: 1429504. Throughput: 0: 966.4. Samples: 356444. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:37:11,635][00633] Avg episode reward: [(0, '4.855')] [2025-02-21 06:37:13,002][03249] Updated weights for policy 0, policy_version 350 (0.0013) [2025-02-21 06:37:16,630][00633] Fps is (10 sec: 3686.3, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 1445888. Throughput: 0: 952.0. Samples: 361372. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:37:16,631][00633] Avg episode reward: [(0, '4.591')] [2025-02-21 06:37:21,631][00633] Fps is (10 sec: 4096.8, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 1470464. Throughput: 0: 968.2. Samples: 367830. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:37:21,637][00633] Avg episode reward: [(0, '4.399')] [2025-02-21 06:37:22,553][03249] Updated weights for policy 0, policy_version 360 (0.0013) [2025-02-21 06:37:26,631][00633] Fps is (10 sec: 3685.7, 60 sec: 3754.5, 300 sec: 3832.2). Total num frames: 1482752. Throughput: 0: 954.5. Samples: 370474. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:37:26,636][00633] Avg episode reward: [(0, '4.529')] [2025-02-21 06:37:31,629][00633] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1507328. Throughput: 0: 965.6. Samples: 376014. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:37:31,631][00633] Avg episode reward: [(0, '4.901')] [2025-02-21 06:37:33,460][03249] Updated weights for policy 0, policy_version 370 (0.0013) [2025-02-21 06:37:36,629][00633] Fps is (10 sec: 4506.5, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1527808. Throughput: 0: 967.8. Samples: 382386. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:37:36,633][00633] Avg episode reward: [(0, '4.919')] [2025-02-21 06:37:41,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1544192. Throughput: 0: 943.4. Samples: 384482. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:37:41,631][00633] Avg episode reward: [(0, '4.980')] [2025-02-21 06:37:44,420][03249] Updated weights for policy 0, policy_version 380 (0.0018) [2025-02-21 06:37:46,632][00633] Fps is (10 sec: 3685.5, 60 sec: 3891.0, 300 sec: 3859.9). Total num frames: 1564672. Throughput: 0: 970.9. Samples: 390590. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:37:46,633][00633] Avg episode reward: [(0, '4.886')] [2025-02-21 06:37:51,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1585152. Throughput: 0: 965.6. Samples: 396664. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:37:51,631][00633] Avg episode reward: [(0, '4.887')] [2025-02-21 06:37:55,262][03249] Updated weights for policy 0, policy_version 390 (0.0016) [2025-02-21 06:37:56,630][00633] Fps is (10 sec: 3687.2, 60 sec: 3823.0, 300 sec: 3846.2). Total num frames: 1601536. Throughput: 0: 940.9. Samples: 398782. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:37:56,634][00633] Avg episode reward: [(0, '4.980')] [2025-02-21 06:38:01,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1622016. Throughput: 0: 975.8. Samples: 405282. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-21 06:38:01,634][00633] Avg episode reward: [(0, '5.000')] [2025-02-21 06:38:05,249][03249] Updated weights for policy 0, policy_version 400 (0.0014) [2025-02-21 06:38:06,631][00633] Fps is (10 sec: 3686.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1638400. Throughput: 0: 953.9. Samples: 410756. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:38:06,638][00633] Avg episode reward: [(0, '5.036')] [2025-02-21 06:38:11,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3832.2). Total num frames: 1658880. Throughput: 0: 955.2. Samples: 413458. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:38:11,633][00633] Avg episode reward: [(0, '4.884')] [2025-02-21 06:38:15,821][03249] Updated weights for policy 0, policy_version 410 (0.0013) [2025-02-21 06:38:16,629][00633] Fps is (10 sec: 4096.5, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1679360. Throughput: 0: 976.0. Samples: 419934. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:38:16,634][00633] Avg episode reward: [(0, '5.009')] [2025-02-21 06:38:21,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3832.2). Total num frames: 1695744. Throughput: 0: 945.9. Samples: 424950. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:38:21,634][00633] Avg episode reward: [(0, '5.097')] [2025-02-21 06:38:21,641][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000414_1695744.pth... [2025-02-21 06:38:21,736][03235] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000190_778240.pth [2025-02-21 06:38:21,748][03235] Saving new best policy, reward=5.097! [2025-02-21 06:38:26,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 1716224. Throughput: 0: 967.4. Samples: 428014. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:38:26,633][00633] Avg episode reward: [(0, '5.292')] [2025-02-21 06:38:26,637][03235] Saving new best policy, reward=5.292! [2025-02-21 06:38:26,641][03249] Updated weights for policy 0, policy_version 420 (0.0018) [2025-02-21 06:38:31,631][00633] Fps is (10 sec: 4504.9, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 1740800. Throughput: 0: 973.9. Samples: 434416. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:38:31,635][00633] Avg episode reward: [(0, '5.204')] [2025-02-21 06:38:36,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1753088. Throughput: 0: 947.8. Samples: 439316. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:38:36,633][00633] Avg episode reward: [(0, '5.187')] [2025-02-21 06:38:37,793][03249] Updated weights for policy 0, policy_version 430 (0.0024) [2025-02-21 06:38:41,629][00633] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1777664. Throughput: 0: 972.7. Samples: 442554. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-21 06:38:41,634][00633] Avg episode reward: [(0, '5.792')] [2025-02-21 06:38:41,641][03235] Saving new best policy, reward=5.792! [2025-02-21 06:38:46,632][00633] Fps is (10 sec: 4504.6, 60 sec: 3891.2, 300 sec: 3859.9). Total num frames: 1798144. Throughput: 0: 974.1. Samples: 449120. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:38:46,634][00633] Avg episode reward: [(0, '5.581')] [2025-02-21 06:38:47,832][03249] Updated weights for policy 0, policy_version 440 (0.0016) [2025-02-21 06:38:51,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1814528. Throughput: 0: 964.2. Samples: 454142. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-21 06:38:51,634][00633] Avg episode reward: [(0, '5.581')] [2025-02-21 06:38:56,629][00633] Fps is (10 sec: 3687.2, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1835008. Throughput: 0: 978.1. Samples: 457472. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:38:56,631][00633] Avg episode reward: [(0, '5.532')] [2025-02-21 06:38:57,702][03249] Updated weights for policy 0, policy_version 450 (0.0015) [2025-02-21 06:39:01,630][00633] Fps is (10 sec: 4095.7, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 1855488. Throughput: 0: 967.4. Samples: 463466. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:39:01,634][00633] Avg episode reward: [(0, '5.641')] [2025-02-21 06:39:06,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3832.3). Total num frames: 1871872. Throughput: 0: 978.4. Samples: 468976. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:39:06,631][00633] Avg episode reward: [(0, '6.045')] [2025-02-21 06:39:06,702][03235] Saving new best policy, reward=6.045! [2025-02-21 06:39:08,610][03249] Updated weights for policy 0, policy_version 460 (0.0013) [2025-02-21 06:39:11,629][00633] Fps is (10 sec: 4096.3, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1896448. Throughput: 0: 982.3. Samples: 472218. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-21 06:39:11,634][00633] Avg episode reward: [(0, '6.128')] [2025-02-21 06:39:11,644][03235] Saving new best policy, reward=6.128! [2025-02-21 06:39:16,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1908736. Throughput: 0: 959.7. Samples: 477600. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:39:16,631][00633] Avg episode reward: [(0, '6.306')] [2025-02-21 06:39:16,649][03235] Saving new best policy, reward=6.306! [2025-02-21 06:39:19,477][03249] Updated weights for policy 0, policy_version 470 (0.0012) [2025-02-21 06:39:21,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1933312. Throughput: 0: 987.3. Samples: 483744. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:39:21,634][00633] Avg episode reward: [(0, '6.648')] [2025-02-21 06:39:21,646][03235] Saving new best policy, reward=6.648! [2025-02-21 06:39:26,632][00633] Fps is (10 sec: 4504.5, 60 sec: 3959.3, 300 sec: 3859.9). Total num frames: 1953792. Throughput: 0: 986.6. Samples: 486954. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:39:26,635][00633] Avg episode reward: [(0, '7.316')] [2025-02-21 06:39:26,637][03235] Saving new best policy, reward=7.316! [2025-02-21 06:39:30,465][03249] Updated weights for policy 0, policy_version 480 (0.0012) [2025-02-21 06:39:31,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 1970176. Throughput: 0: 951.6. Samples: 491940. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:39:31,634][00633] Avg episode reward: [(0, '7.820')] [2025-02-21 06:39:31,642][03235] Saving new best policy, reward=7.820! [2025-02-21 06:39:36,629][00633] Fps is (10 sec: 3687.3, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1990656. Throughput: 0: 984.7. Samples: 498454. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:39:36,633][00633] Avg episode reward: [(0, '7.787')] [2025-02-21 06:39:39,783][03249] Updated weights for policy 0, policy_version 490 (0.0015) [2025-02-21 06:39:41,634][00633] Fps is (10 sec: 4094.2, 60 sec: 3890.9, 300 sec: 3859.9). Total num frames: 2011136. Throughput: 0: 984.2. Samples: 501764. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:39:41,637][00633] Avg episode reward: [(0, '7.556')] [2025-02-21 06:39:46,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3832.2). Total num frames: 2027520. Throughput: 0: 962.4. Samples: 506772. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:39:46,631][00633] Avg episode reward: [(0, '7.363')] [2025-02-21 06:39:50,532][03249] Updated weights for policy 0, policy_version 500 (0.0015) [2025-02-21 06:39:51,630][00633] Fps is (10 sec: 4097.7, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2052096. Throughput: 0: 986.9. Samples: 513386. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:39:51,635][00633] Avg episode reward: [(0, '7.740')] [2025-02-21 06:39:56,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2064384. Throughput: 0: 972.8. Samples: 515994. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:39:56,636][00633] Avg episode reward: [(0, '8.295')] [2025-02-21 06:39:56,684][03235] Saving new best policy, reward=8.295! [2025-02-21 06:40:01,629][00633] Fps is (10 sec: 3276.9, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 2084864. Throughput: 0: 967.4. Samples: 521134. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:40:01,634][00633] Avg episode reward: [(0, '8.511')] [2025-02-21 06:40:01,643][03235] Saving new best policy, reward=8.511! [2025-02-21 06:40:01,859][03249] Updated weights for policy 0, policy_version 510 (0.0013) [2025-02-21 06:40:06,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2105344. Throughput: 0: 971.2. Samples: 527448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:40:06,634][00633] Avg episode reward: [(0, '8.852')] [2025-02-21 06:40:06,636][03235] Saving new best policy, reward=8.852! [2025-02-21 06:40:11,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2121728. Throughput: 0: 951.7. Samples: 529778. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:40:11,631][00633] Avg episode reward: [(0, '9.330')] [2025-02-21 06:40:11,642][03235] Saving new best policy, reward=9.330! [2025-02-21 06:40:12,929][03249] Updated weights for policy 0, policy_version 520 (0.0015) [2025-02-21 06:40:16,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2142208. Throughput: 0: 969.2. Samples: 535554. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:40:16,634][00633] Avg episode reward: [(0, '8.879')] [2025-02-21 06:40:21,633][00633] Fps is (10 sec: 4503.9, 60 sec: 3890.9, 300 sec: 3859.9). Total num frames: 2166784. Throughput: 0: 966.8. Samples: 541962. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:40:21,637][00633] Avg episode reward: [(0, '9.237')] [2025-02-21 06:40:21,654][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000529_2166784.pth... [2025-02-21 06:40:21,770][03235] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000302_1236992.pth [2025-02-21 06:40:23,330][03249] Updated weights for policy 0, policy_version 530 (0.0016) [2025-02-21 06:40:26,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3832.2). Total num frames: 2179072. Throughput: 0: 936.3. Samples: 543894. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:40:26,630][00633] Avg episode reward: [(0, '9.508')] [2025-02-21 06:40:26,686][03235] Saving new best policy, reward=9.508! [2025-02-21 06:40:31,629][00633] Fps is (10 sec: 3687.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2203648. Throughput: 0: 962.8. Samples: 550098. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:40:31,633][00633] Avg episode reward: [(0, '10.574')] [2025-02-21 06:40:31,640][03235] Saving new best policy, reward=10.574! [2025-02-21 06:40:33,509][03249] Updated weights for policy 0, policy_version 540 (0.0013) [2025-02-21 06:40:36,630][00633] Fps is (10 sec: 4095.6, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2220032. Throughput: 0: 939.9. Samples: 555682. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:40:36,632][00633] Avg episode reward: [(0, '10.869')] [2025-02-21 06:40:36,633][03235] Saving new best policy, reward=10.869! [2025-02-21 06:40:41,629][00633] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3832.2). Total num frames: 2236416. Throughput: 0: 931.0. Samples: 557890. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-21 06:40:41,631][00633] Avg episode reward: [(0, '11.613')] [2025-02-21 06:40:41,638][03235] Saving new best policy, reward=11.613! [2025-02-21 06:40:44,852][03249] Updated weights for policy 0, policy_version 550 (0.0018) [2025-02-21 06:40:46,629][00633] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2260992. Throughput: 0: 959.2. Samples: 564298. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:40:46,631][00633] Avg episode reward: [(0, '10.681')] [2025-02-21 06:40:51,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 2273280. Throughput: 0: 935.7. Samples: 569556. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:40:51,636][00633] Avg episode reward: [(0, '11.102')] [2025-02-21 06:40:55,704][03249] Updated weights for policy 0, policy_version 560 (0.0016) [2025-02-21 06:40:56,629][00633] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2293760. Throughput: 0: 947.7. Samples: 572426. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:40:56,630][00633] Avg episode reward: [(0, '10.829')] [2025-02-21 06:41:01,629][00633] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2318336. Throughput: 0: 966.3. Samples: 579036. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:41:01,635][00633] Avg episode reward: [(0, '10.993')] [2025-02-21 06:41:06,472][03249] Updated weights for policy 0, policy_version 570 (0.0012) [2025-02-21 06:41:06,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2334720. Throughput: 0: 931.5. Samples: 583874. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-21 06:41:06,631][00633] Avg episode reward: [(0, '10.302')] [2025-02-21 06:41:11,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2355200. Throughput: 0: 962.0. Samples: 587182. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:41:11,631][00633] Avg episode reward: [(0, '10.627')] [2025-02-21 06:41:15,849][03249] Updated weights for policy 0, policy_version 580 (0.0012) [2025-02-21 06:41:16,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2375680. Throughput: 0: 970.2. Samples: 593758. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:41:16,631][00633] Avg episode reward: [(0, '11.790')] [2025-02-21 06:41:16,633][03235] Saving new best policy, reward=11.790! [2025-02-21 06:41:21,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.9, 300 sec: 3846.1). Total num frames: 2392064. Throughput: 0: 953.1. Samples: 598570. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:41:21,631][00633] Avg episode reward: [(0, '11.872')] [2025-02-21 06:41:21,644][03235] Saving new best policy, reward=11.872! [2025-02-21 06:41:26,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2412544. Throughput: 0: 975.0. Samples: 601766. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:41:26,634][00633] Avg episode reward: [(0, '10.751')] [2025-02-21 06:41:26,945][03249] Updated weights for policy 0, policy_version 590 (0.0021) [2025-02-21 06:41:31,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2433024. Throughput: 0: 970.7. Samples: 607978. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-21 06:41:31,632][00633] Avg episode reward: [(0, '10.148')] [2025-02-21 06:41:36,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 2449408. Throughput: 0: 964.1. Samples: 612940. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-21 06:41:36,634][00633] Avg episode reward: [(0, '11.326')] [2025-02-21 06:41:38,130][03249] Updated weights for policy 0, policy_version 600 (0.0023) [2025-02-21 06:41:41,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2469888. Throughput: 0: 971.6. Samples: 616146. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:41:41,634][00633] Avg episode reward: [(0, '12.262')] [2025-02-21 06:41:41,640][03235] Saving new best policy, reward=12.262! [2025-02-21 06:41:46,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2486272. Throughput: 0: 947.7. Samples: 621684. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:41:46,635][00633] Avg episode reward: [(0, '12.884')] [2025-02-21 06:41:46,637][03235] Saving new best policy, reward=12.884! [2025-02-21 06:41:49,353][03249] Updated weights for policy 0, policy_version 610 (0.0015) [2025-02-21 06:41:51,632][00633] Fps is (10 sec: 3685.6, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 2506752. Throughput: 0: 962.3. Samples: 627178. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:41:51,633][00633] Avg episode reward: [(0, '13.514')] [2025-02-21 06:41:51,639][03235] Saving new best policy, reward=13.514! [2025-02-21 06:41:56,630][00633] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2527232. Throughput: 0: 960.3. Samples: 630398. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:41:56,631][00633] Avg episode reward: [(0, '14.491')] [2025-02-21 06:41:56,633][03235] Saving new best policy, reward=14.491! [2025-02-21 06:42:00,049][03249] Updated weights for policy 0, policy_version 620 (0.0012) [2025-02-21 06:42:01,629][00633] Fps is (10 sec: 3687.3, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2543616. Throughput: 0: 926.6. Samples: 635456. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:42:01,635][00633] Avg episode reward: [(0, '15.208')] [2025-02-21 06:42:01,647][03235] Saving new best policy, reward=15.208! [2025-02-21 06:42:06,629][00633] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2564096. Throughput: 0: 955.7. Samples: 641576. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:42:06,634][00633] Avg episode reward: [(0, '15.257')] [2025-02-21 06:42:06,636][03235] Saving new best policy, reward=15.257! [2025-02-21 06:42:10,127][03249] Updated weights for policy 0, policy_version 630 (0.0017) [2025-02-21 06:42:11,632][00633] Fps is (10 sec: 4095.1, 60 sec: 3822.8, 300 sec: 3859.9). Total num frames: 2584576. Throughput: 0: 955.5. Samples: 644764. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:42:11,633][00633] Avg episode reward: [(0, '14.354')] [2025-02-21 06:42:16,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2600960. Throughput: 0: 926.8. Samples: 649686. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:42:16,630][00633] Avg episode reward: [(0, '13.374')] [2025-02-21 06:42:20,952][03249] Updated weights for policy 0, policy_version 640 (0.0012) [2025-02-21 06:42:21,629][00633] Fps is (10 sec: 3687.2, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2621440. Throughput: 0: 961.4. Samples: 656202. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:42:21,634][00633] Avg episode reward: [(0, '12.630')] [2025-02-21 06:42:21,644][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000640_2621440.pth... [2025-02-21 06:42:21,747][03235] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000414_1695744.pth [2025-02-21 06:42:26,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2641920. Throughput: 0: 961.2. Samples: 659400. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:42:26,640][00633] Avg episode reward: [(0, '13.512')] [2025-02-21 06:42:31,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2658304. Throughput: 0: 947.4. Samples: 664318. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:42:31,634][00633] Avg episode reward: [(0, '14.127')] [2025-02-21 06:42:31,909][03249] Updated weights for policy 0, policy_version 650 (0.0017) [2025-02-21 06:42:36,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2678784. Throughput: 0: 970.4. Samples: 670844. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-21 06:42:36,631][00633] Avg episode reward: [(0, '15.081')] [2025-02-21 06:42:41,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2695168. Throughput: 0: 962.9. Samples: 673728. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:42:41,633][00633] Avg episode reward: [(0, '17.309')] [2025-02-21 06:42:41,712][03235] Saving new best policy, reward=17.309! [2025-02-21 06:42:42,991][03249] Updated weights for policy 0, policy_version 660 (0.0016) [2025-02-21 06:42:46,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2719744. Throughput: 0: 968.9. Samples: 679058. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:42:46,631][00633] Avg episode reward: [(0, '17.524')] [2025-02-21 06:42:46,635][03235] Saving new best policy, reward=17.524! [2025-02-21 06:42:51,630][00633] Fps is (10 sec: 4505.6, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 2740224. Throughput: 0: 976.4. Samples: 685516. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:42:51,631][00633] Avg episode reward: [(0, '16.913')] [2025-02-21 06:42:52,234][03249] Updated weights for policy 0, policy_version 670 (0.0025) [2025-02-21 06:42:56,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 2756608. Throughput: 0: 957.1. Samples: 687830. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:42:56,631][00633] Avg episode reward: [(0, '16.347')] [2025-02-21 06:43:01,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2777088. Throughput: 0: 980.1. Samples: 693792. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:43:01,640][00633] Avg episode reward: [(0, '14.886')] [2025-02-21 06:43:03,126][03249] Updated weights for policy 0, policy_version 680 (0.0019) [2025-02-21 06:43:06,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2797568. Throughput: 0: 973.1. Samples: 699992. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:43:06,633][00633] Avg episode reward: [(0, '15.890')] [2025-02-21 06:43:11,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3846.1). Total num frames: 2813952. Throughput: 0: 947.1. Samples: 702020. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:43:11,634][00633] Avg episode reward: [(0, '15.895')] [2025-02-21 06:43:14,010][03249] Updated weights for policy 0, policy_version 690 (0.0022) [2025-02-21 06:43:16,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2834432. Throughput: 0: 981.6. Samples: 708490. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:43:16,635][00633] Avg episode reward: [(0, '16.401')] [2025-02-21 06:43:21,631][00633] Fps is (10 sec: 4095.2, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 2854912. Throughput: 0: 966.3. Samples: 714328. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:43:21,635][00633] Avg episode reward: [(0, '16.124')] [2025-02-21 06:43:24,865][03249] Updated weights for policy 0, policy_version 700 (0.0021) [2025-02-21 06:43:26,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2871296. Throughput: 0: 955.5. Samples: 716724. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:43:26,631][00633] Avg episode reward: [(0, '16.295')] [2025-02-21 06:43:31,629][00633] Fps is (10 sec: 4096.8, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2895872. Throughput: 0: 982.6. Samples: 723276. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:43:31,631][00633] Avg episode reward: [(0, '17.191')] [2025-02-21 06:43:34,422][03249] Updated weights for policy 0, policy_version 710 (0.0013) [2025-02-21 06:43:36,632][00633] Fps is (10 sec: 4095.1, 60 sec: 3891.1, 300 sec: 3846.0). Total num frames: 2912256. Throughput: 0: 955.8. Samples: 728530. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-21 06:43:36,633][00633] Avg episode reward: [(0, '16.189')] [2025-02-21 06:43:41,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2932736. Throughput: 0: 969.7. Samples: 731468. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:43:41,634][00633] Avg episode reward: [(0, '16.043')] [2025-02-21 06:43:45,129][03249] Updated weights for policy 0, policy_version 720 (0.0020) [2025-02-21 06:43:46,629][00633] Fps is (10 sec: 4096.9, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2953216. Throughput: 0: 981.4. Samples: 737956. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:43:46,631][00633] Avg episode reward: [(0, '15.850')] [2025-02-21 06:43:51,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2969600. Throughput: 0: 951.0. Samples: 742788. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:43:51,631][00633] Avg episode reward: [(0, '15.300')] [2025-02-21 06:43:56,200][03249] Updated weights for policy 0, policy_version 730 (0.0014) [2025-02-21 06:43:56,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2990080. Throughput: 0: 977.5. Samples: 746008. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:43:56,630][00633] Avg episode reward: [(0, '15.431')] [2025-02-21 06:44:01,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3010560. Throughput: 0: 976.7. Samples: 752442. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:44:01,631][00633] Avg episode reward: [(0, '15.857')] [2025-02-21 06:44:06,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3026944. Throughput: 0: 954.7. Samples: 757286. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:44:06,636][00633] Avg episode reward: [(0, '16.030')] [2025-02-21 06:44:07,104][03249] Updated weights for policy 0, policy_version 740 (0.0012) [2025-02-21 06:44:11,630][00633] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3047424. Throughput: 0: 975.5. Samples: 760622. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:44:11,634][00633] Avg episode reward: [(0, '15.656')] [2025-02-21 06:44:16,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3067904. Throughput: 0: 969.9. Samples: 766922. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:44:16,631][00633] Avg episode reward: [(0, '15.856')] [2025-02-21 06:44:17,369][03249] Updated weights for policy 0, policy_version 750 (0.0013) [2025-02-21 06:44:21,629][00633] Fps is (10 sec: 4096.1, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 3088384. Throughput: 0: 973.7. Samples: 772344. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:44:21,631][00633] Avg episode reward: [(0, '16.615')] [2025-02-21 06:44:21,637][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000754_3088384.pth... [2025-02-21 06:44:21,732][03235] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000529_2166784.pth [2025-02-21 06:44:26,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3108864. Throughput: 0: 980.2. Samples: 775578. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:44:26,631][00633] Avg episode reward: [(0, '16.065')] [2025-02-21 06:44:27,070][03249] Updated weights for policy 0, policy_version 760 (0.0013) [2025-02-21 06:44:31,632][00633] Fps is (10 sec: 3685.6, 60 sec: 3822.8, 300 sec: 3846.0). Total num frames: 3125248. Throughput: 0: 964.7. Samples: 781370. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:44:31,637][00633] Avg episode reward: [(0, '16.002')] [2025-02-21 06:44:36,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 3145728. Throughput: 0: 987.7. Samples: 787234. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:44:36,631][00633] Avg episode reward: [(0, '16.158')] [2025-02-21 06:44:37,908][03249] Updated weights for policy 0, policy_version 770 (0.0016) [2025-02-21 06:44:41,629][00633] Fps is (10 sec: 4096.9, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3166208. Throughput: 0: 989.9. Samples: 790554. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:44:41,633][00633] Avg episode reward: [(0, '15.648')] [2025-02-21 06:44:46,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3182592. Throughput: 0: 958.4. Samples: 795572. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:44:46,634][00633] Avg episode reward: [(0, '16.317')] [2025-02-21 06:44:48,765][03249] Updated weights for policy 0, policy_version 780 (0.0013) [2025-02-21 06:44:51,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3207168. Throughput: 0: 991.9. Samples: 801920. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:44:51,635][00633] Avg episode reward: [(0, '16.032')] [2025-02-21 06:44:56,629][00633] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3227648. Throughput: 0: 992.1. Samples: 805266. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:44:56,634][00633] Avg episode reward: [(0, '15.715')] [2025-02-21 06:44:59,565][03249] Updated weights for policy 0, policy_version 790 (0.0014) [2025-02-21 06:45:01,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3244032. Throughput: 0: 964.0. Samples: 810300. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:45:01,634][00633] Avg episode reward: [(0, '16.717')] [2025-02-21 06:45:06,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3264512. Throughput: 0: 988.5. Samples: 816826. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:45:06,633][00633] Avg episode reward: [(0, '15.446')] [2025-02-21 06:45:09,032][03249] Updated weights for policy 0, policy_version 800 (0.0012) [2025-02-21 06:45:11,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3280896. Throughput: 0: 987.2. Samples: 820004. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:45:11,639][00633] Avg episode reward: [(0, '16.983')] [2025-02-21 06:45:16,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3301376. Throughput: 0: 969.6. Samples: 824998. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:45:16,635][00633] Avg episode reward: [(0, '17.080')] [2025-02-21 06:45:19,660][03249] Updated weights for policy 0, policy_version 810 (0.0023) [2025-02-21 06:45:21,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3321856. Throughput: 0: 987.0. Samples: 831648. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-21 06:45:21,634][00633] Avg episode reward: [(0, '15.694')] [2025-02-21 06:45:26,631][00633] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3846.1). Total num frames: 3338240. Throughput: 0: 972.4. Samples: 834312. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:45:26,635][00633] Avg episode reward: [(0, '16.521')] [2025-02-21 06:45:30,401][03249] Updated weights for policy 0, policy_version 820 (0.0014) [2025-02-21 06:45:31,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3873.9). Total num frames: 3362816. Throughput: 0: 986.0. Samples: 839942. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:45:31,631][00633] Avg episode reward: [(0, '16.527')] [2025-02-21 06:45:36,629][00633] Fps is (10 sec: 4506.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3383296. Throughput: 0: 990.4. Samples: 846490. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:45:36,631][00633] Avg episode reward: [(0, '16.314')] [2025-02-21 06:45:41,297][03249] Updated weights for policy 0, policy_version 830 (0.0016) [2025-02-21 06:45:41,630][00633] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3399680. Throughput: 0: 961.3. Samples: 848524. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:45:41,634][00633] Avg episode reward: [(0, '16.597')] [2025-02-21 06:45:46,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3420160. Throughput: 0: 987.0. Samples: 854716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:45:46,631][00633] Avg episode reward: [(0, '16.870')] [2025-02-21 06:45:50,767][03249] Updated weights for policy 0, policy_version 840 (0.0012) [2025-02-21 06:45:51,636][00633] Fps is (10 sec: 4093.5, 60 sec: 3890.8, 300 sec: 3887.6). Total num frames: 3440640. Throughput: 0: 977.5. Samples: 860820. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-21 06:45:51,637][00633] Avg episode reward: [(0, '17.907')] [2025-02-21 06:45:51,644][03235] Saving new best policy, reward=17.907! [2025-02-21 06:45:56,632][00633] Fps is (10 sec: 3685.6, 60 sec: 3822.8, 300 sec: 3859.9). Total num frames: 3457024. Throughput: 0: 954.5. Samples: 862960. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:45:56,633][00633] Avg episode reward: [(0, '18.825')] [2025-02-21 06:45:56,639][03235] Saving new best policy, reward=18.825! [2025-02-21 06:46:01,629][00633] Fps is (10 sec: 3688.7, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3477504. Throughput: 0: 987.5. Samples: 869434. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:46:01,633][00633] Avg episode reward: [(0, '18.699')] [2025-02-21 06:46:01,680][03249] Updated weights for policy 0, policy_version 850 (0.0015) [2025-02-21 06:46:06,632][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.0, 300 sec: 3873.8). Total num frames: 3497984. Throughput: 0: 962.8. Samples: 874974. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:46:06,633][00633] Avg episode reward: [(0, '18.975')] [2025-02-21 06:46:06,635][03235] Saving new best policy, reward=18.975! [2025-02-21 06:46:11,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3518464. Throughput: 0: 962.5. Samples: 877624. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:46:11,634][00633] Avg episode reward: [(0, '19.914')] [2025-02-21 06:46:11,646][03235] Saving new best policy, reward=19.914! [2025-02-21 06:46:12,449][03249] Updated weights for policy 0, policy_version 860 (0.0017) [2025-02-21 06:46:16,629][00633] Fps is (10 sec: 4096.9, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3538944. Throughput: 0: 981.3. Samples: 884100. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-21 06:46:16,631][00633] Avg episode reward: [(0, '20.320')] [2025-02-21 06:46:16,634][03235] Saving new best policy, reward=20.320! [2025-02-21 06:46:21,630][00633] Fps is (10 sec: 3686.2, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3555328. Throughput: 0: 945.5. Samples: 889040. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:46:21,634][00633] Avg episode reward: [(0, '19.895')] [2025-02-21 06:46:21,646][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000868_3555328.pth... [2025-02-21 06:46:21,739][03235] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000640_2621440.pth [2025-02-21 06:46:23,503][03249] Updated weights for policy 0, policy_version 870 (0.0012) [2025-02-21 06:46:26,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3873.8). Total num frames: 3575808. Throughput: 0: 970.7. Samples: 892206. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-21 06:46:26,635][00633] Avg episode reward: [(0, '20.598')] [2025-02-21 06:46:26,638][03235] Saving new best policy, reward=20.598! [2025-02-21 06:46:31,629][00633] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3596288. Throughput: 0: 975.2. Samples: 898602. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:46:31,631][00633] Avg episode reward: [(0, '19.860')] [2025-02-21 06:46:33,931][03249] Updated weights for policy 0, policy_version 880 (0.0014) [2025-02-21 06:46:36,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3612672. Throughput: 0: 951.2. Samples: 903620. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:46:36,634][00633] Avg episode reward: [(0, '19.158')] [2025-02-21 06:46:41,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3633152. Throughput: 0: 975.3. Samples: 906846. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:46:41,632][00633] Avg episode reward: [(0, '18.844')] [2025-02-21 06:46:43,932][03249] Updated weights for policy 0, policy_version 890 (0.0017) [2025-02-21 06:46:46,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.8). Total num frames: 3653632. Throughput: 0: 978.1. Samples: 913448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:46:46,634][00633] Avg episode reward: [(0, '18.061')] [2025-02-21 06:46:51,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3873.8). Total num frames: 3670016. Throughput: 0: 968.8. Samples: 918566. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-21 06:46:51,640][00633] Avg episode reward: [(0, '18.716')] [2025-02-21 06:46:54,460][03249] Updated weights for policy 0, policy_version 900 (0.0018) [2025-02-21 06:46:56,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3901.6). Total num frames: 3694592. Throughput: 0: 983.0. Samples: 921860. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:46:56,634][00633] Avg episode reward: [(0, '18.784')] [2025-02-21 06:47:01,630][00633] Fps is (10 sec: 4095.6, 60 sec: 3891.1, 300 sec: 3887.7). Total num frames: 3710976. Throughput: 0: 973.4. Samples: 927906. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:47:01,632][00633] Avg episode reward: [(0, '19.546')] [2025-02-21 06:47:05,156][03249] Updated weights for policy 0, policy_version 910 (0.0013) [2025-02-21 06:47:06,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3887.8). Total num frames: 3731456. Throughput: 0: 989.2. Samples: 933554. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:47:06,631][00633] Avg episode reward: [(0, '20.245')] [2025-02-21 06:47:11,629][00633] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3751936. Throughput: 0: 989.6. Samples: 936738. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:47:11,631][00633] Avg episode reward: [(0, '20.760')] [2025-02-21 06:47:11,642][03235] Saving new best policy, reward=20.760! [2025-02-21 06:47:16,066][03249] Updated weights for policy 0, policy_version 920 (0.0021) [2025-02-21 06:47:16,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3768320. Throughput: 0: 964.7. Samples: 942012. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:47:16,633][00633] Avg episode reward: [(0, '21.315')] [2025-02-21 06:47:16,636][03235] Saving new best policy, reward=21.315! [2025-02-21 06:47:21,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3788800. Throughput: 0: 986.5. Samples: 948014. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:47:21,636][00633] Avg episode reward: [(0, '21.805')] [2025-02-21 06:47:21,652][03235] Saving new best policy, reward=21.805! [2025-02-21 06:47:25,751][03249] Updated weights for policy 0, policy_version 930 (0.0013) [2025-02-21 06:47:26,631][00633] Fps is (10 sec: 4095.1, 60 sec: 3891.1, 300 sec: 3901.6). Total num frames: 3809280. Throughput: 0: 986.1. Samples: 951224. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:47:26,634][00633] Avg episode reward: [(0, '20.612')] [2025-02-21 06:47:31,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3825664. Throughput: 0: 948.7. Samples: 956140. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:47:31,634][00633] Avg episode reward: [(0, '20.119')] [2025-02-21 06:47:36,624][03249] Updated weights for policy 0, policy_version 940 (0.0016) [2025-02-21 06:47:36,629][00633] Fps is (10 sec: 4096.9, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3850240. Throughput: 0: 981.8. Samples: 962748. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:47:36,635][00633] Avg episode reward: [(0, '18.520')] [2025-02-21 06:47:41,633][00633] Fps is (10 sec: 4094.7, 60 sec: 3891.0, 300 sec: 3887.7). Total num frames: 3866624. Throughput: 0: 978.2. Samples: 965880. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:47:41,634][00633] Avg episode reward: [(0, '19.359')] [2025-02-21 06:47:46,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3887104. Throughput: 0: 955.7. Samples: 970910. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:47:46,634][00633] Avg episode reward: [(0, '18.524')] [2025-02-21 06:47:47,593][03249] Updated weights for policy 0, policy_version 950 (0.0018) [2025-02-21 06:47:51,629][00633] Fps is (10 sec: 4097.3, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 3907584. Throughput: 0: 975.1. Samples: 977432. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:47:51,634][00633] Avg episode reward: [(0, '18.558')] [2025-02-21 06:47:56,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3923968. Throughput: 0: 972.9. Samples: 980520. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:47:56,635][00633] Avg episode reward: [(0, '19.072')] [2025-02-21 06:47:58,294][03249] Updated weights for policy 0, policy_version 960 (0.0018) [2025-02-21 06:48:01,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3887.7). Total num frames: 3944448. Throughput: 0: 970.8. Samples: 985696. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:48:01,635][00633] Avg episode reward: [(0, '19.279')] [2025-02-21 06:48:06,629][00633] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3964928. Throughput: 0: 984.3. Samples: 992306. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-21 06:48:06,635][00633] Avg episode reward: [(0, '18.612')] [2025-02-21 06:48:07,555][03249] Updated weights for policy 0, policy_version 970 (0.0013) [2025-02-21 06:48:11,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3981312. Throughput: 0: 967.5. Samples: 994760. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-21 06:48:11,633][00633] Avg episode reward: [(0, '18.726')] [2025-02-21 06:48:16,545][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-21 06:48:16,547][03235] Stopping Batcher_0... [2025-02-21 06:48:16,552][03235] Loop batcher_evt_loop terminating... [2025-02-21 06:48:16,551][00633] Component Batcher_0 stopped! [2025-02-21 06:48:16,557][00633] Component RolloutWorker_w0 process died already! Don't wait for it. [2025-02-21 06:48:16,560][00633] Component RolloutWorker_w2 process died already! Don't wait for it. [2025-02-21 06:48:16,564][00633] Component RolloutWorker_w3 process died already! Don't wait for it. [2025-02-21 06:48:16,565][00633] Component RolloutWorker_w7 process died already! Don't wait for it. [2025-02-21 06:48:16,620][03249] Weights refcount: 2 0 [2025-02-21 06:48:16,623][00633] Component InferenceWorker_p0-w0 stopped! [2025-02-21 06:48:16,627][03249] Stopping InferenceWorker_p0-w0... [2025-02-21 06:48:16,629][03249] Loop inference_proc0-0_evt_loop terminating... [2025-02-21 06:48:16,638][03235] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000754_3088384.pth [2025-02-21 06:48:16,652][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-21 06:48:16,812][03235] Stopping LearnerWorker_p0... [2025-02-21 06:48:16,818][03235] Loop learner_proc0_evt_loop terminating... [2025-02-21 06:48:16,818][00633] Component LearnerWorker_p0 stopped! [2025-02-21 06:48:16,859][00633] Component RolloutWorker_w1 stopped! [2025-02-21 06:48:16,864][03251] Stopping RolloutWorker_w1... [2025-02-21 06:48:16,865][03251] Loop rollout_proc1_evt_loop terminating... [2025-02-21 06:48:16,873][00633] Component RolloutWorker_w5 stopped! [2025-02-21 06:48:16,876][03254] Stopping RolloutWorker_w5... [2025-02-21 06:48:16,877][03254] Loop rollout_proc5_evt_loop terminating... [2025-02-21 06:48:16,952][03255] Stopping RolloutWorker_w6... [2025-02-21 06:48:16,952][00633] Component RolloutWorker_w6 stopped! [2025-02-21 06:48:16,954][03255] Loop rollout_proc6_evt_loop terminating... [2025-02-21 06:48:16,960][03253] Stopping RolloutWorker_w4... [2025-02-21 06:48:16,960][00633] Component RolloutWorker_w4 stopped! [2025-02-21 06:48:16,962][00633] Waiting for process learner_proc0 to stop... [2025-02-21 06:48:16,961][03253] Loop rollout_proc4_evt_loop terminating... [2025-02-21 06:48:18,364][00633] Waiting for process inference_proc0-0 to join... [2025-02-21 06:48:18,366][00633] Waiting for process rollout_proc0 to join... [2025-02-21 06:48:18,368][00633] Waiting for process rollout_proc1 to join... [2025-02-21 06:48:19,096][00633] Waiting for process rollout_proc2 to join... [2025-02-21 06:48:19,097][00633] Waiting for process rollout_proc3 to join... [2025-02-21 06:48:19,098][00633] Waiting for process rollout_proc4 to join... [2025-02-21 06:48:19,101][00633] Waiting for process rollout_proc5 to join... [2025-02-21 06:48:19,102][00633] Waiting for process rollout_proc6 to join... [2025-02-21 06:48:19,103][00633] Waiting for process rollout_proc7 to join... [2025-02-21 06:48:19,104][00633] Batcher 0 profile tree view: batching: 22.1334, releasing_batches: 0.0303 [2025-02-21 06:48:19,105][00633] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 405.8358 update_model: 9.2132 weight_update: 0.0036 one_step: 0.0102 handle_policy_step: 591.4368 deserialize: 14.4476, stack: 3.5130, obs_to_device_normalize: 132.5441, forward: 308.9790, send_messages: 22.1924 prepare_outputs: 83.2722 to_cpu: 52.2151 [2025-02-21 06:48:19,106][00633] Learner 0 profile tree view: misc: 0.0046, prepare_batch: 12.3153 train: 66.1561 epoch_init: 0.0057, minibatch_init: 0.0055, losses_postprocess: 0.5684, kl_divergence: 0.5593, after_optimizer: 32.1202 calculate_losses: 22.0446 losses_init: 0.0032, forward_head: 1.1831, bptt_initial: 15.1547, tail: 0.8670, advantages_returns: 0.1997, losses: 2.8600 bptt: 1.5676 bptt_forward_core: 1.5002 update: 10.3855 clip: 0.8488 [2025-02-21 06:48:19,108][00633] Loop Runner_EvtLoop terminating... [2025-02-21 06:48:19,109][00633] Runner profile tree view: main_loop: 1070.6298 [2025-02-21 06:48:19,110][00633] Collected {0: 4005888}, FPS: 3741.6 [2025-02-21 06:55:30,394][00633] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-02-21 06:55:30,397][00633] Overriding arg 'num_workers' with value 1 passed from command line [2025-02-21 06:55:30,399][00633] Adding new argument 'no_render'=True that is not in the saved config file! [2025-02-21 06:55:30,400][00633] Adding new argument 'save_video'=True that is not in the saved config file! [2025-02-21 06:55:30,401][00633] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-02-21 06:55:30,402][00633] Adding new argument 'video_name'=None that is not in the saved config file! [2025-02-21 06:55:30,402][00633] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-02-21 06:55:30,403][00633] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-02-21 06:55:30,404][00633] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-02-21 06:55:30,405][00633] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-02-21 06:55:30,406][00633] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-02-21 06:55:30,406][00633] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-02-21 06:55:30,407][00633] Adding new argument 'train_script'=None that is not in the saved config file! [2025-02-21 06:55:30,408][00633] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-02-21 06:55:30,410][00633] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-02-21 06:55:30,458][00633] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-21 06:55:30,461][00633] RunningMeanStd input shape: (3, 72, 128) [2025-02-21 06:55:30,463][00633] RunningMeanStd input shape: (1,) [2025-02-21 06:55:30,491][00633] ConvEncoder: input_channels=3 [2025-02-21 06:55:30,652][00633] Conv encoder output size: 512 [2025-02-21 06:55:30,654][00633] Policy head output size: 512 [2025-02-21 06:55:30,971][00633] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-21 06:55:32,042][00633] Num frames 100... [2025-02-21 06:55:32,220][00633] Num frames 200... [2025-02-21 06:55:32,401][00633] Num frames 300... [2025-02-21 06:55:32,546][00633] Num frames 400... [2025-02-21 06:55:32,685][00633] Num frames 500... [2025-02-21 06:55:32,813][00633] Num frames 600... [2025-02-21 06:55:32,944][00633] Num frames 700... [2025-02-21 06:55:33,073][00633] Num frames 800... [2025-02-21 06:55:33,208][00633] Num frames 900... [2025-02-21 06:55:33,341][00633] Num frames 1000... [2025-02-21 06:55:33,470][00633] Num frames 1100... [2025-02-21 06:55:33,556][00633] Avg episode rewards: #0: 23.210, true rewards: #0: 11.210 [2025-02-21 06:55:33,556][00633] Avg episode reward: 23.210, avg true_objective: 11.210 [2025-02-21 06:55:33,674][00633] Num frames 1200... [2025-02-21 06:55:33,802][00633] Num frames 1300... [2025-02-21 06:55:33,927][00633] Num frames 1400... [2025-02-21 06:55:34,054][00633] Num frames 1500... [2025-02-21 06:55:34,183][00633] Num frames 1600... [2025-02-21 06:55:34,320][00633] Num frames 1700... [2025-02-21 06:55:34,448][00633] Num frames 1800... [2025-02-21 06:55:34,577][00633] Num frames 1900... [2025-02-21 06:55:34,716][00633] Num frames 2000... [2025-02-21 06:55:34,846][00633] Num frames 2100... [2025-02-21 06:55:34,975][00633] Num frames 2200... [2025-02-21 06:55:35,087][00633] Avg episode rewards: #0: 24.205, true rewards: #0: 11.205 [2025-02-21 06:55:35,087][00633] Avg episode reward: 24.205, avg true_objective: 11.205 [2025-02-21 06:55:35,163][00633] Num frames 2300... [2025-02-21 06:55:35,299][00633] Num frames 2400... [2025-02-21 06:55:35,426][00633] Num frames 2500... [2025-02-21 06:55:35,554][00633] Num frames 2600... [2025-02-21 06:55:35,679][00633] Num frames 2700... [2025-02-21 06:55:35,810][00633] Num frames 2800... [2025-02-21 06:55:35,935][00633] Num frames 2900... [2025-02-21 06:55:36,061][00633] Num frames 3000... [2025-02-21 06:55:36,189][00633] Num frames 3100... [2025-02-21 06:55:36,298][00633] Avg episode rewards: #0: 21.790, true rewards: #0: 10.457 [2025-02-21 06:55:36,298][00633] Avg episode reward: 21.790, avg true_objective: 10.457 [2025-02-21 06:55:36,389][00633] Num frames 3200... [2025-02-21 06:55:36,519][00633] Num frames 3300... [2025-02-21 06:55:36,644][00633] Num frames 3400... [2025-02-21 06:55:36,781][00633] Num frames 3500... [2025-02-21 06:55:36,909][00633] Num frames 3600... [2025-02-21 06:55:37,036][00633] Num frames 3700... [2025-02-21 06:55:37,165][00633] Num frames 3800... [2025-02-21 06:55:37,233][00633] Avg episode rewards: #0: 19.273, true rewards: #0: 9.522 [2025-02-21 06:55:37,234][00633] Avg episode reward: 19.273, avg true_objective: 9.522 [2025-02-21 06:55:37,348][00633] Num frames 3900... [2025-02-21 06:55:37,477][00633] Num frames 4000... [2025-02-21 06:55:37,606][00633] Num frames 4100... [2025-02-21 06:55:37,733][00633] Num frames 4200... [2025-02-21 06:55:37,870][00633] Num frames 4300... [2025-02-21 06:55:38,003][00633] Num frames 4400... [2025-02-21 06:55:38,131][00633] Num frames 4500... [2025-02-21 06:55:38,263][00633] Num frames 4600... [2025-02-21 06:55:38,394][00633] Num frames 4700... [2025-02-21 06:55:38,528][00633] Num frames 4800... [2025-02-21 06:55:38,708][00633] Avg episode rewards: #0: 20.394, true rewards: #0: 9.794 [2025-02-21 06:55:38,709][00633] Avg episode reward: 20.394, avg true_objective: 9.794 [2025-02-21 06:55:38,715][00633] Num frames 4900... [2025-02-21 06:55:38,849][00633] Num frames 5000... [2025-02-21 06:55:38,977][00633] Num frames 5100... [2025-02-21 06:55:39,103][00633] Num frames 5200... [2025-02-21 06:55:39,238][00633] Num frames 5300... [2025-02-21 06:55:39,368][00633] Num frames 5400... [2025-02-21 06:55:39,475][00633] Avg episode rewards: #0: 18.735, true rewards: #0: 9.068 [2025-02-21 06:55:39,476][00633] Avg episode reward: 18.735, avg true_objective: 9.068 [2025-02-21 06:55:39,554][00633] Num frames 5500... [2025-02-21 06:55:39,691][00633] Num frames 5600... [2025-02-21 06:55:39,844][00633] Num frames 5700... [2025-02-21 06:55:39,973][00633] Num frames 5800... [2025-02-21 06:55:40,103][00633] Num frames 5900... [2025-02-21 06:55:40,241][00633] Num frames 6000... [2025-02-21 06:55:40,369][00633] Num frames 6100... [2025-02-21 06:55:40,498][00633] Num frames 6200... [2025-02-21 06:55:40,627][00633] Num frames 6300... [2025-02-21 06:55:40,758][00633] Num frames 6400... [2025-02-21 06:55:40,809][00633] Avg episode rewards: #0: 19.000, true rewards: #0: 9.143 [2025-02-21 06:55:40,810][00633] Avg episode reward: 19.000, avg true_objective: 9.143 [2025-02-21 06:55:40,945][00633] Num frames 6500... [2025-02-21 06:55:41,074][00633] Num frames 6600... [2025-02-21 06:55:41,202][00633] Num frames 6700... [2025-02-21 06:55:41,286][00633] Avg episode rewards: #0: 17.400, true rewards: #0: 8.400 [2025-02-21 06:55:41,286][00633] Avg episode reward: 17.400, avg true_objective: 8.400 [2025-02-21 06:55:41,391][00633] Num frames 6800... [2025-02-21 06:55:41,520][00633] Num frames 6900... [2025-02-21 06:55:41,647][00633] Num frames 7000... [2025-02-21 06:55:41,773][00633] Num frames 7100... [2025-02-21 06:55:41,911][00633] Num frames 7200... [2025-02-21 06:55:42,038][00633] Num frames 7300... [2025-02-21 06:55:42,164][00633] Num frames 7400... [2025-02-21 06:55:42,301][00633] Num frames 7500... [2025-02-21 06:55:42,427][00633] Num frames 7600... [2025-02-21 06:55:42,491][00633] Avg episode rewards: #0: 17.563, true rewards: #0: 8.452 [2025-02-21 06:55:42,492][00633] Avg episode reward: 17.563, avg true_objective: 8.452 [2025-02-21 06:55:42,655][00633] Num frames 7700... [2025-02-21 06:55:42,827][00633] Num frames 7800... [2025-02-21 06:55:43,000][00633] Num frames 7900... [2025-02-21 06:55:43,171][00633] Num frames 8000... [2025-02-21 06:55:43,358][00633] Num frames 8100... [2025-02-21 06:55:43,525][00633] Num frames 8200... [2025-02-21 06:55:43,690][00633] Num frames 8300... [2025-02-21 06:55:43,878][00633] Num frames 8400... [2025-02-21 06:55:44,061][00633] Num frames 8500... [2025-02-21 06:55:44,237][00633] Num frames 8600... [2025-02-21 06:55:44,422][00633] Num frames 8700... [2025-02-21 06:55:44,640][00633] Avg episode rewards: #0: 18.691, true rewards: #0: 8.791 [2025-02-21 06:55:44,641][00633] Avg episode reward: 18.691, avg true_objective: 8.791 [2025-02-21 06:56:32,957][00633] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-02-21 07:03:58,216][00633] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-02-21 07:03:58,217][00633] Overriding arg 'num_workers' with value 1 passed from command line [2025-02-21 07:03:58,218][00633] Adding new argument 'no_render'=True that is not in the saved config file! [2025-02-21 07:03:58,219][00633] Adding new argument 'save_video'=True that is not in the saved config file! [2025-02-21 07:03:58,220][00633] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-02-21 07:03:58,220][00633] Adding new argument 'video_name'=None that is not in the saved config file! [2025-02-21 07:03:58,221][00633] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-02-21 07:03:58,222][00633] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-02-21 07:03:58,223][00633] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-02-21 07:03:58,224][00633] Adding new argument 'hf_repository'='mjkim0928/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-02-21 07:03:58,225][00633] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-02-21 07:03:58,225][00633] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-02-21 07:03:58,226][00633] Adding new argument 'train_script'=None that is not in the saved config file! [2025-02-21 07:03:58,228][00633] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-02-21 07:03:58,228][00633] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-02-21 07:03:58,254][00633] RunningMeanStd input shape: (3, 72, 128) [2025-02-21 07:03:58,255][00633] RunningMeanStd input shape: (1,) [2025-02-21 07:03:58,265][00633] ConvEncoder: input_channels=3 [2025-02-21 07:03:58,297][00633] Conv encoder output size: 512 [2025-02-21 07:03:58,298][00633] Policy head output size: 512 [2025-02-21 07:03:58,316][00633] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-21 07:03:58,737][00633] Num frames 100... [2025-02-21 07:03:58,864][00633] Num frames 200... [2025-02-21 07:03:58,990][00633] Num frames 300... [2025-02-21 07:03:59,118][00633] Num frames 400... [2025-02-21 07:03:59,259][00633] Num frames 500... [2025-02-21 07:03:59,388][00633] Num frames 600... [2025-02-21 07:03:59,516][00633] Num frames 700... [2025-02-21 07:03:59,642][00633] Num frames 800... [2025-02-21 07:03:59,739][00633] Avg episode rewards: #0: 14.320, true rewards: #0: 8.320 [2025-02-21 07:03:59,740][00633] Avg episode reward: 14.320, avg true_objective: 8.320 [2025-02-21 07:03:59,828][00633] Num frames 900... [2025-02-21 07:03:59,953][00633] Num frames 1000... [2025-02-21 07:04:00,078][00633] Num frames 1100... [2025-02-21 07:04:00,222][00633] Num frames 1200... [2025-02-21 07:04:00,349][00633] Num frames 1300... [2025-02-21 07:04:00,476][00633] Num frames 1400... [2025-02-21 07:04:00,604][00633] Num frames 1500... [2025-02-21 07:04:00,735][00633] Num frames 1600... [2025-02-21 07:04:00,787][00633] Avg episode rewards: #0: 15.000, true rewards: #0: 8.000 [2025-02-21 07:04:00,788][00633] Avg episode reward: 15.000, avg true_objective: 8.000 [2025-02-21 07:04:00,916][00633] Num frames 1700... [2025-02-21 07:04:01,044][00633] Num frames 1800... [2025-02-21 07:04:01,172][00633] Num frames 1900... [2025-02-21 07:04:01,312][00633] Num frames 2000... [2025-02-21 07:04:01,437][00633] Num frames 2100... [2025-02-21 07:04:01,565][00633] Num frames 2200... [2025-02-21 07:04:01,670][00633] Avg episode rewards: #0: 14.467, true rewards: #0: 7.467 [2025-02-21 07:04:01,671][00633] Avg episode reward: 14.467, avg true_objective: 7.467 [2025-02-21 07:04:01,746][00633] Num frames 2300... [2025-02-21 07:04:01,870][00633] Num frames 2400... [2025-02-21 07:04:01,994][00633] Num frames 2500... [2025-02-21 07:04:02,122][00633] Num frames 2600... [2025-02-21 07:04:02,256][00633] Num frames 2700... [2025-02-21 07:04:02,421][00633] Avg episode rewards: #0: 13.460, true rewards: #0: 6.960 [2025-02-21 07:04:02,423][00633] Avg episode reward: 13.460, avg true_objective: 6.960 [2025-02-21 07:04:02,449][00633] Num frames 2800... [2025-02-21 07:04:02,578][00633] Num frames 2900... [2025-02-21 07:04:02,707][00633] Num frames 3000... [2025-02-21 07:04:02,834][00633] Num frames 3100... [2025-02-21 07:04:02,962][00633] Num frames 3200... [2025-02-21 07:04:03,089][00633] Num frames 3300... [2025-02-21 07:04:03,224][00633] Num frames 3400... [2025-02-21 07:04:03,364][00633] Num frames 3500... [2025-02-21 07:04:03,488][00633] Num frames 3600... [2025-02-21 07:04:03,615][00633] Num frames 3700... [2025-02-21 07:04:03,741][00633] Num frames 3800... [2025-02-21 07:04:03,865][00633] Num frames 3900... [2025-02-21 07:04:03,990][00633] Num frames 4000... [2025-02-21 07:04:04,111][00633] Num frames 4100... [2025-02-21 07:04:04,241][00633] Num frames 4200... [2025-02-21 07:04:04,327][00633] Avg episode rewards: #0: 17.248, true rewards: #0: 8.448 [2025-02-21 07:04:04,328][00633] Avg episode reward: 17.248, avg true_objective: 8.448 [2025-02-21 07:04:04,426][00633] Num frames 4300... [2025-02-21 07:04:04,552][00633] Num frames 4400... [2025-02-21 07:04:04,679][00633] Num frames 4500... [2025-02-21 07:04:04,806][00633] Num frames 4600... [2025-02-21 07:04:04,932][00633] Num frames 4700... [2025-02-21 07:04:05,059][00633] Num frames 4800... [2025-02-21 07:04:05,186][00633] Num frames 4900... [2025-02-21 07:04:05,325][00633] Num frames 5000... [2025-02-21 07:04:05,458][00633] Num frames 5100... [2025-02-21 07:04:05,588][00633] Num frames 5200... [2025-02-21 07:04:05,743][00633] Num frames 5300... [2025-02-21 07:04:05,920][00633] Num frames 5400... [2025-02-21 07:04:05,990][00633] Avg episode rewards: #0: 19.180, true rewards: #0: 9.013 [2025-02-21 07:04:05,991][00633] Avg episode reward: 19.180, avg true_objective: 9.013 [2025-02-21 07:04:06,146][00633] Num frames 5500... [2025-02-21 07:04:06,314][00633] Num frames 5600... [2025-02-21 07:04:06,490][00633] Num frames 5700... [2025-02-21 07:04:06,654][00633] Num frames 5800... [2025-02-21 07:04:06,818][00633] Num frames 5900... [2025-02-21 07:04:06,994][00633] Num frames 6000... [2025-02-21 07:04:07,171][00633] Num frames 6100... [2025-02-21 07:04:07,353][00633] Num frames 6200... [2025-02-21 07:04:07,542][00633] Num frames 6300... [2025-02-21 07:04:07,721][00633] Num frames 6400... [2025-02-21 07:04:07,871][00633] Num frames 6500... [2025-02-21 07:04:08,000][00633] Num frames 6600... [2025-02-21 07:04:08,166][00633] Avg episode rewards: #0: 21.269, true rewards: #0: 9.554 [2025-02-21 07:04:08,167][00633] Avg episode reward: 21.269, avg true_objective: 9.554 [2025-02-21 07:04:08,184][00633] Num frames 6700... [2025-02-21 07:04:08,311][00633] Num frames 6800... [2025-02-21 07:04:08,438][00633] Num frames 6900... [2025-02-21 07:04:08,573][00633] Num frames 7000... [2025-02-21 07:04:08,703][00633] Num frames 7100... [2025-02-21 07:04:08,829][00633] Num frames 7200... [2025-02-21 07:04:08,958][00633] Num frames 7300... [2025-02-21 07:04:09,088][00633] Num frames 7400... [2025-02-21 07:04:09,258][00633] Avg episode rewards: #0: 20.489, true rewards: #0: 9.364 [2025-02-21 07:04:09,259][00633] Avg episode reward: 20.489, avg true_objective: 9.364 [2025-02-21 07:04:09,272][00633] Num frames 7500... [2025-02-21 07:04:09,396][00633] Num frames 7600... [2025-02-21 07:04:09,529][00633] Num frames 7700... [2025-02-21 07:04:09,657][00633] Num frames 7800... [2025-02-21 07:04:09,783][00633] Num frames 7900... [2025-02-21 07:04:09,908][00633] Num frames 8000... [2025-02-21 07:04:10,091][00633] Avg episode rewards: #0: 19.332, true rewards: #0: 8.999 [2025-02-21 07:04:10,092][00633] Avg episode reward: 19.332, avg true_objective: 8.999 [2025-02-21 07:04:10,094][00633] Num frames 8100... [2025-02-21 07:04:10,225][00633] Num frames 8200... [2025-02-21 07:04:10,353][00633] Num frames 8300... [2025-02-21 07:04:10,478][00633] Num frames 8400... [2025-02-21 07:04:10,615][00633] Num frames 8500... [2025-02-21 07:04:10,693][00633] Avg episode rewards: #0: 18.017, true rewards: #0: 8.517 [2025-02-21 07:04:10,695][00633] Avg episode reward: 18.017, avg true_objective: 8.517 [2025-02-21 07:04:57,181][00633] Replay video saved to /content/train_dir/default_experiment/replay.mp4!