mjkim0928's picture
Upload folder using huggingface_hub
549b7a3 verified
[2025-02-21 06:30:28,235][00633] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-02-21 06:30:28,239][00633] Rollout worker 0 uses device cpu
[2025-02-21 06:30:28,241][00633] Rollout worker 1 uses device cpu
[2025-02-21 06:30:28,242][00633] Rollout worker 2 uses device cpu
[2025-02-21 06:30:28,246][00633] Rollout worker 3 uses device cpu
[2025-02-21 06:30:28,247][00633] Rollout worker 4 uses device cpu
[2025-02-21 06:30:28,248][00633] Rollout worker 5 uses device cpu
[2025-02-21 06:30:28,248][00633] Rollout worker 6 uses device cpu
[2025-02-21 06:30:28,249][00633] Rollout worker 7 uses device cpu
[2025-02-21 06:30:28,436][00633] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-21 06:30:28,438][00633] InferenceWorker_p0-w0: min num requests: 2
[2025-02-21 06:30:28,479][00633] Starting all processes...
[2025-02-21 06:30:28,480][00633] Starting process learner_proc0
[2025-02-21 06:30:28,562][00633] Starting all processes...
[2025-02-21 06:30:28,690][00633] Starting process inference_proc0-0
[2025-02-21 06:30:28,691][00633] Starting process rollout_proc0
[2025-02-21 06:30:28,691][00633] Starting process rollout_proc1
[2025-02-21 06:30:28,691][00633] Starting process rollout_proc2
[2025-02-21 06:30:28,691][00633] Starting process rollout_proc3
[2025-02-21 06:30:28,691][00633] Starting process rollout_proc4
[2025-02-21 06:30:28,691][00633] Starting process rollout_proc5
[2025-02-21 06:30:28,692][00633] Starting process rollout_proc6
[2025-02-21 06:30:28,692][00633] Starting process rollout_proc7
[2025-02-21 06:30:44,797][03235] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-21 06:30:44,797][03235] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-02-21 06:30:44,873][03235] Num visible devices: 1
[2025-02-21 06:30:44,915][03235] Starting seed is not provided
[2025-02-21 06:30:44,916][03235] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-21 06:30:44,916][03235] Initializing actor-critic model on device cuda:0
[2025-02-21 06:30:44,917][03235] RunningMeanStd input shape: (3, 72, 128)
[2025-02-21 06:30:44,920][03235] RunningMeanStd input shape: (1,)
[2025-02-21 06:30:44,999][03235] ConvEncoder: input_channels=3
[2025-02-21 06:30:45,290][03256] Worker 7 uses CPU cores [1]
[2025-02-21 06:30:45,478][03253] Worker 4 uses CPU cores [0]
[2025-02-21 06:30:45,515][03255] Worker 6 uses CPU cores [0]
[2025-02-21 06:30:45,685][03251] Worker 1 uses CPU cores [1]
[2025-02-21 06:30:45,686][03250] Worker 2 uses CPU cores [0]
[2025-02-21 06:30:45,727][03248] Worker 0 uses CPU cores [0]
[2025-02-21 06:30:45,730][03249] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-21 06:30:45,730][03249] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-02-21 06:30:45,757][03254] Worker 5 uses CPU cores [1]
[2025-02-21 06:30:45,762][03249] Num visible devices: 1
[2025-02-21 06:30:45,812][03252] Worker 3 uses CPU cores [1]
[2025-02-21 06:30:45,820][03235] Conv encoder output size: 512
[2025-02-21 06:30:45,821][03235] Policy head output size: 512
[2025-02-21 06:30:45,878][03235] Created Actor Critic model with architecture:
[2025-02-21 06:30:45,878][03235] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2025-02-21 06:30:46,199][03235] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-02-21 06:30:48,426][00633] Heartbeat connected on Batcher_0
[2025-02-21 06:30:48,437][00633] Heartbeat connected on InferenceWorker_p0-w0
[2025-02-21 06:30:48,445][00633] Heartbeat connected on RolloutWorker_w0
[2025-02-21 06:30:48,451][00633] Heartbeat connected on RolloutWorker_w1
[2025-02-21 06:30:48,455][00633] Heartbeat connected on RolloutWorker_w2
[2025-02-21 06:30:48,465][00633] Heartbeat connected on RolloutWorker_w3
[2025-02-21 06:30:48,466][00633] Heartbeat connected on RolloutWorker_w4
[2025-02-21 06:30:48,472][00633] Heartbeat connected on RolloutWorker_w5
[2025-02-21 06:30:48,475][00633] Heartbeat connected on RolloutWorker_w6
[2025-02-21 06:30:48,479][00633] Heartbeat connected on RolloutWorker_w7
[2025-02-21 06:30:51,093][03235] No checkpoints found
[2025-02-21 06:30:51,093][03235] Did not load from checkpoint, starting from scratch!
[2025-02-21 06:30:51,093][03235] Initialized policy 0 weights for model version 0
[2025-02-21 06:30:51,096][03235] LearnerWorker_p0 finished initialization!
[2025-02-21 06:30:51,097][00633] Heartbeat connected on LearnerWorker_p0
[2025-02-21 06:30:51,097][03235] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-21 06:30:51,347][03249] RunningMeanStd input shape: (3, 72, 128)
[2025-02-21 06:30:51,348][03249] RunningMeanStd input shape: (1,)
[2025-02-21 06:30:51,360][03249] ConvEncoder: input_channels=3
[2025-02-21 06:30:51,475][03249] Conv encoder output size: 512
[2025-02-21 06:30:51,475][03249] Policy head output size: 512
[2025-02-21 06:30:51,510][00633] Inference worker 0-0 is ready!
[2025-02-21 06:30:51,511][00633] All inference workers are ready! Signal rollout workers to start!
[2025-02-21 06:30:51,629][00633] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-21 06:30:51,757][03254] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:30:51,763][03250] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:30:51,824][03248] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:30:51,831][03256] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:30:51,867][03251] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:30:51,887][03255] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:30:51,906][03252] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:30:51,933][03253] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:30:53,282][03254] Decorrelating experience for 0 frames...
[2025-02-21 06:30:53,283][03251] Decorrelating experience for 0 frames...
[2025-02-21 06:30:53,282][03255] Decorrelating experience for 0 frames...
[2025-02-21 06:30:53,284][03253] Decorrelating experience for 0 frames...
[2025-02-21 06:30:54,038][03255] Decorrelating experience for 32 frames...
[2025-02-21 06:30:54,041][03253] Decorrelating experience for 32 frames...
[2025-02-21 06:30:54,055][03254] Decorrelating experience for 32 frames...
[2025-02-21 06:30:54,057][03251] Decorrelating experience for 32 frames...
[2025-02-21 06:30:55,051][03253] Decorrelating experience for 64 frames...
[2025-02-21 06:30:55,056][03255] Decorrelating experience for 64 frames...
[2025-02-21 06:30:55,174][03254] Decorrelating experience for 64 frames...
[2025-02-21 06:30:55,177][03251] Decorrelating experience for 64 frames...
[2025-02-21 06:30:55,915][03253] Decorrelating experience for 96 frames...
[2025-02-21 06:30:55,916][03255] Decorrelating experience for 96 frames...
[2025-02-21 06:30:56,011][03254] Decorrelating experience for 96 frames...
[2025-02-21 06:30:56,009][03251] Decorrelating experience for 96 frames...
[2025-02-21 06:30:56,629][00633] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-21 06:30:59,561][03235] Signal inference workers to stop experience collection...
[2025-02-21 06:30:59,576][03249] InferenceWorker_p0-w0: stopping experience collection
[2025-02-21 06:31:01,626][03235] Signal inference workers to resume experience collection...
[2025-02-21 06:31:01,627][03249] InferenceWorker_p0-w0: resuming experience collection
[2025-02-21 06:31:01,630][00633] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 216.6. Samples: 2166. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-21 06:31:01,631][00633] Avg episode reward: [(0, '2.959')]
[2025-02-21 06:31:06,629][00633] Fps is (10 sec: 2457.6, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 24576. Throughput: 0: 454.7. Samples: 6820. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:31:06,633][00633] Avg episode reward: [(0, '3.932')]
[2025-02-21 06:31:09,833][03249] Updated weights for policy 0, policy_version 10 (0.0015)
[2025-02-21 06:31:11,629][00633] Fps is (10 sec: 4505.8, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 45056. Throughput: 0: 504.4. Samples: 10088. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:31:11,631][00633] Avg episode reward: [(0, '4.303')]
[2025-02-21 06:31:16,629][00633] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 61440. Throughput: 0: 617.4. Samples: 15436. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:31:16,631][00633] Avg episode reward: [(0, '4.309')]
[2025-02-21 06:31:20,766][03249] Updated weights for policy 0, policy_version 20 (0.0015)
[2025-02-21 06:31:21,629][00633] Fps is (10 sec: 3686.4, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 81920. Throughput: 0: 717.7. Samples: 21530. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:31:21,631][00633] Avg episode reward: [(0, '4.463')]
[2025-02-21 06:31:26,629][00633] Fps is (10 sec: 4505.5, 60 sec: 3042.7, 300 sec: 3042.7). Total num frames: 106496. Throughput: 0: 705.0. Samples: 24674. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:31:26,631][00633] Avg episode reward: [(0, '4.528')]
[2025-02-21 06:31:26,632][03235] Saving new best policy, reward=4.528!
[2025-02-21 06:31:31,629][00633] Fps is (10 sec: 3686.4, 60 sec: 2969.6, 300 sec: 2969.6). Total num frames: 118784. Throughput: 0: 739.9. Samples: 29596. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:31:31,631][00633] Avg episode reward: [(0, '4.334')]
[2025-02-21 06:31:32,010][03249] Updated weights for policy 0, policy_version 30 (0.0015)
[2025-02-21 06:31:36,629][00633] Fps is (10 sec: 3276.8, 60 sec: 3094.8, 300 sec: 3094.8). Total num frames: 139264. Throughput: 0: 800.2. Samples: 36008. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:31:36,631][00633] Avg episode reward: [(0, '4.307')]
[2025-02-21 06:31:41,631][00633] Fps is (10 sec: 4095.5, 60 sec: 3194.8, 300 sec: 3194.8). Total num frames: 159744. Throughput: 0: 871.4. Samples: 39212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:31:41,632][00633] Avg episode reward: [(0, '4.462')]
[2025-02-21 06:31:41,827][03249] Updated weights for policy 0, policy_version 40 (0.0013)
[2025-02-21 06:31:46,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 180224. Throughput: 0: 934.2. Samples: 44206. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:31:46,631][00633] Avg episode reward: [(0, '4.390')]
[2025-02-21 06:31:51,629][00633] Fps is (10 sec: 4096.6, 60 sec: 3345.1, 300 sec: 3345.1). Total num frames: 200704. Throughput: 0: 972.6. Samples: 50586. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:31:51,633][00633] Avg episode reward: [(0, '4.657')]
[2025-02-21 06:31:51,639][03235] Saving new best policy, reward=4.657!
[2025-02-21 06:31:52,400][03249] Updated weights for policy 0, policy_version 50 (0.0017)
[2025-02-21 06:31:56,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3339.8). Total num frames: 217088. Throughput: 0: 966.8. Samples: 53594. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:31:56,631][00633] Avg episode reward: [(0, '4.713')]
[2025-02-21 06:31:56,632][03235] Saving new best policy, reward=4.713!
[2025-02-21 06:32:01,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3393.8). Total num frames: 237568. Throughput: 0: 960.6. Samples: 58664. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:32:01,836][00633] Avg episode reward: [(0, '4.500')]
[2025-02-21 06:32:03,474][03249] Updated weights for policy 0, policy_version 60 (0.0012)
[2025-02-21 06:32:06,631][00633] Fps is (10 sec: 4095.3, 60 sec: 3891.1, 300 sec: 3440.6). Total num frames: 258048. Throughput: 0: 966.4. Samples: 65020. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:32:06,632][00633] Avg episode reward: [(0, '4.527')]
[2025-02-21 06:32:11,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3430.4). Total num frames: 274432. Throughput: 0: 953.5. Samples: 67580. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:32:11,631][00633] Avg episode reward: [(0, '4.593')]
[2025-02-21 06:32:14,545][03249] Updated weights for policy 0, policy_version 70 (0.0016)
[2025-02-21 06:32:16,629][00633] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3469.6). Total num frames: 294912. Throughput: 0: 965.7. Samples: 73052. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:32:16,631][00633] Avg episode reward: [(0, '4.637')]
[2025-02-21 06:32:21,631][00633] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3504.3). Total num frames: 315392. Throughput: 0: 966.1. Samples: 79486. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:32:21,634][00633] Avg episode reward: [(0, '4.670')]
[2025-02-21 06:32:21,642][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth...
[2025-02-21 06:32:25,626][03249] Updated weights for policy 0, policy_version 80 (0.0012)
[2025-02-21 06:32:26,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3492.4). Total num frames: 331776. Throughput: 0: 937.8. Samples: 81414. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:32:26,634][00633] Avg episode reward: [(0, '4.638')]
[2025-02-21 06:32:31,629][00633] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3522.6). Total num frames: 352256. Throughput: 0: 962.6. Samples: 87524. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:32:31,633][00633] Avg episode reward: [(0, '4.596')]
[2025-02-21 06:32:35,108][03249] Updated weights for policy 0, policy_version 90 (0.0013)
[2025-02-21 06:32:36,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3549.9). Total num frames: 372736. Throughput: 0: 952.3. Samples: 93440. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:32:36,637][00633] Avg episode reward: [(0, '4.415')]
[2025-02-21 06:32:41,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3537.5). Total num frames: 389120. Throughput: 0: 929.2. Samples: 95408. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:32:41,640][00633] Avg episode reward: [(0, '4.571')]
[2025-02-21 06:32:46,301][03249] Updated weights for policy 0, policy_version 100 (0.0014)
[2025-02-21 06:32:46,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3561.7). Total num frames: 409600. Throughput: 0: 959.8. Samples: 101856. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:32:46,633][00633] Avg episode reward: [(0, '4.550')]
[2025-02-21 06:32:51,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3549.9). Total num frames: 425984. Throughput: 0: 941.3. Samples: 107378. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:32:51,633][00633] Avg episode reward: [(0, '4.396')]
[2025-02-21 06:32:56,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3571.7). Total num frames: 446464. Throughput: 0: 943.8. Samples: 110050. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:32:56,633][00633] Avg episode reward: [(0, '4.550')]
[2025-02-21 06:32:57,216][03249] Updated weights for policy 0, policy_version 110 (0.0015)
[2025-02-21 06:33:01,636][00633] Fps is (10 sec: 4093.4, 60 sec: 3822.5, 300 sec: 3591.7). Total num frames: 466944. Throughput: 0: 967.4. Samples: 116590. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:33:01,641][00633] Avg episode reward: [(0, '4.528')]
[2025-02-21 06:33:06,630][00633] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3580.2). Total num frames: 483328. Throughput: 0: 932.5. Samples: 121446. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:33:06,631][00633] Avg episode reward: [(0, '4.442')]
[2025-02-21 06:33:08,018][03249] Updated weights for policy 0, policy_version 120 (0.0019)
[2025-02-21 06:33:11,630][00633] Fps is (10 sec: 3688.6, 60 sec: 3822.9, 300 sec: 3598.6). Total num frames: 503808. Throughput: 0: 962.3. Samples: 124716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:33:11,634][00633] Avg episode reward: [(0, '4.526')]
[2025-02-21 06:33:16,630][00633] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3644.0). Total num frames: 528384. Throughput: 0: 970.5. Samples: 131198. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:33:16,631][00633] Avg episode reward: [(0, '4.706')]
[2025-02-21 06:33:18,303][03249] Updated weights for policy 0, policy_version 130 (0.0022)
[2025-02-21 06:33:21,632][00633] Fps is (10 sec: 3685.6, 60 sec: 3754.6, 300 sec: 3604.4). Total num frames: 540672. Throughput: 0: 950.3. Samples: 136204. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:33:21,633][00633] Avg episode reward: [(0, '4.624')]
[2025-02-21 06:33:26,629][00633] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3646.8). Total num frames: 565248. Throughput: 0: 979.2. Samples: 139472. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:33:26,634][00633] Avg episode reward: [(0, '4.479')]
[2025-02-21 06:33:28,354][03249] Updated weights for policy 0, policy_version 140 (0.0012)
[2025-02-21 06:33:31,630][00633] Fps is (10 sec: 4096.8, 60 sec: 3822.9, 300 sec: 3635.2). Total num frames: 581632. Throughput: 0: 976.9. Samples: 145818. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:33:31,631][00633] Avg episode reward: [(0, '4.680')]
[2025-02-21 06:33:36,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3649.2). Total num frames: 602112. Throughput: 0: 966.0. Samples: 150850. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:33:36,631][00633] Avg episode reward: [(0, '4.757')]
[2025-02-21 06:33:36,635][03235] Saving new best policy, reward=4.757!
[2025-02-21 06:33:39,238][03249] Updated weights for policy 0, policy_version 150 (0.0017)
[2025-02-21 06:33:41,629][00633] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3662.3). Total num frames: 622592. Throughput: 0: 978.4. Samples: 154080. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:33:41,631][00633] Avg episode reward: [(0, '4.622')]
[2025-02-21 06:33:46,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3651.3). Total num frames: 638976. Throughput: 0: 963.0. Samples: 159920. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:33:46,631][00633] Avg episode reward: [(0, '4.564')]
[2025-02-21 06:33:50,145][03249] Updated weights for policy 0, policy_version 160 (0.0015)
[2025-02-21 06:33:51,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3663.6). Total num frames: 659456. Throughput: 0: 981.7. Samples: 165622. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:33:51,633][00633] Avg episode reward: [(0, '4.651')]
[2025-02-21 06:33:56,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3675.3). Total num frames: 679936. Throughput: 0: 982.9. Samples: 168944. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:33:56,631][00633] Avg episode reward: [(0, '4.587')]
[2025-02-21 06:34:00,856][03249] Updated weights for policy 0, policy_version 170 (0.0017)
[2025-02-21 06:34:01,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3664.8). Total num frames: 696320. Throughput: 0: 955.4. Samples: 174190. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:34:01,631][00633] Avg episode reward: [(0, '4.596')]
[2025-02-21 06:34:06,631][00633] Fps is (10 sec: 4095.5, 60 sec: 3959.4, 300 sec: 3696.9). Total num frames: 720896. Throughput: 0: 980.9. Samples: 180344. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:34:06,632][00633] Avg episode reward: [(0, '4.581')]
[2025-02-21 06:34:10,468][03249] Updated weights for policy 0, policy_version 180 (0.0014)
[2025-02-21 06:34:11,636][00633] Fps is (10 sec: 4502.7, 60 sec: 3959.1, 300 sec: 3706.8). Total num frames: 741376. Throughput: 0: 980.0. Samples: 183580. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:34:11,637][00633] Avg episode reward: [(0, '4.449')]
[2025-02-21 06:34:16,629][00633] Fps is (10 sec: 3686.9, 60 sec: 3823.0, 300 sec: 3696.4). Total num frames: 757760. Throughput: 0: 950.9. Samples: 188606. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:34:16,631][00633] Avg episode reward: [(0, '4.332')]
[2025-02-21 06:34:21,277][03249] Updated weights for policy 0, policy_version 190 (0.0013)
[2025-02-21 06:34:21,629][00633] Fps is (10 sec: 3688.7, 60 sec: 3959.6, 300 sec: 3705.9). Total num frames: 778240. Throughput: 0: 983.4. Samples: 195102. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:34:21,632][00633] Avg episode reward: [(0, '4.553')]
[2025-02-21 06:34:21,640][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000190_778240.pth...
[2025-02-21 06:34:26,631][00633] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3715.0). Total num frames: 798720. Throughput: 0: 983.4. Samples: 198334. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:34:26,632][00633] Avg episode reward: [(0, '4.699')]
[2025-02-21 06:34:31,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3705.0). Total num frames: 815104. Throughput: 0: 963.6. Samples: 203282. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:34:31,634][00633] Avg episode reward: [(0, '4.610')]
[2025-02-21 06:34:32,245][03249] Updated weights for policy 0, policy_version 200 (0.0015)
[2025-02-21 06:34:36,630][00633] Fps is (10 sec: 3686.9, 60 sec: 3891.2, 300 sec: 3713.7). Total num frames: 835584. Throughput: 0: 979.3. Samples: 209692. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:34:36,631][00633] Avg episode reward: [(0, '4.508')]
[2025-02-21 06:34:41,630][00633] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3704.2). Total num frames: 851968. Throughput: 0: 970.2. Samples: 212604. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:34:41,631][00633] Avg episode reward: [(0, '4.483')]
[2025-02-21 06:34:43,088][03249] Updated weights for policy 0, policy_version 210 (0.0013)
[2025-02-21 06:34:46,629][00633] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3712.5). Total num frames: 872448. Throughput: 0: 974.0. Samples: 218018. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:34:46,631][00633] Avg episode reward: [(0, '4.674')]
[2025-02-21 06:34:51,629][00633] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3737.6). Total num frames: 897024. Throughput: 0: 980.9. Samples: 224484. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:34:51,632][00633] Avg episode reward: [(0, '4.643')]
[2025-02-21 06:34:52,605][03249] Updated weights for policy 0, policy_version 220 (0.0013)
[2025-02-21 06:34:56,630][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3711.5). Total num frames: 909312. Throughput: 0: 961.4. Samples: 226838. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:34:56,633][00633] Avg episode reward: [(0, '4.506')]
[2025-02-21 06:35:01,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3735.6). Total num frames: 933888. Throughput: 0: 980.0. Samples: 232708. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:35:01,633][00633] Avg episode reward: [(0, '4.380')]
[2025-02-21 06:35:03,501][03249] Updated weights for policy 0, policy_version 230 (0.0012)
[2025-02-21 06:35:06,630][00633] Fps is (10 sec: 4505.5, 60 sec: 3891.3, 300 sec: 3742.6). Total num frames: 954368. Throughput: 0: 974.3. Samples: 238944. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:35:06,636][00633] Avg episode reward: [(0, '4.666')]
[2025-02-21 06:35:11,630][00633] Fps is (10 sec: 3686.3, 60 sec: 3823.3, 300 sec: 3733.7). Total num frames: 970752. Throughput: 0: 946.9. Samples: 240942. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:35:11,633][00633] Avg episode reward: [(0, '4.768')]
[2025-02-21 06:35:11,643][03235] Saving new best policy, reward=4.768!
[2025-02-21 06:35:14,509][03249] Updated weights for policy 0, policy_version 240 (0.0018)
[2025-02-21 06:35:16,630][00633] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3740.5). Total num frames: 991232. Throughput: 0: 976.5. Samples: 247224. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:35:16,634][00633] Avg episode reward: [(0, '4.780')]
[2025-02-21 06:35:16,638][03235] Saving new best policy, reward=4.780!
[2025-02-21 06:35:21,629][00633] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3731.9). Total num frames: 1007616. Throughput: 0: 958.8. Samples: 252838. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:35:21,633][00633] Avg episode reward: [(0, '4.742')]
[2025-02-21 06:35:25,411][03249] Updated weights for policy 0, policy_version 250 (0.0012)
[2025-02-21 06:35:26,630][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3738.5). Total num frames: 1028096. Throughput: 0: 947.8. Samples: 255256. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:35:26,631][00633] Avg episode reward: [(0, '4.606')]
[2025-02-21 06:35:31,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3744.9). Total num frames: 1048576. Throughput: 0: 971.9. Samples: 261752. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:35:31,630][00633] Avg episode reward: [(0, '4.508')]
[2025-02-21 06:35:35,941][03249] Updated weights for policy 0, policy_version 260 (0.0013)
[2025-02-21 06:35:36,630][00633] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3736.7). Total num frames: 1064960. Throughput: 0: 941.1. Samples: 266834. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:35:36,631][00633] Avg episode reward: [(0, '4.514')]
[2025-02-21 06:35:41,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3742.9). Total num frames: 1085440. Throughput: 0: 954.5. Samples: 269792. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:35:41,631][00633] Avg episode reward: [(0, '4.657')]
[2025-02-21 06:35:45,989][03249] Updated weights for policy 0, policy_version 270 (0.0017)
[2025-02-21 06:35:46,629][00633] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 1105920. Throughput: 0: 966.2. Samples: 276186. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:35:46,633][00633] Avg episode reward: [(0, '4.642')]
[2025-02-21 06:35:51,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1122304. Throughput: 0: 931.9. Samples: 280880. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:35:51,634][00633] Avg episode reward: [(0, '4.681')]
[2025-02-21 06:35:56,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1142784. Throughput: 0: 957.7. Samples: 284038. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:35:56,634][00633] Avg episode reward: [(0, '4.824')]
[2025-02-21 06:35:56,638][03235] Saving new best policy, reward=4.824!
[2025-02-21 06:35:57,360][03249] Updated weights for policy 0, policy_version 280 (0.0014)
[2025-02-21 06:36:01,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1163264. Throughput: 0: 957.5. Samples: 290310. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:36:01,631][00633] Avg episode reward: [(0, '4.679')]
[2025-02-21 06:36:06,629][00633] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 1175552. Throughput: 0: 939.9. Samples: 295132. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:36:06,633][00633] Avg episode reward: [(0, '4.552')]
[2025-02-21 06:36:08,526][03249] Updated weights for policy 0, policy_version 290 (0.0018)
[2025-02-21 06:36:11,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1200128. Throughput: 0: 956.8. Samples: 298310. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:36:11,636][00633] Avg episode reward: [(0, '4.470')]
[2025-02-21 06:36:16,631][00633] Fps is (10 sec: 4095.4, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 1216512. Throughput: 0: 948.8. Samples: 304448. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:36:16,632][00633] Avg episode reward: [(0, '4.591')]
[2025-02-21 06:36:19,391][03249] Updated weights for policy 0, policy_version 300 (0.0025)
[2025-02-21 06:36:21,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1236992. Throughput: 0: 953.8. Samples: 309754. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:36:21,631][00633] Avg episode reward: [(0, '4.792')]
[2025-02-21 06:36:21,639][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000302_1236992.pth...
[2025-02-21 06:36:21,736][03235] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth
[2025-02-21 06:36:26,629][00633] Fps is (10 sec: 4096.6, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1257472. Throughput: 0: 959.1. Samples: 312952. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:36:26,635][00633] Avg episode reward: [(0, '4.834')]
[2025-02-21 06:36:26,639][03235] Saving new best policy, reward=4.834!
[2025-02-21 06:36:29,238][03249] Updated weights for policy 0, policy_version 310 (0.0013)
[2025-02-21 06:36:31,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1273856. Throughput: 0: 941.0. Samples: 318530. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:36:31,632][00633] Avg episode reward: [(0, '4.815')]
[2025-02-21 06:36:36,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 1294336. Throughput: 0: 964.0. Samples: 324260. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:36:36,634][00633] Avg episode reward: [(0, '4.625')]
[2025-02-21 06:36:39,964][03249] Updated weights for policy 0, policy_version 320 (0.0017)
[2025-02-21 06:36:41,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1314816. Throughput: 0: 966.2. Samples: 327518. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:36:41,631][00633] Avg episode reward: [(0, '4.799')]
[2025-02-21 06:36:46,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1331200. Throughput: 0: 938.9. Samples: 332560. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:36:46,634][00633] Avg episode reward: [(0, '4.976')]
[2025-02-21 06:36:46,638][03235] Saving new best policy, reward=4.976!
[2025-02-21 06:36:50,899][03249] Updated weights for policy 0, policy_version 330 (0.0016)
[2025-02-21 06:36:51,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1351680. Throughput: 0: 971.2. Samples: 338834. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:36:51,631][00633] Avg episode reward: [(0, '4.940')]
[2025-02-21 06:36:56,631][00633] Fps is (10 sec: 4095.4, 60 sec: 3822.8, 300 sec: 3846.1). Total num frames: 1372160. Throughput: 0: 971.5. Samples: 342030. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:36:56,633][00633] Avg episode reward: [(0, '4.915')]
[2025-02-21 06:37:01,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1388544. Throughput: 0: 943.3. Samples: 346896. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:37:01,631][00633] Avg episode reward: [(0, '5.042')]
[2025-02-21 06:37:01,639][03235] Saving new best policy, reward=5.042!
[2025-02-21 06:37:02,016][03249] Updated weights for policy 0, policy_version 340 (0.0019)
[2025-02-21 06:37:06,629][00633] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1409024. Throughput: 0: 965.8. Samples: 353214. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:37:06,634][00633] Avg episode reward: [(0, '4.974')]
[2025-02-21 06:37:11,633][00633] Fps is (10 sec: 4094.4, 60 sec: 3822.7, 300 sec: 3846.0). Total num frames: 1429504. Throughput: 0: 966.4. Samples: 356444. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:37:11,635][00633] Avg episode reward: [(0, '4.855')]
[2025-02-21 06:37:13,002][03249] Updated weights for policy 0, policy_version 350 (0.0013)
[2025-02-21 06:37:16,630][00633] Fps is (10 sec: 3686.3, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 1445888. Throughput: 0: 952.0. Samples: 361372. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:37:16,631][00633] Avg episode reward: [(0, '4.591')]
[2025-02-21 06:37:21,631][00633] Fps is (10 sec: 4096.8, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 1470464. Throughput: 0: 968.2. Samples: 367830. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:37:21,637][00633] Avg episode reward: [(0, '4.399')]
[2025-02-21 06:37:22,553][03249] Updated weights for policy 0, policy_version 360 (0.0013)
[2025-02-21 06:37:26,631][00633] Fps is (10 sec: 3685.7, 60 sec: 3754.5, 300 sec: 3832.2). Total num frames: 1482752. Throughput: 0: 954.5. Samples: 370474. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:37:26,636][00633] Avg episode reward: [(0, '4.529')]
[2025-02-21 06:37:31,629][00633] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1507328. Throughput: 0: 965.6. Samples: 376014. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:37:31,631][00633] Avg episode reward: [(0, '4.901')]
[2025-02-21 06:37:33,460][03249] Updated weights for policy 0, policy_version 370 (0.0013)
[2025-02-21 06:37:36,629][00633] Fps is (10 sec: 4506.5, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1527808. Throughput: 0: 967.8. Samples: 382386. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:37:36,633][00633] Avg episode reward: [(0, '4.919')]
[2025-02-21 06:37:41,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1544192. Throughput: 0: 943.4. Samples: 384482. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:37:41,631][00633] Avg episode reward: [(0, '4.980')]
[2025-02-21 06:37:44,420][03249] Updated weights for policy 0, policy_version 380 (0.0018)
[2025-02-21 06:37:46,632][00633] Fps is (10 sec: 3685.5, 60 sec: 3891.0, 300 sec: 3859.9). Total num frames: 1564672. Throughput: 0: 970.9. Samples: 390590. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:37:46,633][00633] Avg episode reward: [(0, '4.886')]
[2025-02-21 06:37:51,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1585152. Throughput: 0: 965.6. Samples: 396664. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:37:51,631][00633] Avg episode reward: [(0, '4.887')]
[2025-02-21 06:37:55,262][03249] Updated weights for policy 0, policy_version 390 (0.0016)
[2025-02-21 06:37:56,630][00633] Fps is (10 sec: 3687.2, 60 sec: 3823.0, 300 sec: 3846.2). Total num frames: 1601536. Throughput: 0: 940.9. Samples: 398782. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:37:56,634][00633] Avg episode reward: [(0, '4.980')]
[2025-02-21 06:38:01,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1622016. Throughput: 0: 975.8. Samples: 405282. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-21 06:38:01,634][00633] Avg episode reward: [(0, '5.000')]
[2025-02-21 06:38:05,249][03249] Updated weights for policy 0, policy_version 400 (0.0014)
[2025-02-21 06:38:06,631][00633] Fps is (10 sec: 3686.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1638400. Throughput: 0: 953.9. Samples: 410756. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:38:06,638][00633] Avg episode reward: [(0, '5.036')]
[2025-02-21 06:38:11,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3832.2). Total num frames: 1658880. Throughput: 0: 955.2. Samples: 413458. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:38:11,633][00633] Avg episode reward: [(0, '4.884')]
[2025-02-21 06:38:15,821][03249] Updated weights for policy 0, policy_version 410 (0.0013)
[2025-02-21 06:38:16,629][00633] Fps is (10 sec: 4096.5, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1679360. Throughput: 0: 976.0. Samples: 419934. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:38:16,634][00633] Avg episode reward: [(0, '5.009')]
[2025-02-21 06:38:21,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3832.2). Total num frames: 1695744. Throughput: 0: 945.9. Samples: 424950. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:38:21,634][00633] Avg episode reward: [(0, '5.097')]
[2025-02-21 06:38:21,641][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000414_1695744.pth...
[2025-02-21 06:38:21,736][03235] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000190_778240.pth
[2025-02-21 06:38:21,748][03235] Saving new best policy, reward=5.097!
[2025-02-21 06:38:26,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 1716224. Throughput: 0: 967.4. Samples: 428014. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:38:26,633][00633] Avg episode reward: [(0, '5.292')]
[2025-02-21 06:38:26,637][03235] Saving new best policy, reward=5.292!
[2025-02-21 06:38:26,641][03249] Updated weights for policy 0, policy_version 420 (0.0018)
[2025-02-21 06:38:31,631][00633] Fps is (10 sec: 4504.9, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 1740800. Throughput: 0: 973.9. Samples: 434416. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:38:31,635][00633] Avg episode reward: [(0, '5.204')]
[2025-02-21 06:38:36,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1753088. Throughput: 0: 947.8. Samples: 439316. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:38:36,633][00633] Avg episode reward: [(0, '5.187')]
[2025-02-21 06:38:37,793][03249] Updated weights for policy 0, policy_version 430 (0.0024)
[2025-02-21 06:38:41,629][00633] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1777664. Throughput: 0: 972.7. Samples: 442554. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:38:41,634][00633] Avg episode reward: [(0, '5.792')]
[2025-02-21 06:38:41,641][03235] Saving new best policy, reward=5.792!
[2025-02-21 06:38:46,632][00633] Fps is (10 sec: 4504.6, 60 sec: 3891.2, 300 sec: 3859.9). Total num frames: 1798144. Throughput: 0: 974.1. Samples: 449120. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:38:46,634][00633] Avg episode reward: [(0, '5.581')]
[2025-02-21 06:38:47,832][03249] Updated weights for policy 0, policy_version 440 (0.0016)
[2025-02-21 06:38:51,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1814528. Throughput: 0: 964.2. Samples: 454142. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:38:51,634][00633] Avg episode reward: [(0, '5.581')]
[2025-02-21 06:38:56,629][00633] Fps is (10 sec: 3687.2, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1835008. Throughput: 0: 978.1. Samples: 457472. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:38:56,631][00633] Avg episode reward: [(0, '5.532')]
[2025-02-21 06:38:57,702][03249] Updated weights for policy 0, policy_version 450 (0.0015)
[2025-02-21 06:39:01,630][00633] Fps is (10 sec: 4095.7, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 1855488. Throughput: 0: 967.4. Samples: 463466. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:39:01,634][00633] Avg episode reward: [(0, '5.641')]
[2025-02-21 06:39:06,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3832.3). Total num frames: 1871872. Throughput: 0: 978.4. Samples: 468976. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:39:06,631][00633] Avg episode reward: [(0, '6.045')]
[2025-02-21 06:39:06,702][03235] Saving new best policy, reward=6.045!
[2025-02-21 06:39:08,610][03249] Updated weights for policy 0, policy_version 460 (0.0013)
[2025-02-21 06:39:11,629][00633] Fps is (10 sec: 4096.3, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1896448. Throughput: 0: 982.3. Samples: 472218. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:39:11,634][00633] Avg episode reward: [(0, '6.128')]
[2025-02-21 06:39:11,644][03235] Saving new best policy, reward=6.128!
[2025-02-21 06:39:16,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1908736. Throughput: 0: 959.7. Samples: 477600. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:39:16,631][00633] Avg episode reward: [(0, '6.306')]
[2025-02-21 06:39:16,649][03235] Saving new best policy, reward=6.306!
[2025-02-21 06:39:19,477][03249] Updated weights for policy 0, policy_version 470 (0.0012)
[2025-02-21 06:39:21,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1933312. Throughput: 0: 987.3. Samples: 483744. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:39:21,634][00633] Avg episode reward: [(0, '6.648')]
[2025-02-21 06:39:21,646][03235] Saving new best policy, reward=6.648!
[2025-02-21 06:39:26,632][00633] Fps is (10 sec: 4504.5, 60 sec: 3959.3, 300 sec: 3859.9). Total num frames: 1953792. Throughput: 0: 986.6. Samples: 486954. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:39:26,635][00633] Avg episode reward: [(0, '7.316')]
[2025-02-21 06:39:26,637][03235] Saving new best policy, reward=7.316!
[2025-02-21 06:39:30,465][03249] Updated weights for policy 0, policy_version 480 (0.0012)
[2025-02-21 06:39:31,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 1970176. Throughput: 0: 951.6. Samples: 491940. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:39:31,634][00633] Avg episode reward: [(0, '7.820')]
[2025-02-21 06:39:31,642][03235] Saving new best policy, reward=7.820!
[2025-02-21 06:39:36,629][00633] Fps is (10 sec: 3687.3, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1990656. Throughput: 0: 984.7. Samples: 498454. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:39:36,633][00633] Avg episode reward: [(0, '7.787')]
[2025-02-21 06:39:39,783][03249] Updated weights for policy 0, policy_version 490 (0.0015)
[2025-02-21 06:39:41,634][00633] Fps is (10 sec: 4094.2, 60 sec: 3890.9, 300 sec: 3859.9). Total num frames: 2011136. Throughput: 0: 984.2. Samples: 501764. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:39:41,637][00633] Avg episode reward: [(0, '7.556')]
[2025-02-21 06:39:46,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3832.2). Total num frames: 2027520. Throughput: 0: 962.4. Samples: 506772. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:39:46,631][00633] Avg episode reward: [(0, '7.363')]
[2025-02-21 06:39:50,532][03249] Updated weights for policy 0, policy_version 500 (0.0015)
[2025-02-21 06:39:51,630][00633] Fps is (10 sec: 4097.7, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2052096. Throughput: 0: 986.9. Samples: 513386. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:39:51,635][00633] Avg episode reward: [(0, '7.740')]
[2025-02-21 06:39:56,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2064384. Throughput: 0: 972.8. Samples: 515994. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:39:56,636][00633] Avg episode reward: [(0, '8.295')]
[2025-02-21 06:39:56,684][03235] Saving new best policy, reward=8.295!
[2025-02-21 06:40:01,629][00633] Fps is (10 sec: 3276.9, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 2084864. Throughput: 0: 967.4. Samples: 521134. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:40:01,634][00633] Avg episode reward: [(0, '8.511')]
[2025-02-21 06:40:01,643][03235] Saving new best policy, reward=8.511!
[2025-02-21 06:40:01,859][03249] Updated weights for policy 0, policy_version 510 (0.0013)
[2025-02-21 06:40:06,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2105344. Throughput: 0: 971.2. Samples: 527448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:40:06,634][00633] Avg episode reward: [(0, '8.852')]
[2025-02-21 06:40:06,636][03235] Saving new best policy, reward=8.852!
[2025-02-21 06:40:11,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2121728. Throughput: 0: 951.7. Samples: 529778. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:40:11,631][00633] Avg episode reward: [(0, '9.330')]
[2025-02-21 06:40:11,642][03235] Saving new best policy, reward=9.330!
[2025-02-21 06:40:12,929][03249] Updated weights for policy 0, policy_version 520 (0.0015)
[2025-02-21 06:40:16,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2142208. Throughput: 0: 969.2. Samples: 535554. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:40:16,634][00633] Avg episode reward: [(0, '8.879')]
[2025-02-21 06:40:21,633][00633] Fps is (10 sec: 4503.9, 60 sec: 3890.9, 300 sec: 3859.9). Total num frames: 2166784. Throughput: 0: 966.8. Samples: 541962. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:40:21,637][00633] Avg episode reward: [(0, '9.237')]
[2025-02-21 06:40:21,654][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000529_2166784.pth...
[2025-02-21 06:40:21,770][03235] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000302_1236992.pth
[2025-02-21 06:40:23,330][03249] Updated weights for policy 0, policy_version 530 (0.0016)
[2025-02-21 06:40:26,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3832.2). Total num frames: 2179072. Throughput: 0: 936.3. Samples: 543894. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:40:26,630][00633] Avg episode reward: [(0, '9.508')]
[2025-02-21 06:40:26,686][03235] Saving new best policy, reward=9.508!
[2025-02-21 06:40:31,629][00633] Fps is (10 sec: 3687.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2203648. Throughput: 0: 962.8. Samples: 550098. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:40:31,633][00633] Avg episode reward: [(0, '10.574')]
[2025-02-21 06:40:31,640][03235] Saving new best policy, reward=10.574!
[2025-02-21 06:40:33,509][03249] Updated weights for policy 0, policy_version 540 (0.0013)
[2025-02-21 06:40:36,630][00633] Fps is (10 sec: 4095.6, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2220032. Throughput: 0: 939.9. Samples: 555682. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:40:36,632][00633] Avg episode reward: [(0, '10.869')]
[2025-02-21 06:40:36,633][03235] Saving new best policy, reward=10.869!
[2025-02-21 06:40:41,629][00633] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3832.2). Total num frames: 2236416. Throughput: 0: 931.0. Samples: 557890. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:40:41,631][00633] Avg episode reward: [(0, '11.613')]
[2025-02-21 06:40:41,638][03235] Saving new best policy, reward=11.613!
[2025-02-21 06:40:44,852][03249] Updated weights for policy 0, policy_version 550 (0.0018)
[2025-02-21 06:40:46,629][00633] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2260992. Throughput: 0: 959.2. Samples: 564298. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:40:46,631][00633] Avg episode reward: [(0, '10.681')]
[2025-02-21 06:40:51,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 2273280. Throughput: 0: 935.7. Samples: 569556. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:40:51,636][00633] Avg episode reward: [(0, '11.102')]
[2025-02-21 06:40:55,704][03249] Updated weights for policy 0, policy_version 560 (0.0016)
[2025-02-21 06:40:56,629][00633] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2293760. Throughput: 0: 947.7. Samples: 572426. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:40:56,630][00633] Avg episode reward: [(0, '10.829')]
[2025-02-21 06:41:01,629][00633] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2318336. Throughput: 0: 966.3. Samples: 579036. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:41:01,635][00633] Avg episode reward: [(0, '10.993')]
[2025-02-21 06:41:06,472][03249] Updated weights for policy 0, policy_version 570 (0.0012)
[2025-02-21 06:41:06,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2334720. Throughput: 0: 931.5. Samples: 583874. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:41:06,631][00633] Avg episode reward: [(0, '10.302')]
[2025-02-21 06:41:11,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2355200. Throughput: 0: 962.0. Samples: 587182. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:41:11,631][00633] Avg episode reward: [(0, '10.627')]
[2025-02-21 06:41:15,849][03249] Updated weights for policy 0, policy_version 580 (0.0012)
[2025-02-21 06:41:16,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2375680. Throughput: 0: 970.2. Samples: 593758. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:41:16,631][00633] Avg episode reward: [(0, '11.790')]
[2025-02-21 06:41:16,633][03235] Saving new best policy, reward=11.790!
[2025-02-21 06:41:21,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.9, 300 sec: 3846.1). Total num frames: 2392064. Throughput: 0: 953.1. Samples: 598570. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:41:21,631][00633] Avg episode reward: [(0, '11.872')]
[2025-02-21 06:41:21,644][03235] Saving new best policy, reward=11.872!
[2025-02-21 06:41:26,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2412544. Throughput: 0: 975.0. Samples: 601766. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:41:26,634][00633] Avg episode reward: [(0, '10.751')]
[2025-02-21 06:41:26,945][03249] Updated weights for policy 0, policy_version 590 (0.0021)
[2025-02-21 06:41:31,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2433024. Throughput: 0: 970.7. Samples: 607978. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:41:31,632][00633] Avg episode reward: [(0, '10.148')]
[2025-02-21 06:41:36,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 2449408. Throughput: 0: 964.1. Samples: 612940. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:41:36,634][00633] Avg episode reward: [(0, '11.326')]
[2025-02-21 06:41:38,130][03249] Updated weights for policy 0, policy_version 600 (0.0023)
[2025-02-21 06:41:41,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2469888. Throughput: 0: 971.6. Samples: 616146. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:41:41,634][00633] Avg episode reward: [(0, '12.262')]
[2025-02-21 06:41:41,640][03235] Saving new best policy, reward=12.262!
[2025-02-21 06:41:46,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2486272. Throughput: 0: 947.7. Samples: 621684. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:41:46,635][00633] Avg episode reward: [(0, '12.884')]
[2025-02-21 06:41:46,637][03235] Saving new best policy, reward=12.884!
[2025-02-21 06:41:49,353][03249] Updated weights for policy 0, policy_version 610 (0.0015)
[2025-02-21 06:41:51,632][00633] Fps is (10 sec: 3685.6, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 2506752. Throughput: 0: 962.3. Samples: 627178. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:41:51,633][00633] Avg episode reward: [(0, '13.514')]
[2025-02-21 06:41:51,639][03235] Saving new best policy, reward=13.514!
[2025-02-21 06:41:56,630][00633] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2527232. Throughput: 0: 960.3. Samples: 630398. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:41:56,631][00633] Avg episode reward: [(0, '14.491')]
[2025-02-21 06:41:56,633][03235] Saving new best policy, reward=14.491!
[2025-02-21 06:42:00,049][03249] Updated weights for policy 0, policy_version 620 (0.0012)
[2025-02-21 06:42:01,629][00633] Fps is (10 sec: 3687.3, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2543616. Throughput: 0: 926.6. Samples: 635456. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:42:01,635][00633] Avg episode reward: [(0, '15.208')]
[2025-02-21 06:42:01,647][03235] Saving new best policy, reward=15.208!
[2025-02-21 06:42:06,629][00633] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2564096. Throughput: 0: 955.7. Samples: 641576. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:42:06,634][00633] Avg episode reward: [(0, '15.257')]
[2025-02-21 06:42:06,636][03235] Saving new best policy, reward=15.257!
[2025-02-21 06:42:10,127][03249] Updated weights for policy 0, policy_version 630 (0.0017)
[2025-02-21 06:42:11,632][00633] Fps is (10 sec: 4095.1, 60 sec: 3822.8, 300 sec: 3859.9). Total num frames: 2584576. Throughput: 0: 955.5. Samples: 644764. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:42:11,633][00633] Avg episode reward: [(0, '14.354')]
[2025-02-21 06:42:16,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2600960. Throughput: 0: 926.8. Samples: 649686. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:42:16,630][00633] Avg episode reward: [(0, '13.374')]
[2025-02-21 06:42:20,952][03249] Updated weights for policy 0, policy_version 640 (0.0012)
[2025-02-21 06:42:21,629][00633] Fps is (10 sec: 3687.2, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2621440. Throughput: 0: 961.4. Samples: 656202. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:42:21,634][00633] Avg episode reward: [(0, '12.630')]
[2025-02-21 06:42:21,644][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000640_2621440.pth...
[2025-02-21 06:42:21,747][03235] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000414_1695744.pth
[2025-02-21 06:42:26,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2641920. Throughput: 0: 961.2. Samples: 659400. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:42:26,640][00633] Avg episode reward: [(0, '13.512')]
[2025-02-21 06:42:31,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2658304. Throughput: 0: 947.4. Samples: 664318. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:42:31,634][00633] Avg episode reward: [(0, '14.127')]
[2025-02-21 06:42:31,909][03249] Updated weights for policy 0, policy_version 650 (0.0017)
[2025-02-21 06:42:36,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2678784. Throughput: 0: 970.4. Samples: 670844. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-21 06:42:36,631][00633] Avg episode reward: [(0, '15.081')]
[2025-02-21 06:42:41,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2695168. Throughput: 0: 962.9. Samples: 673728. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:42:41,633][00633] Avg episode reward: [(0, '17.309')]
[2025-02-21 06:42:41,712][03235] Saving new best policy, reward=17.309!
[2025-02-21 06:42:42,991][03249] Updated weights for policy 0, policy_version 660 (0.0016)
[2025-02-21 06:42:46,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2719744. Throughput: 0: 968.9. Samples: 679058. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:42:46,631][00633] Avg episode reward: [(0, '17.524')]
[2025-02-21 06:42:46,635][03235] Saving new best policy, reward=17.524!
[2025-02-21 06:42:51,630][00633] Fps is (10 sec: 4505.6, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 2740224. Throughput: 0: 976.4. Samples: 685516. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:42:51,631][00633] Avg episode reward: [(0, '16.913')]
[2025-02-21 06:42:52,234][03249] Updated weights for policy 0, policy_version 670 (0.0025)
[2025-02-21 06:42:56,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 2756608. Throughput: 0: 957.1. Samples: 687830. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:42:56,631][00633] Avg episode reward: [(0, '16.347')]
[2025-02-21 06:43:01,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2777088. Throughput: 0: 980.1. Samples: 693792. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:43:01,640][00633] Avg episode reward: [(0, '14.886')]
[2025-02-21 06:43:03,126][03249] Updated weights for policy 0, policy_version 680 (0.0019)
[2025-02-21 06:43:06,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2797568. Throughput: 0: 973.1. Samples: 699992. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:43:06,633][00633] Avg episode reward: [(0, '15.890')]
[2025-02-21 06:43:11,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3846.1). Total num frames: 2813952. Throughput: 0: 947.1. Samples: 702020. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:43:11,634][00633] Avg episode reward: [(0, '15.895')]
[2025-02-21 06:43:14,010][03249] Updated weights for policy 0, policy_version 690 (0.0022)
[2025-02-21 06:43:16,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2834432. Throughput: 0: 981.6. Samples: 708490. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:43:16,635][00633] Avg episode reward: [(0, '16.401')]
[2025-02-21 06:43:21,631][00633] Fps is (10 sec: 4095.2, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 2854912. Throughput: 0: 966.3. Samples: 714328. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:43:21,635][00633] Avg episode reward: [(0, '16.124')]
[2025-02-21 06:43:24,865][03249] Updated weights for policy 0, policy_version 700 (0.0021)
[2025-02-21 06:43:26,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2871296. Throughput: 0: 955.5. Samples: 716724. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:43:26,631][00633] Avg episode reward: [(0, '16.295')]
[2025-02-21 06:43:31,629][00633] Fps is (10 sec: 4096.8, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2895872. Throughput: 0: 982.6. Samples: 723276. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:43:31,631][00633] Avg episode reward: [(0, '17.191')]
[2025-02-21 06:43:34,422][03249] Updated weights for policy 0, policy_version 710 (0.0013)
[2025-02-21 06:43:36,632][00633] Fps is (10 sec: 4095.1, 60 sec: 3891.1, 300 sec: 3846.0). Total num frames: 2912256. Throughput: 0: 955.8. Samples: 728530. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:43:36,633][00633] Avg episode reward: [(0, '16.189')]
[2025-02-21 06:43:41,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2932736. Throughput: 0: 969.7. Samples: 731468. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:43:41,634][00633] Avg episode reward: [(0, '16.043')]
[2025-02-21 06:43:45,129][03249] Updated weights for policy 0, policy_version 720 (0.0020)
[2025-02-21 06:43:46,629][00633] Fps is (10 sec: 4096.9, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2953216. Throughput: 0: 981.4. Samples: 737956. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:43:46,631][00633] Avg episode reward: [(0, '15.850')]
[2025-02-21 06:43:51,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2969600. Throughput: 0: 951.0. Samples: 742788. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:43:51,631][00633] Avg episode reward: [(0, '15.300')]
[2025-02-21 06:43:56,200][03249] Updated weights for policy 0, policy_version 730 (0.0014)
[2025-02-21 06:43:56,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2990080. Throughput: 0: 977.5. Samples: 746008. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:43:56,630][00633] Avg episode reward: [(0, '15.431')]
[2025-02-21 06:44:01,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3010560. Throughput: 0: 976.7. Samples: 752442. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:44:01,631][00633] Avg episode reward: [(0, '15.857')]
[2025-02-21 06:44:06,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3026944. Throughput: 0: 954.7. Samples: 757286. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:44:06,636][00633] Avg episode reward: [(0, '16.030')]
[2025-02-21 06:44:07,104][03249] Updated weights for policy 0, policy_version 740 (0.0012)
[2025-02-21 06:44:11,630][00633] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3047424. Throughput: 0: 975.5. Samples: 760622. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:44:11,634][00633] Avg episode reward: [(0, '15.656')]
[2025-02-21 06:44:16,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3067904. Throughput: 0: 969.9. Samples: 766922. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:44:16,631][00633] Avg episode reward: [(0, '15.856')]
[2025-02-21 06:44:17,369][03249] Updated weights for policy 0, policy_version 750 (0.0013)
[2025-02-21 06:44:21,629][00633] Fps is (10 sec: 4096.1, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 3088384. Throughput: 0: 973.7. Samples: 772344. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:44:21,631][00633] Avg episode reward: [(0, '16.615')]
[2025-02-21 06:44:21,637][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000754_3088384.pth...
[2025-02-21 06:44:21,732][03235] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000529_2166784.pth
[2025-02-21 06:44:26,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3108864. Throughput: 0: 980.2. Samples: 775578. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:44:26,631][00633] Avg episode reward: [(0, '16.065')]
[2025-02-21 06:44:27,070][03249] Updated weights for policy 0, policy_version 760 (0.0013)
[2025-02-21 06:44:31,632][00633] Fps is (10 sec: 3685.6, 60 sec: 3822.8, 300 sec: 3846.0). Total num frames: 3125248. Throughput: 0: 964.7. Samples: 781370. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:44:31,637][00633] Avg episode reward: [(0, '16.002')]
[2025-02-21 06:44:36,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 3145728. Throughput: 0: 987.7. Samples: 787234. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:44:36,631][00633] Avg episode reward: [(0, '16.158')]
[2025-02-21 06:44:37,908][03249] Updated weights for policy 0, policy_version 770 (0.0016)
[2025-02-21 06:44:41,629][00633] Fps is (10 sec: 4096.9, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3166208. Throughput: 0: 989.9. Samples: 790554. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:44:41,633][00633] Avg episode reward: [(0, '15.648')]
[2025-02-21 06:44:46,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3182592. Throughput: 0: 958.4. Samples: 795572. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:44:46,634][00633] Avg episode reward: [(0, '16.317')]
[2025-02-21 06:44:48,765][03249] Updated weights for policy 0, policy_version 780 (0.0013)
[2025-02-21 06:44:51,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3207168. Throughput: 0: 991.9. Samples: 801920. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:44:51,635][00633] Avg episode reward: [(0, '16.032')]
[2025-02-21 06:44:56,629][00633] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3227648. Throughput: 0: 992.1. Samples: 805266. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:44:56,634][00633] Avg episode reward: [(0, '15.715')]
[2025-02-21 06:44:59,565][03249] Updated weights for policy 0, policy_version 790 (0.0014)
[2025-02-21 06:45:01,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3244032. Throughput: 0: 964.0. Samples: 810300. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:45:01,634][00633] Avg episode reward: [(0, '16.717')]
[2025-02-21 06:45:06,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3264512. Throughput: 0: 988.5. Samples: 816826. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:45:06,633][00633] Avg episode reward: [(0, '15.446')]
[2025-02-21 06:45:09,032][03249] Updated weights for policy 0, policy_version 800 (0.0012)
[2025-02-21 06:45:11,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3280896. Throughput: 0: 987.2. Samples: 820004. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:45:11,639][00633] Avg episode reward: [(0, '16.983')]
[2025-02-21 06:45:16,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3301376. Throughput: 0: 969.6. Samples: 824998. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:45:16,635][00633] Avg episode reward: [(0, '17.080')]
[2025-02-21 06:45:19,660][03249] Updated weights for policy 0, policy_version 810 (0.0023)
[2025-02-21 06:45:21,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3321856. Throughput: 0: 987.0. Samples: 831648. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:45:21,634][00633] Avg episode reward: [(0, '15.694')]
[2025-02-21 06:45:26,631][00633] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3846.1). Total num frames: 3338240. Throughput: 0: 972.4. Samples: 834312. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:45:26,635][00633] Avg episode reward: [(0, '16.521')]
[2025-02-21 06:45:30,401][03249] Updated weights for policy 0, policy_version 820 (0.0014)
[2025-02-21 06:45:31,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3873.9). Total num frames: 3362816. Throughput: 0: 986.0. Samples: 839942. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:45:31,631][00633] Avg episode reward: [(0, '16.527')]
[2025-02-21 06:45:36,629][00633] Fps is (10 sec: 4506.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3383296. Throughput: 0: 990.4. Samples: 846490. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:45:36,631][00633] Avg episode reward: [(0, '16.314')]
[2025-02-21 06:45:41,297][03249] Updated weights for policy 0, policy_version 830 (0.0016)
[2025-02-21 06:45:41,630][00633] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3399680. Throughput: 0: 961.3. Samples: 848524. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:45:41,634][00633] Avg episode reward: [(0, '16.597')]
[2025-02-21 06:45:46,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3420160. Throughput: 0: 987.0. Samples: 854716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:45:46,631][00633] Avg episode reward: [(0, '16.870')]
[2025-02-21 06:45:50,767][03249] Updated weights for policy 0, policy_version 840 (0.0012)
[2025-02-21 06:45:51,636][00633] Fps is (10 sec: 4093.5, 60 sec: 3890.8, 300 sec: 3887.6). Total num frames: 3440640. Throughput: 0: 977.5. Samples: 860820. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:45:51,637][00633] Avg episode reward: [(0, '17.907')]
[2025-02-21 06:45:51,644][03235] Saving new best policy, reward=17.907!
[2025-02-21 06:45:56,632][00633] Fps is (10 sec: 3685.6, 60 sec: 3822.8, 300 sec: 3859.9). Total num frames: 3457024. Throughput: 0: 954.5. Samples: 862960. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:45:56,633][00633] Avg episode reward: [(0, '18.825')]
[2025-02-21 06:45:56,639][03235] Saving new best policy, reward=18.825!
[2025-02-21 06:46:01,629][00633] Fps is (10 sec: 3688.7, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3477504. Throughput: 0: 987.5. Samples: 869434. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:46:01,633][00633] Avg episode reward: [(0, '18.699')]
[2025-02-21 06:46:01,680][03249] Updated weights for policy 0, policy_version 850 (0.0015)
[2025-02-21 06:46:06,632][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.0, 300 sec: 3873.8). Total num frames: 3497984. Throughput: 0: 962.8. Samples: 874974. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:46:06,633][00633] Avg episode reward: [(0, '18.975')]
[2025-02-21 06:46:06,635][03235] Saving new best policy, reward=18.975!
[2025-02-21 06:46:11,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3518464. Throughput: 0: 962.5. Samples: 877624. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:46:11,634][00633] Avg episode reward: [(0, '19.914')]
[2025-02-21 06:46:11,646][03235] Saving new best policy, reward=19.914!
[2025-02-21 06:46:12,449][03249] Updated weights for policy 0, policy_version 860 (0.0017)
[2025-02-21 06:46:16,629][00633] Fps is (10 sec: 4096.9, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3538944. Throughput: 0: 981.3. Samples: 884100. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:46:16,631][00633] Avg episode reward: [(0, '20.320')]
[2025-02-21 06:46:16,634][03235] Saving new best policy, reward=20.320!
[2025-02-21 06:46:21,630][00633] Fps is (10 sec: 3686.2, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3555328. Throughput: 0: 945.5. Samples: 889040. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:46:21,634][00633] Avg episode reward: [(0, '19.895')]
[2025-02-21 06:46:21,646][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000868_3555328.pth...
[2025-02-21 06:46:21,739][03235] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000640_2621440.pth
[2025-02-21 06:46:23,503][03249] Updated weights for policy 0, policy_version 870 (0.0012)
[2025-02-21 06:46:26,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3873.8). Total num frames: 3575808. Throughput: 0: 970.7. Samples: 892206. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:46:26,635][00633] Avg episode reward: [(0, '20.598')]
[2025-02-21 06:46:26,638][03235] Saving new best policy, reward=20.598!
[2025-02-21 06:46:31,629][00633] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3596288. Throughput: 0: 975.2. Samples: 898602. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:46:31,631][00633] Avg episode reward: [(0, '19.860')]
[2025-02-21 06:46:33,931][03249] Updated weights for policy 0, policy_version 880 (0.0014)
[2025-02-21 06:46:36,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3612672. Throughput: 0: 951.2. Samples: 903620. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:46:36,634][00633] Avg episode reward: [(0, '19.158')]
[2025-02-21 06:46:41,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3633152. Throughput: 0: 975.3. Samples: 906846. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:46:41,632][00633] Avg episode reward: [(0, '18.844')]
[2025-02-21 06:46:43,932][03249] Updated weights for policy 0, policy_version 890 (0.0017)
[2025-02-21 06:46:46,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.8). Total num frames: 3653632. Throughput: 0: 978.1. Samples: 913448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:46:46,634][00633] Avg episode reward: [(0, '18.061')]
[2025-02-21 06:46:51,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3873.8). Total num frames: 3670016. Throughput: 0: 968.8. Samples: 918566. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:46:51,640][00633] Avg episode reward: [(0, '18.716')]
[2025-02-21 06:46:54,460][03249] Updated weights for policy 0, policy_version 900 (0.0018)
[2025-02-21 06:46:56,629][00633] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3901.6). Total num frames: 3694592. Throughput: 0: 983.0. Samples: 921860. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:46:56,634][00633] Avg episode reward: [(0, '18.784')]
[2025-02-21 06:47:01,630][00633] Fps is (10 sec: 4095.6, 60 sec: 3891.1, 300 sec: 3887.7). Total num frames: 3710976. Throughput: 0: 973.4. Samples: 927906. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:47:01,632][00633] Avg episode reward: [(0, '19.546')]
[2025-02-21 06:47:05,156][03249] Updated weights for policy 0, policy_version 910 (0.0013)
[2025-02-21 06:47:06,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3887.8). Total num frames: 3731456. Throughput: 0: 989.2. Samples: 933554. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:47:06,631][00633] Avg episode reward: [(0, '20.245')]
[2025-02-21 06:47:11,629][00633] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3751936. Throughput: 0: 989.6. Samples: 936738. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:47:11,631][00633] Avg episode reward: [(0, '20.760')]
[2025-02-21 06:47:11,642][03235] Saving new best policy, reward=20.760!
[2025-02-21 06:47:16,066][03249] Updated weights for policy 0, policy_version 920 (0.0021)
[2025-02-21 06:47:16,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3768320. Throughput: 0: 964.7. Samples: 942012. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:47:16,633][00633] Avg episode reward: [(0, '21.315')]
[2025-02-21 06:47:16,636][03235] Saving new best policy, reward=21.315!
[2025-02-21 06:47:21,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3788800. Throughput: 0: 986.5. Samples: 948014. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:47:21,636][00633] Avg episode reward: [(0, '21.805')]
[2025-02-21 06:47:21,652][03235] Saving new best policy, reward=21.805!
[2025-02-21 06:47:25,751][03249] Updated weights for policy 0, policy_version 930 (0.0013)
[2025-02-21 06:47:26,631][00633] Fps is (10 sec: 4095.1, 60 sec: 3891.1, 300 sec: 3901.6). Total num frames: 3809280. Throughput: 0: 986.1. Samples: 951224. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:47:26,634][00633] Avg episode reward: [(0, '20.612')]
[2025-02-21 06:47:31,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3825664. Throughput: 0: 948.7. Samples: 956140. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:47:31,634][00633] Avg episode reward: [(0, '20.119')]
[2025-02-21 06:47:36,624][03249] Updated weights for policy 0, policy_version 940 (0.0016)
[2025-02-21 06:47:36,629][00633] Fps is (10 sec: 4096.9, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3850240. Throughput: 0: 981.8. Samples: 962748. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:47:36,635][00633] Avg episode reward: [(0, '18.520')]
[2025-02-21 06:47:41,633][00633] Fps is (10 sec: 4094.7, 60 sec: 3891.0, 300 sec: 3887.7). Total num frames: 3866624. Throughput: 0: 978.2. Samples: 965880. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:47:41,634][00633] Avg episode reward: [(0, '19.359')]
[2025-02-21 06:47:46,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3887104. Throughput: 0: 955.7. Samples: 970910. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:47:46,634][00633] Avg episode reward: [(0, '18.524')]
[2025-02-21 06:47:47,593][03249] Updated weights for policy 0, policy_version 950 (0.0018)
[2025-02-21 06:47:51,629][00633] Fps is (10 sec: 4097.3, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 3907584. Throughput: 0: 975.1. Samples: 977432. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:47:51,634][00633] Avg episode reward: [(0, '18.558')]
[2025-02-21 06:47:56,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3923968. Throughput: 0: 972.9. Samples: 980520. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:47:56,635][00633] Avg episode reward: [(0, '19.072')]
[2025-02-21 06:47:58,294][03249] Updated weights for policy 0, policy_version 960 (0.0018)
[2025-02-21 06:48:01,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3887.7). Total num frames: 3944448. Throughput: 0: 970.8. Samples: 985696. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:48:01,635][00633] Avg episode reward: [(0, '19.279')]
[2025-02-21 06:48:06,629][00633] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3964928. Throughput: 0: 984.3. Samples: 992306. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:48:06,635][00633] Avg episode reward: [(0, '18.612')]
[2025-02-21 06:48:07,555][03249] Updated weights for policy 0, policy_version 970 (0.0013)
[2025-02-21 06:48:11,629][00633] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3981312. Throughput: 0: 967.5. Samples: 994760. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:48:11,633][00633] Avg episode reward: [(0, '18.726')]
[2025-02-21 06:48:16,545][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-21 06:48:16,547][03235] Stopping Batcher_0...
[2025-02-21 06:48:16,552][03235] Loop batcher_evt_loop terminating...
[2025-02-21 06:48:16,551][00633] Component Batcher_0 stopped!
[2025-02-21 06:48:16,557][00633] Component RolloutWorker_w0 process died already! Don't wait for it.
[2025-02-21 06:48:16,560][00633] Component RolloutWorker_w2 process died already! Don't wait for it.
[2025-02-21 06:48:16,564][00633] Component RolloutWorker_w3 process died already! Don't wait for it.
[2025-02-21 06:48:16,565][00633] Component RolloutWorker_w7 process died already! Don't wait for it.
[2025-02-21 06:48:16,620][03249] Weights refcount: 2 0
[2025-02-21 06:48:16,623][00633] Component InferenceWorker_p0-w0 stopped!
[2025-02-21 06:48:16,627][03249] Stopping InferenceWorker_p0-w0...
[2025-02-21 06:48:16,629][03249] Loop inference_proc0-0_evt_loop terminating...
[2025-02-21 06:48:16,638][03235] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000754_3088384.pth
[2025-02-21 06:48:16,652][03235] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-21 06:48:16,812][03235] Stopping LearnerWorker_p0...
[2025-02-21 06:48:16,818][03235] Loop learner_proc0_evt_loop terminating...
[2025-02-21 06:48:16,818][00633] Component LearnerWorker_p0 stopped!
[2025-02-21 06:48:16,859][00633] Component RolloutWorker_w1 stopped!
[2025-02-21 06:48:16,864][03251] Stopping RolloutWorker_w1...
[2025-02-21 06:48:16,865][03251] Loop rollout_proc1_evt_loop terminating...
[2025-02-21 06:48:16,873][00633] Component RolloutWorker_w5 stopped!
[2025-02-21 06:48:16,876][03254] Stopping RolloutWorker_w5...
[2025-02-21 06:48:16,877][03254] Loop rollout_proc5_evt_loop terminating...
[2025-02-21 06:48:16,952][03255] Stopping RolloutWorker_w6...
[2025-02-21 06:48:16,952][00633] Component RolloutWorker_w6 stopped!
[2025-02-21 06:48:16,954][03255] Loop rollout_proc6_evt_loop terminating...
[2025-02-21 06:48:16,960][03253] Stopping RolloutWorker_w4...
[2025-02-21 06:48:16,960][00633] Component RolloutWorker_w4 stopped!
[2025-02-21 06:48:16,962][00633] Waiting for process learner_proc0 to stop...
[2025-02-21 06:48:16,961][03253] Loop rollout_proc4_evt_loop terminating...
[2025-02-21 06:48:18,364][00633] Waiting for process inference_proc0-0 to join...
[2025-02-21 06:48:18,366][00633] Waiting for process rollout_proc0 to join...
[2025-02-21 06:48:18,368][00633] Waiting for process rollout_proc1 to join...
[2025-02-21 06:48:19,096][00633] Waiting for process rollout_proc2 to join...
[2025-02-21 06:48:19,097][00633] Waiting for process rollout_proc3 to join...
[2025-02-21 06:48:19,098][00633] Waiting for process rollout_proc4 to join...
[2025-02-21 06:48:19,101][00633] Waiting for process rollout_proc5 to join...
[2025-02-21 06:48:19,102][00633] Waiting for process rollout_proc6 to join...
[2025-02-21 06:48:19,103][00633] Waiting for process rollout_proc7 to join...
[2025-02-21 06:48:19,104][00633] Batcher 0 profile tree view:
batching: 22.1334, releasing_batches: 0.0303
[2025-02-21 06:48:19,105][00633] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 405.8358
update_model: 9.2132
weight_update: 0.0036
one_step: 0.0102
handle_policy_step: 591.4368
deserialize: 14.4476, stack: 3.5130, obs_to_device_normalize: 132.5441, forward: 308.9790, send_messages: 22.1924
prepare_outputs: 83.2722
to_cpu: 52.2151
[2025-02-21 06:48:19,106][00633] Learner 0 profile tree view:
misc: 0.0046, prepare_batch: 12.3153
train: 66.1561
epoch_init: 0.0057, minibatch_init: 0.0055, losses_postprocess: 0.5684, kl_divergence: 0.5593, after_optimizer: 32.1202
calculate_losses: 22.0446
losses_init: 0.0032, forward_head: 1.1831, bptt_initial: 15.1547, tail: 0.8670, advantages_returns: 0.1997, losses: 2.8600
bptt: 1.5676
bptt_forward_core: 1.5002
update: 10.3855
clip: 0.8488
[2025-02-21 06:48:19,108][00633] Loop Runner_EvtLoop terminating...
[2025-02-21 06:48:19,109][00633] Runner profile tree view:
main_loop: 1070.6298
[2025-02-21 06:48:19,110][00633] Collected {0: 4005888}, FPS: 3741.6
[2025-02-21 06:55:30,394][00633] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-21 06:55:30,397][00633] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-21 06:55:30,399][00633] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-21 06:55:30,400][00633] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-21 06:55:30,401][00633] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-21 06:55:30,402][00633] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-21 06:55:30,402][00633] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-02-21 06:55:30,403][00633] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-21 06:55:30,404][00633] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-02-21 06:55:30,405][00633] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-02-21 06:55:30,406][00633] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-21 06:55:30,406][00633] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-21 06:55:30,407][00633] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-21 06:55:30,408][00633] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-21 06:55:30,410][00633] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-21 06:55:30,458][00633] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:55:30,461][00633] RunningMeanStd input shape: (3, 72, 128)
[2025-02-21 06:55:30,463][00633] RunningMeanStd input shape: (1,)
[2025-02-21 06:55:30,491][00633] ConvEncoder: input_channels=3
[2025-02-21 06:55:30,652][00633] Conv encoder output size: 512
[2025-02-21 06:55:30,654][00633] Policy head output size: 512
[2025-02-21 06:55:30,971][00633] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-21 06:55:32,042][00633] Num frames 100...
[2025-02-21 06:55:32,220][00633] Num frames 200...
[2025-02-21 06:55:32,401][00633] Num frames 300...
[2025-02-21 06:55:32,546][00633] Num frames 400...
[2025-02-21 06:55:32,685][00633] Num frames 500...
[2025-02-21 06:55:32,813][00633] Num frames 600...
[2025-02-21 06:55:32,944][00633] Num frames 700...
[2025-02-21 06:55:33,073][00633] Num frames 800...
[2025-02-21 06:55:33,208][00633] Num frames 900...
[2025-02-21 06:55:33,341][00633] Num frames 1000...
[2025-02-21 06:55:33,470][00633] Num frames 1100...
[2025-02-21 06:55:33,556][00633] Avg episode rewards: #0: 23.210, true rewards: #0: 11.210
[2025-02-21 06:55:33,556][00633] Avg episode reward: 23.210, avg true_objective: 11.210
[2025-02-21 06:55:33,674][00633] Num frames 1200...
[2025-02-21 06:55:33,802][00633] Num frames 1300...
[2025-02-21 06:55:33,927][00633] Num frames 1400...
[2025-02-21 06:55:34,054][00633] Num frames 1500...
[2025-02-21 06:55:34,183][00633] Num frames 1600...
[2025-02-21 06:55:34,320][00633] Num frames 1700...
[2025-02-21 06:55:34,448][00633] Num frames 1800...
[2025-02-21 06:55:34,577][00633] Num frames 1900...
[2025-02-21 06:55:34,716][00633] Num frames 2000...
[2025-02-21 06:55:34,846][00633] Num frames 2100...
[2025-02-21 06:55:34,975][00633] Num frames 2200...
[2025-02-21 06:55:35,087][00633] Avg episode rewards: #0: 24.205, true rewards: #0: 11.205
[2025-02-21 06:55:35,087][00633] Avg episode reward: 24.205, avg true_objective: 11.205
[2025-02-21 06:55:35,163][00633] Num frames 2300...
[2025-02-21 06:55:35,299][00633] Num frames 2400...
[2025-02-21 06:55:35,426][00633] Num frames 2500...
[2025-02-21 06:55:35,554][00633] Num frames 2600...
[2025-02-21 06:55:35,679][00633] Num frames 2700...
[2025-02-21 06:55:35,810][00633] Num frames 2800...
[2025-02-21 06:55:35,935][00633] Num frames 2900...
[2025-02-21 06:55:36,061][00633] Num frames 3000...
[2025-02-21 06:55:36,189][00633] Num frames 3100...
[2025-02-21 06:55:36,298][00633] Avg episode rewards: #0: 21.790, true rewards: #0: 10.457
[2025-02-21 06:55:36,298][00633] Avg episode reward: 21.790, avg true_objective: 10.457
[2025-02-21 06:55:36,389][00633] Num frames 3200...
[2025-02-21 06:55:36,519][00633] Num frames 3300...
[2025-02-21 06:55:36,644][00633] Num frames 3400...
[2025-02-21 06:55:36,781][00633] Num frames 3500...
[2025-02-21 06:55:36,909][00633] Num frames 3600...
[2025-02-21 06:55:37,036][00633] Num frames 3700...
[2025-02-21 06:55:37,165][00633] Num frames 3800...
[2025-02-21 06:55:37,233][00633] Avg episode rewards: #0: 19.273, true rewards: #0: 9.522
[2025-02-21 06:55:37,234][00633] Avg episode reward: 19.273, avg true_objective: 9.522
[2025-02-21 06:55:37,348][00633] Num frames 3900...
[2025-02-21 06:55:37,477][00633] Num frames 4000...
[2025-02-21 06:55:37,606][00633] Num frames 4100...
[2025-02-21 06:55:37,733][00633] Num frames 4200...
[2025-02-21 06:55:37,870][00633] Num frames 4300...
[2025-02-21 06:55:38,003][00633] Num frames 4400...
[2025-02-21 06:55:38,131][00633] Num frames 4500...
[2025-02-21 06:55:38,263][00633] Num frames 4600...
[2025-02-21 06:55:38,394][00633] Num frames 4700...
[2025-02-21 06:55:38,528][00633] Num frames 4800...
[2025-02-21 06:55:38,708][00633] Avg episode rewards: #0: 20.394, true rewards: #0: 9.794
[2025-02-21 06:55:38,709][00633] Avg episode reward: 20.394, avg true_objective: 9.794
[2025-02-21 06:55:38,715][00633] Num frames 4900...
[2025-02-21 06:55:38,849][00633] Num frames 5000...
[2025-02-21 06:55:38,977][00633] Num frames 5100...
[2025-02-21 06:55:39,103][00633] Num frames 5200...
[2025-02-21 06:55:39,238][00633] Num frames 5300...
[2025-02-21 06:55:39,368][00633] Num frames 5400...
[2025-02-21 06:55:39,475][00633] Avg episode rewards: #0: 18.735, true rewards: #0: 9.068
[2025-02-21 06:55:39,476][00633] Avg episode reward: 18.735, avg true_objective: 9.068
[2025-02-21 06:55:39,554][00633] Num frames 5500...
[2025-02-21 06:55:39,691][00633] Num frames 5600...
[2025-02-21 06:55:39,844][00633] Num frames 5700...
[2025-02-21 06:55:39,973][00633] Num frames 5800...
[2025-02-21 06:55:40,103][00633] Num frames 5900...
[2025-02-21 06:55:40,241][00633] Num frames 6000...
[2025-02-21 06:55:40,369][00633] Num frames 6100...
[2025-02-21 06:55:40,498][00633] Num frames 6200...
[2025-02-21 06:55:40,627][00633] Num frames 6300...
[2025-02-21 06:55:40,758][00633] Num frames 6400...
[2025-02-21 06:55:40,809][00633] Avg episode rewards: #0: 19.000, true rewards: #0: 9.143
[2025-02-21 06:55:40,810][00633] Avg episode reward: 19.000, avg true_objective: 9.143
[2025-02-21 06:55:40,945][00633] Num frames 6500...
[2025-02-21 06:55:41,074][00633] Num frames 6600...
[2025-02-21 06:55:41,202][00633] Num frames 6700...
[2025-02-21 06:55:41,286][00633] Avg episode rewards: #0: 17.400, true rewards: #0: 8.400
[2025-02-21 06:55:41,286][00633] Avg episode reward: 17.400, avg true_objective: 8.400
[2025-02-21 06:55:41,391][00633] Num frames 6800...
[2025-02-21 06:55:41,520][00633] Num frames 6900...
[2025-02-21 06:55:41,647][00633] Num frames 7000...
[2025-02-21 06:55:41,773][00633] Num frames 7100...
[2025-02-21 06:55:41,911][00633] Num frames 7200...
[2025-02-21 06:55:42,038][00633] Num frames 7300...
[2025-02-21 06:55:42,164][00633] Num frames 7400...
[2025-02-21 06:55:42,301][00633] Num frames 7500...
[2025-02-21 06:55:42,427][00633] Num frames 7600...
[2025-02-21 06:55:42,491][00633] Avg episode rewards: #0: 17.563, true rewards: #0: 8.452
[2025-02-21 06:55:42,492][00633] Avg episode reward: 17.563, avg true_objective: 8.452
[2025-02-21 06:55:42,655][00633] Num frames 7700...
[2025-02-21 06:55:42,827][00633] Num frames 7800...
[2025-02-21 06:55:43,000][00633] Num frames 7900...
[2025-02-21 06:55:43,171][00633] Num frames 8000...
[2025-02-21 06:55:43,358][00633] Num frames 8100...
[2025-02-21 06:55:43,525][00633] Num frames 8200...
[2025-02-21 06:55:43,690][00633] Num frames 8300...
[2025-02-21 06:55:43,878][00633] Num frames 8400...
[2025-02-21 06:55:44,061][00633] Num frames 8500...
[2025-02-21 06:55:44,237][00633] Num frames 8600...
[2025-02-21 06:55:44,422][00633] Num frames 8700...
[2025-02-21 06:55:44,640][00633] Avg episode rewards: #0: 18.691, true rewards: #0: 8.791
[2025-02-21 06:55:44,641][00633] Avg episode reward: 18.691, avg true_objective: 8.791
[2025-02-21 06:56:32,957][00633] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-02-21 07:03:58,216][00633] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-21 07:03:58,217][00633] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-21 07:03:58,218][00633] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-21 07:03:58,219][00633] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-21 07:03:58,220][00633] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-21 07:03:58,220][00633] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-21 07:03:58,221][00633] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-02-21 07:03:58,222][00633] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-21 07:03:58,223][00633] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-02-21 07:03:58,224][00633] Adding new argument 'hf_repository'='mjkim0928/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-02-21 07:03:58,225][00633] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-21 07:03:58,225][00633] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-21 07:03:58,226][00633] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-21 07:03:58,228][00633] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-21 07:03:58,228][00633] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-21 07:03:58,254][00633] RunningMeanStd input shape: (3, 72, 128)
[2025-02-21 07:03:58,255][00633] RunningMeanStd input shape: (1,)
[2025-02-21 07:03:58,265][00633] ConvEncoder: input_channels=3
[2025-02-21 07:03:58,297][00633] Conv encoder output size: 512
[2025-02-21 07:03:58,298][00633] Policy head output size: 512
[2025-02-21 07:03:58,316][00633] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-21 07:03:58,737][00633] Num frames 100...
[2025-02-21 07:03:58,864][00633] Num frames 200...
[2025-02-21 07:03:58,990][00633] Num frames 300...
[2025-02-21 07:03:59,118][00633] Num frames 400...
[2025-02-21 07:03:59,259][00633] Num frames 500...
[2025-02-21 07:03:59,388][00633] Num frames 600...
[2025-02-21 07:03:59,516][00633] Num frames 700...
[2025-02-21 07:03:59,642][00633] Num frames 800...
[2025-02-21 07:03:59,739][00633] Avg episode rewards: #0: 14.320, true rewards: #0: 8.320
[2025-02-21 07:03:59,740][00633] Avg episode reward: 14.320, avg true_objective: 8.320
[2025-02-21 07:03:59,828][00633] Num frames 900...
[2025-02-21 07:03:59,953][00633] Num frames 1000...
[2025-02-21 07:04:00,078][00633] Num frames 1100...
[2025-02-21 07:04:00,222][00633] Num frames 1200...
[2025-02-21 07:04:00,349][00633] Num frames 1300...
[2025-02-21 07:04:00,476][00633] Num frames 1400...
[2025-02-21 07:04:00,604][00633] Num frames 1500...
[2025-02-21 07:04:00,735][00633] Num frames 1600...
[2025-02-21 07:04:00,787][00633] Avg episode rewards: #0: 15.000, true rewards: #0: 8.000
[2025-02-21 07:04:00,788][00633] Avg episode reward: 15.000, avg true_objective: 8.000
[2025-02-21 07:04:00,916][00633] Num frames 1700...
[2025-02-21 07:04:01,044][00633] Num frames 1800...
[2025-02-21 07:04:01,172][00633] Num frames 1900...
[2025-02-21 07:04:01,312][00633] Num frames 2000...
[2025-02-21 07:04:01,437][00633] Num frames 2100...
[2025-02-21 07:04:01,565][00633] Num frames 2200...
[2025-02-21 07:04:01,670][00633] Avg episode rewards: #0: 14.467, true rewards: #0: 7.467
[2025-02-21 07:04:01,671][00633] Avg episode reward: 14.467, avg true_objective: 7.467
[2025-02-21 07:04:01,746][00633] Num frames 2300...
[2025-02-21 07:04:01,870][00633] Num frames 2400...
[2025-02-21 07:04:01,994][00633] Num frames 2500...
[2025-02-21 07:04:02,122][00633] Num frames 2600...
[2025-02-21 07:04:02,256][00633] Num frames 2700...
[2025-02-21 07:04:02,421][00633] Avg episode rewards: #0: 13.460, true rewards: #0: 6.960
[2025-02-21 07:04:02,423][00633] Avg episode reward: 13.460, avg true_objective: 6.960
[2025-02-21 07:04:02,449][00633] Num frames 2800...
[2025-02-21 07:04:02,578][00633] Num frames 2900...
[2025-02-21 07:04:02,707][00633] Num frames 3000...
[2025-02-21 07:04:02,834][00633] Num frames 3100...
[2025-02-21 07:04:02,962][00633] Num frames 3200...
[2025-02-21 07:04:03,089][00633] Num frames 3300...
[2025-02-21 07:04:03,224][00633] Num frames 3400...
[2025-02-21 07:04:03,364][00633] Num frames 3500...
[2025-02-21 07:04:03,488][00633] Num frames 3600...
[2025-02-21 07:04:03,615][00633] Num frames 3700...
[2025-02-21 07:04:03,741][00633] Num frames 3800...
[2025-02-21 07:04:03,865][00633] Num frames 3900...
[2025-02-21 07:04:03,990][00633] Num frames 4000...
[2025-02-21 07:04:04,111][00633] Num frames 4100...
[2025-02-21 07:04:04,241][00633] Num frames 4200...
[2025-02-21 07:04:04,327][00633] Avg episode rewards: #0: 17.248, true rewards: #0: 8.448
[2025-02-21 07:04:04,328][00633] Avg episode reward: 17.248, avg true_objective: 8.448
[2025-02-21 07:04:04,426][00633] Num frames 4300...
[2025-02-21 07:04:04,552][00633] Num frames 4400...
[2025-02-21 07:04:04,679][00633] Num frames 4500...
[2025-02-21 07:04:04,806][00633] Num frames 4600...
[2025-02-21 07:04:04,932][00633] Num frames 4700...
[2025-02-21 07:04:05,059][00633] Num frames 4800...
[2025-02-21 07:04:05,186][00633] Num frames 4900...
[2025-02-21 07:04:05,325][00633] Num frames 5000...
[2025-02-21 07:04:05,458][00633] Num frames 5100...
[2025-02-21 07:04:05,588][00633] Num frames 5200...
[2025-02-21 07:04:05,743][00633] Num frames 5300...
[2025-02-21 07:04:05,920][00633] Num frames 5400...
[2025-02-21 07:04:05,990][00633] Avg episode rewards: #0: 19.180, true rewards: #0: 9.013
[2025-02-21 07:04:05,991][00633] Avg episode reward: 19.180, avg true_objective: 9.013
[2025-02-21 07:04:06,146][00633] Num frames 5500...
[2025-02-21 07:04:06,314][00633] Num frames 5600...
[2025-02-21 07:04:06,490][00633] Num frames 5700...
[2025-02-21 07:04:06,654][00633] Num frames 5800...
[2025-02-21 07:04:06,818][00633] Num frames 5900...
[2025-02-21 07:04:06,994][00633] Num frames 6000...
[2025-02-21 07:04:07,171][00633] Num frames 6100...
[2025-02-21 07:04:07,353][00633] Num frames 6200...
[2025-02-21 07:04:07,542][00633] Num frames 6300...
[2025-02-21 07:04:07,721][00633] Num frames 6400...
[2025-02-21 07:04:07,871][00633] Num frames 6500...
[2025-02-21 07:04:08,000][00633] Num frames 6600...
[2025-02-21 07:04:08,166][00633] Avg episode rewards: #0: 21.269, true rewards: #0: 9.554
[2025-02-21 07:04:08,167][00633] Avg episode reward: 21.269, avg true_objective: 9.554
[2025-02-21 07:04:08,184][00633] Num frames 6700...
[2025-02-21 07:04:08,311][00633] Num frames 6800...
[2025-02-21 07:04:08,438][00633] Num frames 6900...
[2025-02-21 07:04:08,573][00633] Num frames 7000...
[2025-02-21 07:04:08,703][00633] Num frames 7100...
[2025-02-21 07:04:08,829][00633] Num frames 7200...
[2025-02-21 07:04:08,958][00633] Num frames 7300...
[2025-02-21 07:04:09,088][00633] Num frames 7400...
[2025-02-21 07:04:09,258][00633] Avg episode rewards: #0: 20.489, true rewards: #0: 9.364
[2025-02-21 07:04:09,259][00633] Avg episode reward: 20.489, avg true_objective: 9.364
[2025-02-21 07:04:09,272][00633] Num frames 7500...
[2025-02-21 07:04:09,396][00633] Num frames 7600...
[2025-02-21 07:04:09,529][00633] Num frames 7700...
[2025-02-21 07:04:09,657][00633] Num frames 7800...
[2025-02-21 07:04:09,783][00633] Num frames 7900...
[2025-02-21 07:04:09,908][00633] Num frames 8000...
[2025-02-21 07:04:10,091][00633] Avg episode rewards: #0: 19.332, true rewards: #0: 8.999
[2025-02-21 07:04:10,092][00633] Avg episode reward: 19.332, avg true_objective: 8.999
[2025-02-21 07:04:10,094][00633] Num frames 8100...
[2025-02-21 07:04:10,225][00633] Num frames 8200...
[2025-02-21 07:04:10,353][00633] Num frames 8300...
[2025-02-21 07:04:10,478][00633] Num frames 8400...
[2025-02-21 07:04:10,615][00633] Num frames 8500...
[2025-02-21 07:04:10,693][00633] Avg episode rewards: #0: 18.017, true rewards: #0: 8.517
[2025-02-21 07:04:10,695][00633] Avg episode reward: 18.017, avg true_objective: 8.517
[2025-02-21 07:04:57,181][00633] Replay video saved to /content/train_dir/default_experiment/replay.mp4!