u79jm's picture
Upload folder using huggingface_hub
f94087f verified
[2025-02-21 06:04:34,856][00162] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-02-21 06:04:34,859][00162] Rollout worker 0 uses device cpu
[2025-02-21 06:04:34,861][00162] Rollout worker 1 uses device cpu
[2025-02-21 06:04:34,861][00162] Rollout worker 2 uses device cpu
[2025-02-21 06:04:34,863][00162] Rollout worker 3 uses device cpu
[2025-02-21 06:04:34,865][00162] Rollout worker 4 uses device cpu
[2025-02-21 06:04:34,866][00162] Rollout worker 5 uses device cpu
[2025-02-21 06:04:34,868][00162] Rollout worker 6 uses device cpu
[2025-02-21 06:04:34,869][00162] Rollout worker 7 uses device cpu
[2025-02-21 06:04:35,127][00162] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-21 06:04:35,128][00162] InferenceWorker_p0-w0: min num requests: 2
[2025-02-21 06:04:35,174][00162] Starting all processes...
[2025-02-21 06:04:35,175][00162] Starting process learner_proc0
[2025-02-21 06:04:35,273][00162] Starting all processes...
[2025-02-21 06:04:35,359][00162] Starting process inference_proc0-0
[2025-02-21 06:04:35,360][00162] Starting process rollout_proc0
[2025-02-21 06:04:35,363][00162] Starting process rollout_proc1
[2025-02-21 06:04:35,363][00162] Starting process rollout_proc2
[2025-02-21 06:04:35,363][00162] Starting process rollout_proc3
[2025-02-21 06:04:35,363][00162] Starting process rollout_proc4
[2025-02-21 06:04:35,363][00162] Starting process rollout_proc5
[2025-02-21 06:04:35,363][00162] Starting process rollout_proc6
[2025-02-21 06:04:35,363][00162] Starting process rollout_proc7
[2025-02-21 06:04:50,264][04898] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-21 06:04:50,266][04898] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-02-21 06:04:50,345][04898] Num visible devices: 1
[2025-02-21 06:04:50,344][04922] Worker 6 uses CPU cores [0]
[2025-02-21 06:04:50,398][04898] Starting seed is not provided
[2025-02-21 06:04:50,398][04898] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-21 06:04:50,399][04898] Initializing actor-critic model on device cuda:0
[2025-02-21 06:04:50,399][04898] RunningMeanStd input shape: (3, 72, 128)
[2025-02-21 06:04:50,402][04898] RunningMeanStd input shape: (1,)
[2025-02-21 06:04:50,512][04898] ConvEncoder: input_channels=3
[2025-02-21 06:04:50,987][04919] Worker 3 uses CPU cores [1]
[2025-02-21 06:04:50,988][04917] Worker 1 uses CPU cores [1]
[2025-02-21 06:04:51,018][04915] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-21 06:04:51,018][04915] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-02-21 06:04:51,023][04918] Worker 2 uses CPU cores [0]
[2025-02-21 06:04:51,040][04923] Worker 5 uses CPU cores [1]
[2025-02-21 06:04:51,093][04916] Worker 0 uses CPU cores [0]
[2025-02-21 06:04:51,104][04921] Worker 7 uses CPU cores [1]
[2025-02-21 06:04:51,105][04920] Worker 4 uses CPU cores [0]
[2025-02-21 06:04:51,111][04915] Num visible devices: 1
[2025-02-21 06:04:51,157][04898] Conv encoder output size: 512
[2025-02-21 06:04:51,158][04898] Policy head output size: 512
[2025-02-21 06:04:51,210][04898] Created Actor Critic model with architecture:
[2025-02-21 06:04:51,210][04898] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2025-02-21 06:04:51,543][04898] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-02-21 06:04:55,119][00162] Heartbeat connected on Batcher_0
[2025-02-21 06:04:55,128][00162] Heartbeat connected on InferenceWorker_p0-w0
[2025-02-21 06:04:55,137][00162] Heartbeat connected on RolloutWorker_w0
[2025-02-21 06:04:55,142][00162] Heartbeat connected on RolloutWorker_w1
[2025-02-21 06:04:55,148][00162] Heartbeat connected on RolloutWorker_w2
[2025-02-21 06:04:55,152][00162] Heartbeat connected on RolloutWorker_w3
[2025-02-21 06:04:55,157][00162] Heartbeat connected on RolloutWorker_w4
[2025-02-21 06:04:55,166][00162] Heartbeat connected on RolloutWorker_w5
[2025-02-21 06:04:55,168][00162] Heartbeat connected on RolloutWorker_w6
[2025-02-21 06:04:55,173][00162] Heartbeat connected on RolloutWorker_w7
[2025-02-21 06:04:55,518][04898] No checkpoints found
[2025-02-21 06:04:55,520][04898] Did not load from checkpoint, starting from scratch!
[2025-02-21 06:04:55,521][04898] Initialized policy 0 weights for model version 0
[2025-02-21 06:04:55,524][04898] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-21 06:04:55,534][04898] LearnerWorker_p0 finished initialization!
[2025-02-21 06:04:55,535][00162] Heartbeat connected on LearnerWorker_p0
[2025-02-21 06:04:55,744][04915] RunningMeanStd input shape: (3, 72, 128)
[2025-02-21 06:04:55,746][04915] RunningMeanStd input shape: (1,)
[2025-02-21 06:04:55,759][04915] ConvEncoder: input_channels=3
[2025-02-21 06:04:55,858][04915] Conv encoder output size: 512
[2025-02-21 06:04:55,858][04915] Policy head output size: 512
[2025-02-21 06:04:55,894][00162] Inference worker 0-0 is ready!
[2025-02-21 06:04:55,901][00162] All inference workers are ready! Signal rollout workers to start!
[2025-02-21 06:04:56,241][04917] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:04:56,320][04921] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:04:56,366][04923] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:04:56,365][04920] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:04:56,404][04918] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:04:56,406][04922] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:04:56,472][04916] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:04:56,523][04919] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:04:57,911][00162] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-21 06:04:58,058][04917] Decorrelating experience for 0 frames...
[2025-02-21 06:04:58,059][04916] Decorrelating experience for 0 frames...
[2025-02-21 06:04:58,642][04921] Decorrelating experience for 0 frames...
[2025-02-21 06:04:58,652][04919] Decorrelating experience for 0 frames...
[2025-02-21 06:04:59,076][04917] Decorrelating experience for 32 frames...
[2025-02-21 06:04:59,120][04916] Decorrelating experience for 32 frames...
[2025-02-21 06:04:59,509][04921] Decorrelating experience for 32 frames...
[2025-02-21 06:04:59,878][04919] Decorrelating experience for 32 frames...
[2025-02-21 06:04:59,997][04917] Decorrelating experience for 64 frames...
[2025-02-21 06:05:00,290][04916] Decorrelating experience for 64 frames...
[2025-02-21 06:05:00,733][04916] Decorrelating experience for 96 frames...
[2025-02-21 06:05:01,500][04921] Decorrelating experience for 64 frames...
[2025-02-21 06:05:01,666][04917] Decorrelating experience for 96 frames...
[2025-02-21 06:05:01,696][04919] Decorrelating experience for 64 frames...
[2025-02-21 06:05:02,601][04921] Decorrelating experience for 96 frames...
[2025-02-21 06:05:02,697][04919] Decorrelating experience for 96 frames...
[2025-02-21 06:05:02,911][00162] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.4. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-21 06:05:02,915][00162] Avg episode reward: [(0, '0.470')]
[2025-02-21 06:05:05,029][04898] Signal inference workers to stop experience collection...
[2025-02-21 06:05:05,042][04915] InferenceWorker_p0-w0: stopping experience collection
[2025-02-21 06:05:06,937][04898] Signal inference workers to resume experience collection...
[2025-02-21 06:05:06,942][04915] InferenceWorker_p0-w0: resuming experience collection
[2025-02-21 06:05:07,911][00162] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8192. Throughput: 0: 306.2. Samples: 3062. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2025-02-21 06:05:07,912][00162] Avg episode reward: [(0, '3.510')]
[2025-02-21 06:05:12,911][00162] Fps is (10 sec: 2457.7, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 24576. Throughput: 0: 372.5. Samples: 5588. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:05:12,912][00162] Avg episode reward: [(0, '4.105')]
[2025-02-21 06:05:16,289][04915] Updated weights for policy 0, policy_version 10 (0.0122)
[2025-02-21 06:05:17,914][00162] Fps is (10 sec: 3685.3, 60 sec: 2252.5, 300 sec: 2252.5). Total num frames: 45056. Throughput: 0: 558.7. Samples: 11176. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:05:17,915][00162] Avg episode reward: [(0, '4.342')]
[2025-02-21 06:05:22,912][00162] Fps is (10 sec: 4095.4, 60 sec: 2621.3, 300 sec: 2621.3). Total num frames: 65536. Throughput: 0: 698.8. Samples: 17472. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:05:22,918][00162] Avg episode reward: [(0, '4.386')]
[2025-02-21 06:05:27,403][04915] Updated weights for policy 0, policy_version 20 (0.0016)
[2025-02-21 06:05:27,911][00162] Fps is (10 sec: 3687.6, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 81920. Throughput: 0: 647.5. Samples: 19426. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:05:27,912][00162] Avg episode reward: [(0, '4.470')]
[2025-02-21 06:05:32,911][00162] Fps is (10 sec: 3686.8, 60 sec: 2925.7, 300 sec: 2925.7). Total num frames: 102400. Throughput: 0: 728.4. Samples: 25494. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:05:32,916][00162] Avg episode reward: [(0, '4.422')]
[2025-02-21 06:05:32,918][04898] Saving new best policy, reward=4.422!
[2025-02-21 06:05:37,911][00162] Fps is (10 sec: 3686.2, 60 sec: 2969.6, 300 sec: 2969.6). Total num frames: 118784. Throughput: 0: 703.0. Samples: 28122. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:05:37,913][00162] Avg episode reward: [(0, '4.394')]
[2025-02-21 06:05:38,753][04915] Updated weights for policy 0, policy_version 30 (0.0018)
[2025-02-21 06:05:42,911][00162] Fps is (10 sec: 3686.5, 60 sec: 3094.8, 300 sec: 3094.8). Total num frames: 139264. Throughput: 0: 734.3. Samples: 33044. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:05:42,912][00162] Avg episode reward: [(0, '4.434')]
[2025-02-21 06:05:42,916][04898] Saving new best policy, reward=4.434!
[2025-02-21 06:05:47,911][00162] Fps is (10 sec: 4096.2, 60 sec: 3194.9, 300 sec: 3194.9). Total num frames: 159744. Throughput: 0: 872.0. Samples: 39250. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:05:47,913][00162] Avg episode reward: [(0, '4.446')]
[2025-02-21 06:05:47,917][04898] Saving new best policy, reward=4.446!
[2025-02-21 06:05:48,893][04915] Updated weights for policy 0, policy_version 40 (0.0023)
[2025-02-21 06:05:52,911][00162] Fps is (10 sec: 3276.8, 60 sec: 3127.9, 300 sec: 3127.9). Total num frames: 172032. Throughput: 0: 911.8. Samples: 44094. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:05:52,912][00162] Avg episode reward: [(0, '4.280')]
[2025-02-21 06:05:57,911][00162] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3208.5). Total num frames: 192512. Throughput: 0: 923.9. Samples: 47164. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:05:57,916][00162] Avg episode reward: [(0, '4.395')]
[2025-02-21 06:06:00,081][04915] Updated weights for policy 0, policy_version 50 (0.0012)
[2025-02-21 06:06:02,911][00162] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 212992. Throughput: 0: 936.6. Samples: 53322. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:06:02,916][00162] Avg episode reward: [(0, '4.456')]
[2025-02-21 06:06:02,924][04898] Saving new best policy, reward=4.456!
[2025-02-21 06:06:07,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 229376. Throughput: 0: 898.3. Samples: 57896. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:06:07,914][00162] Avg episode reward: [(0, '4.422')]
[2025-02-21 06:06:11,498][04915] Updated weights for policy 0, policy_version 60 (0.0012)
[2025-02-21 06:06:12,911][00162] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3331.4). Total num frames: 249856. Throughput: 0: 924.0. Samples: 61004. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:06:12,912][00162] Avg episode reward: [(0, '4.434')]
[2025-02-21 06:06:17,912][00162] Fps is (10 sec: 3686.0, 60 sec: 3686.5, 300 sec: 3328.0). Total num frames: 266240. Throughput: 0: 919.7. Samples: 66882. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:06:17,915][00162] Avg episode reward: [(0, '4.409')]
[2025-02-21 06:06:22,911][00162] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3325.0). Total num frames: 282624. Throughput: 0: 972.5. Samples: 71884. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:06:22,915][00162] Avg episode reward: [(0, '4.449')]
[2025-02-21 06:06:23,030][04915] Updated weights for policy 0, policy_version 70 (0.0013)
[2025-02-21 06:06:27,911][00162] Fps is (10 sec: 4096.5, 60 sec: 3754.7, 300 sec: 3413.3). Total num frames: 307200. Throughput: 0: 932.0. Samples: 74986. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:06:27,913][00162] Avg episode reward: [(0, '4.423')]
[2025-02-21 06:06:27,919][04898] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth...
[2025-02-21 06:06:32,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3363.0). Total num frames: 319488. Throughput: 0: 912.8. Samples: 80328. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:06:32,912][00162] Avg episode reward: [(0, '4.534')]
[2025-02-21 06:06:32,914][04898] Saving new best policy, reward=4.534!
[2025-02-21 06:06:34,409][04915] Updated weights for policy 0, policy_version 80 (0.0012)
[2025-02-21 06:06:37,911][00162] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3399.7). Total num frames: 339968. Throughput: 0: 928.7. Samples: 85884. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:06:37,912][00162] Avg episode reward: [(0, '4.592')]
[2025-02-21 06:06:37,917][04898] Saving new best policy, reward=4.592!
[2025-02-21 06:06:42,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3432.8). Total num frames: 360448. Throughput: 0: 928.3. Samples: 88938. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:06:42,912][00162] Avg episode reward: [(0, '4.549')]
[2025-02-21 06:06:44,827][04915] Updated weights for policy 0, policy_version 90 (0.0012)
[2025-02-21 06:06:47,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3425.7). Total num frames: 376832. Throughput: 0: 900.4. Samples: 93838. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:06:47,912][00162] Avg episode reward: [(0, '4.632')]
[2025-02-21 06:06:47,917][04898] Saving new best policy, reward=4.632!
[2025-02-21 06:06:52,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3454.9). Total num frames: 397312. Throughput: 0: 937.9. Samples: 100102. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:06:52,918][00162] Avg episode reward: [(0, '4.743')]
[2025-02-21 06:06:52,921][04898] Saving new best policy, reward=4.743!
[2025-02-21 06:06:55,522][04915] Updated weights for policy 0, policy_version 100 (0.0012)
[2025-02-21 06:06:57,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3481.6). Total num frames: 417792. Throughput: 0: 937.2. Samples: 103176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-21 06:06:57,915][00162] Avg episode reward: [(0, '4.543')]
[2025-02-21 06:07:02,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3473.4). Total num frames: 434176. Throughput: 0: 916.8. Samples: 108138. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:07:02,915][00162] Avg episode reward: [(0, '4.550')]
[2025-02-21 06:07:06,622][04915] Updated weights for policy 0, policy_version 110 (0.0014)
[2025-02-21 06:07:07,911][00162] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3497.4). Total num frames: 454656. Throughput: 0: 942.8. Samples: 114308. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:07:07,912][00162] Avg episode reward: [(0, '4.655')]
[2025-02-21 06:07:12,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3489.2). Total num frames: 471040. Throughput: 0: 934.3. Samples: 117030. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:07:12,915][00162] Avg episode reward: [(0, '4.601')]
[2025-02-21 06:07:17,645][04915] Updated weights for policy 0, policy_version 120 (0.0013)
[2025-02-21 06:07:17,913][00162] Fps is (10 sec: 3685.6, 60 sec: 3754.6, 300 sec: 3510.8). Total num frames: 491520. Throughput: 0: 933.2. Samples: 122324. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:07:17,914][00162] Avg episode reward: [(0, '4.605')]
[2025-02-21 06:07:22,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3531.0). Total num frames: 512000. Throughput: 0: 949.2. Samples: 128600. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:07:22,912][00162] Avg episode reward: [(0, '4.502')]
[2025-02-21 06:07:27,911][00162] Fps is (10 sec: 3687.2, 60 sec: 3686.4, 300 sec: 3522.6). Total num frames: 528384. Throughput: 0: 929.6. Samples: 130772. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:07:27,912][00162] Avg episode reward: [(0, '4.428')]
[2025-02-21 06:07:28,842][04915] Updated weights for policy 0, policy_version 130 (0.0014)
[2025-02-21 06:07:32,911][00162] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3541.1). Total num frames: 548864. Throughput: 0: 950.1. Samples: 136592. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:07:32,915][00162] Avg episode reward: [(0, '4.561')]
[2025-02-21 06:07:37,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3532.8). Total num frames: 565248. Throughput: 0: 940.7. Samples: 142432. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:07:37,917][00162] Avg episode reward: [(0, '4.703')]
[2025-02-21 06:07:39,736][04915] Updated weights for policy 0, policy_version 140 (0.0013)
[2025-02-21 06:07:42,911][00162] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3525.0). Total num frames: 581632. Throughput: 0: 920.3. Samples: 144590. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:07:42,915][00162] Avg episode reward: [(0, '4.796')]
[2025-02-21 06:07:42,962][04898] Saving new best policy, reward=4.796!
[2025-02-21 06:07:47,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3541.8). Total num frames: 602112. Throughput: 0: 945.6. Samples: 150690. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:07:47,919][00162] Avg episode reward: [(0, '4.874')]
[2025-02-21 06:07:48,013][04898] Saving new best policy, reward=4.874!
[2025-02-21 06:07:50,018][04915] Updated weights for policy 0, policy_version 150 (0.0013)
[2025-02-21 06:07:52,912][00162] Fps is (10 sec: 4095.6, 60 sec: 3754.6, 300 sec: 3557.6). Total num frames: 622592. Throughput: 0: 925.8. Samples: 155972. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:07:52,914][00162] Avg episode reward: [(0, '5.018')]
[2025-02-21 06:07:52,918][04898] Saving new best policy, reward=5.018!
[2025-02-21 06:07:57,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3549.9). Total num frames: 638976. Throughput: 0: 922.6. Samples: 158548. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:07:57,916][00162] Avg episode reward: [(0, '5.030')]
[2025-02-21 06:07:57,923][04898] Saving new best policy, reward=5.030!
[2025-02-21 06:08:01,316][04915] Updated weights for policy 0, policy_version 160 (0.0012)
[2025-02-21 06:08:02,911][00162] Fps is (10 sec: 3686.8, 60 sec: 3754.7, 300 sec: 3564.6). Total num frames: 659456. Throughput: 0: 943.5. Samples: 164780. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:08:02,912][00162] Avg episode reward: [(0, '4.927')]
[2025-02-21 06:08:07,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3557.1). Total num frames: 675840. Throughput: 0: 909.7. Samples: 169538. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:08:07,915][00162] Avg episode reward: [(0, '4.855')]
[2025-02-21 06:08:12,516][04915] Updated weights for policy 0, policy_version 170 (0.0014)
[2025-02-21 06:08:12,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3570.9). Total num frames: 696320. Throughput: 0: 931.2. Samples: 172676. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:08:12,912][00162] Avg episode reward: [(0, '4.913')]
[2025-02-21 06:08:17,913][00162] Fps is (10 sec: 4095.4, 60 sec: 3754.7, 300 sec: 3584.0). Total num frames: 716800. Throughput: 0: 943.4. Samples: 179046. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:08:17,917][00162] Avg episode reward: [(0, '5.207')]
[2025-02-21 06:08:17,924][04898] Saving new best policy, reward=5.207!
[2025-02-21 06:08:22,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3576.5). Total num frames: 733184. Throughput: 0: 920.9. Samples: 183872. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:08:22,913][00162] Avg episode reward: [(0, '5.065')]
[2025-02-21 06:08:23,644][04915] Updated weights for policy 0, policy_version 180 (0.0017)
[2025-02-21 06:08:27,911][00162] Fps is (10 sec: 3687.0, 60 sec: 3754.7, 300 sec: 3588.9). Total num frames: 753664. Throughput: 0: 942.5. Samples: 187002. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:08:27,912][00162] Avg episode reward: [(0, '5.035')]
[2025-02-21 06:08:27,918][04898] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000184_753664.pth...
[2025-02-21 06:08:32,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3581.6). Total num frames: 770048. Throughput: 0: 937.0. Samples: 192856. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:08:32,914][00162] Avg episode reward: [(0, '5.223')]
[2025-02-21 06:08:32,915][04898] Saving new best policy, reward=5.223!
[2025-02-21 06:08:34,952][04915] Updated weights for policy 0, policy_version 190 (0.0012)
[2025-02-21 06:08:37,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3593.3). Total num frames: 790528. Throughput: 0: 930.3. Samples: 197834. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:08:37,912][00162] Avg episode reward: [(0, '5.232')]
[2025-02-21 06:08:37,925][04898] Saving new best policy, reward=5.232!
[2025-02-21 06:08:42,911][00162] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3586.3). Total num frames: 806912. Throughput: 0: 940.0. Samples: 200848. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:08:42,912][00162] Avg episode reward: [(0, '4.959')]
[2025-02-21 06:08:45,077][04915] Updated weights for policy 0, policy_version 200 (0.0013)
[2025-02-21 06:08:47,911][00162] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3579.5). Total num frames: 823296. Throughput: 0: 916.8. Samples: 206034. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:08:47,918][00162] Avg episode reward: [(0, '5.142')]
[2025-02-21 06:08:52,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3590.5). Total num frames: 843776. Throughput: 0: 939.8. Samples: 211830. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:08:52,915][00162] Avg episode reward: [(0, '4.983')]
[2025-02-21 06:08:56,089][04915] Updated weights for policy 0, policy_version 210 (0.0013)
[2025-02-21 06:08:57,913][00162] Fps is (10 sec: 4095.4, 60 sec: 3754.6, 300 sec: 3601.0). Total num frames: 864256. Throughput: 0: 940.7. Samples: 215010. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:08:57,915][00162] Avg episode reward: [(0, '5.006')]
[2025-02-21 06:09:02,911][00162] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3594.4). Total num frames: 880640. Throughput: 0: 909.4. Samples: 219968. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:09:02,916][00162] Avg episode reward: [(0, '5.371')]
[2025-02-21 06:09:02,919][04898] Saving new best policy, reward=5.371!
[2025-02-21 06:09:07,511][04915] Updated weights for policy 0, policy_version 220 (0.0015)
[2025-02-21 06:09:07,911][00162] Fps is (10 sec: 3687.0, 60 sec: 3754.7, 300 sec: 3604.5). Total num frames: 901120. Throughput: 0: 936.7. Samples: 226022. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:09:07,913][00162] Avg episode reward: [(0, '5.912')]
[2025-02-21 06:09:07,920][04898] Saving new best policy, reward=5.912!
[2025-02-21 06:09:12,911][00162] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3598.1). Total num frames: 917504. Throughput: 0: 934.0. Samples: 229030. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:09:12,912][00162] Avg episode reward: [(0, '6.440')]
[2025-02-21 06:09:12,914][04898] Saving new best policy, reward=6.440!
[2025-02-21 06:09:17,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3607.6). Total num frames: 937984. Throughput: 0: 912.5. Samples: 233920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-21 06:09:17,915][00162] Avg episode reward: [(0, '6.264')]
[2025-02-21 06:09:18,702][04915] Updated weights for policy 0, policy_version 230 (0.0014)
[2025-02-21 06:09:22,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3616.8). Total num frames: 958464. Throughput: 0: 942.4. Samples: 240242. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:09:22,914][00162] Avg episode reward: [(0, '6.891')]
[2025-02-21 06:09:22,917][04898] Saving new best policy, reward=6.891!
[2025-02-21 06:09:27,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3610.5). Total num frames: 974848. Throughput: 0: 934.9. Samples: 242920. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:09:27,912][00162] Avg episode reward: [(0, '6.714')]
[2025-02-21 06:09:29,854][04915] Updated weights for policy 0, policy_version 240 (0.0018)
[2025-02-21 06:09:32,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3619.4). Total num frames: 995328. Throughput: 0: 939.3. Samples: 248302. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-21 06:09:32,916][00162] Avg episode reward: [(0, '6.629')]
[2025-02-21 06:09:37,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3627.9). Total num frames: 1015808. Throughput: 0: 947.8. Samples: 254480. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-21 06:09:37,912][00162] Avg episode reward: [(0, '6.497')]
[2025-02-21 06:09:40,489][04915] Updated weights for policy 0, policy_version 250 (0.0013)
[2025-02-21 06:09:42,911][00162] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3621.7). Total num frames: 1032192. Throughput: 0: 920.8. Samples: 256446. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:09:42,912][00162] Avg episode reward: [(0, '6.816')]
[2025-02-21 06:09:47,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3629.9). Total num frames: 1052672. Throughput: 0: 948.1. Samples: 262634. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:09:47,915][00162] Avg episode reward: [(0, '7.219')]
[2025-02-21 06:09:47,921][04898] Saving new best policy, reward=7.219!
[2025-02-21 06:09:50,619][04915] Updated weights for policy 0, policy_version 260 (0.0014)
[2025-02-21 06:09:52,913][00162] Fps is (10 sec: 3685.7, 60 sec: 3754.5, 300 sec: 3623.9). Total num frames: 1069056. Throughput: 0: 939.6. Samples: 268306. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:09:52,916][00162] Avg episode reward: [(0, '7.808')]
[2025-02-21 06:09:52,923][04898] Saving new best policy, reward=7.808!
[2025-02-21 06:09:57,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3693.3). Total num frames: 1089536. Throughput: 0: 923.7. Samples: 270596. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:09:57,914][00162] Avg episode reward: [(0, '8.609')]
[2025-02-21 06:09:57,920][04898] Saving new best policy, reward=8.609!
[2025-02-21 06:10:01,868][04915] Updated weights for policy 0, policy_version 270 (0.0012)
[2025-02-21 06:10:02,911][00162] Fps is (10 sec: 4096.8, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1110016. Throughput: 0: 952.2. Samples: 276768. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:10:02,912][00162] Avg episode reward: [(0, '9.376')]
[2025-02-21 06:10:02,914][04898] Saving new best policy, reward=9.376!
[2025-02-21 06:10:07,911][00162] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1122304. Throughput: 0: 921.4. Samples: 281704. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:10:07,912][00162] Avg episode reward: [(0, '9.507')]
[2025-02-21 06:10:07,919][04898] Saving new best policy, reward=9.507!
[2025-02-21 06:10:12,913][00162] Fps is (10 sec: 3276.1, 60 sec: 3754.5, 300 sec: 3721.1). Total num frames: 1142784. Throughput: 0: 923.9. Samples: 284498. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:10:12,915][00162] Avg episode reward: [(0, '8.930')]
[2025-02-21 06:10:13,439][04915] Updated weights for policy 0, policy_version 280 (0.0014)
[2025-02-21 06:10:17,911][00162] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1163264. Throughput: 0: 940.2. Samples: 290612. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:10:17,915][00162] Avg episode reward: [(0, '9.368')]
[2025-02-21 06:10:22,911][00162] Fps is (10 sec: 3687.2, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1179648. Throughput: 0: 910.1. Samples: 295436. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:10:22,912][00162] Avg episode reward: [(0, '10.402')]
[2025-02-21 06:10:22,918][04898] Saving new best policy, reward=10.402!
[2025-02-21 06:10:24,737][04915] Updated weights for policy 0, policy_version 290 (0.0014)
[2025-02-21 06:10:27,913][00162] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3721.1). Total num frames: 1200128. Throughput: 0: 933.0. Samples: 298434. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:10:27,915][00162] Avg episode reward: [(0, '10.897')]
[2025-02-21 06:10:27,923][04898] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000293_1200128.pth...
[2025-02-21 06:10:28,032][04898] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth
[2025-02-21 06:10:28,050][04898] Saving new best policy, reward=10.897!
[2025-02-21 06:10:32,912][00162] Fps is (10 sec: 3685.9, 60 sec: 3686.3, 300 sec: 3721.1). Total num frames: 1216512. Throughput: 0: 927.6. Samples: 304378. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:10:32,916][00162] Avg episode reward: [(0, '11.775')]
[2025-02-21 06:10:32,920][04898] Saving new best policy, reward=11.775!
[2025-02-21 06:10:36,402][04915] Updated weights for policy 0, policy_version 300 (0.0014)
[2025-02-21 06:10:37,911][00162] Fps is (10 sec: 3277.6, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 1232896. Throughput: 0: 905.3. Samples: 309042. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:10:37,912][00162] Avg episode reward: [(0, '11.857')]
[2025-02-21 06:10:37,919][04898] Saving new best policy, reward=11.857!
[2025-02-21 06:10:42,911][00162] Fps is (10 sec: 3686.9, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1253376. Throughput: 0: 920.8. Samples: 312034. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:10:42,912][00162] Avg episode reward: [(0, '11.677')]
[2025-02-21 06:10:46,932][04915] Updated weights for policy 0, policy_version 310 (0.0012)
[2025-02-21 06:10:47,913][00162] Fps is (10 sec: 3685.7, 60 sec: 3618.0, 300 sec: 3721.1). Total num frames: 1269760. Throughput: 0: 907.8. Samples: 317622. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:10:47,914][00162] Avg episode reward: [(0, '10.330')]
[2025-02-21 06:10:52,913][00162] Fps is (10 sec: 3685.6, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1290240. Throughput: 0: 919.0. Samples: 323060. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:10:52,914][00162] Avg episode reward: [(0, '10.195')]
[2025-02-21 06:10:57,572][04915] Updated weights for policy 0, policy_version 320 (0.0013)
[2025-02-21 06:10:57,912][00162] Fps is (10 sec: 4096.3, 60 sec: 3686.3, 300 sec: 3721.1). Total num frames: 1310720. Throughput: 0: 927.0. Samples: 326214. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:10:57,916][00162] Avg episode reward: [(0, '10.557')]
[2025-02-21 06:11:02,911][00162] Fps is (10 sec: 3687.2, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 1327104. Throughput: 0: 906.4. Samples: 331398. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:11:02,916][00162] Avg episode reward: [(0, '10.792')]
[2025-02-21 06:11:07,911][00162] Fps is (10 sec: 3686.9, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1347584. Throughput: 0: 935.2. Samples: 337518. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:11:07,912][00162] Avg episode reward: [(0, '11.643')]
[2025-02-21 06:11:08,494][04915] Updated weights for policy 0, policy_version 330 (0.0014)
[2025-02-21 06:11:12,913][00162] Fps is (10 sec: 4095.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1368064. Throughput: 0: 939.0. Samples: 340690. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:11:12,917][00162] Avg episode reward: [(0, '11.579')]
[2025-02-21 06:11:17,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1384448. Throughput: 0: 918.7. Samples: 345716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:11:17,912][00162] Avg episode reward: [(0, '12.669')]
[2025-02-21 06:11:17,919][04898] Saving new best policy, reward=12.669!
[2025-02-21 06:11:19,540][04915] Updated weights for policy 0, policy_version 340 (0.0016)
[2025-02-21 06:11:22,911][00162] Fps is (10 sec: 3687.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1404928. Throughput: 0: 954.8. Samples: 352006. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:11:22,913][00162] Avg episode reward: [(0, '13.708')]
[2025-02-21 06:11:22,914][04898] Saving new best policy, reward=13.708!
[2025-02-21 06:11:27,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3735.0). Total num frames: 1421312. Throughput: 0: 957.5. Samples: 355122. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:11:27,912][00162] Avg episode reward: [(0, '14.472')]
[2025-02-21 06:11:27,921][04898] Saving new best policy, reward=14.472!
[2025-02-21 06:11:30,611][04915] Updated weights for policy 0, policy_version 350 (0.0013)
[2025-02-21 06:11:32,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 1441792. Throughput: 0: 944.4. Samples: 360118. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:11:32,912][00162] Avg episode reward: [(0, '13.921')]
[2025-02-21 06:11:37,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1462272. Throughput: 0: 960.8. Samples: 366292. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:11:37,912][00162] Avg episode reward: [(0, '13.731')]
[2025-02-21 06:11:41,052][04915] Updated weights for policy 0, policy_version 360 (0.0012)
[2025-02-21 06:11:42,911][00162] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1478656. Throughput: 0: 947.4. Samples: 368844. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:11:42,915][00162] Avg episode reward: [(0, '13.586')]
[2025-02-21 06:11:47,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3735.0). Total num frames: 1499136. Throughput: 0: 957.1. Samples: 374466. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:11:47,915][00162] Avg episode reward: [(0, '13.231')]
[2025-02-21 06:11:51,353][04915] Updated weights for policy 0, policy_version 370 (0.0013)
[2025-02-21 06:11:52,911][00162] Fps is (10 sec: 4096.1, 60 sec: 3823.1, 300 sec: 3735.0). Total num frames: 1519616. Throughput: 0: 891.4. Samples: 377630. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:11:52,912][00162] Avg episode reward: [(0, '13.835')]
[2025-02-21 06:11:57,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1536000. Throughput: 0: 932.4. Samples: 382646. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:11:57,915][00162] Avg episode reward: [(0, '14.209')]
[2025-02-21 06:12:02,272][04915] Updated weights for policy 0, policy_version 380 (0.0014)
[2025-02-21 06:12:02,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1556480. Throughput: 0: 962.5. Samples: 389028. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:12:02,915][00162] Avg episode reward: [(0, '14.667')]
[2025-02-21 06:12:02,918][04898] Saving new best policy, reward=14.667!
[2025-02-21 06:12:07,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1576960. Throughput: 0: 943.2. Samples: 394448. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:12:07,916][00162] Avg episode reward: [(0, '15.627')]
[2025-02-21 06:12:07,921][04898] Saving new best policy, reward=15.627!
[2025-02-21 06:12:12,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 1593344. Throughput: 0: 929.8. Samples: 396962. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:12:12,915][00162] Avg episode reward: [(0, '15.631')]
[2025-02-21 06:12:12,918][04898] Saving new best policy, reward=15.631!
[2025-02-21 06:12:13,378][04915] Updated weights for policy 0, policy_version 390 (0.0015)
[2025-02-21 06:12:17,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1613824. Throughput: 0: 959.8. Samples: 403310. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:12:17,912][00162] Avg episode reward: [(0, '15.918')]
[2025-02-21 06:12:17,920][04898] Saving new best policy, reward=15.918!
[2025-02-21 06:12:22,911][00162] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1630208. Throughput: 0: 933.5. Samples: 408300. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:12:22,912][00162] Avg episode reward: [(0, '15.418')]
[2025-02-21 06:12:24,528][04915] Updated weights for policy 0, policy_version 400 (0.0015)
[2025-02-21 06:12:27,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1650688. Throughput: 0: 944.8. Samples: 411360. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:12:27,913][00162] Avg episode reward: [(0, '17.030')]
[2025-02-21 06:12:27,921][04898] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000403_1650688.pth...
[2025-02-21 06:12:28,039][04898] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000184_753664.pth
[2025-02-21 06:12:28,057][04898] Saving new best policy, reward=17.030!
[2025-02-21 06:12:32,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1671168. Throughput: 0: 957.4. Samples: 417548. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:12:32,915][00162] Avg episode reward: [(0, '17.513')]
[2025-02-21 06:12:32,919][04898] Saving new best policy, reward=17.513!
[2025-02-21 06:12:35,222][04915] Updated weights for policy 0, policy_version 410 (0.0013)
[2025-02-21 06:12:37,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1687552. Throughput: 0: 995.8. Samples: 422442. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:12:37,912][00162] Avg episode reward: [(0, '17.927')]
[2025-02-21 06:12:37,918][04898] Saving new best policy, reward=17.927!
[2025-02-21 06:12:42,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1708032. Throughput: 0: 954.5. Samples: 425598. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:12:42,912][00162] Avg episode reward: [(0, '18.115')]
[2025-02-21 06:12:42,914][04898] Saving new best policy, reward=18.115!
[2025-02-21 06:12:45,378][04915] Updated weights for policy 0, policy_version 420 (0.0014)
[2025-02-21 06:12:47,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1728512. Throughput: 0: 952.4. Samples: 431888. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:12:47,912][00162] Avg episode reward: [(0, '17.476')]
[2025-02-21 06:12:52,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1744896. Throughput: 0: 945.5. Samples: 436996. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:12:52,915][00162] Avg episode reward: [(0, '17.393')]
[2025-02-21 06:12:56,380][04915] Updated weights for policy 0, policy_version 430 (0.0014)
[2025-02-21 06:12:57,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1765376. Throughput: 0: 960.0. Samples: 440164. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:12:57,914][00162] Avg episode reward: [(0, '18.556')]
[2025-02-21 06:12:57,923][04898] Saving new best policy, reward=18.556!
[2025-02-21 06:13:02,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1781760. Throughput: 0: 943.9. Samples: 445784. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:13:02,914][00162] Avg episode reward: [(0, '19.543')]
[2025-02-21 06:13:02,918][04898] Saving new best policy, reward=19.543!
[2025-02-21 06:13:07,526][04915] Updated weights for policy 0, policy_version 440 (0.0014)
[2025-02-21 06:13:07,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1802240. Throughput: 0: 952.5. Samples: 451162. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:13:07,912][00162] Avg episode reward: [(0, '19.440')]
[2025-02-21 06:13:12,911][00162] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1822720. Throughput: 0: 955.2. Samples: 454342. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:13:12,914][00162] Avg episode reward: [(0, '20.462')]
[2025-02-21 06:13:12,917][04898] Saving new best policy, reward=20.462!
[2025-02-21 06:13:17,913][00162] Fps is (10 sec: 3685.5, 60 sec: 3754.5, 300 sec: 3748.9). Total num frames: 1839104. Throughput: 0: 927.6. Samples: 459290. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:13:17,915][00162] Avg episode reward: [(0, '20.670')]
[2025-02-21 06:13:17,928][04898] Saving new best policy, reward=20.670!
[2025-02-21 06:13:18,740][04915] Updated weights for policy 0, policy_version 450 (0.0013)
[2025-02-21 06:13:22,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1859584. Throughput: 0: 956.3. Samples: 465476. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:13:22,913][00162] Avg episode reward: [(0, '20.471')]
[2025-02-21 06:13:27,915][00162] Fps is (10 sec: 4095.3, 60 sec: 3822.7, 300 sec: 3762.7). Total num frames: 1880064. Throughput: 0: 956.9. Samples: 468662. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:13:27,916][00162] Avg episode reward: [(0, '20.056')]
[2025-02-21 06:13:28,966][04915] Updated weights for policy 0, policy_version 460 (0.0017)
[2025-02-21 06:13:32,911][00162] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1896448. Throughput: 0: 929.6. Samples: 473722. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:13:32,913][00162] Avg episode reward: [(0, '20.842')]
[2025-02-21 06:13:32,917][04898] Saving new best policy, reward=20.842!
[2025-02-21 06:13:37,911][00162] Fps is (10 sec: 3687.9, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1916928. Throughput: 0: 953.4. Samples: 479898. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:13:37,912][00162] Avg episode reward: [(0, '19.477')]
[2025-02-21 06:13:39,532][04915] Updated weights for policy 0, policy_version 470 (0.0016)
[2025-02-21 06:13:42,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1933312. Throughput: 0: 950.6. Samples: 482940. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:13:42,915][00162] Avg episode reward: [(0, '18.512')]
[2025-02-21 06:13:47,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1953792. Throughput: 0: 940.9. Samples: 488126. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:13:47,916][00162] Avg episode reward: [(0, '17.982')]
[2025-02-21 06:13:50,333][04915] Updated weights for policy 0, policy_version 480 (0.0019)
[2025-02-21 06:13:52,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1974272. Throughput: 0: 962.2. Samples: 494460. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:13:52,916][00162] Avg episode reward: [(0, '19.309')]
[2025-02-21 06:13:57,911][00162] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1990656. Throughput: 0: 945.8. Samples: 496904. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:13:57,915][00162] Avg episode reward: [(0, '19.092')]
[2025-02-21 06:14:01,391][04915] Updated weights for policy 0, policy_version 490 (0.0012)
[2025-02-21 06:14:02,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2011136. Throughput: 0: 962.4. Samples: 502596. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:14:02,912][00162] Avg episode reward: [(0, '20.129')]
[2025-02-21 06:14:07,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2031616. Throughput: 0: 961.9. Samples: 508762. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:14:07,914][00162] Avg episode reward: [(0, '20.677')]
[2025-02-21 06:14:12,413][04915] Updated weights for policy 0, policy_version 500 (0.0019)
[2025-02-21 06:14:12,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2048000. Throughput: 0: 933.1. Samples: 510646. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:14:12,912][00162] Avg episode reward: [(0, '20.798')]
[2025-02-21 06:14:17,911][00162] Fps is (10 sec: 3686.5, 60 sec: 3823.1, 300 sec: 3762.8). Total num frames: 2068480. Throughput: 0: 961.1. Samples: 516970. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:14:17,913][00162] Avg episode reward: [(0, '21.034')]
[2025-02-21 06:14:17,919][04898] Saving new best policy, reward=21.034!
[2025-02-21 06:14:22,911][00162] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2088960. Throughput: 0: 893.6. Samples: 520112. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:14:22,913][00162] Avg episode reward: [(0, '21.884')]
[2025-02-21 06:14:22,915][04898] Saving new best policy, reward=21.884!
[2025-02-21 06:14:22,917][04915] Updated weights for policy 0, policy_version 510 (0.0013)
[2025-02-21 06:14:27,911][00162] Fps is (10 sec: 3686.3, 60 sec: 3754.9, 300 sec: 3762.8). Total num frames: 2105344. Throughput: 0: 934.0. Samples: 524972. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:14:27,913][00162] Avg episode reward: [(0, '22.097')]
[2025-02-21 06:14:27,919][04898] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000514_2105344.pth...
[2025-02-21 06:14:28,002][04898] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000293_1200128.pth
[2025-02-21 06:14:28,013][04898] Saving new best policy, reward=22.097!
[2025-02-21 06:14:32,911][00162] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2125824. Throughput: 0: 956.4. Samples: 531166. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:14:32,912][00162] Avg episode reward: [(0, '21.456')]
[2025-02-21 06:14:33,381][04915] Updated weights for policy 0, policy_version 520 (0.0014)
[2025-02-21 06:14:37,911][00162] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2142208. Throughput: 0: 924.9. Samples: 536080. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:14:37,912][00162] Avg episode reward: [(0, '21.114')]
[2025-02-21 06:14:42,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2162688. Throughput: 0: 941.3. Samples: 539262. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-21 06:14:42,916][00162] Avg episode reward: [(0, '21.725')]
[2025-02-21 06:14:44,469][04915] Updated weights for policy 0, policy_version 530 (0.0013)
[2025-02-21 06:14:47,913][00162] Fps is (10 sec: 4095.1, 60 sec: 3822.8, 300 sec: 3776.6). Total num frames: 2183168. Throughput: 0: 956.8. Samples: 545652. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:14:47,917][00162] Avg episode reward: [(0, '21.580')]
[2025-02-21 06:14:52,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2199552. Throughput: 0: 931.3. Samples: 550670. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:14:52,914][00162] Avg episode reward: [(0, '22.220')]
[2025-02-21 06:14:52,917][04898] Saving new best policy, reward=22.220!
[2025-02-21 06:14:55,310][04915] Updated weights for policy 0, policy_version 540 (0.0013)
[2025-02-21 06:14:57,911][00162] Fps is (10 sec: 3687.2, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2220032. Throughput: 0: 958.8. Samples: 553792. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:14:57,915][00162] Avg episode reward: [(0, '22.985')]
[2025-02-21 06:14:57,925][04898] Saving new best policy, reward=22.985!
[2025-02-21 06:15:02,917][00162] Fps is (10 sec: 4093.5, 60 sec: 3822.5, 300 sec: 3790.5). Total num frames: 2240512. Throughput: 0: 953.1. Samples: 559866. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-21 06:15:02,918][00162] Avg episode reward: [(0, '23.632')]
[2025-02-21 06:15:02,922][04898] Saving new best policy, reward=23.632!
[2025-02-21 06:15:06,532][04915] Updated weights for policy 0, policy_version 550 (0.0013)
[2025-02-21 06:15:07,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2256896. Throughput: 0: 994.9. Samples: 564884. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:15:07,912][00162] Avg episode reward: [(0, '23.807')]
[2025-02-21 06:15:07,922][04898] Saving new best policy, reward=23.807!
[2025-02-21 06:15:12,911][00162] Fps is (10 sec: 3688.7, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2277376. Throughput: 0: 955.5. Samples: 567968. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:15:12,916][00162] Avg episode reward: [(0, '21.713')]
[2025-02-21 06:15:17,053][04915] Updated weights for policy 0, policy_version 560 (0.0012)
[2025-02-21 06:15:17,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2293760. Throughput: 0: 941.2. Samples: 573518. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:15:17,915][00162] Avg episode reward: [(0, '21.307')]
[2025-02-21 06:15:22,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2314240. Throughput: 0: 961.2. Samples: 579332. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:15:22,913][00162] Avg episode reward: [(0, '21.186')]
[2025-02-21 06:15:27,264][04915] Updated weights for policy 0, policy_version 570 (0.0014)
[2025-02-21 06:15:27,913][00162] Fps is (10 sec: 4095.1, 60 sec: 3822.8, 300 sec: 3790.5). Total num frames: 2334720. Throughput: 0: 961.3. Samples: 582524. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-21 06:15:27,917][00162] Avg episode reward: [(0, '19.721')]
[2025-02-21 06:15:32,911][00162] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2351104. Throughput: 0: 930.2. Samples: 587510. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:15:32,912][00162] Avg episode reward: [(0, '18.982')]
[2025-02-21 06:15:37,911][00162] Fps is (10 sec: 3687.2, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2371584. Throughput: 0: 958.8. Samples: 593818. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:15:37,914][00162] Avg episode reward: [(0, '19.531')]
[2025-02-21 06:15:38,247][04915] Updated weights for policy 0, policy_version 580 (0.0019)
[2025-02-21 06:15:42,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2392064. Throughput: 0: 960.2. Samples: 597002. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:15:42,912][00162] Avg episode reward: [(0, '19.161')]
[2025-02-21 06:15:47,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3790.6). Total num frames: 2408448. Throughput: 0: 935.0. Samples: 601934. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:15:47,914][00162] Avg episode reward: [(0, '18.760')]
[2025-02-21 06:15:49,319][04915] Updated weights for policy 0, policy_version 590 (0.0013)
[2025-02-21 06:15:52,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.6). Total num frames: 2428928. Throughput: 0: 964.1. Samples: 608268. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:15:52,912][00162] Avg episode reward: [(0, '19.575')]
[2025-02-21 06:15:57,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2445312. Throughput: 0: 958.4. Samples: 611096. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:15:57,914][00162] Avg episode reward: [(0, '20.418')]
[2025-02-21 06:16:00,134][04915] Updated weights for policy 0, policy_version 600 (0.0012)
[2025-02-21 06:16:02,911][00162] Fps is (10 sec: 3686.3, 60 sec: 3755.0, 300 sec: 3790.5). Total num frames: 2465792. Throughput: 0: 954.4. Samples: 616468. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:16:02,915][00162] Avg episode reward: [(0, '20.718')]
[2025-02-21 06:16:07,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.6). Total num frames: 2486272. Throughput: 0: 964.2. Samples: 622722. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:16:07,914][00162] Avg episode reward: [(0, '22.543')]
[2025-02-21 06:16:10,847][04915] Updated weights for policy 0, policy_version 610 (0.0015)
[2025-02-21 06:16:12,911][00162] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2502656. Throughput: 0: 942.5. Samples: 624936. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:16:12,912][00162] Avg episode reward: [(0, '22.806')]
[2025-02-21 06:16:17,911][00162] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2523136. Throughput: 0: 962.9. Samples: 630840. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:16:17,917][00162] Avg episode reward: [(0, '24.000')]
[2025-02-21 06:16:17,924][04898] Saving new best policy, reward=24.000!
[2025-02-21 06:16:21,007][04915] Updated weights for policy 0, policy_version 620 (0.0012)
[2025-02-21 06:16:22,911][00162] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2543616. Throughput: 0: 955.9. Samples: 636834. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:16:22,918][00162] Avg episode reward: [(0, '23.302')]
[2025-02-21 06:16:27,911][00162] Fps is (10 sec: 3686.5, 60 sec: 3754.8, 300 sec: 3790.5). Total num frames: 2560000. Throughput: 0: 932.8. Samples: 638976. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:16:27,915][00162] Avg episode reward: [(0, '21.795')]
[2025-02-21 06:16:27,924][04898] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000625_2560000.pth...
[2025-02-21 06:16:28,049][04898] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000403_1650688.pth
[2025-02-21 06:16:31,988][04915] Updated weights for policy 0, policy_version 630 (0.0016)
[2025-02-21 06:16:32,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2580480. Throughput: 0: 963.1. Samples: 645274. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:16:32,915][00162] Avg episode reward: [(0, '20.717')]
[2025-02-21 06:16:37,911][00162] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2600960. Throughput: 0: 940.6. Samples: 650594. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:16:37,913][00162] Avg episode reward: [(0, '20.081')]
[2025-02-21 06:16:42,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2617344. Throughput: 0: 938.3. Samples: 653318. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:16:42,917][00162] Avg episode reward: [(0, '19.794')]
[2025-02-21 06:16:43,131][04915] Updated weights for policy 0, policy_version 640 (0.0012)
[2025-02-21 06:16:47,911][00162] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2637824. Throughput: 0: 959.2. Samples: 659632. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:16:47,914][00162] Avg episode reward: [(0, '19.928')]
[2025-02-21 06:16:52,912][00162] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3790.5). Total num frames: 2654208. Throughput: 0: 931.4. Samples: 664636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:16:52,916][00162] Avg episode reward: [(0, '20.329')]
[2025-02-21 06:16:54,037][04915] Updated weights for policy 0, policy_version 650 (0.0012)
[2025-02-21 06:16:57,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2674688. Throughput: 0: 952.9. Samples: 667818. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:16:57,912][00162] Avg episode reward: [(0, '20.301')]
[2025-02-21 06:17:02,911][00162] Fps is (10 sec: 4506.1, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2699264. Throughput: 0: 962.5. Samples: 674154. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:17:02,914][00162] Avg episode reward: [(0, '20.420')]
[2025-02-21 06:17:04,350][04915] Updated weights for policy 0, policy_version 660 (0.0019)
[2025-02-21 06:17:07,911][00162] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2711552. Throughput: 0: 938.5. Samples: 679066. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:17:07,913][00162] Avg episode reward: [(0, '22.750')]
[2025-02-21 06:17:12,911][00162] Fps is (10 sec: 3686.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2736128. Throughput: 0: 960.8. Samples: 682212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:17:12,915][00162] Avg episode reward: [(0, '21.662')]
[2025-02-21 06:17:14,870][04915] Updated weights for policy 0, policy_version 670 (0.0019)
[2025-02-21 06:17:17,911][00162] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2752512. Throughput: 0: 953.0. Samples: 688158. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:17:17,912][00162] Avg episode reward: [(0, '22.231')]
[2025-02-21 06:17:22,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2772992. Throughput: 0: 952.8. Samples: 693468. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:17:22,912][00162] Avg episode reward: [(0, '22.076')]
[2025-02-21 06:17:25,820][04915] Updated weights for policy 0, policy_version 680 (0.0014)
[2025-02-21 06:17:27,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2793472. Throughput: 0: 962.3. Samples: 696622. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:17:27,912][00162] Avg episode reward: [(0, '23.550')]
[2025-02-21 06:17:32,911][00162] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2805760. Throughput: 0: 938.0. Samples: 701840. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:17:32,916][00162] Avg episode reward: [(0, '23.997')]
[2025-02-21 06:17:37,012][04915] Updated weights for policy 0, policy_version 690 (0.0014)
[2025-02-21 06:17:37,911][00162] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2826240. Throughput: 0: 957.7. Samples: 707732. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:17:37,913][00162] Avg episode reward: [(0, '23.787')]
[2025-02-21 06:17:42,911][00162] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2846720. Throughput: 0: 957.0. Samples: 710884. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-21 06:17:42,914][00162] Avg episode reward: [(0, '24.180')]
[2025-02-21 06:17:42,917][04898] Saving new best policy, reward=24.180!
[2025-02-21 06:17:47,911][00162] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2863104. Throughput: 0: 925.8. Samples: 715816. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:17:47,915][00162] Avg episode reward: [(0, '23.677')]
[2025-02-21 06:17:48,159][04915] Updated weights for policy 0, policy_version 700 (0.0013)
[2025-02-21 06:17:52,915][00162] Fps is (10 sec: 4094.4, 60 sec: 3891.0, 300 sec: 3804.4). Total num frames: 2887680. Throughput: 0: 957.3. Samples: 722148. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:17:52,916][00162] Avg episode reward: [(0, '24.172')]
[2025-02-21 06:17:57,911][00162] Fps is (10 sec: 4095.8, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2904064. Throughput: 0: 958.0. Samples: 725322. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:17:57,915][00162] Avg episode reward: [(0, '23.502')]
[2025-02-21 06:17:58,819][04915] Updated weights for policy 0, policy_version 710 (0.0012)
[2025-02-21 06:18:02,911][00162] Fps is (10 sec: 3278.1, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 2920448. Throughput: 0: 937.3. Samples: 730338. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:18:02,912][00162] Avg episode reward: [(0, '23.868')]
[2025-02-21 06:18:07,911][00162] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2945024. Throughput: 0: 959.2. Samples: 736630. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:18:07,915][00162] Avg episode reward: [(0, '22.898')]
[2025-02-21 06:18:08,819][04915] Updated weights for policy 0, policy_version 720 (0.0015)
[2025-02-21 06:18:12,912][00162] Fps is (10 sec: 3685.8, 60 sec: 3686.3, 300 sec: 3790.5). Total num frames: 2957312. Throughput: 0: 949.1. Samples: 739332. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:18:12,914][00162] Avg episode reward: [(0, '22.769')]
[2025-02-21 06:18:17,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2981888. Throughput: 0: 953.7. Samples: 744756. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:18:17,916][00162] Avg episode reward: [(0, '23.682')]
[2025-02-21 06:18:19,733][04915] Updated weights for policy 0, policy_version 730 (0.0020)
[2025-02-21 06:18:22,911][00162] Fps is (10 sec: 4506.3, 60 sec: 3822.9, 300 sec: 3804.5). Total num frames: 3002368. Throughput: 0: 964.5. Samples: 751136. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:18:22,916][00162] Avg episode reward: [(0, '23.075')]
[2025-02-21 06:18:27,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3018752. Throughput: 0: 939.2. Samples: 753150. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:18:27,912][00162] Avg episode reward: [(0, '21.576')]
[2025-02-21 06:18:27,925][04898] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000737_3018752.pth...
[2025-02-21 06:18:28,033][04898] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000514_2105344.pth
[2025-02-21 06:18:30,820][04915] Updated weights for policy 0, policy_version 740 (0.0021)
[2025-02-21 06:18:32,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3039232. Throughput: 0: 963.8. Samples: 759188. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:18:32,917][00162] Avg episode reward: [(0, '21.646')]
[2025-02-21 06:18:37,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3055616. Throughput: 0: 950.0. Samples: 764894. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:18:37,916][00162] Avg episode reward: [(0, '21.359')]
[2025-02-21 06:18:41,869][04915] Updated weights for policy 0, policy_version 750 (0.0016)
[2025-02-21 06:18:42,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3076096. Throughput: 0: 931.4. Samples: 767236. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:18:42,916][00162] Avg episode reward: [(0, '20.278')]
[2025-02-21 06:18:47,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3096576. Throughput: 0: 960.0. Samples: 773538. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:18:47,912][00162] Avg episode reward: [(0, '20.317')]
[2025-02-21 06:18:52,589][04915] Updated weights for policy 0, policy_version 760 (0.0016)
[2025-02-21 06:18:52,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.9, 300 sec: 3804.4). Total num frames: 3112960. Throughput: 0: 934.5. Samples: 778684. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:18:52,914][00162] Avg episode reward: [(0, '20.457')]
[2025-02-21 06:18:57,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3804.4). Total num frames: 3133440. Throughput: 0: 942.4. Samples: 781738. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:18:57,915][00162] Avg episode reward: [(0, '20.499')]
[2025-02-21 06:19:02,441][04915] Updated weights for policy 0, policy_version 770 (0.0016)
[2025-02-21 06:19:02,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3153920. Throughput: 0: 964.0. Samples: 788138. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:19:02,912][00162] Avg episode reward: [(0, '20.177')]
[2025-02-21 06:19:07,913][00162] Fps is (10 sec: 3685.8, 60 sec: 3754.6, 300 sec: 3804.4). Total num frames: 3170304. Throughput: 0: 932.5. Samples: 793102. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:19:07,914][00162] Avg episode reward: [(0, '20.601')]
[2025-02-21 06:19:12,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3804.4). Total num frames: 3190784. Throughput: 0: 959.0. Samples: 796304. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:19:12,912][00162] Avg episode reward: [(0, '21.150')]
[2025-02-21 06:19:13,466][04915] Updated weights for policy 0, policy_version 780 (0.0014)
[2025-02-21 06:19:17,911][00162] Fps is (10 sec: 4096.6, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3211264. Throughput: 0: 962.3. Samples: 802490. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:19:17,912][00162] Avg episode reward: [(0, '20.723')]
[2025-02-21 06:19:22,911][00162] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3227648. Throughput: 0: 950.9. Samples: 807686. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:19:22,917][00162] Avg episode reward: [(0, '20.604')]
[2025-02-21 06:19:24,334][04915] Updated weights for policy 0, policy_version 790 (0.0012)
[2025-02-21 06:19:27,911][00162] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3248128. Throughput: 0: 970.6. Samples: 810914. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:19:27,915][00162] Avg episode reward: [(0, '20.930')]
[2025-02-21 06:19:32,916][00162] Fps is (10 sec: 3684.6, 60 sec: 3754.3, 300 sec: 3804.4). Total num frames: 3264512. Throughput: 0: 954.6. Samples: 816502. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:19:32,918][00162] Avg episode reward: [(0, '20.612')]
[2025-02-21 06:19:35,181][04915] Updated weights for policy 0, policy_version 800 (0.0019)
[2025-02-21 06:19:37,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3284992. Throughput: 0: 969.8. Samples: 822326. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:19:37,912][00162] Avg episode reward: [(0, '20.388')]
[2025-02-21 06:19:42,911][00162] Fps is (10 sec: 4098.1, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3305472. Throughput: 0: 971.5. Samples: 825456. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:19:42,912][00162] Avg episode reward: [(0, '21.440')]
[2025-02-21 06:19:45,875][04915] Updated weights for policy 0, policy_version 810 (0.0015)
[2025-02-21 06:19:47,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3321856. Throughput: 0: 940.9. Samples: 830480. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:19:47,916][00162] Avg episode reward: [(0, '22.168')]
[2025-02-21 06:19:52,915][00162] Fps is (10 sec: 3684.9, 60 sec: 3822.7, 300 sec: 3804.4). Total num frames: 3342336. Throughput: 0: 971.4. Samples: 836818. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:19:52,917][00162] Avg episode reward: [(0, '21.821')]
[2025-02-21 06:19:55,829][04915] Updated weights for policy 0, policy_version 820 (0.0012)
[2025-02-21 06:19:57,911][00162] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3804.5). Total num frames: 3362816. Throughput: 0: 970.8. Samples: 839992. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:19:57,912][00162] Avg episode reward: [(0, '22.122')]
[2025-02-21 06:20:02,911][00162] Fps is (10 sec: 3687.9, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3379200. Throughput: 0: 943.6. Samples: 844952. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:20:02,916][00162] Avg episode reward: [(0, '22.621')]
[2025-02-21 06:20:06,852][04915] Updated weights for policy 0, policy_version 830 (0.0012)
[2025-02-21 06:20:07,913][00162] Fps is (10 sec: 4095.1, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3403776. Throughput: 0: 968.4. Samples: 851268. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:20:07,918][00162] Avg episode reward: [(0, '23.861')]
[2025-02-21 06:20:12,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3420160. Throughput: 0: 960.8. Samples: 854150. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:20:12,917][00162] Avg episode reward: [(0, '24.868')]
[2025-02-21 06:20:12,922][04898] Saving new best policy, reward=24.868!
[2025-02-21 06:20:17,911][00162] Fps is (10 sec: 3277.5, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3436544. Throughput: 0: 949.6. Samples: 859230. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:20:17,913][00162] Avg episode reward: [(0, '24.162')]
[2025-02-21 06:20:18,101][04915] Updated weights for policy 0, policy_version 840 (0.0014)
[2025-02-21 06:20:22,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3457024. Throughput: 0: 890.0. Samples: 862374. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:20:22,916][00162] Avg episode reward: [(0, '24.239')]
[2025-02-21 06:20:27,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3473408. Throughput: 0: 946.7. Samples: 868058. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:20:27,912][00162] Avg episode reward: [(0, '24.274')]
[2025-02-21 06:20:27,974][04898] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000849_3477504.pth...
[2025-02-21 06:20:28,076][04898] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000625_2560000.pth
[2025-02-21 06:20:29,023][04915] Updated weights for policy 0, policy_version 850 (0.0012)
[2025-02-21 06:20:32,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3804.4). Total num frames: 3493888. Throughput: 0: 959.2. Samples: 873642. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:20:32,912][00162] Avg episode reward: [(0, '23.772')]
[2025-02-21 06:20:37,916][00162] Fps is (10 sec: 4503.4, 60 sec: 3890.9, 300 sec: 3818.2). Total num frames: 3518464. Throughput: 0: 957.3. Samples: 879896. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:20:37,922][00162] Avg episode reward: [(0, '23.427')]
[2025-02-21 06:20:39,374][04915] Updated weights for policy 0, policy_version 860 (0.0013)
[2025-02-21 06:20:42,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3530752. Throughput: 0: 931.8. Samples: 881924. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:20:42,912][00162] Avg episode reward: [(0, '22.570')]
[2025-02-21 06:20:47,911][00162] Fps is (10 sec: 3688.2, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3555328. Throughput: 0: 960.3. Samples: 888166. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:20:47,915][00162] Avg episode reward: [(0, '23.385')]
[2025-02-21 06:20:49,633][04915] Updated weights for policy 0, policy_version 870 (0.0016)
[2025-02-21 06:20:52,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3823.2, 300 sec: 3818.3). Total num frames: 3571712. Throughput: 0: 947.3. Samples: 893894. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:20:52,913][00162] Avg episode reward: [(0, '24.057')]
[2025-02-21 06:20:57,911][00162] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3592192. Throughput: 0: 938.2. Samples: 896370. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:20:57,916][00162] Avg episode reward: [(0, '24.739')]
[2025-02-21 06:21:00,587][04915] Updated weights for policy 0, policy_version 880 (0.0012)
[2025-02-21 06:21:02,911][00162] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3612672. Throughput: 0: 965.8. Samples: 902692. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:21:02,914][00162] Avg episode reward: [(0, '23.536')]
[2025-02-21 06:21:07,913][00162] Fps is (10 sec: 3685.9, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 3629056. Throughput: 0: 1007.2. Samples: 907700. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:21:07,914][00162] Avg episode reward: [(0, '25.098')]
[2025-02-21 06:21:07,923][04898] Saving new best policy, reward=25.098!
[2025-02-21 06:21:11,790][04915] Updated weights for policy 0, policy_version 890 (0.0015)
[2025-02-21 06:21:12,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3649536. Throughput: 0: 947.1. Samples: 910676. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:21:12,916][00162] Avg episode reward: [(0, '25.342')]
[2025-02-21 06:21:12,920][04898] Saving new best policy, reward=25.342!
[2025-02-21 06:21:17,911][00162] Fps is (10 sec: 4096.7, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3670016. Throughput: 0: 962.2. Samples: 916940. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:21:17,912][00162] Avg episode reward: [(0, '23.333')]
[2025-02-21 06:21:22,780][04915] Updated weights for policy 0, policy_version 900 (0.0017)
[2025-02-21 06:21:22,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3686400. Throughput: 0: 933.9. Samples: 921916. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:21:22,912][00162] Avg episode reward: [(0, '23.502')]
[2025-02-21 06:21:27,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3706880. Throughput: 0: 960.1. Samples: 925130. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:21:27,916][00162] Avg episode reward: [(0, '22.954')]
[2025-02-21 06:21:32,895][04915] Updated weights for policy 0, policy_version 910 (0.0012)
[2025-02-21 06:21:32,912][00162] Fps is (10 sec: 4095.7, 60 sec: 3891.1, 300 sec: 3818.3). Total num frames: 3727360. Throughput: 0: 960.1. Samples: 931372. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:21:32,913][00162] Avg episode reward: [(0, '22.520')]
[2025-02-21 06:21:37,913][00162] Fps is (10 sec: 3685.6, 60 sec: 3754.8, 300 sec: 3818.3). Total num frames: 3743744. Throughput: 0: 947.2. Samples: 936522. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:21:37,915][00162] Avg episode reward: [(0, '23.402')]
[2025-02-21 06:21:42,911][00162] Fps is (10 sec: 3686.7, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3764224. Throughput: 0: 960.7. Samples: 939602. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:21:42,912][00162] Avg episode reward: [(0, '24.440')]
[2025-02-21 06:21:43,495][04915] Updated weights for policy 0, policy_version 920 (0.0012)
[2025-02-21 06:21:47,914][00162] Fps is (10 sec: 3686.2, 60 sec: 3754.5, 300 sec: 3818.3). Total num frames: 3780608. Throughput: 0: 946.7. Samples: 945296. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:21:47,915][00162] Avg episode reward: [(0, '23.875')]
[2025-02-21 06:21:52,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3801088. Throughput: 0: 963.1. Samples: 951038. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:21:52,912][00162] Avg episode reward: [(0, '24.213')]
[2025-02-21 06:21:54,338][04915] Updated weights for policy 0, policy_version 930 (0.0015)
[2025-02-21 06:21:57,911][00162] Fps is (10 sec: 4097.1, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3821568. Throughput: 0: 968.2. Samples: 954244. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:21:57,916][00162] Avg episode reward: [(0, '24.555')]
[2025-02-21 06:22:02,911][00162] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 3837952. Throughput: 0: 941.5. Samples: 959306. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:22:02,916][00162] Avg episode reward: [(0, '25.856')]
[2025-02-21 06:22:02,924][04898] Saving new best policy, reward=25.856!
[2025-02-21 06:22:05,412][04915] Updated weights for policy 0, policy_version 940 (0.0013)
[2025-02-21 06:22:07,911][00162] Fps is (10 sec: 3686.3, 60 sec: 3823.0, 300 sec: 3804.4). Total num frames: 3858432. Throughput: 0: 969.7. Samples: 965552. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:22:07,915][00162] Avg episode reward: [(0, '25.457')]
[2025-02-21 06:22:12,911][00162] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3878912. Throughput: 0: 965.7. Samples: 968588. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:22:12,914][00162] Avg episode reward: [(0, '25.574')]
[2025-02-21 06:22:16,445][04915] Updated weights for policy 0, policy_version 950 (0.0012)
[2025-02-21 06:22:17,911][00162] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3895296. Throughput: 0: 938.4. Samples: 973598. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:22:17,916][00162] Avg episode reward: [(0, '24.517')]
[2025-02-21 06:22:22,911][00162] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3915776. Throughput: 0: 965.4. Samples: 979964. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:22:22,912][00162] Avg episode reward: [(0, '24.497')]
[2025-02-21 06:22:26,684][04915] Updated weights for policy 0, policy_version 960 (0.0023)
[2025-02-21 06:22:27,915][00162] Fps is (10 sec: 3684.8, 60 sec: 3754.4, 300 sec: 3818.3). Total num frames: 3932160. Throughput: 0: 961.5. Samples: 982872. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:22:27,917][00162] Avg episode reward: [(0, '22.593')]
[2025-02-21 06:22:27,928][04898] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000960_3932160.pth...
[2025-02-21 06:22:28,047][04898] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000737_3018752.pth
[2025-02-21 06:22:32,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 3952640. Throughput: 0: 950.0. Samples: 988044. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-21 06:22:32,913][00162] Avg episode reward: [(0, '21.825')]
[2025-02-21 06:22:37,189][04915] Updated weights for policy 0, policy_version 970 (0.0013)
[2025-02-21 06:22:37,911][00162] Fps is (10 sec: 4097.7, 60 sec: 3823.1, 300 sec: 3818.3). Total num frames: 3973120. Throughput: 0: 964.3. Samples: 994430. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:22:37,915][00162] Avg episode reward: [(0, '22.928')]
[2025-02-21 06:22:42,911][00162] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 3989504. Throughput: 0: 941.9. Samples: 996630. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-21 06:22:42,916][00162] Avg episode reward: [(0, '22.727')]
[2025-02-21 06:22:46,216][04898] Stopping Batcher_0...
[2025-02-21 06:22:46,218][04898] Loop batcher_evt_loop terminating...
[2025-02-21 06:22:46,219][00162] Component Batcher_0 stopped!
[2025-02-21 06:22:46,224][00162] Component RolloutWorker_w2 process died already! Don't wait for it.
[2025-02-21 06:22:46,225][00162] Component RolloutWorker_w4 process died already! Don't wait for it.
[2025-02-21 06:22:46,228][00162] Component RolloutWorker_w5 process died already! Don't wait for it.
[2025-02-21 06:22:46,229][00162] Component RolloutWorker_w6 process died already! Don't wait for it.
[2025-02-21 06:22:46,233][04898] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-21 06:22:46,250][04915] Weights refcount: 2 0
[2025-02-21 06:22:46,252][04915] Stopping InferenceWorker_p0-w0...
[2025-02-21 06:22:46,253][04915] Loop inference_proc0-0_evt_loop terminating...
[2025-02-21 06:22:46,393][04898] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000849_3477504.pth
[2025-02-21 06:22:46,412][04898] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-21 06:22:46,253][00162] Component InferenceWorker_p0-w0 stopped!
[2025-02-21 06:22:46,574][04917] Stopping RolloutWorker_w1...
[2025-02-21 06:22:46,578][00162] Component RolloutWorker_w1 stopped!
[2025-02-21 06:22:46,577][04917] Loop rollout_proc1_evt_loop terminating...
[2025-02-21 06:22:46,603][00162] Component RolloutWorker_w7 stopped!
[2025-02-21 06:22:46,599][04921] Stopping RolloutWorker_w7...
[2025-02-21 06:22:46,608][04898] Stopping LearnerWorker_p0...
[2025-02-21 06:22:46,609][04898] Loop learner_proc0_evt_loop terminating...
[2025-02-21 06:22:46,608][00162] Component LearnerWorker_p0 stopped!
[2025-02-21 06:22:46,607][04921] Loop rollout_proc7_evt_loop terminating...
[2025-02-21 06:22:46,631][00162] Component RolloutWorker_w3 stopped!
[2025-02-21 06:22:46,631][04919] Stopping RolloutWorker_w3...
[2025-02-21 06:22:46,636][04919] Loop rollout_proc3_evt_loop terminating...
[2025-02-21 06:22:46,656][00162] Component RolloutWorker_w0 stopped!
[2025-02-21 06:22:46,657][04916] Stopping RolloutWorker_w0...
[2025-02-21 06:22:46,658][04916] Loop rollout_proc0_evt_loop terminating...
[2025-02-21 06:22:46,657][00162] Waiting for process learner_proc0 to stop...
[2025-02-21 06:22:47,955][00162] Waiting for process inference_proc0-0 to join...
[2025-02-21 06:22:47,956][00162] Waiting for process rollout_proc0 to join...
[2025-02-21 06:22:48,460][00162] Waiting for process rollout_proc1 to join...
[2025-02-21 06:22:49,113][00162] Waiting for process rollout_proc2 to join...
[2025-02-21 06:22:49,114][00162] Waiting for process rollout_proc3 to join...
[2025-02-21 06:22:49,115][00162] Waiting for process rollout_proc4 to join...
[2025-02-21 06:22:49,116][00162] Waiting for process rollout_proc5 to join...
[2025-02-21 06:22:49,117][00162] Waiting for process rollout_proc6 to join...
[2025-02-21 06:22:49,117][00162] Waiting for process rollout_proc7 to join...
[2025-02-21 06:22:49,118][00162] Batcher 0 profile tree view:
batching: 22.1837, releasing_batches: 0.0295
[2025-02-21 06:22:49,119][00162] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0025
wait_policy_total: 399.2861
update_model: 9.8035
weight_update: 0.0015
one_step: 0.0024
handle_policy_step: 622.9071
deserialize: 14.8725, stack: 3.8476, obs_to_device_normalize: 141.7672, forward: 329.6422, send_messages: 20.2580
prepare_outputs: 85.6114
to_cpu: 54.1046
[2025-02-21 06:22:49,120][00162] Learner 0 profile tree view:
misc: 0.0047, prepare_batch: 12.2618
train: 65.6558
epoch_init: 0.0048, minibatch_init: 0.0059, losses_postprocess: 0.5576, kl_divergence: 0.5127, after_optimizer: 31.8119
calculate_losses: 22.1192
losses_init: 0.0032, forward_head: 1.1898, bptt_initial: 15.0649, tail: 0.9236, advantages_returns: 0.2267, losses: 2.7859
bptt: 1.7130
bptt_forward_core: 1.6376
update: 10.1753
clip: 0.8003
[2025-02-21 06:22:49,121][00162] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.4560, enqueue_policy_requests: 343.6811, env_step: 590.8806, overhead: 22.1007, complete_rollouts: 3.7423
save_policy_outputs: 27.3974
split_output_tensors: 10.4551
[2025-02-21 06:22:49,123][00162] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.3784, enqueue_policy_requests: 104.8702, env_step: 853.3712, overhead: 15.8739, complete_rollouts: 8.7588
save_policy_outputs: 24.3665
split_output_tensors: 9.5815
[2025-02-21 06:22:49,124][00162] Loop Runner_EvtLoop terminating...
[2025-02-21 06:22:49,126][00162] Runner profile tree view:
main_loop: 1093.9531
[2025-02-21 06:22:49,128][00162] Collected {0: 4005888}, FPS: 3661.8
[2025-02-21 06:23:02,480][00162] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-21 06:23:02,481][00162] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-21 06:23:02,483][00162] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-21 06:23:02,484][00162] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-21 06:23:02,484][00162] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-21 06:23:02,486][00162] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-21 06:23:02,486][00162] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-02-21 06:23:02,487][00162] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-21 06:23:02,488][00162] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-02-21 06:23:02,489][00162] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-02-21 06:23:02,490][00162] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-21 06:23:02,491][00162] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-21 06:23:02,492][00162] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-21 06:23:02,493][00162] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-21 06:23:02,494][00162] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-21 06:23:02,523][00162] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-21 06:23:02,526][00162] RunningMeanStd input shape: (3, 72, 128)
[2025-02-21 06:23:02,528][00162] RunningMeanStd input shape: (1,)
[2025-02-21 06:23:02,541][00162] ConvEncoder: input_channels=3
[2025-02-21 06:23:02,634][00162] Conv encoder output size: 512
[2025-02-21 06:23:02,635][00162] Policy head output size: 512
[2025-02-21 06:23:02,898][00162] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-21 06:23:03,792][00162] Num frames 100...
[2025-02-21 06:23:03,961][00162] Num frames 200...
[2025-02-21 06:23:04,129][00162] Num frames 300...
[2025-02-21 06:23:04,295][00162] Num frames 400...
[2025-02-21 06:23:04,429][00162] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
[2025-02-21 06:23:04,432][00162] Avg episode reward: 5.480, avg true_objective: 4.480
[2025-02-21 06:23:04,520][00162] Num frames 500...
[2025-02-21 06:23:04,693][00162] Num frames 600...
[2025-02-21 06:23:04,875][00162] Num frames 700...
[2025-02-21 06:23:05,061][00162] Num frames 800...
[2025-02-21 06:23:05,245][00162] Num frames 900...
[2025-02-21 06:23:05,403][00162] Num frames 1000...
[2025-02-21 06:23:05,533][00162] Num frames 1100...
[2025-02-21 06:23:05,659][00162] Num frames 1200...
[2025-02-21 06:23:05,788][00162] Num frames 1300...
[2025-02-21 06:23:05,922][00162] Num frames 1400...
[2025-02-21 06:23:06,051][00162] Num frames 1500...
[2025-02-21 06:23:06,179][00162] Num frames 1600...
[2025-02-21 06:23:06,304][00162] Num frames 1700...
[2025-02-21 06:23:06,394][00162] Avg episode rewards: #0: 18.140, true rewards: #0: 8.640
[2025-02-21 06:23:06,395][00162] Avg episode reward: 18.140, avg true_objective: 8.640
[2025-02-21 06:23:06,485][00162] Num frames 1800...
[2025-02-21 06:23:06,613][00162] Num frames 1900...
[2025-02-21 06:23:06,745][00162] Num frames 2000...
[2025-02-21 06:23:06,870][00162] Num frames 2100...
[2025-02-21 06:23:07,006][00162] Num frames 2200...
[2025-02-21 06:23:07,136][00162] Num frames 2300...
[2025-02-21 06:23:07,265][00162] Num frames 2400...
[2025-02-21 06:23:07,395][00162] Num frames 2500...
[2025-02-21 06:23:07,525][00162] Num frames 2600...
[2025-02-21 06:23:07,650][00162] Avg episode rewards: #0: 18.520, true rewards: #0: 8.853
[2025-02-21 06:23:07,651][00162] Avg episode reward: 18.520, avg true_objective: 8.853
[2025-02-21 06:23:07,710][00162] Num frames 2700...
[2025-02-21 06:23:07,839][00162] Num frames 2800...
[2025-02-21 06:23:07,972][00162] Num frames 2900...
[2025-02-21 06:23:08,102][00162] Num frames 3000...
[2025-02-21 06:23:08,233][00162] Num frames 3100...
[2025-02-21 06:23:08,302][00162] Avg episode rewards: #0: 16.273, true rewards: #0: 7.772
[2025-02-21 06:23:08,303][00162] Avg episode reward: 16.273, avg true_objective: 7.772
[2025-02-21 06:23:08,418][00162] Num frames 3200...
[2025-02-21 06:23:08,545][00162] Num frames 3300...
[2025-02-21 06:23:08,671][00162] Num frames 3400...
[2025-02-21 06:23:08,797][00162] Num frames 3500...
[2025-02-21 06:23:08,921][00162] Num frames 3600...
[2025-02-21 06:23:09,060][00162] Num frames 3700...
[2025-02-21 06:23:09,179][00162] Avg episode rewards: #0: 15.698, true rewards: #0: 7.498
[2025-02-21 06:23:09,180][00162] Avg episode reward: 15.698, avg true_objective: 7.498
[2025-02-21 06:23:09,245][00162] Num frames 3800...
[2025-02-21 06:23:09,371][00162] Num frames 3900...
[2025-02-21 06:23:09,506][00162] Num frames 4000...
[2025-02-21 06:23:09,651][00162] Num frames 4100...
[2025-02-21 06:23:09,780][00162] Num frames 4200...
[2025-02-21 06:23:09,904][00162] Num frames 4300...
[2025-02-21 06:23:10,040][00162] Num frames 4400...
[2025-02-21 06:23:10,172][00162] Num frames 4500...
[2025-02-21 06:23:10,301][00162] Num frames 4600...
[2025-02-21 06:23:10,428][00162] Num frames 4700...
[2025-02-21 06:23:10,557][00162] Num frames 4800...
[2025-02-21 06:23:10,687][00162] Num frames 4900...
[2025-02-21 06:23:10,824][00162] Avg episode rewards: #0: 17.775, true rewards: #0: 8.275
[2025-02-21 06:23:10,825][00162] Avg episode reward: 17.775, avg true_objective: 8.275
[2025-02-21 06:23:10,869][00162] Num frames 5000...
[2025-02-21 06:23:11,003][00162] Num frames 5100...
[2025-02-21 06:23:11,143][00162] Num frames 5200...
[2025-02-21 06:23:11,272][00162] Num frames 5300...
[2025-02-21 06:23:11,399][00162] Num frames 5400...
[2025-02-21 06:23:11,528][00162] Num frames 5500...
[2025-02-21 06:23:11,656][00162] Num frames 5600...
[2025-02-21 06:23:11,815][00162] Avg episode rewards: #0: 17.259, true rewards: #0: 8.116
[2025-02-21 06:23:11,816][00162] Avg episode reward: 17.259, avg true_objective: 8.116
[2025-02-21 06:23:11,841][00162] Num frames 5700...
[2025-02-21 06:23:11,967][00162] Num frames 5800...
[2025-02-21 06:23:12,108][00162] Num frames 5900...
[2025-02-21 06:23:12,235][00162] Num frames 6000...
[2025-02-21 06:23:12,362][00162] Num frames 6100...
[2025-02-21 06:23:12,488][00162] Num frames 6200...
[2025-02-21 06:23:12,613][00162] Num frames 6300...
[2025-02-21 06:23:12,737][00162] Num frames 6400...
[2025-02-21 06:23:12,863][00162] Num frames 6500...
[2025-02-21 06:23:12,991][00162] Num frames 6600...
[2025-02-21 06:23:13,134][00162] Num frames 6700...
[2025-02-21 06:23:13,268][00162] Num frames 6800...
[2025-02-21 06:23:13,398][00162] Num frames 6900...
[2025-02-21 06:23:13,528][00162] Num frames 7000...
[2025-02-21 06:23:13,657][00162] Num frames 7100...
[2025-02-21 06:23:13,785][00162] Num frames 7200...
[2025-02-21 06:23:13,911][00162] Num frames 7300...
[2025-02-21 06:23:14,048][00162] Num frames 7400...
[2025-02-21 06:23:14,190][00162] Num frames 7500...
[2025-02-21 06:23:14,319][00162] Num frames 7600...
[2025-02-21 06:23:14,444][00162] Num frames 7700...
[2025-02-21 06:23:14,604][00162] Avg episode rewards: #0: 22.226, true rewards: #0: 9.726
[2025-02-21 06:23:14,605][00162] Avg episode reward: 22.226, avg true_objective: 9.726
[2025-02-21 06:23:14,630][00162] Num frames 7800...
[2025-02-21 06:23:14,756][00162] Num frames 7900...
[2025-02-21 06:23:14,880][00162] Num frames 8000...
[2025-02-21 06:23:15,008][00162] Num frames 8100...
[2025-02-21 06:23:15,148][00162] Num frames 8200...
[2025-02-21 06:23:15,277][00162] Num frames 8300...
[2025-02-21 06:23:15,430][00162] Num frames 8400...
[2025-02-21 06:23:15,605][00162] Num frames 8500...
[2025-02-21 06:23:15,749][00162] Avg episode rewards: #0: 21.721, true rewards: #0: 9.499
[2025-02-21 06:23:15,750][00162] Avg episode reward: 21.721, avg true_objective: 9.499
[2025-02-21 06:23:15,834][00162] Num frames 8600...
[2025-02-21 06:23:15,997][00162] Num frames 8700...
[2025-02-21 06:23:16,172][00162] Num frames 8800...
[2025-02-21 06:23:16,342][00162] Num frames 8900...
[2025-02-21 06:23:16,505][00162] Num frames 9000...
[2025-02-21 06:23:16,732][00162] Avg episode rewards: #0: 20.699, true rewards: #0: 9.099
[2025-02-21 06:23:16,733][00162] Avg episode reward: 20.699, avg true_objective: 9.099
[2025-02-21 06:23:16,735][00162] Num frames 9100...
[2025-02-21 06:24:06,104][00162] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-02-21 06:29:18,376][00162] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-21 06:29:18,377][00162] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-21 06:29:18,378][00162] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-21 06:29:18,379][00162] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-21 06:29:18,380][00162] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-21 06:29:18,381][00162] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-21 06:29:18,381][00162] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-02-21 06:29:18,382][00162] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-21 06:29:18,383][00162] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-02-21 06:29:18,384][00162] Adding new argument 'hf_repository'='u79jm/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-02-21 06:29:18,385][00162] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-21 06:29:18,385][00162] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-21 06:29:18,386][00162] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-21 06:29:18,387][00162] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-21 06:29:18,388][00162] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-21 06:29:18,414][00162] RunningMeanStd input shape: (3, 72, 128)
[2025-02-21 06:29:18,417][00162] RunningMeanStd input shape: (1,)
[2025-02-21 06:29:18,427][00162] ConvEncoder: input_channels=3
[2025-02-21 06:29:18,458][00162] Conv encoder output size: 512
[2025-02-21 06:29:18,459][00162] Policy head output size: 512
[2025-02-21 06:29:18,476][00162] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-21 06:29:18,894][00162] Num frames 100...
[2025-02-21 06:29:19,019][00162] Num frames 200...
[2025-02-21 06:29:19,174][00162] Num frames 300...
[2025-02-21 06:29:19,304][00162] Num frames 400...
[2025-02-21 06:29:19,430][00162] Num frames 500...
[2025-02-21 06:29:19,557][00162] Num frames 600...
[2025-02-21 06:29:19,684][00162] Num frames 700...
[2025-02-21 06:29:19,744][00162] Avg episode rewards: #0: 15.040, true rewards: #0: 7.040
[2025-02-21 06:29:19,745][00162] Avg episode reward: 15.040, avg true_objective: 7.040
[2025-02-21 06:29:19,866][00162] Num frames 800...
[2025-02-21 06:29:19,989][00162] Num frames 900...
[2025-02-21 06:29:20,119][00162] Num frames 1000...
[2025-02-21 06:29:20,258][00162] Num frames 1100...
[2025-02-21 06:29:20,396][00162] Num frames 1200...
[2025-02-21 06:29:20,523][00162] Num frames 1300...
[2025-02-21 06:29:20,669][00162] Num frames 1400...
[2025-02-21 06:29:20,752][00162] Avg episode rewards: #0: 13.040, true rewards: #0: 7.040
[2025-02-21 06:29:20,754][00162] Avg episode reward: 13.040, avg true_objective: 7.040
[2025-02-21 06:29:20,873][00162] Num frames 1500...
[2025-02-21 06:29:20,998][00162] Num frames 1600...
[2025-02-21 06:29:21,133][00162] Num frames 1700...
[2025-02-21 06:29:21,273][00162] Num frames 1800...
[2025-02-21 06:29:21,416][00162] Num frames 1900...
[2025-02-21 06:29:21,586][00162] Num frames 2000...
[2025-02-21 06:29:21,753][00162] Num frames 2100...
[2025-02-21 06:29:21,920][00162] Num frames 2200...
[2025-02-21 06:29:22,088][00162] Num frames 2300...
[2025-02-21 06:29:22,268][00162] Num frames 2400...
[2025-02-21 06:29:22,433][00162] Num frames 2500...
[2025-02-21 06:29:22,602][00162] Num frames 2600...
[2025-02-21 06:29:22,771][00162] Num frames 2700...
[2025-02-21 06:29:22,947][00162] Num frames 2800...
[2025-02-21 06:29:23,124][00162] Num frames 2900...
[2025-02-21 06:29:23,235][00162] Avg episode rewards: #0: 21.093, true rewards: #0: 9.760
[2025-02-21 06:29:23,236][00162] Avg episode reward: 21.093, avg true_objective: 9.760
[2025-02-21 06:29:23,377][00162] Num frames 3000...
[2025-02-21 06:29:23,530][00162] Num frames 3100...
[2025-02-21 06:29:23,656][00162] Num frames 3200...
[2025-02-21 06:29:23,782][00162] Num frames 3300...
[2025-02-21 06:29:23,906][00162] Num frames 3400...
[2025-02-21 06:29:24,029][00162] Num frames 3500...
[2025-02-21 06:29:24,157][00162] Num frames 3600...
[2025-02-21 06:29:24,285][00162] Num frames 3700...
[2025-02-21 06:29:24,379][00162] Avg episode rewards: #0: 19.820, true rewards: #0: 9.320
[2025-02-21 06:29:24,380][00162] Avg episode reward: 19.820, avg true_objective: 9.320
[2025-02-21 06:29:24,469][00162] Num frames 3800...
[2025-02-21 06:29:24,593][00162] Num frames 3900...
[2025-02-21 06:29:24,723][00162] Num frames 4000...
[2025-02-21 06:29:24,849][00162] Num frames 4100...
[2025-02-21 06:29:24,972][00162] Num frames 4200...
[2025-02-21 06:29:25,101][00162] Num frames 4300...
[2025-02-21 06:29:25,262][00162] Avg episode rewards: #0: 18.362, true rewards: #0: 8.762
[2025-02-21 06:29:25,263][00162] Avg episode reward: 18.362, avg true_objective: 8.762
[2025-02-21 06:29:25,286][00162] Num frames 4400...
[2025-02-21 06:29:25,417][00162] Num frames 4500...
[2025-02-21 06:29:25,543][00162] Num frames 4600...
[2025-02-21 06:29:25,667][00162] Num frames 4700...
[2025-02-21 06:29:25,790][00162] Num frames 4800...
[2025-02-21 06:29:25,915][00162] Num frames 4900...
[2025-02-21 06:29:26,039][00162] Num frames 5000...
[2025-02-21 06:29:26,169][00162] Num frames 5100...
[2025-02-21 06:29:26,284][00162] Avg episode rewards: #0: 17.745, true rewards: #0: 8.578
[2025-02-21 06:29:26,284][00162] Avg episode reward: 17.745, avg true_objective: 8.578
[2025-02-21 06:29:26,362][00162] Num frames 5200...
[2025-02-21 06:29:26,486][00162] Num frames 5300...
[2025-02-21 06:29:26,612][00162] Num frames 5400...
[2025-02-21 06:29:26,738][00162] Num frames 5500...
[2025-02-21 06:29:26,862][00162] Num frames 5600...
[2025-02-21 06:29:26,986][00162] Num frames 5700...
[2025-02-21 06:29:27,118][00162] Num frames 5800...
[2025-02-21 06:29:27,244][00162] Num frames 5900...
[2025-02-21 06:29:27,382][00162] Num frames 6000...
[2025-02-21 06:29:27,507][00162] Num frames 6100...
[2025-02-21 06:29:27,632][00162] Num frames 6200...
[2025-02-21 06:29:27,811][00162] Avg episode rewards: #0: 19.284, true rewards: #0: 8.999
[2025-02-21 06:29:27,812][00162] Avg episode reward: 19.284, avg true_objective: 8.999
[2025-02-21 06:29:27,815][00162] Num frames 6300...
[2025-02-21 06:29:27,939][00162] Num frames 6400...
[2025-02-21 06:29:28,069][00162] Num frames 6500...
[2025-02-21 06:29:28,196][00162] Num frames 6600...
[2025-02-21 06:29:28,321][00162] Num frames 6700...
[2025-02-21 06:29:28,456][00162] Num frames 6800...
[2025-02-21 06:29:28,584][00162] Num frames 6900...
[2025-02-21 06:29:28,712][00162] Num frames 7000...
[2025-02-21 06:29:28,812][00162] Avg episode rewards: #0: 18.919, true rewards: #0: 8.794
[2025-02-21 06:29:28,813][00162] Avg episode reward: 18.919, avg true_objective: 8.794
[2025-02-21 06:29:28,895][00162] Num frames 7100...
[2025-02-21 06:29:29,023][00162] Num frames 7200...
[2025-02-21 06:29:29,156][00162] Num frames 7300...
[2025-02-21 06:29:29,284][00162] Num frames 7400...
[2025-02-21 06:29:29,422][00162] Num frames 7500...
[2025-02-21 06:29:29,550][00162] Num frames 7600...
[2025-02-21 06:29:29,677][00162] Num frames 7700...
[2025-02-21 06:29:29,816][00162] Avg episode rewards: #0: 18.518, true rewards: #0: 8.629
[2025-02-21 06:29:29,817][00162] Avg episode reward: 18.518, avg true_objective: 8.629
[2025-02-21 06:29:29,862][00162] Num frames 7800...
[2025-02-21 06:29:29,990][00162] Num frames 7900...
[2025-02-21 06:29:30,120][00162] Num frames 8000...
[2025-02-21 06:29:30,247][00162] Num frames 8100...
[2025-02-21 06:29:30,379][00162] Num frames 8200...
[2025-02-21 06:29:30,513][00162] Num frames 8300...
[2025-02-21 06:29:30,642][00162] Num frames 8400...
[2025-02-21 06:29:30,771][00162] Num frames 8500...
[2025-02-21 06:29:30,898][00162] Num frames 8600...
[2025-02-21 06:29:31,023][00162] Num frames 8700...
[2025-02-21 06:29:31,152][00162] Num frames 8800...
[2025-02-21 06:29:31,281][00162] Num frames 8900...
[2025-02-21 06:29:31,409][00162] Num frames 9000...
[2025-02-21 06:29:31,549][00162] Num frames 9100...
[2025-02-21 06:29:31,623][00162] Avg episode rewards: #0: 19.915, true rewards: #0: 9.115
[2025-02-21 06:29:31,624][00162] Avg episode reward: 19.915, avg true_objective: 9.115
[2025-02-21 06:30:22,179][00162] Replay video saved to /content/train_dir/default_experiment/replay.mp4!