diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1242 @@ +[2025-02-20 15:17:34,093][00180] Saving configuration to /content/train_dir/default_experiment/config.json... +[2025-02-20 15:17:34,097][00180] Rollout worker 0 uses device cpu +[2025-02-20 15:17:34,101][00180] Rollout worker 1 uses device cpu +[2025-02-20 15:17:34,101][00180] Rollout worker 2 uses device cpu +[2025-02-20 15:17:34,102][00180] Rollout worker 3 uses device cpu +[2025-02-20 15:17:34,103][00180] Rollout worker 4 uses device cpu +[2025-02-20 15:17:34,104][00180] Rollout worker 5 uses device cpu +[2025-02-20 15:17:34,106][00180] Rollout worker 6 uses device cpu +[2025-02-20 15:17:34,107][00180] Rollout worker 7 uses device cpu +[2025-02-20 15:17:34,849][00180] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-02-20 15:17:34,862][00180] InferenceWorker_p0-w0: min num requests: 2 +[2025-02-20 15:17:35,051][00180] Starting all processes... +[2025-02-20 15:17:35,053][00180] Starting process learner_proc0 +[2025-02-20 15:17:35,306][00180] Starting all processes... +[2025-02-20 15:17:35,315][00180] Starting process inference_proc0-0 +[2025-02-20 15:17:35,316][00180] Starting process rollout_proc0 +[2025-02-20 15:17:35,316][00180] Starting process rollout_proc1 +[2025-02-20 15:17:35,317][00180] Starting process rollout_proc2 +[2025-02-20 15:17:35,317][00180] Starting process rollout_proc3 +[2025-02-20 15:17:35,317][00180] Starting process rollout_proc4 +[2025-02-20 15:17:35,318][00180] Starting process rollout_proc5 +[2025-02-20 15:17:35,318][00180] Starting process rollout_proc6 +[2025-02-20 15:17:35,318][00180] Starting process rollout_proc7 +[2025-02-20 15:17:52,032][02584] Worker 3 uses CPU cores [1] +[2025-02-20 15:17:52,033][02586] Worker 5 uses CPU cores [1] +[2025-02-20 15:17:52,082][02589] Worker 7 uses CPU cores [1] +[2025-02-20 15:17:52,114][02585] Worker 4 uses CPU cores [0] +[2025-02-20 15:17:52,139][02581] Worker 0 uses CPU cores [0] +[2025-02-20 15:17:52,154][02588] Worker 6 uses CPU cores [0] +[2025-02-20 15:17:52,187][02568] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-02-20 15:17:52,187][02568] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2025-02-20 15:17:52,199][02582] Worker 1 uses CPU cores [1] +[2025-02-20 15:17:52,240][02568] Num visible devices: 1 +[2025-02-20 15:17:52,271][02568] Starting seed is not provided +[2025-02-20 15:17:52,271][02568] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-02-20 15:17:52,272][02568] Initializing actor-critic model on device cuda:0 +[2025-02-20 15:17:52,273][02568] RunningMeanStd input shape: (3, 72, 128) +[2025-02-20 15:17:52,277][02568] RunningMeanStd input shape: (1,) +[2025-02-20 15:17:52,308][02568] ConvEncoder: input_channels=3 +[2025-02-20 15:17:52,324][02583] Worker 2 uses CPU cores [0] +[2025-02-20 15:17:52,332][02587] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-02-20 15:17:52,333][02587] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2025-02-20 15:17:52,351][02587] Num visible devices: 1 +[2025-02-20 15:17:52,620][02568] Conv encoder output size: 512 +[2025-02-20 15:17:52,620][02568] Policy head output size: 512 +[2025-02-20 15:17:52,685][02568] Created Actor Critic model with architecture: +[2025-02-20 15:17:52,685][02568] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2025-02-20 15:17:52,944][02568] Using optimizer +[2025-02-20 15:17:54,781][00180] Heartbeat connected on Batcher_0 +[2025-02-20 15:17:54,850][00180] Heartbeat connected on InferenceWorker_p0-w0 +[2025-02-20 15:17:54,887][00180] Heartbeat connected on RolloutWorker_w0 +[2025-02-20 15:17:54,930][00180] Heartbeat connected on RolloutWorker_w1 +[2025-02-20 15:17:54,956][00180] Heartbeat connected on RolloutWorker_w2 +[2025-02-20 15:17:54,964][00180] Heartbeat connected on RolloutWorker_w3 +[2025-02-20 15:17:54,982][00180] Heartbeat connected on RolloutWorker_w4 +[2025-02-20 15:17:54,989][00180] Heartbeat connected on RolloutWorker_w5 +[2025-02-20 15:17:55,027][00180] Heartbeat connected on RolloutWorker_w6 +[2025-02-20 15:17:55,051][00180] Heartbeat connected on RolloutWorker_w7 +[2025-02-20 15:17:57,436][02568] No checkpoints found +[2025-02-20 15:17:57,436][02568] Did not load from checkpoint, starting from scratch! +[2025-02-20 15:17:57,436][02568] Initialized policy 0 weights for model version 0 +[2025-02-20 15:17:57,440][02568] LearnerWorker_p0 finished initialization! +[2025-02-20 15:17:57,444][02568] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-02-20 15:17:57,444][00180] Heartbeat connected on LearnerWorker_p0 +[2025-02-20 15:17:57,609][02587] RunningMeanStd input shape: (3, 72, 128) +[2025-02-20 15:17:57,610][02587] RunningMeanStd input shape: (1,) +[2025-02-20 15:17:57,622][02587] ConvEncoder: input_channels=3 +[2025-02-20 15:17:57,731][02587] Conv encoder output size: 512 +[2025-02-20 15:17:57,732][02587] Policy head output size: 512 +[2025-02-20 15:17:57,770][00180] Inference worker 0-0 is ready! +[2025-02-20 15:17:57,771][00180] All inference workers are ready! Signal rollout workers to start! +[2025-02-20 15:17:58,079][02586] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-20 15:17:58,093][02584] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-20 15:17:58,095][02588] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-20 15:17:58,100][02583] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-20 15:17:58,096][02582] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-20 15:17:58,211][02585] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-20 15:17:58,213][02581] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-20 15:17:58,267][02589] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-20 15:17:59,636][02581] Decorrelating experience for 0 frames... +[2025-02-20 15:17:59,636][02589] Decorrelating experience for 0 frames... +[2025-02-20 15:17:59,637][02583] Decorrelating experience for 0 frames... +[2025-02-20 15:17:59,638][02582] Decorrelating experience for 0 frames... +[2025-02-20 15:18:00,510][02583] Decorrelating experience for 32 frames... +[2025-02-20 15:18:00,525][02581] Decorrelating experience for 32 frames... +[2025-02-20 15:18:00,932][02589] Decorrelating experience for 32 frames... +[2025-02-20 15:18:00,934][02582] Decorrelating experience for 32 frames... +[2025-02-20 15:18:01,822][02581] Decorrelating experience for 64 frames... +[2025-02-20 15:18:01,827][02583] Decorrelating experience for 64 frames... +[2025-02-20 15:18:02,148][00180] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-02-20 15:18:02,626][02589] Decorrelating experience for 64 frames... +[2025-02-20 15:18:02,910][02581] Decorrelating experience for 96 frames... +[2025-02-20 15:18:02,923][02583] Decorrelating experience for 96 frames... +[2025-02-20 15:18:03,420][02582] Decorrelating experience for 64 frames... +[2025-02-20 15:18:04,248][02589] Decorrelating experience for 96 frames... +[2025-02-20 15:18:04,317][02582] Decorrelating experience for 96 frames... +[2025-02-20 15:18:07,087][02568] Signal inference workers to stop experience collection... +[2025-02-20 15:18:07,095][02587] InferenceWorker_p0-w0: stopping experience collection +[2025-02-20 15:18:07,147][00180] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 385.2. Samples: 1926. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-02-20 15:18:07,151][00180] Avg episode reward: [(0, '2.887')] +[2025-02-20 15:18:09,018][02568] Signal inference workers to resume experience collection... +[2025-02-20 15:18:09,018][02587] InferenceWorker_p0-w0: resuming experience collection +[2025-02-20 15:18:12,148][00180] Fps is (10 sec: 1638.3, 60 sec: 1638.3, 300 sec: 1638.3). Total num frames: 16384. Throughput: 0: 284.6. Samples: 2846. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:18:12,158][00180] Avg episode reward: [(0, '3.816')] +[2025-02-20 15:18:17,147][00180] Fps is (10 sec: 3276.8, 60 sec: 2184.6, 300 sec: 2184.6). Total num frames: 32768. Throughput: 0: 553.2. Samples: 8298. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:18:17,151][00180] Avg episode reward: [(0, '4.133')] +[2025-02-20 15:18:18,427][02587] Updated weights for policy 0, policy_version 10 (0.0013) +[2025-02-20 15:18:22,147][00180] Fps is (10 sec: 3686.7, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 53248. Throughput: 0: 708.5. Samples: 14170. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:18:22,151][00180] Avg episode reward: [(0, '4.501')] +[2025-02-20 15:18:27,151][00180] Fps is (10 sec: 4504.0, 60 sec: 3112.6, 300 sec: 3112.6). Total num frames: 77824. Throughput: 0: 692.4. Samples: 17312. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:18:27,153][00180] Avg episode reward: [(0, '4.472')] +[2025-02-20 15:18:28,588][02587] Updated weights for policy 0, policy_version 20 (0.0015) +[2025-02-20 15:18:32,148][00180] Fps is (10 sec: 3686.3, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 90112. Throughput: 0: 736.1. Samples: 22082. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:18:32,151][00180] Avg episode reward: [(0, '4.424')] +[2025-02-20 15:18:37,147][00180] Fps is (10 sec: 3277.9, 60 sec: 3159.8, 300 sec: 3159.8). Total num frames: 110592. Throughput: 0: 794.5. Samples: 27806. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:18:37,151][00180] Avg episode reward: [(0, '4.369')] +[2025-02-20 15:18:37,155][02568] Saving new best policy, reward=4.369! +[2025-02-20 15:18:39,720][02587] Updated weights for policy 0, policy_version 30 (0.0014) +[2025-02-20 15:18:42,147][00180] Fps is (10 sec: 3686.5, 60 sec: 3174.4, 300 sec: 3174.4). Total num frames: 126976. Throughput: 0: 772.1. Samples: 30884. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:18:42,150][00180] Avg episode reward: [(0, '4.469')] +[2025-02-20 15:18:42,163][02568] Saving new best policy, reward=4.469! +[2025-02-20 15:18:47,147][00180] Fps is (10 sec: 3276.8, 60 sec: 3185.8, 300 sec: 3185.8). Total num frames: 143360. Throughput: 0: 788.6. Samples: 35488. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:18:47,151][00180] Avg episode reward: [(0, '4.467')] +[2025-02-20 15:18:51,296][02587] Updated weights for policy 0, policy_version 40 (0.0017) +[2025-02-20 15:18:52,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 163840. Throughput: 0: 885.1. Samples: 41754. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:18:52,149][00180] Avg episode reward: [(0, '4.356')] +[2025-02-20 15:18:57,148][00180] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 180224. Throughput: 0: 932.0. Samples: 44786. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:18:57,151][00180] Avg episode reward: [(0, '4.319')] +[2025-02-20 15:19:02,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3345.1). Total num frames: 200704. Throughput: 0: 914.4. Samples: 49446. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:19:02,150][00180] Avg episode reward: [(0, '4.231')] +[2025-02-20 15:19:02,684][02587] Updated weights for policy 0, policy_version 50 (0.0013) +[2025-02-20 15:19:07,148][00180] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3402.8). Total num frames: 221184. Throughput: 0: 911.9. Samples: 55206. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:19:07,149][00180] Avg episode reward: [(0, '4.399')] +[2025-02-20 15:19:12,153][00180] Fps is (10 sec: 3275.0, 60 sec: 3617.8, 300 sec: 3335.1). Total num frames: 233472. Throughput: 0: 901.7. Samples: 57892. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:19:12,160][00180] Avg episode reward: [(0, '4.475')] +[2025-02-20 15:19:12,166][02568] Saving new best policy, reward=4.475! +[2025-02-20 15:19:15,268][02587] Updated weights for policy 0, policy_version 60 (0.0018) +[2025-02-20 15:19:17,148][00180] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3386.0). Total num frames: 253952. Throughput: 0: 893.8. Samples: 62304. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:19:17,151][00180] Avg episode reward: [(0, '4.401')] +[2025-02-20 15:19:22,148][00180] Fps is (10 sec: 3688.4, 60 sec: 3618.1, 300 sec: 3379.2). Total num frames: 270336. Throughput: 0: 903.2. Samples: 68450. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:19:22,152][00180] Avg episode reward: [(0, '4.261')] +[2025-02-20 15:19:22,158][02568] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000066_270336.pth... +[2025-02-20 15:19:26,710][02587] Updated weights for policy 0, policy_version 70 (0.0013) +[2025-02-20 15:19:27,148][00180] Fps is (10 sec: 3276.9, 60 sec: 3481.8, 300 sec: 3373.2). Total num frames: 286720. Throughput: 0: 880.2. Samples: 70494. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:19:27,151][00180] Avg episode reward: [(0, '4.365')] +[2025-02-20 15:19:32,148][00180] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3413.3). Total num frames: 307200. Throughput: 0: 906.1. Samples: 76262. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:19:32,149][00180] Avg episode reward: [(0, '4.264')] +[2025-02-20 15:19:36,738][02587] Updated weights for policy 0, policy_version 80 (0.0013) +[2025-02-20 15:19:37,148][00180] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 3449.3). Total num frames: 327680. Throughput: 0: 897.9. Samples: 82162. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:19:37,156][00180] Avg episode reward: [(0, '4.260')] +[2025-02-20 15:19:42,147][00180] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3440.6). Total num frames: 344064. Throughput: 0: 873.6. Samples: 84098. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:19:42,150][00180] Avg episode reward: [(0, '4.243')] +[2025-02-20 15:19:47,147][00180] Fps is (10 sec: 3686.6, 60 sec: 3686.4, 300 sec: 3471.9). Total num frames: 364544. Throughput: 0: 912.8. Samples: 90524. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:19:47,152][00180] Avg episode reward: [(0, '4.367')] +[2025-02-20 15:19:47,520][02587] Updated weights for policy 0, policy_version 90 (0.0013) +[2025-02-20 15:19:52,148][00180] Fps is (10 sec: 3686.2, 60 sec: 3618.1, 300 sec: 3463.0). Total num frames: 380928. Throughput: 0: 912.4. Samples: 96262. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:19:52,153][00180] Avg episode reward: [(0, '4.601')] +[2025-02-20 15:19:52,157][02568] Saving new best policy, reward=4.601! +[2025-02-20 15:19:57,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3490.5). Total num frames: 401408. Throughput: 0: 905.1. Samples: 98616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-02-20 15:19:57,151][00180] Avg episode reward: [(0, '4.629')] +[2025-02-20 15:19:57,159][02568] Saving new best policy, reward=4.629! +[2025-02-20 15:19:58,582][02587] Updated weights for policy 0, policy_version 100 (0.0017) +[2025-02-20 15:20:02,147][00180] Fps is (10 sec: 4096.2, 60 sec: 3686.4, 300 sec: 3515.7). Total num frames: 421888. Throughput: 0: 948.9. Samples: 105002. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:20:02,152][00180] Avg episode reward: [(0, '4.491')] +[2025-02-20 15:20:07,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3506.2). Total num frames: 438272. Throughput: 0: 925.2. Samples: 110084. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:20:07,149][00180] Avg episode reward: [(0, '4.287')] +[2025-02-20 15:20:09,643][02587] Updated weights for policy 0, policy_version 110 (0.0015) +[2025-02-20 15:20:12,148][00180] Fps is (10 sec: 3686.2, 60 sec: 3755.0, 300 sec: 3528.8). Total num frames: 458752. Throughput: 0: 944.2. Samples: 112984. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:20:12,153][00180] Avg episode reward: [(0, '4.344')] +[2025-02-20 15:20:17,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3549.9). Total num frames: 479232. Throughput: 0: 960.4. Samples: 119478. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:20:17,152][00180] Avg episode reward: [(0, '4.492')] +[2025-02-20 15:20:20,206][02587] Updated weights for policy 0, policy_version 120 (0.0013) +[2025-02-20 15:20:22,147][00180] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3540.1). Total num frames: 495616. Throughput: 0: 938.0. Samples: 124372. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:20:22,152][00180] Avg episode reward: [(0, '4.406')] +[2025-02-20 15:20:27,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3559.3). Total num frames: 516096. Throughput: 0: 966.0. Samples: 127566. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:20:27,151][00180] Avg episode reward: [(0, '4.453')] +[2025-02-20 15:20:30,200][02587] Updated weights for policy 0, policy_version 130 (0.0013) +[2025-02-20 15:20:32,152][00180] Fps is (10 sec: 4094.2, 60 sec: 3822.7, 300 sec: 3577.1). Total num frames: 536576. Throughput: 0: 965.6. Samples: 133978. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:20:32,156][00180] Avg episode reward: [(0, '4.612')] +[2025-02-20 15:20:37,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3567.5). Total num frames: 552960. Throughput: 0: 942.9. Samples: 138692. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:20:37,152][00180] Avg episode reward: [(0, '4.635')] +[2025-02-20 15:20:37,155][02568] Saving new best policy, reward=4.635! +[2025-02-20 15:20:41,383][02587] Updated weights for policy 0, policy_version 140 (0.0017) +[2025-02-20 15:20:42,147][00180] Fps is (10 sec: 3688.0, 60 sec: 3822.9, 300 sec: 3584.0). Total num frames: 573440. Throughput: 0: 960.7. Samples: 141848. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:20:42,150][00180] Avg episode reward: [(0, '4.640')] +[2025-02-20 15:20:42,158][02568] Saving new best policy, reward=4.640! +[2025-02-20 15:20:47,148][00180] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3574.7). Total num frames: 589824. Throughput: 0: 951.6. Samples: 147822. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:20:47,152][00180] Avg episode reward: [(0, '4.612')] +[2025-02-20 15:20:52,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3590.0). Total num frames: 610304. Throughput: 0: 954.6. Samples: 153040. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:20:52,149][00180] Avg episode reward: [(0, '4.432')] +[2025-02-20 15:20:52,554][02587] Updated weights for policy 0, policy_version 150 (0.0017) +[2025-02-20 15:20:57,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3604.5). Total num frames: 630784. Throughput: 0: 961.7. Samples: 156262. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:20:57,151][00180] Avg episode reward: [(0, '4.412')] +[2025-02-20 15:21:02,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3595.4). Total num frames: 647168. Throughput: 0: 942.3. Samples: 161882. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:21:02,149][00180] Avg episode reward: [(0, '4.565')] +[2025-02-20 15:21:03,587][02587] Updated weights for policy 0, policy_version 160 (0.0014) +[2025-02-20 15:21:07,148][00180] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3608.9). Total num frames: 667648. Throughput: 0: 957.2. Samples: 167444. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:21:07,152][00180] Avg episode reward: [(0, '4.583')] +[2025-02-20 15:21:12,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3621.7). Total num frames: 688128. Throughput: 0: 957.4. Samples: 170648. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:21:12,151][00180] Avg episode reward: [(0, '4.490')] +[2025-02-20 15:21:13,680][02587] Updated weights for policy 0, policy_version 170 (0.0016) +[2025-02-20 15:21:17,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3612.9). Total num frames: 704512. Throughput: 0: 928.0. Samples: 175736. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:21:17,152][00180] Avg episode reward: [(0, '4.492')] +[2025-02-20 15:21:22,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3625.0). Total num frames: 724992. Throughput: 0: 965.6. Samples: 182146. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:21:22,152][00180] Avg episode reward: [(0, '4.498')] +[2025-02-20 15:21:22,221][02568] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000178_729088.pth... +[2025-02-20 15:21:24,113][02587] Updated weights for policy 0, policy_version 180 (0.0013) +[2025-02-20 15:21:27,148][00180] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3636.5). Total num frames: 745472. Throughput: 0: 966.6. Samples: 185344. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:21:27,149][00180] Avg episode reward: [(0, '4.551')] +[2025-02-20 15:21:32,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3823.2, 300 sec: 3647.4). Total num frames: 765952. Throughput: 0: 944.0. Samples: 190302. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:21:32,152][00180] Avg episode reward: [(0, '4.520')] +[2025-02-20 15:21:34,912][02587] Updated weights for policy 0, policy_version 190 (0.0017) +[2025-02-20 15:21:37,147][00180] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3657.8). Total num frames: 786432. Throughput: 0: 972.3. Samples: 196792. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:21:37,150][00180] Avg episode reward: [(0, '4.428')] +[2025-02-20 15:21:42,148][00180] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3649.2). Total num frames: 802816. Throughput: 0: 973.6. Samples: 200076. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:21:42,149][00180] Avg episode reward: [(0, '4.351')] +[2025-02-20 15:21:45,785][02587] Updated weights for policy 0, policy_version 200 (0.0015) +[2025-02-20 15:21:47,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3659.1). Total num frames: 823296. Throughput: 0: 960.0. Samples: 205084. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:21:47,149][00180] Avg episode reward: [(0, '4.345')] +[2025-02-20 15:21:52,147][00180] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3668.6). Total num frames: 843776. Throughput: 0: 982.3. Samples: 211648. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:21:52,152][00180] Avg episode reward: [(0, '4.404')] +[2025-02-20 15:21:55,958][02587] Updated weights for policy 0, policy_version 210 (0.0015) +[2025-02-20 15:21:57,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3660.3). Total num frames: 860160. Throughput: 0: 971.9. Samples: 214384. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:21:57,152][00180] Avg episode reward: [(0, '4.515')] +[2025-02-20 15:22:02,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3686.4). Total num frames: 884736. Throughput: 0: 982.0. Samples: 219926. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:22:02,148][00180] Avg episode reward: [(0, '4.685')] +[2025-02-20 15:22:02,156][02568] Saving new best policy, reward=4.685! +[2025-02-20 15:22:05,959][02587] Updated weights for policy 0, policy_version 220 (0.0016) +[2025-02-20 15:22:07,154][00180] Fps is (10 sec: 4502.5, 60 sec: 3959.0, 300 sec: 3694.7). Total num frames: 905216. Throughput: 0: 983.8. Samples: 226422. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:22:07,156][00180] Avg episode reward: [(0, '4.499')] +[2025-02-20 15:22:12,153][00180] Fps is (10 sec: 3274.8, 60 sec: 3822.6, 300 sec: 3669.9). Total num frames: 917504. Throughput: 0: 958.4. Samples: 228476. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:22:12,155][00180] Avg episode reward: [(0, '4.456')] +[2025-02-20 15:22:16,974][02587] Updated weights for policy 0, policy_version 230 (0.0023) +[2025-02-20 15:22:17,147][00180] Fps is (10 sec: 3688.9, 60 sec: 3959.5, 300 sec: 3694.4). Total num frames: 942080. Throughput: 0: 984.0. Samples: 234580. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:22:17,149][00180] Avg episode reward: [(0, '4.635')] +[2025-02-20 15:22:22,147][00180] Fps is (10 sec: 4508.3, 60 sec: 3959.5, 300 sec: 3702.2). Total num frames: 962560. Throughput: 0: 976.4. Samples: 240728. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:22:22,151][00180] Avg episode reward: [(0, '4.624')] +[2025-02-20 15:22:27,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3694.1). Total num frames: 978944. Throughput: 0: 948.8. Samples: 242772. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:22:27,149][00180] Avg episode reward: [(0, '4.628')] +[2025-02-20 15:22:27,755][02587] Updated weights for policy 0, policy_version 240 (0.0014) +[2025-02-20 15:22:32,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3701.6). Total num frames: 999424. Throughput: 0: 982.9. Samples: 249314. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:22:32,149][00180] Avg episode reward: [(0, '4.942')] +[2025-02-20 15:22:32,159][02568] Saving new best policy, reward=4.942! +[2025-02-20 15:22:37,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3708.7). Total num frames: 1019904. Throughput: 0: 961.6. Samples: 254918. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:22:37,149][00180] Avg episode reward: [(0, '5.046')] +[2025-02-20 15:22:37,151][02568] Saving new best policy, reward=5.046! +[2025-02-20 15:22:38,787][02587] Updated weights for policy 0, policy_version 250 (0.0013) +[2025-02-20 15:22:42,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3701.0). Total num frames: 1036288. Throughput: 0: 955.6. Samples: 257386. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:22:42,149][00180] Avg episode reward: [(0, '4.775')] +[2025-02-20 15:22:47,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3708.0). Total num frames: 1056768. Throughput: 0: 979.2. Samples: 263992. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:22:47,151][00180] Avg episode reward: [(0, '4.687')] +[2025-02-20 15:22:48,167][02587] Updated weights for policy 0, policy_version 260 (0.0013) +[2025-02-20 15:22:52,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3700.5). Total num frames: 1073152. Throughput: 0: 949.7. Samples: 269150. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:22:52,150][00180] Avg episode reward: [(0, '4.787')] +[2025-02-20 15:22:57,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3721.1). Total num frames: 1097728. Throughput: 0: 974.2. Samples: 272308. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:22:57,150][00180] Avg episode reward: [(0, '4.680')] +[2025-02-20 15:22:58,847][02587] Updated weights for policy 0, policy_version 270 (0.0017) +[2025-02-20 15:23:02,148][00180] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1118208. Throughput: 0: 984.2. Samples: 278868. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:23:02,151][00180] Avg episode reward: [(0, '4.644')] +[2025-02-20 15:23:07,148][00180] Fps is (10 sec: 3686.3, 60 sec: 3823.4, 300 sec: 3790.5). Total num frames: 1134592. Throughput: 0: 959.3. Samples: 283896. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:23:07,151][00180] Avg episode reward: [(0, '4.763')] +[2025-02-20 15:23:09,701][02587] Updated weights for policy 0, policy_version 280 (0.0016) +[2025-02-20 15:23:12,147][00180] Fps is (10 sec: 3686.5, 60 sec: 3959.9, 300 sec: 3804.4). Total num frames: 1155072. Throughput: 0: 985.3. Samples: 287110. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:23:12,152][00180] Avg episode reward: [(0, '5.168')] +[2025-02-20 15:23:12,161][02568] Saving new best policy, reward=5.168! +[2025-02-20 15:23:17,147][00180] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1175552. Throughput: 0: 986.2. Samples: 293692. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:23:17,153][00180] Avg episode reward: [(0, '5.178')] +[2025-02-20 15:23:17,154][02568] Saving new best policy, reward=5.178! +[2025-02-20 15:23:20,328][02587] Updated weights for policy 0, policy_version 290 (0.0013) +[2025-02-20 15:23:22,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1191936. Throughput: 0: 972.7. Samples: 298688. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:23:22,152][00180] Avg episode reward: [(0, '4.973')] +[2025-02-20 15:23:22,162][02568] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000291_1191936.pth... +[2025-02-20 15:23:22,254][02568] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000066_270336.pth +[2025-02-20 15:23:27,148][00180] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1216512. Throughput: 0: 988.6. Samples: 301874. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:23:27,151][00180] Avg episode reward: [(0, '4.901')] +[2025-02-20 15:23:29,922][02587] Updated weights for policy 0, policy_version 300 (0.0013) +[2025-02-20 15:23:32,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1232896. Throughput: 0: 976.1. Samples: 307916. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:23:32,150][00180] Avg episode reward: [(0, '4.932')] +[2025-02-20 15:23:37,147][00180] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1253376. Throughput: 0: 985.8. Samples: 313512. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:23:37,148][00180] Avg episode reward: [(0, '5.091')] +[2025-02-20 15:23:40,786][02587] Updated weights for policy 0, policy_version 310 (0.0025) +[2025-02-20 15:23:42,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1273856. Throughput: 0: 985.6. Samples: 316658. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:23:42,152][00180] Avg episode reward: [(0, '5.153')] +[2025-02-20 15:23:47,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1290240. Throughput: 0: 962.0. Samples: 322156. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:23:47,151][00180] Avg episode reward: [(0, '5.314')] +[2025-02-20 15:23:47,155][02568] Saving new best policy, reward=5.314! +[2025-02-20 15:23:51,719][02587] Updated weights for policy 0, policy_version 320 (0.0013) +[2025-02-20 15:23:52,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1310720. Throughput: 0: 981.4. Samples: 328058. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:23:52,152][00180] Avg episode reward: [(0, '5.467')] +[2025-02-20 15:23:52,159][02568] Saving new best policy, reward=5.467! +[2025-02-20 15:23:57,148][00180] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1331200. Throughput: 0: 980.9. Samples: 331252. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:23:57,149][00180] Avg episode reward: [(0, '5.680')] +[2025-02-20 15:23:57,150][02568] Saving new best policy, reward=5.680! +[2025-02-20 15:24:02,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1347584. Throughput: 0: 943.1. Samples: 336132. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:24:02,149][00180] Avg episode reward: [(0, '6.087')] +[2025-02-20 15:24:02,157][02568] Saving new best policy, reward=6.087! +[2025-02-20 15:24:02,735][02587] Updated weights for policy 0, policy_version 330 (0.0012) +[2025-02-20 15:24:07,147][00180] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1368064. Throughput: 0: 969.7. Samples: 342324. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:24:07,149][00180] Avg episode reward: [(0, '6.087')] +[2025-02-20 15:24:12,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1388544. Throughput: 0: 967.8. Samples: 345426. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:24:12,151][00180] Avg episode reward: [(0, '6.422')] +[2025-02-20 15:24:12,160][02568] Saving new best policy, reward=6.422! +[2025-02-20 15:24:13,743][02587] Updated weights for policy 0, policy_version 340 (0.0018) +[2025-02-20 15:24:17,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1404928. Throughput: 0: 941.2. Samples: 350272. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:24:17,149][00180] Avg episode reward: [(0, '5.997')] +[2025-02-20 15:24:22,149][00180] Fps is (10 sec: 3685.9, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 1425408. Throughput: 0: 960.0. Samples: 356712. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:24:22,150][00180] Avg episode reward: [(0, '6.110')] +[2025-02-20 15:24:23,484][02587] Updated weights for policy 0, policy_version 350 (0.0013) +[2025-02-20 15:24:27,147][00180] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1441792. Throughput: 0: 960.0. Samples: 359856. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:24:27,153][00180] Avg episode reward: [(0, '6.063')] +[2025-02-20 15:24:32,147][00180] Fps is (10 sec: 3686.9, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1462272. Throughput: 0: 950.3. Samples: 364920. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:24:32,153][00180] Avg episode reward: [(0, '6.165')] +[2025-02-20 15:24:34,425][02587] Updated weights for policy 0, policy_version 360 (0.0012) +[2025-02-20 15:24:37,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1482752. Throughput: 0: 964.5. Samples: 371460. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:24:37,150][00180] Avg episode reward: [(0, '6.644')] +[2025-02-20 15:24:37,239][02568] Saving new best policy, reward=6.644! +[2025-02-20 15:24:42,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1499136. Throughput: 0: 948.5. Samples: 373934. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:24:42,149][00180] Avg episode reward: [(0, '6.742')] +[2025-02-20 15:24:42,155][02568] Saving new best policy, reward=6.742! +[2025-02-20 15:24:45,447][02587] Updated weights for policy 0, policy_version 370 (0.0017) +[2025-02-20 15:24:47,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1519616. Throughput: 0: 963.0. Samples: 379466. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:24:47,152][00180] Avg episode reward: [(0, '6.845')] +[2025-02-20 15:24:47,155][02568] Saving new best policy, reward=6.845! +[2025-02-20 15:24:52,149][00180] Fps is (10 sec: 4504.6, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 1544192. Throughput: 0: 970.7. Samples: 386008. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:24:52,151][00180] Avg episode reward: [(0, '7.654')] +[2025-02-20 15:24:52,163][02568] Saving new best policy, reward=7.654! +[2025-02-20 15:24:56,448][02587] Updated weights for policy 0, policy_version 380 (0.0019) +[2025-02-20 15:24:57,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1556480. Throughput: 0: 946.1. Samples: 388000. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:24:57,152][00180] Avg episode reward: [(0, '8.002')] +[2025-02-20 15:24:57,156][02568] Saving new best policy, reward=8.002! +[2025-02-20 15:25:02,147][00180] Fps is (10 sec: 3687.2, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1581056. Throughput: 0: 974.8. Samples: 394138. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:25:02,148][00180] Avg episode reward: [(0, '8.842')] +[2025-02-20 15:25:02,156][02568] Saving new best policy, reward=8.842! +[2025-02-20 15:25:06,040][02587] Updated weights for policy 0, policy_version 390 (0.0022) +[2025-02-20 15:25:07,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1597440. Throughput: 0: 965.0. Samples: 400136. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:25:07,151][00180] Avg episode reward: [(0, '9.165')] +[2025-02-20 15:25:07,156][02568] Saving new best policy, reward=9.165! +[2025-02-20 15:25:12,147][00180] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1613824. Throughput: 0: 940.1. Samples: 402162. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:25:12,149][00180] Avg episode reward: [(0, '9.173')] +[2025-02-20 15:25:12,200][02568] Saving new best policy, reward=9.173! +[2025-02-20 15:25:16,963][02587] Updated weights for policy 0, policy_version 400 (0.0021) +[2025-02-20 15:25:17,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1638400. Throughput: 0: 970.3. Samples: 408584. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:25:17,151][00180] Avg episode reward: [(0, '8.589')] +[2025-02-20 15:25:22,148][00180] Fps is (10 sec: 4095.9, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 1654784. Throughput: 0: 947.9. Samples: 414114. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:25:22,155][00180] Avg episode reward: [(0, '9.582')] +[2025-02-20 15:25:22,168][02568] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000404_1654784.pth... +[2025-02-20 15:25:22,294][02568] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000178_729088.pth +[2025-02-20 15:25:22,309][02568] Saving new best policy, reward=9.582! +[2025-02-20 15:25:27,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1675264. Throughput: 0: 948.8. Samples: 416628. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:25:27,152][00180] Avg episode reward: [(0, '9.821')] +[2025-02-20 15:25:27,156][02568] Saving new best policy, reward=9.821! +[2025-02-20 15:25:28,016][02587] Updated weights for policy 0, policy_version 410 (0.0020) +[2025-02-20 15:25:32,147][00180] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1695744. Throughput: 0: 967.5. Samples: 423002. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:25:32,154][00180] Avg episode reward: [(0, '10.054')] +[2025-02-20 15:25:32,163][02568] Saving new best policy, reward=10.054! +[2025-02-20 15:25:37,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1712128. Throughput: 0: 932.0. Samples: 427948. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:25:37,149][00180] Avg episode reward: [(0, '9.251')] +[2025-02-20 15:25:39,072][02587] Updated weights for policy 0, policy_version 420 (0.0015) +[2025-02-20 15:25:42,148][00180] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1732608. Throughput: 0: 959.4. Samples: 431172. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:25:42,149][00180] Avg episode reward: [(0, '8.515')] +[2025-02-20 15:25:47,151][00180] Fps is (10 sec: 4094.6, 60 sec: 3891.0, 300 sec: 3873.8). Total num frames: 1753088. Throughput: 0: 964.7. Samples: 437554. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:25:47,152][00180] Avg episode reward: [(0, '8.126')] +[2025-02-20 15:25:49,783][02587] Updated weights for policy 0, policy_version 430 (0.0017) +[2025-02-20 15:25:52,148][00180] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3860.0). Total num frames: 1769472. Throughput: 0: 946.5. Samples: 442730. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:25:52,152][00180] Avg episode reward: [(0, '8.698')] +[2025-02-20 15:25:57,148][00180] Fps is (10 sec: 3687.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1789952. Throughput: 0: 974.3. Samples: 446008. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:25:57,152][00180] Avg episode reward: [(0, '9.901')] +[2025-02-20 15:25:59,067][02587] Updated weights for policy 0, policy_version 440 (0.0013) +[2025-02-20 15:26:02,147][00180] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1810432. Throughput: 0: 972.3. Samples: 452336. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:26:02,149][00180] Avg episode reward: [(0, '10.740')] +[2025-02-20 15:26:02,160][02568] Saving new best policy, reward=10.740! +[2025-02-20 15:26:07,147][00180] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1826816. Throughput: 0: 967.3. Samples: 457642. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:26:07,152][00180] Avg episode reward: [(0, '11.759')] +[2025-02-20 15:26:07,164][02568] Saving new best policy, reward=11.759! +[2025-02-20 15:26:10,022][02587] Updated weights for policy 0, policy_version 450 (0.0016) +[2025-02-20 15:26:12,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1851392. Throughput: 0: 983.4. Samples: 460880. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:26:12,150][00180] Avg episode reward: [(0, '12.330')] +[2025-02-20 15:26:12,157][02568] Saving new best policy, reward=12.330! +[2025-02-20 15:26:17,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1867776. Throughput: 0: 959.9. Samples: 466196. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:26:17,151][00180] Avg episode reward: [(0, '12.987')] +[2025-02-20 15:26:17,155][02568] Saving new best policy, reward=12.987! +[2025-02-20 15:26:20,843][02587] Updated weights for policy 0, policy_version 460 (0.0012) +[2025-02-20 15:26:22,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1888256. Throughput: 0: 987.0. Samples: 472364. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:26:22,152][00180] Avg episode reward: [(0, '12.269')] +[2025-02-20 15:26:27,148][00180] Fps is (10 sec: 4095.6, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 1908736. Throughput: 0: 988.7. Samples: 475664. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:26:27,150][00180] Avg episode reward: [(0, '12.773')] +[2025-02-20 15:26:31,434][02587] Updated weights for policy 0, policy_version 470 (0.0013) +[2025-02-20 15:26:32,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1925120. Throughput: 0: 962.3. Samples: 480856. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:26:32,152][00180] Avg episode reward: [(0, '13.387')] +[2025-02-20 15:26:32,162][02568] Saving new best policy, reward=13.387! +[2025-02-20 15:26:37,147][00180] Fps is (10 sec: 4096.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1949696. Throughput: 0: 992.9. Samples: 487412. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:26:37,149][00180] Avg episode reward: [(0, '14.073')] +[2025-02-20 15:26:37,150][02568] Saving new best policy, reward=14.073! +[2025-02-20 15:26:41,339][02587] Updated weights for policy 0, policy_version 480 (0.0013) +[2025-02-20 15:26:42,148][00180] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1966080. Throughput: 0: 992.6. Samples: 490676. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:26:42,157][00180] Avg episode reward: [(0, '14.569')] +[2025-02-20 15:26:42,163][02568] Saving new best policy, reward=14.569! +[2025-02-20 15:26:47,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3873.8). Total num frames: 1986560. Throughput: 0: 959.7. Samples: 495524. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:26:47,149][00180] Avg episode reward: [(0, '14.224')] +[2025-02-20 15:26:51,605][02587] Updated weights for policy 0, policy_version 490 (0.0014) +[2025-02-20 15:26:52,148][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2007040. Throughput: 0: 988.5. Samples: 502124. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:26:52,154][00180] Avg episode reward: [(0, '13.910')] +[2025-02-20 15:26:57,151][00180] Fps is (10 sec: 3685.1, 60 sec: 3891.0, 300 sec: 3859.9). Total num frames: 2023424. Throughput: 0: 978.4. Samples: 504910. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:26:57,152][00180] Avg episode reward: [(0, '13.442')] +[2025-02-20 15:27:02,147][00180] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2043904. Throughput: 0: 985.1. Samples: 510524. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:27:02,149][00180] Avg episode reward: [(0, '13.568')] +[2025-02-20 15:27:02,394][02587] Updated weights for policy 0, policy_version 500 (0.0016) +[2025-02-20 15:27:07,148][00180] Fps is (10 sec: 4507.1, 60 sec: 4027.7, 300 sec: 3901.7). Total num frames: 2068480. Throughput: 0: 995.8. Samples: 517176. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:27:07,149][00180] Avg episode reward: [(0, '13.395')] +[2025-02-20 15:27:12,148][00180] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2084864. Throughput: 0: 972.3. Samples: 519418. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:27:12,150][00180] Avg episode reward: [(0, '13.067')] +[2025-02-20 15:27:13,006][02587] Updated weights for policy 0, policy_version 510 (0.0015) +[2025-02-20 15:27:17,147][00180] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2105344. Throughput: 0: 994.1. Samples: 525590. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:27:17,150][00180] Avg episode reward: [(0, '13.113')] +[2025-02-20 15:27:22,147][00180] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2125824. Throughput: 0: 984.6. Samples: 531720. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:27:22,150][00180] Avg episode reward: [(0, '12.646')] +[2025-02-20 15:27:22,159][02568] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000519_2125824.pth... +[2025-02-20 15:27:22,282][02568] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000291_1191936.pth +[2025-02-20 15:27:23,367][02587] Updated weights for policy 0, policy_version 520 (0.0020) +[2025-02-20 15:27:27,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3873.8). Total num frames: 2142208. Throughput: 0: 961.9. Samples: 533960. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:27:27,152][00180] Avg episode reward: [(0, '13.363')] +[2025-02-20 15:27:32,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 2166784. Throughput: 0: 1000.2. Samples: 540534. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:27:32,152][00180] Avg episode reward: [(0, '15.526')] +[2025-02-20 15:27:32,159][02568] Saving new best policy, reward=15.526! +[2025-02-20 15:27:33,170][02587] Updated weights for policy 0, policy_version 530 (0.0016) +[2025-02-20 15:27:37,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2183168. Throughput: 0: 974.5. Samples: 545978. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:27:37,149][00180] Avg episode reward: [(0, '16.515')] +[2025-02-20 15:27:37,151][02568] Saving new best policy, reward=16.515! +[2025-02-20 15:27:42,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2203648. Throughput: 0: 978.4. Samples: 548934. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:27:42,151][00180] Avg episode reward: [(0, '18.135')] +[2025-02-20 15:27:42,162][02568] Saving new best policy, reward=18.135! +[2025-02-20 15:27:43,846][02587] Updated weights for policy 0, policy_version 540 (0.0019) +[2025-02-20 15:27:47,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2224128. Throughput: 0: 996.4. Samples: 555364. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:27:47,153][00180] Avg episode reward: [(0, '18.242')] +[2025-02-20 15:27:47,159][02568] Saving new best policy, reward=18.242! +[2025-02-20 15:27:52,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2240512. Throughput: 0: 961.4. Samples: 560440. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:27:52,153][00180] Avg episode reward: [(0, '17.308')] +[2025-02-20 15:27:54,569][02587] Updated weights for policy 0, policy_version 550 (0.0017) +[2025-02-20 15:27:57,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 3873.8). Total num frames: 2260992. Throughput: 0: 985.4. Samples: 563760. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:27:57,152][00180] Avg episode reward: [(0, '16.516')] +[2025-02-20 15:28:02,150][00180] Fps is (10 sec: 4094.8, 60 sec: 3959.3, 300 sec: 3887.7). Total num frames: 2281472. Throughput: 0: 998.8. Samples: 570538. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:28:02,153][00180] Avg episode reward: [(0, '15.679')] +[2025-02-20 15:28:04,989][02587] Updated weights for policy 0, policy_version 560 (0.0023) +[2025-02-20 15:28:07,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2301952. Throughput: 0: 976.3. Samples: 575652. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:28:07,153][00180] Avg episode reward: [(0, '15.343')] +[2025-02-20 15:28:12,148][00180] Fps is (10 sec: 4097.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2322432. Throughput: 0: 1001.4. Samples: 579022. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:28:12,150][00180] Avg episode reward: [(0, '15.721')] +[2025-02-20 15:28:14,360][02587] Updated weights for policy 0, policy_version 570 (0.0017) +[2025-02-20 15:28:17,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2338816. Throughput: 0: 987.0. Samples: 584948. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:28:17,151][00180] Avg episode reward: [(0, '14.711')] +[2025-02-20 15:28:22,147][00180] Fps is (10 sec: 3686.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2359296. Throughput: 0: 994.1. Samples: 590714. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:28:22,152][00180] Avg episode reward: [(0, '15.131')] +[2025-02-20 15:28:25,077][02587] Updated weights for policy 0, policy_version 580 (0.0014) +[2025-02-20 15:28:27,147][00180] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 2383872. Throughput: 0: 1000.1. Samples: 593940. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:28:27,154][00180] Avg episode reward: [(0, '15.950')] +[2025-02-20 15:28:32,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2400256. Throughput: 0: 975.0. Samples: 599238. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:28:32,149][00180] Avg episode reward: [(0, '16.633')] +[2025-02-20 15:28:35,679][02587] Updated weights for policy 0, policy_version 590 (0.0013) +[2025-02-20 15:28:37,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2420736. Throughput: 0: 1005.2. Samples: 605674. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:28:37,149][00180] Avg episode reward: [(0, '18.237')] +[2025-02-20 15:28:42,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2441216. Throughput: 0: 1006.0. Samples: 609030. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:28:42,149][00180] Avg episode reward: [(0, '18.677')] +[2025-02-20 15:28:42,160][02568] Saving new best policy, reward=18.677! +[2025-02-20 15:28:46,452][02587] Updated weights for policy 0, policy_version 600 (0.0012) +[2025-02-20 15:28:47,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2457600. Throughput: 0: 961.8. Samples: 613816. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:28:47,150][00180] Avg episode reward: [(0, '19.585')] +[2025-02-20 15:28:47,157][02568] Saving new best policy, reward=19.585! +[2025-02-20 15:28:52,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2478080. Throughput: 0: 993.6. Samples: 620364. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:28:52,151][00180] Avg episode reward: [(0, '18.923')] +[2025-02-20 15:28:56,560][02587] Updated weights for policy 0, policy_version 610 (0.0013) +[2025-02-20 15:28:57,149][00180] Fps is (10 sec: 4095.2, 60 sec: 3959.3, 300 sec: 3901.6). Total num frames: 2498560. Throughput: 0: 991.8. Samples: 623654. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:28:57,153][00180] Avg episode reward: [(0, '17.445')] +[2025-02-20 15:29:02,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3901.6). Total num frames: 2519040. Throughput: 0: 974.5. Samples: 628802. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:29:02,152][00180] Avg episode reward: [(0, '17.667')] +[2025-02-20 15:29:06,471][02587] Updated weights for policy 0, policy_version 620 (0.0023) +[2025-02-20 15:29:07,147][00180] Fps is (10 sec: 4096.8, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2539520. Throughput: 0: 992.9. Samples: 635396. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:29:07,153][00180] Avg episode reward: [(0, '18.454')] +[2025-02-20 15:29:12,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 2555904. Throughput: 0: 982.8. Samples: 638168. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:29:12,153][00180] Avg episode reward: [(0, '18.642')] +[2025-02-20 15:29:17,152][00180] Fps is (10 sec: 3684.8, 60 sec: 3959.2, 300 sec: 3901.6). Total num frames: 2576384. Throughput: 0: 992.2. Samples: 643892. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:29:17,153][00180] Avg episode reward: [(0, '19.956')] +[2025-02-20 15:29:17,182][02587] Updated weights for policy 0, policy_version 630 (0.0025) +[2025-02-20 15:29:17,189][02568] Saving new best policy, reward=19.956! +[2025-02-20 15:29:22,147][00180] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 2600960. Throughput: 0: 995.6. Samples: 650476. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:29:22,153][00180] Avg episode reward: [(0, '19.754')] +[2025-02-20 15:29:22,160][02568] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000635_2600960.pth... +[2025-02-20 15:29:22,299][02568] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000404_1654784.pth +[2025-02-20 15:29:27,147][00180] Fps is (10 sec: 4097.8, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2617344. Throughput: 0: 966.7. Samples: 652532. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:29:27,149][00180] Avg episode reward: [(0, '19.298')] +[2025-02-20 15:29:27,733][02587] Updated weights for policy 0, policy_version 640 (0.0015) +[2025-02-20 15:29:32,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2637824. Throughput: 0: 1001.9. Samples: 658902. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:29:32,149][00180] Avg episode reward: [(0, '19.042')] +[2025-02-20 15:29:37,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2658304. Throughput: 0: 984.8. Samples: 664682. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:29:37,149][00180] Avg episode reward: [(0, '19.184')] +[2025-02-20 15:29:38,590][02587] Updated weights for policy 0, policy_version 650 (0.0013) +[2025-02-20 15:29:42,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2674688. Throughput: 0: 967.8. Samples: 667202. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:29:42,153][00180] Avg episode reward: [(0, '19.194')] +[2025-02-20 15:29:47,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2699264. Throughput: 0: 998.7. Samples: 673742. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:29:47,155][00180] Avg episode reward: [(0, '19.127')] +[2025-02-20 15:29:47,912][02587] Updated weights for policy 0, policy_version 660 (0.0017) +[2025-02-20 15:29:52,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2715648. Throughput: 0: 968.7. Samples: 678988. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:29:52,152][00180] Avg episode reward: [(0, '19.179')] +[2025-02-20 15:29:57,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3915.5). Total num frames: 2736128. Throughput: 0: 977.6. Samples: 682162. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:29:57,148][00180] Avg episode reward: [(0, '19.172')] +[2025-02-20 15:29:58,469][02587] Updated weights for policy 0, policy_version 670 (0.0014) +[2025-02-20 15:30:02,149][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2756608. Throughput: 0: 999.7. Samples: 688874. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:30:02,151][00180] Avg episode reward: [(0, '19.676')] +[2025-02-20 15:30:07,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2777088. Throughput: 0: 968.5. Samples: 694060. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:30:07,148][00180] Avg episode reward: [(0, '19.876')] +[2025-02-20 15:30:09,042][02587] Updated weights for policy 0, policy_version 680 (0.0012) +[2025-02-20 15:30:12,147][00180] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 2797568. Throughput: 0: 998.4. Samples: 697458. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:30:12,152][00180] Avg episode reward: [(0, '20.916')] +[2025-02-20 15:30:12,162][02568] Saving new best policy, reward=20.916! +[2025-02-20 15:30:17,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3959.8, 300 sec: 3929.4). Total num frames: 2813952. Throughput: 0: 996.8. Samples: 703760. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:30:17,152][00180] Avg episode reward: [(0, '20.338')] +[2025-02-20 15:30:19,659][02587] Updated weights for policy 0, policy_version 690 (0.0013) +[2025-02-20 15:30:22,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2834432. Throughput: 0: 988.8. Samples: 709180. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:30:22,151][00180] Avg episode reward: [(0, '19.911')] +[2025-02-20 15:30:27,147][00180] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2859008. Throughput: 0: 1007.1. Samples: 712522. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:30:27,148][00180] Avg episode reward: [(0, '19.865')] +[2025-02-20 15:30:29,041][02587] Updated weights for policy 0, policy_version 700 (0.0015) +[2025-02-20 15:30:32,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2875392. Throughput: 0: 989.9. Samples: 718286. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:30:32,152][00180] Avg episode reward: [(0, '19.514')] +[2025-02-20 15:30:37,148][00180] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2895872. Throughput: 0: 1009.7. Samples: 724424. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-02-20 15:30:37,149][00180] Avg episode reward: [(0, '18.526')] +[2025-02-20 15:30:39,343][02587] Updated weights for policy 0, policy_version 710 (0.0014) +[2025-02-20 15:30:42,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2916352. Throughput: 0: 1013.7. Samples: 727780. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:30:42,150][00180] Avg episode reward: [(0, '17.901')] +[2025-02-20 15:30:47,147][00180] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2932736. Throughput: 0: 979.1. Samples: 732934. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:30:47,157][00180] Avg episode reward: [(0, '16.286')] +[2025-02-20 15:30:49,931][02587] Updated weights for policy 0, policy_version 720 (0.0020) +[2025-02-20 15:30:52,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2957312. Throughput: 0: 1013.0. Samples: 739644. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:30:52,148][00180] Avg episode reward: [(0, '15.584')] +[2025-02-20 15:30:57,147][00180] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2977792. Throughput: 0: 1009.7. Samples: 742894. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:30:57,148][00180] Avg episode reward: [(0, '16.262')] +[2025-02-20 15:31:00,473][02587] Updated weights for policy 0, policy_version 730 (0.0015) +[2025-02-20 15:31:02,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2994176. Throughput: 0: 986.1. Samples: 748136. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:31:02,151][00180] Avg episode reward: [(0, '16.799')] +[2025-02-20 15:31:07,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3018752. Throughput: 0: 1015.7. Samples: 754886. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:31:07,156][00180] Avg episode reward: [(0, '17.483')] +[2025-02-20 15:31:10,033][02587] Updated weights for policy 0, policy_version 740 (0.0014) +[2025-02-20 15:31:12,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3035136. Throughput: 0: 1005.6. Samples: 757774. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-02-20 15:31:12,152][00180] Avg episode reward: [(0, '18.125')] +[2025-02-20 15:31:17,147][00180] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3055616. Throughput: 0: 1000.4. Samples: 763306. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:31:17,148][00180] Avg episode reward: [(0, '18.876')] +[2025-02-20 15:31:20,199][02587] Updated weights for policy 0, policy_version 750 (0.0012) +[2025-02-20 15:31:22,151][00180] Fps is (10 sec: 4504.0, 60 sec: 4095.8, 300 sec: 3971.0). Total num frames: 3080192. Throughput: 0: 1015.0. Samples: 770104. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:31:22,152][00180] Avg episode reward: [(0, '17.721')] +[2025-02-20 15:31:22,164][02568] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000752_3080192.pth... +[2025-02-20 15:31:22,284][02568] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000519_2125824.pth +[2025-02-20 15:31:27,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3096576. Throughput: 0: 988.6. Samples: 772266. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:31:27,152][00180] Avg episode reward: [(0, '18.293')] +[2025-02-20 15:31:30,649][02587] Updated weights for policy 0, policy_version 760 (0.0016) +[2025-02-20 15:31:32,147][00180] Fps is (10 sec: 3687.7, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3117056. Throughput: 0: 1014.2. Samples: 778574. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:31:32,152][00180] Avg episode reward: [(0, '18.111')] +[2025-02-20 15:31:37,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3137536. Throughput: 0: 1004.0. Samples: 784822. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:31:37,151][00180] Avg episode reward: [(0, '18.205')] +[2025-02-20 15:31:41,106][02587] Updated weights for policy 0, policy_version 770 (0.0013) +[2025-02-20 15:31:42,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3158016. Throughput: 0: 985.8. Samples: 787256. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:31:42,153][00180] Avg episode reward: [(0, '18.499')] +[2025-02-20 15:31:47,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 3178496. Throughput: 0: 1016.2. Samples: 793864. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:31:47,150][00180] Avg episode reward: [(0, '17.418')] +[2025-02-20 15:31:51,112][02587] Updated weights for policy 0, policy_version 780 (0.0017) +[2025-02-20 15:31:52,150][00180] Fps is (10 sec: 3685.5, 60 sec: 3959.3, 300 sec: 3971.1). Total num frames: 3194880. Throughput: 0: 989.4. Samples: 799410. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:31:52,151][00180] Avg episode reward: [(0, '18.595')] +[2025-02-20 15:31:57,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3219456. Throughput: 0: 994.2. Samples: 802512. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:31:57,148][00180] Avg episode reward: [(0, '19.223')] +[2025-02-20 15:32:00,852][02587] Updated weights for policy 0, policy_version 790 (0.0015) +[2025-02-20 15:32:02,147][00180] Fps is (10 sec: 4506.7, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 3239936. Throughput: 0: 1020.5. Samples: 809230. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:32:02,153][00180] Avg episode reward: [(0, '19.062')] +[2025-02-20 15:32:07,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3256320. Throughput: 0: 984.8. Samples: 814418. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:32:07,152][00180] Avg episode reward: [(0, '19.556')] +[2025-02-20 15:32:11,225][02587] Updated weights for policy 0, policy_version 800 (0.0012) +[2025-02-20 15:32:12,147][00180] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3276800. Throughput: 0: 1012.1. Samples: 817812. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:32:12,153][00180] Avg episode reward: [(0, '20.351')] +[2025-02-20 15:32:17,148][00180] Fps is (10 sec: 4095.7, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3297280. Throughput: 0: 1017.8. Samples: 824376. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:32:17,153][00180] Avg episode reward: [(0, '21.801')] +[2025-02-20 15:32:17,155][02568] Saving new best policy, reward=21.801! +[2025-02-20 15:32:21,930][02587] Updated weights for policy 0, policy_version 810 (0.0012) +[2025-02-20 15:32:22,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3984.9). Total num frames: 3317760. Throughput: 0: 995.3. Samples: 829612. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:32:22,152][00180] Avg episode reward: [(0, '21.944')] +[2025-02-20 15:32:22,157][02568] Saving new best policy, reward=21.944! +[2025-02-20 15:32:27,147][00180] Fps is (10 sec: 4096.2, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3338240. Throughput: 0: 1012.6. Samples: 832824. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:32:27,154][00180] Avg episode reward: [(0, '22.909')] +[2025-02-20 15:32:27,158][02568] Saving new best policy, reward=22.909! +[2025-02-20 15:32:32,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3354624. Throughput: 0: 996.4. Samples: 838702. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:32:32,154][00180] Avg episode reward: [(0, '22.709')] +[2025-02-20 15:32:32,454][02587] Updated weights for policy 0, policy_version 820 (0.0014) +[2025-02-20 15:32:37,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3379200. Throughput: 0: 1007.9. Samples: 844762. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:32:37,151][00180] Avg episode reward: [(0, '22.280')] +[2025-02-20 15:32:41,690][02587] Updated weights for policy 0, policy_version 830 (0.0016) +[2025-02-20 15:32:42,147][00180] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3399680. Throughput: 0: 1012.3. Samples: 848066. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:32:42,151][00180] Avg episode reward: [(0, '19.997')] +[2025-02-20 15:32:47,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3416064. Throughput: 0: 977.5. Samples: 853216. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:32:47,149][00180] Avg episode reward: [(0, '19.494')] +[2025-02-20 15:32:52,147][00180] Fps is (10 sec: 3686.4, 60 sec: 4027.9, 300 sec: 3984.9). Total num frames: 3436544. Throughput: 0: 1009.5. Samples: 859846. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:32:52,154][00180] Avg episode reward: [(0, '19.199')] +[2025-02-20 15:32:52,274][02587] Updated weights for policy 0, policy_version 840 (0.0013) +[2025-02-20 15:32:57,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3985.0). Total num frames: 3457024. Throughput: 0: 1008.8. Samples: 863208. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:32:57,151][00180] Avg episode reward: [(0, '19.821')] +[2025-02-20 15:33:02,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3477504. Throughput: 0: 979.5. Samples: 868454. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:33:02,152][00180] Avg episode reward: [(0, '19.310')] +[2025-02-20 15:33:02,914][02587] Updated weights for policy 0, policy_version 850 (0.0017) +[2025-02-20 15:33:07,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3497984. Throughput: 0: 1008.8. Samples: 875008. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:33:07,149][00180] Avg episode reward: [(0, '18.429')] +[2025-02-20 15:33:12,149][00180] Fps is (10 sec: 3685.9, 60 sec: 3959.4, 300 sec: 3984.9). Total num frames: 3514368. Throughput: 0: 1005.2. Samples: 878060. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:33:12,150][00180] Avg episode reward: [(0, '18.436')] +[2025-02-20 15:33:13,314][02587] Updated weights for policy 0, policy_version 860 (0.0014) +[2025-02-20 15:33:17,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3998.8). Total num frames: 3538944. Throughput: 0: 1000.4. Samples: 883720. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:33:17,149][00180] Avg episode reward: [(0, '18.480')] +[2025-02-20 15:33:22,147][00180] Fps is (10 sec: 4506.3, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3559424. Throughput: 0: 1010.8. Samples: 890248. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:33:22,149][00180] Avg episode reward: [(0, '18.876')] +[2025-02-20 15:33:22,156][02568] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000869_3559424.pth... +[2025-02-20 15:33:22,256][02568] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000635_2600960.pth +[2025-02-20 15:33:22,867][02587] Updated weights for policy 0, policy_version 870 (0.0013) +[2025-02-20 15:33:27,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3575808. Throughput: 0: 987.0. Samples: 892480. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:33:27,149][00180] Avg episode reward: [(0, '20.257')] +[2025-02-20 15:33:32,147][00180] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3596288. Throughput: 0: 1011.2. Samples: 898718. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:33:32,153][00180] Avg episode reward: [(0, '20.144')] +[2025-02-20 15:33:33,232][02587] Updated weights for policy 0, policy_version 880 (0.0015) +[2025-02-20 15:33:37,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3616768. Throughput: 0: 1002.5. Samples: 904960. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:33:37,149][00180] Avg episode reward: [(0, '21.560')] +[2025-02-20 15:33:42,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3637248. Throughput: 0: 981.5. Samples: 907374. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:33:42,149][00180] Avg episode reward: [(0, '22.569')] +[2025-02-20 15:33:43,542][02587] Updated weights for policy 0, policy_version 890 (0.0019) +[2025-02-20 15:33:47,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3657728. Throughput: 0: 1014.4. Samples: 914102. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:33:47,150][00180] Avg episode reward: [(0, '21.789')] +[2025-02-20 15:33:52,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3678208. Throughput: 0: 991.0. Samples: 919604. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:33:52,149][00180] Avg episode reward: [(0, '22.052')] +[2025-02-20 15:33:54,065][02587] Updated weights for policy 0, policy_version 900 (0.0017) +[2025-02-20 15:33:57,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3698688. Throughput: 0: 991.1. Samples: 922658. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:33:57,149][00180] Avg episode reward: [(0, '23.886')] +[2025-02-20 15:33:57,151][02568] Saving new best policy, reward=23.886! +[2025-02-20 15:34:02,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3719168. Throughput: 0: 1011.6. Samples: 929240. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:34:02,148][00180] Avg episode reward: [(0, '23.928')] +[2025-02-20 15:34:02,157][02568] Saving new best policy, reward=23.928! +[2025-02-20 15:34:04,151][02587] Updated weights for policy 0, policy_version 910 (0.0015) +[2025-02-20 15:34:07,147][00180] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3735552. Throughput: 0: 981.1. Samples: 934396. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:34:07,153][00180] Avg episode reward: [(0, '23.537')] +[2025-02-20 15:34:12,149][00180] Fps is (10 sec: 4095.4, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 3760128. Throughput: 0: 1006.2. Samples: 937762. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:34:12,154][00180] Avg episode reward: [(0, '23.987')] +[2025-02-20 15:34:12,162][02568] Saving new best policy, reward=23.987! +[2025-02-20 15:34:13,920][02587] Updated weights for policy 0, policy_version 920 (0.0014) +[2025-02-20 15:34:17,147][00180] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3776512. Throughput: 0: 1013.8. Samples: 944338. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:34:17,151][00180] Avg episode reward: [(0, '25.309')] +[2025-02-20 15:34:17,153][02568] Saving new best policy, reward=25.309! +[2025-02-20 15:34:22,147][00180] Fps is (10 sec: 3686.9, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3796992. Throughput: 0: 990.2. Samples: 949518. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:34:22,150][00180] Avg episode reward: [(0, '23.941')] +[2025-02-20 15:34:24,576][02587] Updated weights for policy 0, policy_version 930 (0.0017) +[2025-02-20 15:34:27,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3817472. Throughput: 0: 1010.3. Samples: 952836. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:34:27,150][00180] Avg episode reward: [(0, '22.992')] +[2025-02-20 15:34:32,147][00180] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3837952. Throughput: 0: 994.6. Samples: 958858. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:34:32,152][00180] Avg episode reward: [(0, '23.529')] +[2025-02-20 15:34:35,038][02587] Updated weights for policy 0, policy_version 940 (0.0020) +[2025-02-20 15:34:37,149][00180] Fps is (10 sec: 4095.4, 60 sec: 4027.6, 300 sec: 4012.7). Total num frames: 3858432. Throughput: 0: 1004.2. Samples: 964796. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:34:37,152][00180] Avg episode reward: [(0, '22.178')] +[2025-02-20 15:34:42,149][00180] Fps is (10 sec: 4095.2, 60 sec: 4027.6, 300 sec: 3998.8). Total num frames: 3878912. Throughput: 0: 1009.1. Samples: 968068. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:34:42,151][00180] Avg episode reward: [(0, '22.369')] +[2025-02-20 15:34:45,280][02587] Updated weights for policy 0, policy_version 950 (0.0020) +[2025-02-20 15:34:47,148][00180] Fps is (10 sec: 3686.7, 60 sec: 3959.4, 300 sec: 3998.8). Total num frames: 3895296. Throughput: 0: 981.3. Samples: 973398. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:34:47,150][00180] Avg episode reward: [(0, '22.357')] +[2025-02-20 15:34:52,147][00180] Fps is (10 sec: 3687.1, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3915776. Throughput: 0: 996.8. Samples: 979252. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:34:52,150][00180] Avg episode reward: [(0, '23.583')] +[2025-02-20 15:34:55,700][02587] Updated weights for policy 0, policy_version 960 (0.0014) +[2025-02-20 15:34:57,147][00180] Fps is (10 sec: 4096.3, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3936256. Throughput: 0: 991.3. Samples: 982368. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:34:57,155][00180] Avg episode reward: [(0, '22.617')] +[2025-02-20 15:35:02,147][00180] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3971.0). Total num frames: 3948544. Throughput: 0: 942.1. Samples: 986734. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:35:02,149][00180] Avg episode reward: [(0, '23.423')] +[2025-02-20 15:35:07,147][00180] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3969024. Throughput: 0: 959.7. Samples: 992704. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-02-20 15:35:07,153][00180] Avg episode reward: [(0, '23.601')] +[2025-02-20 15:35:07,479][02587] Updated weights for policy 0, policy_version 970 (0.0023) +[2025-02-20 15:35:12,148][00180] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3971.0). Total num frames: 3985408. Throughput: 0: 955.2. Samples: 995822. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-02-20 15:35:12,149][00180] Avg episode reward: [(0, '24.387')] +[2025-02-20 15:35:16,748][02568] Stopping Batcher_0... +[2025-02-20 15:35:16,750][02568] Loop batcher_evt_loop terminating... +[2025-02-20 15:35:16,751][00180] Component Batcher_0 stopped! +[2025-02-20 15:35:16,756][00180] Component RolloutWorker_w3 process died already! Don't wait for it. +[2025-02-20 15:35:16,759][00180] Component RolloutWorker_w4 process died already! Don't wait for it. +[2025-02-20 15:35:16,763][00180] Component RolloutWorker_w5 process died already! Don't wait for it. +[2025-02-20 15:35:16,768][00180] Component RolloutWorker_w6 process died already! Don't wait for it. +[2025-02-20 15:35:16,770][02568] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-02-20 15:35:16,820][02587] Weights refcount: 2 0 +[2025-02-20 15:35:16,823][00180] Component InferenceWorker_p0-w0 stopped! +[2025-02-20 15:35:16,827][02587] Stopping InferenceWorker_p0-w0... +[2025-02-20 15:35:16,827][02587] Loop inference_proc0-0_evt_loop terminating... +[2025-02-20 15:35:16,882][02568] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000752_3080192.pth +[2025-02-20 15:35:16,893][02568] Saving new best policy, reward=25.548! +[2025-02-20 15:35:17,004][02568] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-02-20 15:35:17,059][00180] Component RolloutWorker_w7 stopped! +[2025-02-20 15:35:17,068][02589] Stopping RolloutWorker_w7... +[2025-02-20 15:35:17,068][02589] Loop rollout_proc7_evt_loop terminating... +[2025-02-20 15:35:17,142][00180] Component RolloutWorker_w1 stopped! +[2025-02-20 15:35:17,150][02582] Stopping RolloutWorker_w1... +[2025-02-20 15:35:17,154][02582] Loop rollout_proc1_evt_loop terminating... +[2025-02-20 15:35:17,225][00180] Component LearnerWorker_p0 stopped! +[2025-02-20 15:35:17,226][02568] Stopping LearnerWorker_p0... +[2025-02-20 15:35:17,227][02568] Loop learner_proc0_evt_loop terminating... +[2025-02-20 15:35:17,290][00180] Component RolloutWorker_w2 stopped! +[2025-02-20 15:35:17,292][02583] Stopping RolloutWorker_w2... +[2025-02-20 15:35:17,295][02583] Loop rollout_proc2_evt_loop terminating... +[2025-02-20 15:35:17,318][00180] Component RolloutWorker_w0 stopped! +[2025-02-20 15:35:17,319][00180] Waiting for process learner_proc0 to stop... +[2025-02-20 15:35:17,321][02581] Stopping RolloutWorker_w0... +[2025-02-20 15:35:17,328][02581] Loop rollout_proc0_evt_loop terminating... +[2025-02-20 15:35:18,775][00180] Waiting for process inference_proc0-0 to join... +[2025-02-20 15:35:18,777][00180] Waiting for process rollout_proc0 to join... +[2025-02-20 15:35:19,532][00180] Waiting for process rollout_proc1 to join... +[2025-02-20 15:35:19,533][00180] Waiting for process rollout_proc2 to join... +[2025-02-20 15:35:19,535][00180] Waiting for process rollout_proc3 to join... +[2025-02-20 15:35:19,535][00180] Waiting for process rollout_proc4 to join... +[2025-02-20 15:35:19,536][00180] Waiting for process rollout_proc5 to join... +[2025-02-20 15:35:19,537][00180] Waiting for process rollout_proc6 to join... +[2025-02-20 15:35:19,538][00180] Waiting for process rollout_proc7 to join... +[2025-02-20 15:35:19,538][00180] Batcher 0 profile tree view: +batching: 21.5800, releasing_batches: 0.0255 +[2025-02-20 15:35:19,539][00180] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0032 + wait_policy_total: 397.6986 +update_model: 9.1847 + weight_update: 0.0012 +one_step: 0.0031 + handle_policy_step: 593.7209 + deserialize: 14.4450, stack: 3.6127, obs_to_device_normalize: 133.5992, forward: 309.0722, send_messages: 22.6678 + prepare_outputs: 83.7121 + to_cpu: 52.6604 +[2025-02-20 15:35:19,540][00180] Learner 0 profile tree view: +misc: 0.0052, prepare_batch: 12.1769 +train: 65.6212 + epoch_init: 0.0066, minibatch_init: 0.0054, losses_postprocess: 0.6045, kl_divergence: 0.5382, after_optimizer: 31.9516 + calculate_losses: 21.9046 + losses_init: 0.0033, forward_head: 1.1552, bptt_initial: 15.0224, tail: 0.8407, advantages_returns: 0.2030, losses: 2.8100 + bptt: 1.6718 + bptt_forward_core: 1.6066 + update: 10.1410 + clip: 0.8594 +[2025-02-20 15:35:19,541][00180] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.3875, enqueue_policy_requests: 144.8687, env_step: 724.4136, overhead: 16.7270, complete_rollouts: 5.8104 +save_policy_outputs: 23.9879 + split_output_tensors: 9.2155 +[2025-02-20 15:35:19,542][00180] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.3865, enqueue_policy_requests: 139.8492, env_step: 722.1495, overhead: 17.7005, complete_rollouts: 7.3891 +save_policy_outputs: 25.0181 + split_output_tensors: 9.4871 +[2025-02-20 15:35:19,543][00180] Loop Runner_EvtLoop terminating... +[2025-02-20 15:35:19,544][00180] Runner profile tree view: +main_loop: 1064.4969 +[2025-02-20 15:35:19,545][00180] Collected {0: 4005888}, FPS: 3763.2 +[2025-02-20 15:35:20,068][00180] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-02-20 15:35:20,070][00180] Overriding arg 'num_workers' with value 1 passed from command line +[2025-02-20 15:35:20,070][00180] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-02-20 15:35:20,072][00180] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-02-20 15:35:20,073][00180] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-02-20 15:35:20,074][00180] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-02-20 15:35:20,075][00180] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2025-02-20 15:35:20,076][00180] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-02-20 15:35:20,077][00180] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2025-02-20 15:35:20,077][00180] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2025-02-20 15:35:20,079][00180] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-02-20 15:35:20,080][00180] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-02-20 15:35:20,082][00180] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-02-20 15:35:20,083][00180] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-02-20 15:35:20,084][00180] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-02-20 15:35:20,141][00180] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-20 15:35:20,148][00180] RunningMeanStd input shape: (3, 72, 128) +[2025-02-20 15:35:20,153][00180] RunningMeanStd input shape: (1,) +[2025-02-20 15:35:20,190][00180] ConvEncoder: input_channels=3 +[2025-02-20 15:35:20,316][00180] Conv encoder output size: 512 +[2025-02-20 15:35:20,317][00180] Policy head output size: 512 +[2025-02-20 15:35:20,488][00180] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-02-20 15:35:21,215][00180] Num frames 100... +[2025-02-20 15:35:21,344][00180] Num frames 200... +[2025-02-20 15:35:21,483][00180] Num frames 300... +[2025-02-20 15:35:21,612][00180] Num frames 400... +[2025-02-20 15:35:21,740][00180] Num frames 500... +[2025-02-20 15:35:21,869][00180] Num frames 600... +[2025-02-20 15:35:21,999][00180] Num frames 700... +[2025-02-20 15:35:22,129][00180] Num frames 800... +[2025-02-20 15:35:22,257][00180] Num frames 900... +[2025-02-20 15:35:22,390][00180] Num frames 1000... +[2025-02-20 15:35:22,527][00180] Num frames 1100... +[2025-02-20 15:35:22,658][00180] Num frames 1200... +[2025-02-20 15:35:22,787][00180] Num frames 1300... +[2025-02-20 15:35:22,916][00180] Num frames 1400... +[2025-02-20 15:35:23,046][00180] Num frames 1500... +[2025-02-20 15:35:23,107][00180] Avg episode rewards: #0: 36.040, true rewards: #0: 15.040 +[2025-02-20 15:35:23,108][00180] Avg episode reward: 36.040, avg true_objective: 15.040 +[2025-02-20 15:35:23,298][00180] Num frames 1600... +[2025-02-20 15:35:23,496][00180] Num frames 1700... +[2025-02-20 15:35:23,677][00180] Num frames 1800... +[2025-02-20 15:35:23,847][00180] Num frames 1900... +[2025-02-20 15:35:24,020][00180] Num frames 2000... +[2025-02-20 15:35:24,194][00180] Num frames 2100... +[2025-02-20 15:35:24,360][00180] Num frames 2200... +[2025-02-20 15:35:24,537][00180] Num frames 2300... +[2025-02-20 15:35:24,723][00180] Num frames 2400... +[2025-02-20 15:35:24,908][00180] Num frames 2500... +[2025-02-20 15:35:25,079][00180] Num frames 2600... +[2025-02-20 15:35:25,208][00180] Num frames 2700... +[2025-02-20 15:35:25,339][00180] Num frames 2800... +[2025-02-20 15:35:25,468][00180] Num frames 2900... +[2025-02-20 15:35:25,597][00180] Num frames 3000... +[2025-02-20 15:35:25,732][00180] Num frames 3100... +[2025-02-20 15:35:25,897][00180] Avg episode rewards: #0: 38.935, true rewards: #0: 15.935 +[2025-02-20 15:35:25,898][00180] Avg episode reward: 38.935, avg true_objective: 15.935 +[2025-02-20 15:35:25,917][00180] Num frames 3200... +[2025-02-20 15:35:26,042][00180] Num frames 3300... +[2025-02-20 15:35:26,171][00180] Num frames 3400... +[2025-02-20 15:35:26,297][00180] Num frames 3500... +[2025-02-20 15:35:26,425][00180] Num frames 3600... +[2025-02-20 15:35:26,551][00180] Num frames 3700... +[2025-02-20 15:35:26,687][00180] Num frames 3800... +[2025-02-20 15:35:26,814][00180] Num frames 3900... +[2025-02-20 15:35:26,941][00180] Num frames 4000... +[2025-02-20 15:35:27,069][00180] Num frames 4100... +[2025-02-20 15:35:27,147][00180] Avg episode rewards: #0: 33.390, true rewards: #0: 13.723 +[2025-02-20 15:35:27,148][00180] Avg episode reward: 33.390, avg true_objective: 13.723 +[2025-02-20 15:35:27,254][00180] Num frames 4200... +[2025-02-20 15:35:27,383][00180] Num frames 4300... +[2025-02-20 15:35:27,512][00180] Num frames 4400... +[2025-02-20 15:35:27,641][00180] Num frames 4500... +[2025-02-20 15:35:27,780][00180] Num frames 4600... +[2025-02-20 15:35:27,911][00180] Num frames 4700... +[2025-02-20 15:35:28,039][00180] Num frames 4800... +[2025-02-20 15:35:28,172][00180] Num frames 4900... +[2025-02-20 15:35:28,302][00180] Num frames 5000... +[2025-02-20 15:35:28,434][00180] Num frames 5100... +[2025-02-20 15:35:28,565][00180] Num frames 5200... +[2025-02-20 15:35:28,736][00180] Avg episode rewards: #0: 31.220, true rewards: #0: 13.220 +[2025-02-20 15:35:28,737][00180] Avg episode reward: 31.220, avg true_objective: 13.220 +[2025-02-20 15:35:28,755][00180] Num frames 5300... +[2025-02-20 15:35:28,887][00180] Num frames 5400... +[2025-02-20 15:35:29,014][00180] Num frames 5500... +[2025-02-20 15:35:29,147][00180] Num frames 5600... +[2025-02-20 15:35:29,276][00180] Num frames 5700... +[2025-02-20 15:35:29,404][00180] Num frames 5800... +[2025-02-20 15:35:29,530][00180] Num frames 5900... +[2025-02-20 15:35:29,659][00180] Num frames 6000... +[2025-02-20 15:35:29,792][00180] Avg episode rewards: #0: 27.912, true rewards: #0: 12.112 +[2025-02-20 15:35:29,793][00180] Avg episode reward: 27.912, avg true_objective: 12.112 +[2025-02-20 15:35:29,855][00180] Num frames 6100... +[2025-02-20 15:35:29,992][00180] Num frames 6200... +[2025-02-20 15:35:30,127][00180] Num frames 6300... +[2025-02-20 15:35:30,258][00180] Num frames 6400... +[2025-02-20 15:35:30,385][00180] Num frames 6500... +[2025-02-20 15:35:30,514][00180] Num frames 6600... +[2025-02-20 15:35:30,645][00180] Num frames 6700... +[2025-02-20 15:35:30,790][00180] Num frames 6800... +[2025-02-20 15:35:30,923][00180] Num frames 6900... +[2025-02-20 15:35:31,063][00180] Num frames 7000... +[2025-02-20 15:35:31,148][00180] Avg episode rewards: #0: 27.372, true rewards: #0: 11.705 +[2025-02-20 15:35:31,149][00180] Avg episode reward: 27.372, avg true_objective: 11.705 +[2025-02-20 15:35:31,250][00180] Num frames 7100... +[2025-02-20 15:35:31,385][00180] Num frames 7200... +[2025-02-20 15:35:31,512][00180] Num frames 7300... +[2025-02-20 15:35:31,647][00180] Num frames 7400... +[2025-02-20 15:35:31,787][00180] Num frames 7500... +[2025-02-20 15:35:31,916][00180] Num frames 7600... +[2025-02-20 15:35:32,046][00180] Num frames 7700... +[2025-02-20 15:35:32,180][00180] Num frames 7800... +[2025-02-20 15:35:32,309][00180] Num frames 7900... +[2025-02-20 15:35:32,435][00180] Num frames 8000... +[2025-02-20 15:35:32,560][00180] Num frames 8100... +[2025-02-20 15:35:32,686][00180] Num frames 8200... +[2025-02-20 15:35:32,817][00180] Num frames 8300... +[2025-02-20 15:35:32,957][00180] Avg episode rewards: #0: 27.810, true rewards: #0: 11.953 +[2025-02-20 15:35:32,958][00180] Avg episode reward: 27.810, avg true_objective: 11.953 +[2025-02-20 15:35:33,002][00180] Num frames 8400... +[2025-02-20 15:35:33,137][00180] Num frames 8500... +[2025-02-20 15:35:33,264][00180] Num frames 8600... +[2025-02-20 15:35:33,394][00180] Num frames 8700... +[2025-02-20 15:35:33,524][00180] Num frames 8800... +[2025-02-20 15:35:33,655][00180] Num frames 8900... +[2025-02-20 15:35:33,786][00180] Num frames 9000... +[2025-02-20 15:35:33,925][00180] Num frames 9100... +[2025-02-20 15:35:34,056][00180] Num frames 9200... +[2025-02-20 15:35:34,188][00180] Num frames 9300... +[2025-02-20 15:35:34,322][00180] Num frames 9400... +[2025-02-20 15:35:34,456][00180] Num frames 9500... +[2025-02-20 15:35:34,586][00180] Num frames 9600... +[2025-02-20 15:35:34,717][00180] Num frames 9700... +[2025-02-20 15:35:34,855][00180] Num frames 9800... +[2025-02-20 15:35:34,983][00180] Num frames 9900... +[2025-02-20 15:35:35,148][00180] Num frames 10000... +[2025-02-20 15:35:35,342][00180] Num frames 10100... +[2025-02-20 15:35:35,520][00180] Num frames 10200... +[2025-02-20 15:35:35,695][00180] Num frames 10300... +[2025-02-20 15:35:35,873][00180] Num frames 10400... +[2025-02-20 15:35:36,058][00180] Avg episode rewards: #0: 31.709, true rewards: #0: 13.084 +[2025-02-20 15:35:36,059][00180] Avg episode reward: 31.709, avg true_objective: 13.084 +[2025-02-20 15:35:36,120][00180] Num frames 10500... +[2025-02-20 15:35:36,290][00180] Num frames 10600... +[2025-02-20 15:35:36,471][00180] Num frames 10700... +[2025-02-20 15:35:36,649][00180] Num frames 10800... +[2025-02-20 15:35:36,828][00180] Num frames 10900... +[2025-02-20 15:35:37,008][00180] Num frames 11000... +[2025-02-20 15:35:37,141][00180] Num frames 11100... +[2025-02-20 15:35:37,274][00180] Num frames 11200... +[2025-02-20 15:35:37,421][00180] Avg episode rewards: #0: 30.074, true rewards: #0: 12.519 +[2025-02-20 15:35:37,422][00180] Avg episode reward: 30.074, avg true_objective: 12.519 +[2025-02-20 15:35:37,467][00180] Num frames 11300... +[2025-02-20 15:35:37,593][00180] Num frames 11400... +[2025-02-20 15:35:37,724][00180] Num frames 11500... +[2025-02-20 15:35:37,854][00180] Num frames 11600... +[2025-02-20 15:35:37,992][00180] Num frames 11700... +[2025-02-20 15:35:38,124][00180] Num frames 11800... +[2025-02-20 15:35:38,255][00180] Num frames 11900... +[2025-02-20 15:35:38,385][00180] Num frames 12000... +[2025-02-20 15:35:38,486][00180] Avg episode rewards: #0: 28.535, true rewards: #0: 12.035 +[2025-02-20 15:35:38,487][00180] Avg episode reward: 28.535, avg true_objective: 12.035 +[2025-02-20 15:36:49,264][00180] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2025-02-20 15:36:49,870][00180] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-02-20 15:36:49,875][00180] Overriding arg 'num_workers' with value 1 passed from command line +[2025-02-20 15:36:49,876][00180] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-02-20 15:36:49,876][00180] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-02-20 15:36:49,877][00180] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-02-20 15:36:49,878][00180] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-02-20 15:36:49,878][00180] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-02-20 15:36:49,879][00180] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-02-20 15:36:49,880][00180] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-02-20 15:36:49,880][00180] Adding new argument 'hf_repository'='Pie33000/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-02-20 15:36:49,885][00180] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-02-20 15:36:49,886][00180] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-02-20 15:36:49,887][00180] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-02-20 15:36:49,887][00180] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-02-20 15:36:49,888][00180] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-02-20 15:36:49,938][00180] RunningMeanStd input shape: (3, 72, 128) +[2025-02-20 15:36:49,940][00180] RunningMeanStd input shape: (1,) +[2025-02-20 15:36:49,962][00180] ConvEncoder: input_channels=3 +[2025-02-20 15:36:50,039][00180] Conv encoder output size: 512 +[2025-02-20 15:36:50,040][00180] Policy head output size: 512 +[2025-02-20 15:36:50,077][00180] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-02-20 15:36:50,842][00180] Num frames 100... +[2025-02-20 15:36:51,025][00180] Num frames 200... +[2025-02-20 15:36:51,225][00180] Num frames 300... +[2025-02-20 15:36:51,401][00180] Num frames 400... +[2025-02-20 15:36:51,598][00180] Num frames 500... +[2025-02-20 15:36:51,788][00180] Num frames 600... +[2025-02-20 15:36:52,000][00180] Num frames 700... +[2025-02-20 15:36:52,190][00180] Num frames 800... +[2025-02-20 15:36:52,406][00180] Num frames 900... +[2025-02-20 15:36:52,605][00180] Num frames 1000... +[2025-02-20 15:36:52,810][00180] Num frames 1100... +[2025-02-20 15:36:52,989][00180] Num frames 1200... +[2025-02-20 15:36:53,166][00180] Num frames 1300... +[2025-02-20 15:36:53,326][00180] Num frames 1400... +[2025-02-20 15:36:53,489][00180] Num frames 1500... +[2025-02-20 15:36:53,652][00180] Num frames 1600... +[2025-02-20 15:36:53,833][00180] Num frames 1700... +[2025-02-20 15:36:54,008][00180] Num frames 1800... +[2025-02-20 15:36:54,179][00180] Num frames 1900... +[2025-02-20 15:36:54,350][00180] Num frames 2000... +[2025-02-20 15:36:54,435][00180] Avg episode rewards: #0: 48.159, true rewards: #0: 20.160 +[2025-02-20 15:36:54,438][00180] Avg episode reward: 48.159, avg true_objective: 20.160 +[2025-02-20 15:36:54,583][00180] Num frames 2100... +[2025-02-20 15:36:54,752][00180] Num frames 2200... +[2025-02-20 15:36:54,943][00180] Num frames 2300... +[2025-02-20 15:36:55,094][00180] Num frames 2400... +[2025-02-20 15:36:55,240][00180] Num frames 2500... +[2025-02-20 15:36:55,370][00180] Num frames 2600... +[2025-02-20 15:36:55,497][00180] Num frames 2700... +[2025-02-20 15:36:55,626][00180] Num frames 2800... +[2025-02-20 15:36:55,756][00180] Num frames 2900... +[2025-02-20 15:36:55,892][00180] Num frames 3000... +[2025-02-20 15:36:56,018][00180] Num frames 3100... +[2025-02-20 15:36:56,150][00180] Num frames 3200... +[2025-02-20 15:36:56,280][00180] Num frames 3300... +[2025-02-20 15:36:56,412][00180] Num frames 3400... +[2025-02-20 15:36:56,544][00180] Num frames 3500... +[2025-02-20 15:36:56,674][00180] Num frames 3600... +[2025-02-20 15:36:56,801][00180] Num frames 3700... +[2025-02-20 15:36:56,936][00180] Num frames 3800... +[2025-02-20 15:36:57,064][00180] Num frames 3900... +[2025-02-20 15:36:57,142][00180] Avg episode rewards: #0: 50.089, true rewards: #0: 19.590 +[2025-02-20 15:36:57,143][00180] Avg episode reward: 50.089, avg true_objective: 19.590 +[2025-02-20 15:36:57,244][00180] Num frames 4000... +[2025-02-20 15:36:57,371][00180] Num frames 4100... +[2025-02-20 15:36:57,495][00180] Num frames 4200... +[2025-02-20 15:36:57,623][00180] Num frames 4300... +[2025-02-20 15:36:57,747][00180] Num frames 4400... +[2025-02-20 15:36:57,874][00180] Num frames 4500... +[2025-02-20 15:36:58,047][00180] Avg episode rewards: #0: 37.299, true rewards: #0: 15.300 +[2025-02-20 15:36:58,047][00180] Avg episode reward: 37.299, avg true_objective: 15.300 +[2025-02-20 15:36:58,061][00180] Num frames 4600... +[2025-02-20 15:36:58,193][00180] Num frames 4700... +[2025-02-20 15:36:58,319][00180] Num frames 4800... +[2025-02-20 15:36:58,448][00180] Num frames 4900... +[2025-02-20 15:36:58,596][00180] Avg episode rewards: #0: 29.435, true rewards: #0: 12.435 +[2025-02-20 15:36:58,597][00180] Avg episode reward: 29.435, avg true_objective: 12.435 +[2025-02-20 15:36:58,631][00180] Num frames 5000... +[2025-02-20 15:36:58,755][00180] Num frames 5100... +[2025-02-20 15:36:58,880][00180] Num frames 5200... +[2025-02-20 15:36:59,014][00180] Num frames 5300... +[2025-02-20 15:36:59,147][00180] Num frames 5400... +[2025-02-20 15:36:59,272][00180] Num frames 5500... +[2025-02-20 15:36:59,402][00180] Num frames 5600... +[2025-02-20 15:36:59,529][00180] Num frames 5700... +[2025-02-20 15:36:59,657][00180] Num frames 5800... +[2025-02-20 15:36:59,783][00180] Num frames 5900... +[2025-02-20 15:36:59,911][00180] Num frames 6000... +[2025-02-20 15:37:00,048][00180] Num frames 6100... +[2025-02-20 15:37:00,179][00180] Num frames 6200... +[2025-02-20 15:37:00,308][00180] Num frames 6300... +[2025-02-20 15:37:00,434][00180] Num frames 6400... +[2025-02-20 15:37:00,563][00180] Num frames 6500... +[2025-02-20 15:37:00,690][00180] Num frames 6600... +[2025-02-20 15:37:00,817][00180] Num frames 6700... +[2025-02-20 15:37:00,949][00180] Num frames 6800... +[2025-02-20 15:37:01,094][00180] Avg episode rewards: #0: 32.324, true rewards: #0: 13.724 +[2025-02-20 15:37:01,094][00180] Avg episode reward: 32.324, avg true_objective: 13.724 +[2025-02-20 15:37:01,150][00180] Num frames 6900... +[2025-02-20 15:37:01,279][00180] Num frames 7000... +[2025-02-20 15:37:01,405][00180] Num frames 7100... +[2025-02-20 15:37:01,533][00180] Num frames 7200... +[2025-02-20 15:37:01,660][00180] Num frames 7300... +[2025-02-20 15:37:01,788][00180] Num frames 7400... +[2025-02-20 15:37:01,914][00180] Num frames 7500... +[2025-02-20 15:37:02,044][00180] Num frames 7600... +[2025-02-20 15:37:02,172][00180] Num frames 7700... +[2025-02-20 15:37:02,299][00180] Num frames 7800... +[2025-02-20 15:37:02,432][00180] Num frames 7900... +[2025-02-20 15:37:02,559][00180] Num frames 8000... +[2025-02-20 15:37:02,687][00180] Num frames 8100... +[2025-02-20 15:37:02,812][00180] Num frames 8200... +[2025-02-20 15:37:02,917][00180] Avg episode rewards: #0: 32.396, true rewards: #0: 13.730 +[2025-02-20 15:37:02,918][00180] Avg episode reward: 32.396, avg true_objective: 13.730 +[2025-02-20 15:37:03,041][00180] Num frames 8300... +[2025-02-20 15:37:03,216][00180] Num frames 8400... +[2025-02-20 15:37:03,388][00180] Num frames 8500... +[2025-02-20 15:37:03,552][00180] Num frames 8600... +[2025-02-20 15:37:03,719][00180] Num frames 8700... +[2025-02-20 15:37:03,882][00180] Num frames 8800... +[2025-02-20 15:37:04,044][00180] Num frames 8900... +[2025-02-20 15:37:04,222][00180] Num frames 9000... +[2025-02-20 15:37:04,398][00180] Num frames 9100... +[2025-02-20 15:37:04,577][00180] Num frames 9200... +[2025-02-20 15:37:04,755][00180] Num frames 9300... +[2025-02-20 15:37:04,884][00180] Num frames 9400... +[2025-02-20 15:37:05,012][00180] Num frames 9500... +[2025-02-20 15:37:05,153][00180] Num frames 9600... +[2025-02-20 15:37:05,285][00180] Num frames 9700... +[2025-02-20 15:37:05,415][00180] Num frames 9800... +[2025-02-20 15:37:05,547][00180] Num frames 9900... +[2025-02-20 15:37:05,687][00180] Avg episode rewards: #0: 34.664, true rewards: #0: 14.236 +[2025-02-20 15:37:05,688][00180] Avg episode reward: 34.664, avg true_objective: 14.236 +[2025-02-20 15:37:05,733][00180] Num frames 10000... +[2025-02-20 15:37:05,858][00180] Num frames 10100... +[2025-02-20 15:37:05,994][00180] Num frames 10200... +[2025-02-20 15:37:06,129][00180] Num frames 10300... +[2025-02-20 15:37:06,273][00180] Num frames 10400... +[2025-02-20 15:37:06,405][00180] Num frames 10500... +[2025-02-20 15:37:06,488][00180] Avg episode rewards: #0: 31.901, true rewards: #0: 13.151 +[2025-02-20 15:37:06,489][00180] Avg episode reward: 31.901, avg true_objective: 13.151 +[2025-02-20 15:37:06,591][00180] Num frames 10600... +[2025-02-20 15:37:06,718][00180] Num frames 10700... +[2025-02-20 15:37:06,846][00180] Num frames 10800... +[2025-02-20 15:37:06,973][00180] Num frames 10900... +[2025-02-20 15:37:07,100][00180] Num frames 11000... +[2025-02-20 15:37:07,241][00180] Num frames 11100... +[2025-02-20 15:37:07,371][00180] Num frames 11200... +[2025-02-20 15:37:07,500][00180] Num frames 11300... +[2025-02-20 15:37:07,629][00180] Num frames 11400... +[2025-02-20 15:37:07,755][00180] Num frames 11500... +[2025-02-20 15:37:07,885][00180] Num frames 11600... +[2025-02-20 15:37:08,016][00180] Avg episode rewards: #0: 31.400, true rewards: #0: 12.956 +[2025-02-20 15:37:08,017][00180] Avg episode reward: 31.400, avg true_objective: 12.956 +[2025-02-20 15:37:08,071][00180] Num frames 11700... +[2025-02-20 15:37:08,207][00180] Num frames 11800... +[2025-02-20 15:37:08,335][00180] Num frames 11900... +[2025-02-20 15:37:08,464][00180] Num frames 12000... +[2025-02-20 15:37:08,591][00180] Num frames 12100... +[2025-02-20 15:37:08,717][00180] Num frames 12200... +[2025-02-20 15:37:08,847][00180] Num frames 12300... +[2025-02-20 15:37:08,977][00180] Num frames 12400... +[2025-02-20 15:37:09,104][00180] Num frames 12500... +[2025-02-20 15:37:09,242][00180] Num frames 12600... +[2025-02-20 15:37:09,371][00180] Num frames 12700... +[2025-02-20 15:37:09,498][00180] Num frames 12800... +[2025-02-20 15:37:09,569][00180] Avg episode rewards: #0: 31.012, true rewards: #0: 12.812 +[2025-02-20 15:37:09,570][00180] Avg episode reward: 31.012, avg true_objective: 12.812 +[2025-02-20 15:38:24,679][00180] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2025-02-20 16:43:52,336][00180] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-02-20 16:43:52,337][00180] Overriding arg 'num_workers' with value 1 passed from command line +[2025-02-20 16:43:52,338][00180] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-02-20 16:43:52,338][00180] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-02-20 16:43:52,339][00180] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-02-20 16:43:52,340][00180] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-02-20 16:43:52,341][00180] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-02-20 16:43:52,342][00180] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-02-20 16:43:52,342][00180] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-02-20 16:43:52,343][00180] Adding new argument 'hf_repository'='Pie33000/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-02-20 16:43:52,345][00180] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-02-20 16:43:52,346][00180] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-02-20 16:43:52,348][00180] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-02-20 16:43:52,349][00180] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-02-20 16:43:52,350][00180] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-02-20 16:43:52,380][00180] RunningMeanStd input shape: (3, 72, 128) +[2025-02-20 16:43:52,382][00180] RunningMeanStd input shape: (1,) +[2025-02-20 16:43:52,394][00180] ConvEncoder: input_channels=3 +[2025-02-20 16:43:52,433][00180] Conv encoder output size: 512 +[2025-02-20 16:43:52,435][00180] Policy head output size: 512 +[2025-02-20 16:43:52,457][00180] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-02-20 16:43:52,906][00180] Num frames 100... +[2025-02-20 16:43:53,032][00180] Num frames 200... +[2025-02-20 16:43:53,167][00180] Num frames 300... +[2025-02-20 16:43:53,295][00180] Num frames 400... +[2025-02-20 16:43:53,431][00180] Num frames 500... +[2025-02-20 16:43:53,574][00180] Num frames 600... +[2025-02-20 16:43:53,703][00180] Num frames 700... +[2025-02-20 16:43:53,832][00180] Num frames 800... +[2025-02-20 16:43:53,963][00180] Num frames 900... +[2025-02-20 16:43:54,093][00180] Num frames 1000... +[2025-02-20 16:43:54,227][00180] Num frames 1100... +[2025-02-20 16:43:54,357][00180] Num frames 1200... +[2025-02-20 16:43:54,496][00180] Num frames 1300... +[2025-02-20 16:43:54,627][00180] Num frames 1400... +[2025-02-20 16:43:54,773][00180] Avg episode rewards: #0: 37.720, true rewards: #0: 14.720 +[2025-02-20 16:43:54,774][00180] Avg episode reward: 37.720, avg true_objective: 14.720 +[2025-02-20 16:43:54,812][00180] Num frames 1500... +[2025-02-20 16:43:54,940][00180] Num frames 1600... +[2025-02-20 16:43:55,068][00180] Num frames 1700... +[2025-02-20 16:43:55,201][00180] Num frames 1800... +[2025-02-20 16:43:55,329][00180] Num frames 1900... +[2025-02-20 16:43:55,455][00180] Num frames 2000... +[2025-02-20 16:43:55,622][00180] Avg episode rewards: #0: 26.400, true rewards: #0: 10.400 +[2025-02-20 16:43:55,623][00180] Avg episode reward: 26.400, avg true_objective: 10.400 +[2025-02-20 16:43:55,652][00180] Num frames 2100... +[2025-02-20 16:43:55,780][00180] Num frames 2200... +[2025-02-20 16:43:55,909][00180] Num frames 2300... +[2025-02-20 16:43:56,064][00180] Num frames 2400... +[2025-02-20 16:43:56,196][00180] Num frames 2500... +[2025-02-20 16:43:56,324][00180] Num frames 2600... +[2025-02-20 16:43:56,453][00180] Num frames 2700... +[2025-02-20 16:43:56,534][00180] Avg episode rewards: #0: 22.067, true rewards: #0: 9.067 +[2025-02-20 16:43:56,536][00180] Avg episode reward: 22.067, avg true_objective: 9.067 +[2025-02-20 16:43:56,647][00180] Num frames 2800... +[2025-02-20 16:43:56,780][00180] Num frames 2900... +[2025-02-20 16:43:56,908][00180] Num frames 3000... +[2025-02-20 16:43:57,035][00180] Num frames 3100... +[2025-02-20 16:43:57,166][00180] Num frames 3200... +[2025-02-20 16:43:57,298][00180] Num frames 3300... +[2025-02-20 16:43:57,428][00180] Num frames 3400... +[2025-02-20 16:43:57,566][00180] Num frames 3500... +[2025-02-20 16:43:57,697][00180] Num frames 3600... +[2025-02-20 16:43:57,826][00180] Num frames 3700... +[2025-02-20 16:43:57,938][00180] Avg episode rewards: #0: 22.360, true rewards: #0: 9.360 +[2025-02-20 16:43:57,939][00180] Avg episode reward: 22.360, avg true_objective: 9.360 +[2025-02-20 16:43:58,012][00180] Num frames 3800... +[2025-02-20 16:43:58,141][00180] Num frames 3900... +[2025-02-20 16:43:58,273][00180] Num frames 4000... +[2025-02-20 16:43:58,435][00180] Avg episode rewards: #0: 19.170, true rewards: #0: 8.170 +[2025-02-20 16:43:58,436][00180] Avg episode reward: 19.170, avg true_objective: 8.170 +[2025-02-20 16:43:58,457][00180] Num frames 4100... +[2025-02-20 16:43:58,590][00180] Num frames 4200... +[2025-02-20 16:43:58,722][00180] Num frames 4300... +[2025-02-20 16:43:58,849][00180] Num frames 4400... +[2025-02-20 16:43:58,978][00180] Num frames 4500... +[2025-02-20 16:43:59,105][00180] Num frames 4600... +[2025-02-20 16:43:59,238][00180] Num frames 4700... +[2025-02-20 16:43:59,368][00180] Num frames 4800... +[2025-02-20 16:43:59,492][00180] Avg episode rewards: #0: 19.255, true rewards: #0: 8.088 +[2025-02-20 16:43:59,494][00180] Avg episode reward: 19.255, avg true_objective: 8.088 +[2025-02-20 16:43:59,555][00180] Num frames 4900... +[2025-02-20 16:43:59,687][00180] Num frames 5000... +[2025-02-20 16:43:59,812][00180] Num frames 5100... +[2025-02-20 16:43:59,938][00180] Num frames 5200... +[2025-02-20 16:44:00,038][00180] Avg episode rewards: #0: 17.050, true rewards: #0: 7.479 +[2025-02-20 16:44:00,039][00180] Avg episode reward: 17.050, avg true_objective: 7.479 +[2025-02-20 16:44:00,127][00180] Num frames 5300... +[2025-02-20 16:44:00,255][00180] Num frames 5400... +[2025-02-20 16:44:00,383][00180] Num frames 5500... +[2025-02-20 16:44:00,510][00180] Num frames 5600... +[2025-02-20 16:44:00,643][00180] Num frames 5700... +[2025-02-20 16:44:00,771][00180] Num frames 5800... +[2025-02-20 16:44:00,897][00180] Num frames 5900... +[2025-02-20 16:44:01,038][00180] Num frames 6000... +[2025-02-20 16:44:01,217][00180] Num frames 6100... +[2025-02-20 16:44:01,394][00180] Num frames 6200... +[2025-02-20 16:44:01,520][00180] Avg episode rewards: #0: 17.424, true rewards: #0: 7.799 +[2025-02-20 16:44:01,524][00180] Avg episode reward: 17.424, avg true_objective: 7.799 +[2025-02-20 16:44:01,629][00180] Num frames 6300... +[2025-02-20 16:44:01,803][00180] Num frames 6400... +[2025-02-20 16:44:01,970][00180] Num frames 6500... +[2025-02-20 16:44:02,143][00180] Num frames 6600... +[2025-02-20 16:44:02,314][00180] Num frames 6700... +[2025-02-20 16:44:02,482][00180] Num frames 6800... +[2025-02-20 16:44:02,662][00180] Num frames 6900... +[2025-02-20 16:44:02,738][00180] Avg episode rewards: #0: 17.123, true rewards: #0: 7.679 +[2025-02-20 16:44:02,739][00180] Avg episode reward: 17.123, avg true_objective: 7.679 +[2025-02-20 16:44:02,895][00180] Num frames 7000... +[2025-02-20 16:44:03,023][00180] Num frames 7100... +[2025-02-20 16:44:03,159][00180] Num frames 7200... +[2025-02-20 16:44:03,293][00180] Num frames 7300... +[2025-02-20 16:44:03,419][00180] Num frames 7400... +[2025-02-20 16:44:03,547][00180] Num frames 7500... +[2025-02-20 16:44:03,673][00180] Num frames 7600... +[2025-02-20 16:44:03,806][00180] Num frames 7700... +[2025-02-20 16:44:03,932][00180] Num frames 7800... +[2025-02-20 16:44:04,058][00180] Num frames 7900... +[2025-02-20 16:44:04,189][00180] Num frames 8000... +[2025-02-20 16:44:04,269][00180] Avg episode rewards: #0: 18.219, true rewards: #0: 8.019 +[2025-02-20 16:44:04,270][00180] Avg episode reward: 18.219, avg true_objective: 8.019 +[2025-02-20 16:44:51,410][00180] Replay video saved to /content/train_dir/default_experiment/replay.mp4!