[2025-02-19 01:29:57,919][00376] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-02-19 01:29:57,921][00376] Rollout worker 0 uses device cpu [2025-02-19 01:29:57,922][00376] Rollout worker 1 uses device cpu [2025-02-19 01:29:57,923][00376] Rollout worker 2 uses device cpu [2025-02-19 01:29:57,924][00376] Rollout worker 3 uses device cpu [2025-02-19 01:29:57,925][00376] Rollout worker 4 uses device cpu [2025-02-19 01:29:57,926][00376] Rollout worker 5 uses device cpu [2025-02-19 01:29:57,927][00376] Rollout worker 6 uses device cpu [2025-02-19 01:29:57,927][00376] Rollout worker 7 uses device cpu [2025-02-19 01:29:58,081][00376] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 01:29:58,084][00376] InferenceWorker_p0-w0: min num requests: 2 [2025-02-19 01:29:58,121][00376] Starting all processes... [2025-02-19 01:29:58,122][00376] Starting process learner_proc0 [2025-02-19 01:29:58,180][00376] Starting all processes... [2025-02-19 01:29:58,189][00376] Starting process inference_proc0-0 [2025-02-19 01:29:58,189][00376] Starting process rollout_proc0 [2025-02-19 01:29:58,189][00376] Starting process rollout_proc1 [2025-02-19 01:29:58,190][00376] Starting process rollout_proc2 [2025-02-19 01:29:58,190][00376] Starting process rollout_proc3 [2025-02-19 01:29:58,190][00376] Starting process rollout_proc4 [2025-02-19 01:29:58,190][00376] Starting process rollout_proc5 [2025-02-19 01:29:58,190][00376] Starting process rollout_proc6 [2025-02-19 01:29:58,190][00376] Starting process rollout_proc7 [2025-02-19 01:30:13,920][02884] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 01:30:13,922][02884] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-02-19 01:30:14,023][02884] Num visible devices: 1 [2025-02-19 01:30:14,074][02884] Starting seed is not provided [2025-02-19 01:30:14,074][02884] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 01:30:14,074][02884] Initializing actor-critic model on device cuda:0 [2025-02-19 01:30:14,075][02884] RunningMeanStd input shape: (3, 72, 128) [2025-02-19 01:30:14,083][02884] RunningMeanStd input shape: (1,) [2025-02-19 01:30:14,216][02884] ConvEncoder: input_channels=3 [2025-02-19 01:30:14,613][02898] Worker 0 uses CPU cores [0] [2025-02-19 01:30:14,685][02899] Worker 1 uses CPU cores [1] [2025-02-19 01:30:14,707][02900] Worker 3 uses CPU cores [1] [2025-02-19 01:30:14,731][02903] Worker 6 uses CPU cores [0] [2025-02-19 01:30:14,749][02902] Worker 4 uses CPU cores [0] [2025-02-19 01:30:14,784][02901] Worker 2 uses CPU cores [0] [2025-02-19 01:30:14,795][02905] Worker 7 uses CPU cores [1] [2025-02-19 01:30:14,856][02904] Worker 5 uses CPU cores [1] [2025-02-19 01:30:14,860][02897] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 01:30:14,860][02897] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-02-19 01:30:14,891][02897] Num visible devices: 1 [2025-02-19 01:30:14,957][02884] Conv encoder output size: 512 [2025-02-19 01:30:14,958][02884] Policy head output size: 512 [2025-02-19 01:30:15,011][02884] Created Actor Critic model with architecture: [2025-02-19 01:30:15,011][02884] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-02-19 01:30:15,377][02884] Using optimizer [2025-02-19 01:30:18,082][00376] Heartbeat connected on InferenceWorker_p0-w0 [2025-02-19 01:30:18,094][00376] Heartbeat connected on RolloutWorker_w0 [2025-02-19 01:30:18,099][00376] Heartbeat connected on RolloutWorker_w1 [2025-02-19 01:30:18,102][00376] Heartbeat connected on RolloutWorker_w2 [2025-02-19 01:30:18,106][00376] Heartbeat connected on RolloutWorker_w3 [2025-02-19 01:30:18,110][00376] Heartbeat connected on RolloutWorker_w4 [2025-02-19 01:30:18,113][00376] Heartbeat connected on RolloutWorker_w5 [2025-02-19 01:30:18,117][00376] Heartbeat connected on RolloutWorker_w6 [2025-02-19 01:30:18,121][00376] Heartbeat connected on RolloutWorker_w7 [2025-02-19 01:30:18,154][00376] Heartbeat connected on Batcher_0 [2025-02-19 01:30:19,513][02884] No checkpoints found [2025-02-19 01:30:19,513][02884] Did not load from checkpoint, starting from scratch! [2025-02-19 01:30:19,514][02884] Initialized policy 0 weights for model version 0 [2025-02-19 01:30:19,517][02884] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 01:30:19,524][02884] LearnerWorker_p0 finished initialization! [2025-02-19 01:30:19,525][00376] Heartbeat connected on LearnerWorker_p0 [2025-02-19 01:30:19,745][02897] RunningMeanStd input shape: (3, 72, 128) [2025-02-19 01:30:19,747][02897] RunningMeanStd input shape: (1,) [2025-02-19 01:30:19,759][02897] ConvEncoder: input_channels=3 [2025-02-19 01:30:19,869][02897] Conv encoder output size: 512 [2025-02-19 01:30:19,869][02897] Policy head output size: 512 [2025-02-19 01:30:19,909][00376] Inference worker 0-0 is ready! [2025-02-19 01:30:19,912][00376] All inference workers are ready! Signal rollout workers to start! [2025-02-19 01:30:20,221][02900] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 01:30:20,229][02903] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 01:30:20,248][02899] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 01:30:20,251][02898] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 01:30:20,278][02905] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 01:30:20,276][02901] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 01:30:20,281][02902] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 01:30:20,289][02904] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 01:30:21,644][02902] Decorrelating experience for 0 frames... [2025-02-19 01:30:21,644][02905] Decorrelating experience for 0 frames... [2025-02-19 01:30:21,645][02899] Decorrelating experience for 0 frames... [2025-02-19 01:30:22,030][02902] Decorrelating experience for 32 frames... [2025-02-19 01:30:22,372][02905] Decorrelating experience for 32 frames... [2025-02-19 01:30:22,374][02899] Decorrelating experience for 32 frames... [2025-02-19 01:30:22,517][02902] Decorrelating experience for 64 frames... [2025-02-19 01:30:22,965][02902] Decorrelating experience for 96 frames... [2025-02-19 01:30:23,007][00376] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-19 01:30:23,597][02899] Decorrelating experience for 64 frames... [2025-02-19 01:30:23,614][02905] Decorrelating experience for 64 frames... [2025-02-19 01:30:24,374][02905] Decorrelating experience for 96 frames... [2025-02-19 01:30:24,377][02899] Decorrelating experience for 96 frames... [2025-02-19 01:30:28,007][00376] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 240.4. Samples: 1202. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-19 01:30:28,009][00376] Avg episode reward: [(0, '3.347')] [2025-02-19 01:30:28,561][02884] Signal inference workers to stop experience collection... [2025-02-19 01:30:28,570][02897] InferenceWorker_p0-w0: stopping experience collection [2025-02-19 01:30:29,726][02884] Signal inference workers to resume experience collection... [2025-02-19 01:30:29,728][02897] InferenceWorker_p0-w0: resuming experience collection [2025-02-19 01:30:33,007][00376] Fps is (10 sec: 2048.0, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 20480. Throughput: 0: 430.4. Samples: 4304. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:30:33,010][00376] Avg episode reward: [(0, '3.833')] [2025-02-19 01:30:37,853][02897] Updated weights for policy 0, policy_version 10 (0.0013) [2025-02-19 01:30:38,008][00376] Fps is (10 sec: 4095.8, 60 sec: 2730.6, 300 sec: 2730.6). Total num frames: 40960. Throughput: 0: 698.5. Samples: 10478. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:30:38,009][00376] Avg episode reward: [(0, '4.294')] [2025-02-19 01:30:43,007][00376] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 53248. Throughput: 0: 630.0. Samples: 12600. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-19 01:30:43,013][00376] Avg episode reward: [(0, '4.567')] [2025-02-19 01:30:48,007][00376] Fps is (10 sec: 3277.0, 60 sec: 2949.1, 300 sec: 2949.1). Total num frames: 73728. Throughput: 0: 742.2. Samples: 18554. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:30:48,011][00376] Avg episode reward: [(0, '4.555')] [2025-02-19 01:30:49,026][02897] Updated weights for policy 0, policy_version 20 (0.0013) [2025-02-19 01:30:53,009][00376] Fps is (10 sec: 4095.4, 60 sec: 3140.1, 300 sec: 3140.1). Total num frames: 94208. Throughput: 0: 806.8. Samples: 24204. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:30:53,015][00376] Avg episode reward: [(0, '4.574')] [2025-02-19 01:30:58,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3159.8, 300 sec: 3159.8). Total num frames: 110592. Throughput: 0: 755.5. Samples: 26442. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:30:58,011][00376] Avg episode reward: [(0, '4.401')] [2025-02-19 01:30:58,013][02884] Saving new best policy, reward=4.401! [2025-02-19 01:31:00,290][02897] Updated weights for policy 0, policy_version 30 (0.0012) [2025-02-19 01:31:03,007][00376] Fps is (10 sec: 3687.0, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 131072. Throughput: 0: 815.4. Samples: 32616. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:31:03,011][00376] Avg episode reward: [(0, '4.360')] [2025-02-19 01:31:08,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 147456. Throughput: 0: 840.3. Samples: 37814. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:31:08,009][00376] Avg episode reward: [(0, '4.448')] [2025-02-19 01:31:08,015][02884] Saving new best policy, reward=4.448! [2025-02-19 01:31:11,454][02897] Updated weights for policy 0, policy_version 40 (0.0012) [2025-02-19 01:31:13,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3358.7, 300 sec: 3358.7). Total num frames: 167936. Throughput: 0: 876.8. Samples: 40660. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:31:13,012][00376] Avg episode reward: [(0, '4.362')] [2025-02-19 01:31:18,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3425.8, 300 sec: 3425.8). Total num frames: 188416. Throughput: 0: 944.8. Samples: 46822. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:31:18,009][00376] Avg episode reward: [(0, '4.379')] [2025-02-19 01:31:22,609][02897] Updated weights for policy 0, policy_version 50 (0.0013) [2025-02-19 01:31:23,007][00376] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3413.3). Total num frames: 204800. Throughput: 0: 915.6. Samples: 51678. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:31:23,009][00376] Avg episode reward: [(0, '4.450')] [2025-02-19 01:31:23,015][02884] Saving new best policy, reward=4.450! [2025-02-19 01:31:28,059][00376] Fps is (10 sec: 3667.5, 60 sec: 3751.4, 300 sec: 3463.1). Total num frames: 225280. Throughput: 0: 933.7. Samples: 54664. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:31:28,060][00376] Avg episode reward: [(0, '4.433')] [2025-02-19 01:31:33,007][00376] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3452.3). Total num frames: 241664. Throughput: 0: 934.4. Samples: 60604. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:31:33,009][00376] Avg episode reward: [(0, '4.494')] [2025-02-19 01:31:33,022][02884] Saving new best policy, reward=4.494! [2025-02-19 01:31:33,466][02897] Updated weights for policy 0, policy_version 60 (0.0012) [2025-02-19 01:31:38,007][00376] Fps is (10 sec: 3293.8, 60 sec: 3618.2, 300 sec: 3440.6). Total num frames: 258048. Throughput: 0: 919.7. Samples: 65588. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:31:38,012][00376] Avg episode reward: [(0, '4.524')] [2025-02-19 01:31:38,085][02884] Saving new best policy, reward=4.524! [2025-02-19 01:31:43,007][00376] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3481.6). Total num frames: 278528. Throughput: 0: 937.1. Samples: 68612. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:31:43,009][00376] Avg episode reward: [(0, '4.581')] [2025-02-19 01:31:43,017][02884] Saving new best policy, reward=4.581! [2025-02-19 01:31:44,261][02897] Updated weights for policy 0, policy_version 70 (0.0014) [2025-02-19 01:31:48,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3469.6). Total num frames: 294912. Throughput: 0: 919.2. Samples: 73978. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:31:48,011][00376] Avg episode reward: [(0, '4.607')] [2025-02-19 01:31:48,013][02884] Saving new best policy, reward=4.607! [2025-02-19 01:31:53,007][00376] Fps is (10 sec: 3686.5, 60 sec: 3686.5, 300 sec: 3504.4). Total num frames: 315392. Throughput: 0: 924.7. Samples: 79424. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:31:53,012][00376] Avg episode reward: [(0, '4.495')] [2025-02-19 01:31:53,020][02884] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth... [2025-02-19 01:31:55,586][02897] Updated weights for policy 0, policy_version 80 (0.0013) [2025-02-19 01:31:58,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3535.5). Total num frames: 335872. Throughput: 0: 927.9. Samples: 82414. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:31:58,012][00376] Avg episode reward: [(0, '4.550')] [2025-02-19 01:32:03,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3522.6). Total num frames: 352256. Throughput: 0: 898.9. Samples: 87274. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:32:03,010][00376] Avg episode reward: [(0, '4.596')] [2025-02-19 01:32:06,819][02897] Updated weights for policy 0, policy_version 90 (0.0012) [2025-02-19 01:32:08,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3549.9). Total num frames: 372736. Throughput: 0: 930.1. Samples: 93532. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:32:08,010][00376] Avg episode reward: [(0, '4.425')] [2025-02-19 01:32:13,007][00376] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3537.4). Total num frames: 389120. Throughput: 0: 933.2. Samples: 96610. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:32:13,010][00376] Avg episode reward: [(0, '4.438')] [2025-02-19 01:32:18,007][00376] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3526.1). Total num frames: 405504. Throughput: 0: 910.3. Samples: 101568. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:32:18,012][00376] Avg episode reward: [(0, '4.668')] [2025-02-19 01:32:18,023][02884] Saving new best policy, reward=4.668! [2025-02-19 01:32:18,024][02897] Updated weights for policy 0, policy_version 100 (0.0012) [2025-02-19 01:32:23,007][00376] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3549.9). Total num frames: 425984. Throughput: 0: 934.7. Samples: 107650. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:32:23,008][00376] Avg episode reward: [(0, '4.835')] [2025-02-19 01:32:23,022][02884] Saving new best policy, reward=4.835! [2025-02-19 01:32:28,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3621.2, 300 sec: 3538.9). Total num frames: 442368. Throughput: 0: 926.4. Samples: 110298. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:32:28,009][00376] Avg episode reward: [(0, '4.558')] [2025-02-19 01:32:29,451][02897] Updated weights for policy 0, policy_version 110 (0.0015) [2025-02-19 01:32:33,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3560.4). Total num frames: 462848. Throughput: 0: 920.5. Samples: 115402. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:32:33,008][00376] Avg episode reward: [(0, '4.381')] [2025-02-19 01:32:38,007][00376] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3580.2). Total num frames: 483328. Throughput: 0: 937.0. Samples: 121590. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:32:38,008][00376] Avg episode reward: [(0, '4.498')] [2025-02-19 01:32:40,010][02897] Updated weights for policy 0, policy_version 120 (0.0017) [2025-02-19 01:32:43,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3569.4). Total num frames: 499712. Throughput: 0: 917.0. Samples: 123678. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:32:43,011][00376] Avg episode reward: [(0, '4.633')] [2025-02-19 01:32:48,007][00376] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3531.0). Total num frames: 512000. Throughput: 0: 908.5. Samples: 128158. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:32:48,011][00376] Avg episode reward: [(0, '4.419')] [2025-02-19 01:32:53,011][00376] Fps is (10 sec: 2866.0, 60 sec: 3549.6, 300 sec: 3522.5). Total num frames: 528384. Throughput: 0: 880.7. Samples: 133168. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:32:53,016][00376] Avg episode reward: [(0, '4.447')] [2025-02-19 01:32:53,136][02897] Updated weights for policy 0, policy_version 130 (0.0016) [2025-02-19 01:32:58,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3541.1). Total num frames: 548864. Throughput: 0: 862.6. Samples: 135426. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:32:58,009][00376] Avg episode reward: [(0, '4.491')] [2025-02-19 01:33:03,007][00376] Fps is (10 sec: 4097.7, 60 sec: 3618.1, 300 sec: 3558.4). Total num frames: 569344. Throughput: 0: 888.4. Samples: 141544. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:33:03,008][00376] Avg episode reward: [(0, '4.533')] [2025-02-19 01:33:03,883][02897] Updated weights for policy 0, policy_version 140 (0.0012) [2025-02-19 01:33:08,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3549.9). Total num frames: 585728. Throughput: 0: 865.3. Samples: 146588. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:33:08,011][00376] Avg episode reward: [(0, '4.395')] [2025-02-19 01:33:13,009][00376] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3565.9). Total num frames: 606208. Throughput: 0: 874.1. Samples: 149636. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:33:13,012][00376] Avg episode reward: [(0, '4.436')] [2025-02-19 01:33:14,918][02897] Updated weights for policy 0, policy_version 150 (0.0020) [2025-02-19 01:33:18,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3581.1). Total num frames: 626688. Throughput: 0: 898.0. Samples: 155814. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:33:18,012][00376] Avg episode reward: [(0, '4.396')] [2025-02-19 01:33:23,007][00376] Fps is (10 sec: 3277.4, 60 sec: 3549.9, 300 sec: 3549.9). Total num frames: 638976. Throughput: 0: 870.4. Samples: 160758. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:33:23,008][00376] Avg episode reward: [(0, '4.357')] [2025-02-19 01:33:26,229][02897] Updated weights for policy 0, policy_version 160 (0.0014) [2025-02-19 01:33:28,007][00376] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3564.6). Total num frames: 659456. Throughput: 0: 891.5. Samples: 163794. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:33:28,012][00376] Avg episode reward: [(0, '4.486')] [2025-02-19 01:33:33,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3578.6). Total num frames: 679936. Throughput: 0: 922.7. Samples: 169680. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:33:33,009][00376] Avg episode reward: [(0, '4.563')] [2025-02-19 01:33:37,404][02897] Updated weights for policy 0, policy_version 170 (0.0012) [2025-02-19 01:33:38,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3570.9). Total num frames: 696320. Throughput: 0: 925.9. Samples: 174828. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:33:38,011][00376] Avg episode reward: [(0, '4.536')] [2025-02-19 01:33:43,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3584.0). Total num frames: 716800. Throughput: 0: 945.4. Samples: 177968. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:33:43,009][00376] Avg episode reward: [(0, '4.439')] [2025-02-19 01:33:48,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3576.5). Total num frames: 733184. Throughput: 0: 928.9. Samples: 183344. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:33:48,009][00376] Avg episode reward: [(0, '4.434')] [2025-02-19 01:33:48,488][02897] Updated weights for policy 0, policy_version 180 (0.0012) [2025-02-19 01:33:53,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.9, 300 sec: 3588.9). Total num frames: 753664. Throughput: 0: 945.6. Samples: 189140. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:33:53,008][00376] Avg episode reward: [(0, '4.466')] [2025-02-19 01:33:53,016][02884] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000184_753664.pth... [2025-02-19 01:33:58,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3600.7). Total num frames: 774144. Throughput: 0: 945.9. Samples: 192202. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-19 01:33:58,009][00376] Avg episode reward: [(0, '4.371')] [2025-02-19 01:33:59,277][02897] Updated weights for policy 0, policy_version 190 (0.0012) [2025-02-19 01:34:03,007][00376] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3574.7). Total num frames: 786432. Throughput: 0: 906.9. Samples: 196624. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:34:03,010][00376] Avg episode reward: [(0, '4.423')] [2025-02-19 01:34:08,007][00376] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3586.3). Total num frames: 806912. Throughput: 0: 935.7. Samples: 202866. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:34:08,008][00376] Avg episode reward: [(0, '4.541')] [2025-02-19 01:34:09,963][02897] Updated weights for policy 0, policy_version 200 (0.0015) [2025-02-19 01:34:13,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3686.5, 300 sec: 3597.4). Total num frames: 827392. Throughput: 0: 936.6. Samples: 205942. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:34:13,010][00376] Avg episode reward: [(0, '4.518')] [2025-02-19 01:34:18,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3590.5). Total num frames: 843776. Throughput: 0: 916.4. Samples: 210918. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:34:18,009][00376] Avg episode reward: [(0, '4.351')] [2025-02-19 01:34:21,282][02897] Updated weights for policy 0, policy_version 210 (0.0012) [2025-02-19 01:34:23,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3601.1). Total num frames: 864256. Throughput: 0: 934.4. Samples: 216874. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:34:23,011][00376] Avg episode reward: [(0, '4.685')] [2025-02-19 01:34:28,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3594.5). Total num frames: 880640. Throughput: 0: 923.6. Samples: 219532. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:34:28,011][00376] Avg episode reward: [(0, '4.818')] [2025-02-19 01:34:32,570][02897] Updated weights for policy 0, policy_version 220 (0.0013) [2025-02-19 01:34:33,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3604.5). Total num frames: 901120. Throughput: 0: 922.5. Samples: 224858. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-19 01:34:33,012][00376] Avg episode reward: [(0, '4.507')] [2025-02-19 01:34:38,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3614.1). Total num frames: 921600. Throughput: 0: 932.6. Samples: 231106. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:34:38,012][00376] Avg episode reward: [(0, '4.406')] [2025-02-19 01:34:43,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3607.6). Total num frames: 937984. Throughput: 0: 908.2. Samples: 233072. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:34:43,011][00376] Avg episode reward: [(0, '4.533')] [2025-02-19 01:34:43,640][02897] Updated weights for policy 0, policy_version 230 (0.0012) [2025-02-19 01:34:48,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3616.8). Total num frames: 958464. Throughput: 0: 946.9. Samples: 239234. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:34:48,008][00376] Avg episode reward: [(0, '4.620')] [2025-02-19 01:34:53,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3625.7). Total num frames: 978944. Throughput: 0: 932.0. Samples: 244804. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:34:53,011][00376] Avg episode reward: [(0, '4.726')] [2025-02-19 01:34:54,664][02897] Updated weights for policy 0, policy_version 240 (0.0017) [2025-02-19 01:34:58,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3619.4). Total num frames: 995328. Throughput: 0: 917.4. Samples: 247226. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:34:58,012][00376] Avg episode reward: [(0, '4.870')] [2025-02-19 01:34:58,014][02884] Saving new best policy, reward=4.870! [2025-02-19 01:35:03,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3627.9). Total num frames: 1015808. Throughput: 0: 942.5. Samples: 253332. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:35:03,011][00376] Avg episode reward: [(0, '4.800')] [2025-02-19 01:35:04,766][02897] Updated weights for policy 0, policy_version 250 (0.0013) [2025-02-19 01:35:08,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3621.7). Total num frames: 1032192. Throughput: 0: 919.2. Samples: 258238. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:35:08,012][00376] Avg episode reward: [(0, '4.738')] [2025-02-19 01:35:13,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3629.9). Total num frames: 1052672. Throughput: 0: 929.8. Samples: 261374. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:35:13,012][00376] Avg episode reward: [(0, '5.016')] [2025-02-19 01:35:13,018][02884] Saving new best policy, reward=5.016! [2025-02-19 01:35:15,803][02897] Updated weights for policy 0, policy_version 260 (0.0013) [2025-02-19 01:35:18,011][00376] Fps is (10 sec: 4094.3, 60 sec: 3822.7, 300 sec: 3637.8). Total num frames: 1073152. Throughput: 0: 949.5. Samples: 267590. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:35:18,013][00376] Avg episode reward: [(0, '4.994')] [2025-02-19 01:35:23,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1089536. Throughput: 0: 921.4. Samples: 272570. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:35:23,009][00376] Avg episode reward: [(0, '4.747')] [2025-02-19 01:35:26,805][02897] Updated weights for policy 0, policy_version 270 (0.0013) [2025-02-19 01:35:28,007][00376] Fps is (10 sec: 3687.8, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 1110016. Throughput: 0: 947.2. Samples: 275696. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:35:28,009][00376] Avg episode reward: [(0, '4.706')] [2025-02-19 01:35:33,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1126400. Throughput: 0: 939.9. Samples: 281528. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:35:33,011][00376] Avg episode reward: [(0, '4.866')] [2025-02-19 01:35:38,007][00376] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1142784. Throughput: 0: 934.8. Samples: 286872. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:35:38,008][00376] Avg episode reward: [(0, '4.853')] [2025-02-19 01:35:38,020][02897] Updated weights for policy 0, policy_version 280 (0.0015) [2025-02-19 01:35:43,007][00376] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 1167360. Throughput: 0: 949.6. Samples: 289960. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:35:43,009][00376] Avg episode reward: [(0, '5.157')] [2025-02-19 01:35:43,015][02884] Saving new best policy, reward=5.157! [2025-02-19 01:35:48,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1179648. Throughput: 0: 929.0. Samples: 295138. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:35:48,009][00376] Avg episode reward: [(0, '4.957')] [2025-02-19 01:35:49,165][02897] Updated weights for policy 0, policy_version 290 (0.0014) [2025-02-19 01:35:53,007][00376] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1200128. Throughput: 0: 952.8. Samples: 301116. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:35:53,012][00376] Avg episode reward: [(0, '5.018')] [2025-02-19 01:35:53,029][02884] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000294_1204224.pth... [2025-02-19 01:35:53,117][02884] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth [2025-02-19 01:35:58,008][00376] Fps is (10 sec: 4095.6, 60 sec: 3754.6, 300 sec: 3693.3). Total num frames: 1220608. Throughput: 0: 949.8. Samples: 304114. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:35:58,010][00376] Avg episode reward: [(0, '4.983')] [2025-02-19 01:35:59,804][02897] Updated weights for policy 0, policy_version 300 (0.0012) [2025-02-19 01:36:03,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1236992. Throughput: 0: 920.7. Samples: 309016. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:36:03,009][00376] Avg episode reward: [(0, '4.602')] [2025-02-19 01:36:08,007][00376] Fps is (10 sec: 3686.8, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1257472. Throughput: 0: 944.3. Samples: 315064. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:36:08,008][00376] Avg episode reward: [(0, '4.467')] [2025-02-19 01:36:10,422][02897] Updated weights for policy 0, policy_version 310 (0.0011) [2025-02-19 01:36:13,007][00376] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1273856. Throughput: 0: 943.3. Samples: 318146. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:36:13,009][00376] Avg episode reward: [(0, '4.546')] [2025-02-19 01:36:18,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3693.3). Total num frames: 1294336. Throughput: 0: 923.0. Samples: 323064. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:36:18,008][00376] Avg episode reward: [(0, '4.916')] [2025-02-19 01:36:21,592][02897] Updated weights for policy 0, policy_version 320 (0.0018) [2025-02-19 01:36:23,007][00376] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3694.0). Total num frames: 1314816. Throughput: 0: 940.1. Samples: 329178. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:36:23,008][00376] Avg episode reward: [(0, '4.867')] [2025-02-19 01:36:28,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1331200. Throughput: 0: 928.2. Samples: 331730. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:36:28,008][00376] Avg episode reward: [(0, '4.825')] [2025-02-19 01:36:32,904][02897] Updated weights for policy 0, policy_version 330 (0.0015) [2025-02-19 01:36:33,008][00376] Fps is (10 sec: 3686.1, 60 sec: 3754.6, 300 sec: 3707.2). Total num frames: 1351680. Throughput: 0: 933.3. Samples: 337136. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:36:33,012][00376] Avg episode reward: [(0, '4.608')] [2025-02-19 01:36:38,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 1372160. Throughput: 0: 935.7. Samples: 343224. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:36:38,009][00376] Avg episode reward: [(0, '4.623')] [2025-02-19 01:36:43,007][00376] Fps is (10 sec: 3277.1, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 1384448. Throughput: 0: 913.3. Samples: 345210. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:36:43,008][00376] Avg episode reward: [(0, '4.723')] [2025-02-19 01:36:44,030][02897] Updated weights for policy 0, policy_version 340 (0.0017) [2025-02-19 01:36:48,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 1409024. Throughput: 0: 940.9. Samples: 351356. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:36:48,009][00376] Avg episode reward: [(0, '4.821')] [2025-02-19 01:36:53,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1425408. Throughput: 0: 929.0. Samples: 356868. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:36:53,008][00376] Avg episode reward: [(0, '4.695')] [2025-02-19 01:36:55,160][02897] Updated weights for policy 0, policy_version 350 (0.0013) [2025-02-19 01:36:58,009][00376] Fps is (10 sec: 3276.1, 60 sec: 3686.3, 300 sec: 3693.3). Total num frames: 1441792. Throughput: 0: 916.1. Samples: 359372. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:36:58,014][00376] Avg episode reward: [(0, '4.574')] [2025-02-19 01:37:03,007][00376] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1462272. Throughput: 0: 945.7. Samples: 365622. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:37:03,009][00376] Avg episode reward: [(0, '4.404')] [2025-02-19 01:37:05,556][02897] Updated weights for policy 0, policy_version 360 (0.0012) [2025-02-19 01:37:08,007][00376] Fps is (10 sec: 3687.1, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1478656. Throughput: 0: 916.0. Samples: 370398. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:37:08,009][00376] Avg episode reward: [(0, '4.375')] [2025-02-19 01:37:13,007][00376] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1499136. Throughput: 0: 926.3. Samples: 373414. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:37:13,012][00376] Avg episode reward: [(0, '4.708')] [2025-02-19 01:37:16,395][02897] Updated weights for policy 0, policy_version 370 (0.0015) [2025-02-19 01:37:18,008][00376] Fps is (10 sec: 4095.8, 60 sec: 3754.6, 300 sec: 3707.2). Total num frames: 1519616. Throughput: 0: 943.2. Samples: 379578. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:37:18,011][00376] Avg episode reward: [(0, '4.888')] [2025-02-19 01:37:23,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1536000. Throughput: 0: 917.4. Samples: 384506. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:37:23,011][00376] Avg episode reward: [(0, '4.853')] [2025-02-19 01:37:27,638][02897] Updated weights for policy 0, policy_version 380 (0.0014) [2025-02-19 01:37:28,007][00376] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1556480. Throughput: 0: 942.9. Samples: 387640. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:37:28,009][00376] Avg episode reward: [(0, '4.771')] [2025-02-19 01:37:33,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3693.3). Total num frames: 1572864. Throughput: 0: 935.7. Samples: 393464. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-19 01:37:33,013][00376] Avg episode reward: [(0, '4.788')] [2025-02-19 01:37:38,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1593344. Throughput: 0: 929.2. Samples: 398682. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:37:38,012][00376] Avg episode reward: [(0, '5.074')] [2025-02-19 01:37:38,680][02897] Updated weights for policy 0, policy_version 390 (0.0013) [2025-02-19 01:37:43,009][00376] Fps is (10 sec: 4095.3, 60 sec: 3822.8, 300 sec: 3735.0). Total num frames: 1613824. Throughput: 0: 942.4. Samples: 401778. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:37:43,011][00376] Avg episode reward: [(0, '5.215')] [2025-02-19 01:37:43,018][02884] Saving new best policy, reward=5.215! [2025-02-19 01:37:48,008][00376] Fps is (10 sec: 3686.1, 60 sec: 3686.3, 300 sec: 3735.0). Total num frames: 1630208. Throughput: 0: 918.3. Samples: 406946. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:37:48,009][00376] Avg episode reward: [(0, '5.178')] [2025-02-19 01:37:49,931][02897] Updated weights for policy 0, policy_version 400 (0.0014) [2025-02-19 01:37:53,007][00376] Fps is (10 sec: 3687.1, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1650688. Throughput: 0: 943.5. Samples: 412856. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:37:53,012][00376] Avg episode reward: [(0, '5.465')] [2025-02-19 01:37:53,019][02884] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000403_1650688.pth... [2025-02-19 01:37:53,105][02884] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000184_753664.pth [2025-02-19 01:37:53,115][02884] Saving new best policy, reward=5.465! [2025-02-19 01:37:58,007][00376] Fps is (10 sec: 3686.7, 60 sec: 3754.8, 300 sec: 3721.1). Total num frames: 1667072. Throughput: 0: 943.3. Samples: 415864. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:37:58,008][00376] Avg episode reward: [(0, '5.399')] [2025-02-19 01:38:01,087][02897] Updated weights for policy 0, policy_version 410 (0.0015) [2025-02-19 01:38:03,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1687552. Throughput: 0: 917.3. Samples: 420858. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:38:03,009][00376] Avg episode reward: [(0, '5.214')] [2025-02-19 01:38:08,010][00376] Fps is (10 sec: 3685.5, 60 sec: 3754.5, 300 sec: 3721.1). Total num frames: 1703936. Throughput: 0: 945.9. Samples: 427074. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:38:08,011][00376] Avg episode reward: [(0, '5.163')] [2025-02-19 01:38:11,103][02897] Updated weights for policy 0, policy_version 420 (0.0013) [2025-02-19 01:38:13,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1724416. Throughput: 0: 945.4. Samples: 430184. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:38:13,009][00376] Avg episode reward: [(0, '5.520')] [2025-02-19 01:38:13,013][02884] Saving new best policy, reward=5.520! [2025-02-19 01:38:18,007][00376] Fps is (10 sec: 3687.3, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1740800. Throughput: 0: 924.8. Samples: 435082. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:38:18,009][00376] Avg episode reward: [(0, '5.478')] [2025-02-19 01:38:22,226][02897] Updated weights for policy 0, policy_version 430 (0.0012) [2025-02-19 01:38:23,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1761280. Throughput: 0: 945.9. Samples: 441248. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:38:23,010][00376] Avg episode reward: [(0, '5.170')] [2025-02-19 01:38:28,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1777664. Throughput: 0: 930.7. Samples: 443656. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-19 01:38:28,010][00376] Avg episode reward: [(0, '5.034')] [2025-02-19 01:38:33,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1798144. Throughput: 0: 940.4. Samples: 449262. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:38:33,009][00376] Avg episode reward: [(0, '4.758')] [2025-02-19 01:38:33,282][02897] Updated weights for policy 0, policy_version 440 (0.0017) [2025-02-19 01:38:38,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1818624. Throughput: 0: 940.0. Samples: 455156. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:38:38,011][00376] Avg episode reward: [(0, '4.963')] [2025-02-19 01:38:43,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3735.0). Total num frames: 1835008. Throughput: 0: 919.6. Samples: 457248. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-19 01:38:43,011][00376] Avg episode reward: [(0, '5.434')] [2025-02-19 01:38:44,507][02897] Updated weights for policy 0, policy_version 450 (0.0013) [2025-02-19 01:38:48,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1855488. Throughput: 0: 948.5. Samples: 463540. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:38:48,009][00376] Avg episode reward: [(0, '5.624')] [2025-02-19 01:38:48,015][02884] Saving new best policy, reward=5.624! [2025-02-19 01:38:53,010][00376] Fps is (10 sec: 3685.3, 60 sec: 3686.2, 300 sec: 3721.1). Total num frames: 1871872. Throughput: 0: 926.0. Samples: 468746. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:38:53,011][00376] Avg episode reward: [(0, '5.608')] [2025-02-19 01:38:55,720][02897] Updated weights for policy 0, policy_version 460 (0.0012) [2025-02-19 01:38:58,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1892352. Throughput: 0: 918.3. Samples: 471508. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:38:58,011][00376] Avg episode reward: [(0, '5.578')] [2025-02-19 01:39:03,007][00376] Fps is (10 sec: 4097.2, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1912832. Throughput: 0: 948.8. Samples: 477780. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:39:03,009][00376] Avg episode reward: [(0, '5.705')] [2025-02-19 01:39:03,016][02884] Saving new best policy, reward=5.705! [2025-02-19 01:39:06,397][02897] Updated weights for policy 0, policy_version 470 (0.0014) [2025-02-19 01:39:08,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 1929216. Throughput: 0: 919.6. Samples: 482632. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:39:08,012][00376] Avg episode reward: [(0, '5.867')] [2025-02-19 01:39:08,016][02884] Saving new best policy, reward=5.867! [2025-02-19 01:39:13,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1949696. Throughput: 0: 934.5. Samples: 485708. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:39:13,009][00376] Avg episode reward: [(0, '5.473')] [2025-02-19 01:39:16,697][02897] Updated weights for policy 0, policy_version 480 (0.0012) [2025-02-19 01:39:18,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1970176. Throughput: 0: 947.7. Samples: 491908. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:39:18,009][00376] Avg episode reward: [(0, '5.841')] [2025-02-19 01:39:23,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1986560. Throughput: 0: 927.2. Samples: 496878. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:39:23,008][00376] Avg episode reward: [(0, '6.270')] [2025-02-19 01:39:23,017][02884] Saving new best policy, reward=6.270! [2025-02-19 01:39:27,841][02897] Updated weights for policy 0, policy_version 490 (0.0016) [2025-02-19 01:39:28,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2007040. Throughput: 0: 948.2. Samples: 499916. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:39:28,011][00376] Avg episode reward: [(0, '6.727')] [2025-02-19 01:39:28,012][02884] Saving new best policy, reward=6.727! [2025-02-19 01:39:33,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2023424. Throughput: 0: 931.7. Samples: 505468. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:39:33,008][00376] Avg episode reward: [(0, '6.980')] [2025-02-19 01:39:33,013][02884] Saving new best policy, reward=6.980! [2025-02-19 01:39:38,007][00376] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2039808. Throughput: 0: 935.8. Samples: 510856. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:39:38,012][00376] Avg episode reward: [(0, '7.162')] [2025-02-19 01:39:38,017][02884] Saving new best policy, reward=7.162! [2025-02-19 01:39:39,307][02897] Updated weights for policy 0, policy_version 500 (0.0013) [2025-02-19 01:39:43,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2060288. Throughput: 0: 941.8. Samples: 513890. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:39:43,011][00376] Avg episode reward: [(0, '6.873')] [2025-02-19 01:39:48,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2076672. Throughput: 0: 911.1. Samples: 518780. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:39:48,012][00376] Avg episode reward: [(0, '6.875')] [2025-02-19 01:39:50,598][02897] Updated weights for policy 0, policy_version 510 (0.0013) [2025-02-19 01:39:53,008][00376] Fps is (10 sec: 3686.2, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 2097152. Throughput: 0: 940.7. Samples: 524962. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-19 01:39:53,013][00376] Avg episode reward: [(0, '7.076')] [2025-02-19 01:39:53,024][02884] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000512_2097152.pth... [2025-02-19 01:39:53,110][02884] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000294_1204224.pth [2025-02-19 01:39:58,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2117632. Throughput: 0: 939.9. Samples: 528002. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-19 01:39:58,010][00376] Avg episode reward: [(0, '7.321')] [2025-02-19 01:39:58,013][02884] Saving new best policy, reward=7.321! [2025-02-19 01:40:01,807][02897] Updated weights for policy 0, policy_version 520 (0.0012) [2025-02-19 01:40:03,007][00376] Fps is (10 sec: 3686.6, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2134016. Throughput: 0: 909.5. Samples: 532836. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:40:03,012][00376] Avg episode reward: [(0, '7.367')] [2025-02-19 01:40:03,019][02884] Saving new best policy, reward=7.367! [2025-02-19 01:40:08,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2154496. Throughput: 0: 934.9. Samples: 538948. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:40:08,014][00376] Avg episode reward: [(0, '7.167')] [2025-02-19 01:40:12,939][02897] Updated weights for policy 0, policy_version 530 (0.0013) [2025-02-19 01:40:13,009][00376] Fps is (10 sec: 3685.6, 60 sec: 3686.3, 300 sec: 3721.1). Total num frames: 2170880. Throughput: 0: 928.3. Samples: 541690. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:40:13,011][00376] Avg episode reward: [(0, '7.355')] [2025-02-19 01:40:18,007][00376] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 2187264. Throughput: 0: 920.3. Samples: 546882. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:40:18,008][00376] Avg episode reward: [(0, '7.286')] [2025-02-19 01:40:23,007][00376] Fps is (10 sec: 3687.2, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2207744. Throughput: 0: 938.7. Samples: 553096. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:40:23,009][00376] Avg episode reward: [(0, '7.634')] [2025-02-19 01:40:23,015][02884] Saving new best policy, reward=7.634! [2025-02-19 01:40:23,226][02897] Updated weights for policy 0, policy_version 540 (0.0012) [2025-02-19 01:40:28,008][00376] Fps is (10 sec: 3686.2, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 2224128. Throughput: 0: 917.9. Samples: 555196. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:40:28,009][00376] Avg episode reward: [(0, '7.514')] [2025-02-19 01:40:33,007][00376] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2244608. Throughput: 0: 940.1. Samples: 561086. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:40:33,012][00376] Avg episode reward: [(0, '8.857')] [2025-02-19 01:40:33,019][02884] Saving new best policy, reward=8.857! [2025-02-19 01:40:34,319][02897] Updated weights for policy 0, policy_version 550 (0.0013) [2025-02-19 01:40:38,007][00376] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2265088. Throughput: 0: 927.9. Samples: 566716. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-19 01:40:38,009][00376] Avg episode reward: [(0, '9.593')] [2025-02-19 01:40:38,010][02884] Saving new best policy, reward=9.593! [2025-02-19 01:40:43,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2281472. Throughput: 0: 908.6. Samples: 568890. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:40:43,009][00376] Avg episode reward: [(0, '9.642')] [2025-02-19 01:40:43,014][02884] Saving new best policy, reward=9.642! [2025-02-19 01:40:45,468][02897] Updated weights for policy 0, policy_version 560 (0.0013) [2025-02-19 01:40:48,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2301952. Throughput: 0: 939.2. Samples: 575100. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:40:48,008][00376] Avg episode reward: [(0, '8.746')] [2025-02-19 01:40:53,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2318336. Throughput: 0: 918.9. Samples: 580300. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:40:53,014][00376] Avg episode reward: [(0, '8.747')] [2025-02-19 01:40:56,682][02897] Updated weights for policy 0, policy_version 570 (0.0017) [2025-02-19 01:40:58,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2338816. Throughput: 0: 921.9. Samples: 583172. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:40:58,012][00376] Avg episode reward: [(0, '10.106')] [2025-02-19 01:40:58,015][02884] Saving new best policy, reward=10.106! [2025-02-19 01:41:03,008][00376] Fps is (10 sec: 4095.8, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 2359296. Throughput: 0: 944.7. Samples: 589396. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:41:03,013][00376] Avg episode reward: [(0, '11.148')] [2025-02-19 01:41:03,023][02884] Saving new best policy, reward=11.148! [2025-02-19 01:41:07,773][02897] Updated weights for policy 0, policy_version 580 (0.0012) [2025-02-19 01:41:08,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2375680. Throughput: 0: 915.6. Samples: 594298. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0) [2025-02-19 01:41:08,010][00376] Avg episode reward: [(0, '12.060')] [2025-02-19 01:41:08,015][02884] Saving new best policy, reward=12.060! [2025-02-19 01:41:13,007][00376] Fps is (10 sec: 3686.6, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 2396160. Throughput: 0: 935.6. Samples: 597298. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:41:13,008][00376] Avg episode reward: [(0, '13.486')] [2025-02-19 01:41:13,015][02884] Saving new best policy, reward=13.486! [2025-02-19 01:41:18,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2412544. Throughput: 0: 941.2. Samples: 603442. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:41:18,009][00376] Avg episode reward: [(0, '13.072')] [2025-02-19 01:41:18,400][02897] Updated weights for policy 0, policy_version 590 (0.0012) [2025-02-19 01:41:23,008][00376] Fps is (10 sec: 3686.1, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 2433024. Throughput: 0: 926.5. Samples: 608408. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:41:23,016][00376] Avg episode reward: [(0, '13.455')] [2025-02-19 01:41:28,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3735.0). Total num frames: 2453504. Throughput: 0: 947.0. Samples: 611504. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:41:28,012][00376] Avg episode reward: [(0, '14.686')] [2025-02-19 01:41:28,015][02884] Saving new best policy, reward=14.686! [2025-02-19 01:41:28,878][02897] Updated weights for policy 0, policy_version 600 (0.0015) [2025-02-19 01:41:33,008][00376] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2465792. Throughput: 0: 931.3. Samples: 617010. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:41:33,011][00376] Avg episode reward: [(0, '14.233')] [2025-02-19 01:41:38,007][00376] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2486272. Throughput: 0: 941.5. Samples: 622668. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-19 01:41:38,011][00376] Avg episode reward: [(0, '14.277')] [2025-02-19 01:41:40,089][02897] Updated weights for policy 0, policy_version 610 (0.0012) [2025-02-19 01:41:43,007][00376] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2506752. Throughput: 0: 945.1. Samples: 625700. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:41:43,011][00376] Avg episode reward: [(0, '13.553')] [2025-02-19 01:41:48,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2523136. Throughput: 0: 916.1. Samples: 630618. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:41:48,012][00376] Avg episode reward: [(0, '11.901')] [2025-02-19 01:41:51,286][02897] Updated weights for policy 0, policy_version 620 (0.0012) [2025-02-19 01:41:53,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2543616. Throughput: 0: 945.2. Samples: 636834. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:41:53,011][00376] Avg episode reward: [(0, '12.121')] [2025-02-19 01:41:53,020][02884] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000621_2543616.pth... [2025-02-19 01:41:53,116][02884] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000403_1650688.pth [2025-02-19 01:41:58,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2564096. Throughput: 0: 946.1. Samples: 639872. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:41:58,011][00376] Avg episode reward: [(0, '12.974')] [2025-02-19 01:42:02,343][02897] Updated weights for policy 0, policy_version 630 (0.0012) [2025-02-19 01:42:03,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2580480. Throughput: 0: 920.1. Samples: 644846. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:42:03,014][00376] Avg episode reward: [(0, '13.051')] [2025-02-19 01:42:08,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2600960. Throughput: 0: 948.8. Samples: 651102. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:42:08,012][00376] Avg episode reward: [(0, '13.255')] [2025-02-19 01:42:13,011][00376] Fps is (10 sec: 3684.9, 60 sec: 3686.1, 300 sec: 3721.1). Total num frames: 2617344. Throughput: 0: 940.0. Samples: 653808. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:42:13,013][00376] Avg episode reward: [(0, '13.065')] [2025-02-19 01:42:13,411][02897] Updated weights for policy 0, policy_version 640 (0.0012) [2025-02-19 01:42:18,008][00376] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 2637824. Throughput: 0: 935.4. Samples: 659102. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:42:18,009][00376] Avg episode reward: [(0, '11.799')] [2025-02-19 01:42:23,007][00376] Fps is (10 sec: 4097.7, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2658304. Throughput: 0: 948.5. Samples: 665352. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:42:23,011][00376] Avg episode reward: [(0, '12.866')] [2025-02-19 01:42:23,491][02897] Updated weights for policy 0, policy_version 650 (0.0013) [2025-02-19 01:42:28,007][00376] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2674688. Throughput: 0: 927.5. Samples: 667436. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:42:28,009][00376] Avg episode reward: [(0, '12.933')] [2025-02-19 01:42:33,007][00376] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2695168. Throughput: 0: 951.1. Samples: 673416. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:42:33,013][00376] Avg episode reward: [(0, '13.598')] [2025-02-19 01:42:34,350][02897] Updated weights for policy 0, policy_version 660 (0.0012) [2025-02-19 01:42:38,007][00376] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2715648. Throughput: 0: 939.8. Samples: 679126. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:42:38,008][00376] Avg episode reward: [(0, '14.296')] [2025-02-19 01:42:43,007][00376] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2732032. Throughput: 0: 925.6. Samples: 681526. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:42:43,016][00376] Avg episode reward: [(0, '13.731')] [2025-02-19 01:42:45,640][02897] Updated weights for policy 0, policy_version 670 (0.0013) [2025-02-19 01:42:48,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2752512. Throughput: 0: 951.3. Samples: 687654. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:42:48,008][00376] Avg episode reward: [(0, '13.269')] [2025-02-19 01:42:53,010][00376] Fps is (10 sec: 3685.4, 60 sec: 3754.5, 300 sec: 3735.0). Total num frames: 2768896. Throughput: 0: 925.9. Samples: 692772. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0) [2025-02-19 01:42:53,012][00376] Avg episode reward: [(0, '12.230')] [2025-02-19 01:42:56,532][02897] Updated weights for policy 0, policy_version 680 (0.0018) [2025-02-19 01:42:58,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2789376. Throughput: 0: 932.8. Samples: 695778. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:42:58,012][00376] Avg episode reward: [(0, '13.466')] [2025-02-19 01:43:03,008][00376] Fps is (10 sec: 4096.5, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2809856. Throughput: 0: 953.6. Samples: 702014. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:43:03,014][00376] Avg episode reward: [(0, '15.130')] [2025-02-19 01:43:03,024][02884] Saving new best policy, reward=15.130! [2025-02-19 01:43:07,721][02897] Updated weights for policy 0, policy_version 690 (0.0013) [2025-02-19 01:43:08,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2826240. Throughput: 0: 923.2. Samples: 706898. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:43:08,009][00376] Avg episode reward: [(0, '16.334')] [2025-02-19 01:43:08,013][02884] Saving new best policy, reward=16.334! [2025-02-19 01:43:13,007][00376] Fps is (10 sec: 3686.8, 60 sec: 3823.2, 300 sec: 3748.9). Total num frames: 2846720. Throughput: 0: 944.8. Samples: 709952. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:43:13,011][00376] Avg episode reward: [(0, '18.093')] [2025-02-19 01:43:13,020][02884] Saving new best policy, reward=18.093! [2025-02-19 01:43:18,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2863104. Throughput: 0: 944.5. Samples: 715916. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:43:18,009][00376] Avg episode reward: [(0, '20.094')] [2025-02-19 01:43:18,016][02884] Saving new best policy, reward=20.094! [2025-02-19 01:43:18,663][02897] Updated weights for policy 0, policy_version 700 (0.0019) [2025-02-19 01:43:23,007][00376] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2879488. Throughput: 0: 929.5. Samples: 720956. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-19 01:43:23,014][00376] Avg episode reward: [(0, '20.163')] [2025-02-19 01:43:23,021][02884] Saving new best policy, reward=20.163! [2025-02-19 01:43:28,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2899968. Throughput: 0: 943.7. Samples: 723994. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:43:28,010][00376] Avg episode reward: [(0, '19.237')] [2025-02-19 01:43:29,101][02897] Updated weights for policy 0, policy_version 710 (0.0012) [2025-02-19 01:43:33,007][00376] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2916352. Throughput: 0: 927.7. Samples: 729400. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:43:33,009][00376] Avg episode reward: [(0, '18.211')] [2025-02-19 01:43:38,007][00376] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2936832. Throughput: 0: 941.9. Samples: 735156. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:43:38,009][00376] Avg episode reward: [(0, '16.223')] [2025-02-19 01:43:40,161][02897] Updated weights for policy 0, policy_version 720 (0.0012) [2025-02-19 01:43:43,007][00376] Fps is (10 sec: 4095.9, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 2957312. Throughput: 0: 944.3. Samples: 738272. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:43:43,016][00376] Avg episode reward: [(0, '15.534')] [2025-02-19 01:43:48,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2973696. Throughput: 0: 913.5. Samples: 743120. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:43:48,011][00376] Avg episode reward: [(0, '15.815')] [2025-02-19 01:43:51,398][02897] Updated weights for policy 0, policy_version 730 (0.0013) [2025-02-19 01:43:53,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 2994176. Throughput: 0: 941.8. Samples: 749280. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:43:53,012][00376] Avg episode reward: [(0, '17.740')] [2025-02-19 01:43:53,020][02884] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000731_2994176.pth... [2025-02-19 01:43:53,105][02884] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000512_2097152.pth [2025-02-19 01:43:58,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3014656. Throughput: 0: 941.4. Samples: 752314. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:43:58,008][00376] Avg episode reward: [(0, '17.792')] [2025-02-19 01:44:02,505][02897] Updated weights for policy 0, policy_version 740 (0.0013) [2025-02-19 01:44:03,007][00376] Fps is (10 sec: 3686.5, 60 sec: 3686.5, 300 sec: 3735.0). Total num frames: 3031040. Throughput: 0: 919.4. Samples: 757290. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:44:03,012][00376] Avg episode reward: [(0, '18.868')] [2025-02-19 01:44:08,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3051520. Throughput: 0: 947.3. Samples: 763584. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:44:08,013][00376] Avg episode reward: [(0, '17.370')] [2025-02-19 01:44:13,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3067904. Throughput: 0: 941.6. Samples: 766368. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:44:13,014][00376] Avg episode reward: [(0, '16.983')] [2025-02-19 01:44:13,469][02897] Updated weights for policy 0, policy_version 750 (0.0014) [2025-02-19 01:44:18,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3088384. Throughput: 0: 938.7. Samples: 771642. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:44:18,011][00376] Avg episode reward: [(0, '17.088')] [2025-02-19 01:44:23,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3735.0). Total num frames: 3108864. Throughput: 0: 950.2. Samples: 777916. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-19 01:44:23,011][00376] Avg episode reward: [(0, '17.613')] [2025-02-19 01:44:23,594][02897] Updated weights for policy 0, policy_version 760 (0.0012) [2025-02-19 01:44:28,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3125248. Throughput: 0: 928.5. Samples: 780052. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:44:28,009][00376] Avg episode reward: [(0, '18.625')] [2025-02-19 01:44:33,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3145728. Throughput: 0: 954.1. Samples: 786054. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:44:33,012][00376] Avg episode reward: [(0, '18.896')] [2025-02-19 01:44:34,359][02897] Updated weights for policy 0, policy_version 770 (0.0016) [2025-02-19 01:44:38,012][00376] Fps is (10 sec: 4094.0, 60 sec: 3822.6, 300 sec: 3748.8). Total num frames: 3166208. Throughput: 0: 944.6. Samples: 791792. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:44:38,016][00376] Avg episode reward: [(0, '18.111')] [2025-02-19 01:44:43,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3182592. Throughput: 0: 930.8. Samples: 794202. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:44:43,008][00376] Avg episode reward: [(0, '17.782')] [2025-02-19 01:44:45,545][02897] Updated weights for policy 0, policy_version 780 (0.0012) [2025-02-19 01:44:48,007][00376] Fps is (10 sec: 3688.2, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3203072. Throughput: 0: 956.9. Samples: 800350. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:44:48,014][00376] Avg episode reward: [(0, '18.105')] [2025-02-19 01:44:53,009][00376] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3735.0). Total num frames: 3219456. Throughput: 0: 928.6. Samples: 805374. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:44:53,013][00376] Avg episode reward: [(0, '17.145')] [2025-02-19 01:44:56,662][02897] Updated weights for policy 0, policy_version 790 (0.0013) [2025-02-19 01:44:58,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3239936. Throughput: 0: 933.7. Samples: 808384. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:44:58,012][00376] Avg episode reward: [(0, '17.467')] [2025-02-19 01:45:03,007][00376] Fps is (10 sec: 4096.9, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3260416. Throughput: 0: 956.4. Samples: 814678. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:45:03,012][00376] Avg episode reward: [(0, '18.889')] [2025-02-19 01:45:07,631][02897] Updated weights for policy 0, policy_version 800 (0.0012) [2025-02-19 01:45:08,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3276800. Throughput: 0: 927.8. Samples: 819666. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:45:08,009][00376] Avg episode reward: [(0, '19.669')] [2025-02-19 01:45:13,009][00376] Fps is (10 sec: 3685.6, 60 sec: 3822.8, 300 sec: 3762.7). Total num frames: 3297280. Throughput: 0: 948.9. Samples: 822756. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:45:13,011][00376] Avg episode reward: [(0, '20.879')] [2025-02-19 01:45:13,020][02884] Saving new best policy, reward=20.879! [2025-02-19 01:45:18,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3313664. Throughput: 0: 946.3. Samples: 828638. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:45:18,011][00376] Avg episode reward: [(0, '21.949')] [2025-02-19 01:45:18,016][02884] Saving new best policy, reward=21.949! [2025-02-19 01:45:18,658][02897] Updated weights for policy 0, policy_version 810 (0.0013) [2025-02-19 01:45:23,007][00376] Fps is (10 sec: 3277.5, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3330048. Throughput: 0: 930.1. Samples: 833644. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:45:23,008][00376] Avg episode reward: [(0, '22.263')] [2025-02-19 01:45:23,017][02884] Saving new best policy, reward=22.263! [2025-02-19 01:45:28,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3350528. Throughput: 0: 944.5. Samples: 836704. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-19 01:45:28,010][00376] Avg episode reward: [(0, '21.144')] [2025-02-19 01:45:29,078][02897] Updated weights for policy 0, policy_version 820 (0.0013) [2025-02-19 01:45:33,008][00376] Fps is (10 sec: 3686.2, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3366912. Throughput: 0: 927.6. Samples: 842094. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:45:33,010][00376] Avg episode reward: [(0, '20.119')] [2025-02-19 01:45:38,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.7, 300 sec: 3748.9). Total num frames: 3387392. Throughput: 0: 945.2. Samples: 847904. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-19 01:45:38,009][00376] Avg episode reward: [(0, '18.664')] [2025-02-19 01:45:40,110][02897] Updated weights for policy 0, policy_version 830 (0.0012) [2025-02-19 01:45:43,007][00376] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3407872. Throughput: 0: 947.3. Samples: 851014. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:45:43,009][00376] Avg episode reward: [(0, '16.536')] [2025-02-19 01:45:48,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3424256. Throughput: 0: 917.8. Samples: 855980. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:45:48,010][00376] Avg episode reward: [(0, '15.779')] [2025-02-19 01:45:51,176][02897] Updated weights for policy 0, policy_version 840 (0.0021) [2025-02-19 01:45:53,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3748.9). Total num frames: 3444736. Throughput: 0: 944.5. Samples: 862168. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:45:53,009][00376] Avg episode reward: [(0, '15.860')] [2025-02-19 01:45:53,018][02884] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000841_3444736.pth... [2025-02-19 01:45:53,113][02884] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000621_2543616.pth [2025-02-19 01:45:58,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3465216. Throughput: 0: 943.5. Samples: 865210. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:45:58,008][00376] Avg episode reward: [(0, '16.664')] [2025-02-19 01:46:02,346][02897] Updated weights for policy 0, policy_version 850 (0.0025) [2025-02-19 01:46:03,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3481600. Throughput: 0: 923.9. Samples: 870214. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:46:03,015][00376] Avg episode reward: [(0, '17.445')] [2025-02-19 01:46:08,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3502080. Throughput: 0: 951.5. Samples: 876462. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:46:08,012][00376] Avg episode reward: [(0, '18.507')] [2025-02-19 01:46:13,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3748.9). Total num frames: 3518464. Throughput: 0: 941.5. Samples: 879072. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:46:13,013][00376] Avg episode reward: [(0, '19.372')] [2025-02-19 01:46:13,362][02897] Updated weights for policy 0, policy_version 860 (0.0015) [2025-02-19 01:46:18,008][00376] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3748.9). Total num frames: 3538944. Throughput: 0: 944.5. Samples: 884598. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:46:18,009][00376] Avg episode reward: [(0, '20.485')] [2025-02-19 01:46:23,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3559424. Throughput: 0: 951.3. Samples: 890714. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:46:23,010][00376] Avg episode reward: [(0, '20.876')] [2025-02-19 01:46:23,470][02897] Updated weights for policy 0, policy_version 870 (0.0012) [2025-02-19 01:46:28,007][00376] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3575808. Throughput: 0: 925.8. Samples: 892674. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:46:28,010][00376] Avg episode reward: [(0, '21.800')] [2025-02-19 01:46:33,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 3596288. Throughput: 0: 951.8. Samples: 898812. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:46:33,008][00376] Avg episode reward: [(0, '21.973')] [2025-02-19 01:46:34,433][02897] Updated weights for policy 0, policy_version 880 (0.0015) [2025-02-19 01:46:38,011][00376] Fps is (10 sec: 4094.6, 60 sec: 3822.7, 300 sec: 3762.7). Total num frames: 3616768. Throughput: 0: 937.6. Samples: 904364. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:46:38,013][00376] Avg episode reward: [(0, '22.487')] [2025-02-19 01:46:38,016][02884] Saving new best policy, reward=22.487! [2025-02-19 01:46:43,008][00376] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3762.8). Total num frames: 3633152. Throughput: 0: 924.3. Samples: 906806. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-19 01:46:43,010][00376] Avg episode reward: [(0, '21.558')] [2025-02-19 01:46:45,467][02897] Updated weights for policy 0, policy_version 890 (0.0014) [2025-02-19 01:46:48,007][00376] Fps is (10 sec: 3687.7, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3653632. Throughput: 0: 953.4. Samples: 913116. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:46:48,009][00376] Avg episode reward: [(0, '22.206')] [2025-02-19 01:46:53,009][00376] Fps is (10 sec: 3685.8, 60 sec: 3754.5, 300 sec: 3748.9). Total num frames: 3670016. Throughput: 0: 922.6. Samples: 917982. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:46:53,014][00376] Avg episode reward: [(0, '22.140')] [2025-02-19 01:46:56,683][02897] Updated weights for policy 0, policy_version 900 (0.0013) [2025-02-19 01:46:58,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3690496. Throughput: 0: 933.5. Samples: 921080. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:46:58,009][00376] Avg episode reward: [(0, '22.860')] [2025-02-19 01:46:58,011][02884] Saving new best policy, reward=22.860! [2025-02-19 01:47:03,014][00376] Fps is (10 sec: 4094.0, 60 sec: 3822.5, 300 sec: 3762.7). Total num frames: 3710976. Throughput: 0: 947.9. Samples: 927262. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:47:03,016][00376] Avg episode reward: [(0, '21.146')] [2025-02-19 01:47:07,799][02897] Updated weights for policy 0, policy_version 910 (0.0012) [2025-02-19 01:47:08,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3727360. Throughput: 0: 922.4. Samples: 932220. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:47:08,012][00376] Avg episode reward: [(0, '22.059')] [2025-02-19 01:47:13,007][00376] Fps is (10 sec: 3689.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3747840. Throughput: 0: 947.6. Samples: 935316. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:47:13,013][00376] Avg episode reward: [(0, '20.197')] [2025-02-19 01:47:18,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3764224. Throughput: 0: 941.2. Samples: 941164. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:47:18,012][00376] Avg episode reward: [(0, '19.519')] [2025-02-19 01:47:18,731][02897] Updated weights for policy 0, policy_version 920 (0.0013) [2025-02-19 01:47:23,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3784704. Throughput: 0: 934.7. Samples: 946422. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:47:23,011][00376] Avg episode reward: [(0, '18.776')] [2025-02-19 01:47:28,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3805184. Throughput: 0: 949.4. Samples: 949528. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:47:28,013][00376] Avg episode reward: [(0, '19.144')] [2025-02-19 01:47:28,824][02897] Updated weights for policy 0, policy_version 930 (0.0012) [2025-02-19 01:47:33,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3821568. Throughput: 0: 922.2. Samples: 954616. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:47:33,011][00376] Avg episode reward: [(0, '19.810')] [2025-02-19 01:47:38,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.9, 300 sec: 3762.8). Total num frames: 3842048. Throughput: 0: 949.9. Samples: 960726. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:47:38,013][00376] Avg episode reward: [(0, '20.129')] [2025-02-19 01:47:39,824][02897] Updated weights for policy 0, policy_version 940 (0.0012) [2025-02-19 01:47:43,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3858432. Throughput: 0: 950.3. Samples: 963842. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:47:43,011][00376] Avg episode reward: [(0, '22.024')] [2025-02-19 01:47:48,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3878912. Throughput: 0: 924.6. Samples: 968862. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:47:48,012][00376] Avg episode reward: [(0, '21.914')] [2025-02-19 01:47:51,011][02897] Updated weights for policy 0, policy_version 950 (0.0012) [2025-02-19 01:47:53,007][00376] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3762.8). Total num frames: 3899392. Throughput: 0: 951.1. Samples: 975020. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:47:53,013][00376] Avg episode reward: [(0, '22.084')] [2025-02-19 01:47:53,021][02884] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000952_3899392.pth... [2025-02-19 01:47:53,114][02884] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000731_2994176.pth [2025-02-19 01:47:58,008][00376] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3748.9). Total num frames: 3915776. Throughput: 0: 946.5. Samples: 977910. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-19 01:47:58,011][00376] Avg episode reward: [(0, '22.047')] [2025-02-19 01:48:02,098][02897] Updated weights for policy 0, policy_version 960 (0.0012) [2025-02-19 01:48:03,007][00376] Fps is (10 sec: 3276.8, 60 sec: 3686.8, 300 sec: 3748.9). Total num frames: 3932160. Throughput: 0: 931.6. Samples: 983088. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:48:03,010][00376] Avg episode reward: [(0, '21.108')] [2025-02-19 01:48:08,007][00376] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3956736. Throughput: 0: 953.9. Samples: 989346. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:48:08,008][00376] Avg episode reward: [(0, '21.472')] [2025-02-19 01:48:13,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3969024. Throughput: 0: 936.1. Samples: 991652. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:48:13,012][00376] Avg episode reward: [(0, '20.877')] [2025-02-19 01:48:13,136][02897] Updated weights for policy 0, policy_version 970 (0.0012) [2025-02-19 01:48:18,007][00376] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3993600. Throughput: 0: 952.7. Samples: 997488. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-19 01:48:18,009][00376] Avg episode reward: [(0, '21.355')] [2025-02-19 01:48:20,898][02884] Stopping Batcher_0... [2025-02-19 01:48:20,898][02884] Loop batcher_evt_loop terminating... [2025-02-19 01:48:20,900][00376] Component Batcher_0 stopped! [2025-02-19 01:48:20,904][00376] Component RolloutWorker_w0 process died already! Don't wait for it. [2025-02-19 01:48:20,905][00376] Component RolloutWorker_w2 process died already! Don't wait for it. [2025-02-19 01:48:20,908][00376] Component RolloutWorker_w3 process died already! Don't wait for it. [2025-02-19 01:48:20,910][00376] Component RolloutWorker_w5 process died already! Don't wait for it. [2025-02-19 01:48:20,911][00376] Component RolloutWorker_w6 process died already! Don't wait for it. [2025-02-19 01:48:20,912][02884] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-19 01:48:20,952][02897] Weights refcount: 2 0 [2025-02-19 01:48:20,955][02897] Stopping InferenceWorker_p0-w0... [2025-02-19 01:48:20,956][02897] Loop inference_proc0-0_evt_loop terminating... [2025-02-19 01:48:20,955][00376] Component InferenceWorker_p0-w0 stopped! [2025-02-19 01:48:21,006][02884] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000841_3444736.pth [2025-02-19 01:48:21,017][02884] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-19 01:48:21,147][00376] Component LearnerWorker_p0 stopped! [2025-02-19 01:48:21,149][02884] Stopping LearnerWorker_p0... [2025-02-19 01:48:21,150][02884] Loop learner_proc0_evt_loop terminating... [2025-02-19 01:48:21,168][02899] Stopping RolloutWorker_w1... [2025-02-19 01:48:21,169][02899] Loop rollout_proc1_evt_loop terminating... [2025-02-19 01:48:21,168][00376] Component RolloutWorker_w1 stopped! [2025-02-19 01:48:21,173][02905] Stopping RolloutWorker_w7... [2025-02-19 01:48:21,173][00376] Component RolloutWorker_w7 stopped! [2025-02-19 01:48:21,174][02905] Loop rollout_proc7_evt_loop terminating... [2025-02-19 01:48:21,378][00376] Component RolloutWorker_w4 stopped! [2025-02-19 01:48:21,380][00376] Waiting for process learner_proc0 to stop... [2025-02-19 01:48:21,382][02902] Stopping RolloutWorker_w4... [2025-02-19 01:48:21,382][02902] Loop rollout_proc4_evt_loop terminating... [2025-02-19 01:48:22,722][00376] Waiting for process inference_proc0-0 to join... [2025-02-19 01:48:22,723][00376] Waiting for process rollout_proc0 to join... [2025-02-19 01:48:22,724][00376] Waiting for process rollout_proc1 to join... [2025-02-19 01:48:23,795][00376] Waiting for process rollout_proc2 to join... [2025-02-19 01:48:23,797][00376] Waiting for process rollout_proc3 to join... [2025-02-19 01:48:23,797][00376] Waiting for process rollout_proc4 to join... [2025-02-19 01:48:23,798][00376] Waiting for process rollout_proc5 to join... [2025-02-19 01:48:23,799][00376] Waiting for process rollout_proc6 to join... [2025-02-19 01:48:23,800][00376] Waiting for process rollout_proc7 to join... [2025-02-19 01:48:23,801][00376] Batcher 0 profile tree view: batching: 19.9471, releasing_batches: 0.0263 [2025-02-19 01:48:23,802][00376] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0023 wait_policy_total: 402.6355 update_model: 10.1091 weight_update: 0.0012 one_step: 0.0029 handle_policy_step: 629.8392 deserialize: 15.2385, stack: 3.9825, obs_to_device_normalize: 144.8192, forward: 331.1217, send_messages: 20.9884 prepare_outputs: 86.0461 to_cpu: 54.0486 [2025-02-19 01:48:23,803][00376] Learner 0 profile tree view: misc: 0.0052, prepare_batch: 12.1290 train: 65.4400 epoch_init: 0.0047, minibatch_init: 0.0057, losses_postprocess: 0.5672, kl_divergence: 0.5511, after_optimizer: 31.7019 calculate_losses: 21.9930 losses_init: 0.0032, forward_head: 1.2296, bptt_initial: 15.2279, tail: 0.8459, advantages_returns: 0.2261, losses: 2.6955 bptt: 1.5793 bptt_forward_core: 1.5069 update: 10.2000 clip: 0.8225 [2025-02-19 01:48:23,803][00376] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4994, enqueue_policy_requests: 132.5838, env_step: 764.1864, overhead: 22.1792, complete_rollouts: 7.7479 save_policy_outputs: 30.5912 split_output_tensors: 11.7585 [2025-02-19 01:48:23,807][00376] Loop Runner_EvtLoop terminating... [2025-02-19 01:48:23,808][00376] Runner profile tree view: main_loop: 1105.6877 [2025-02-19 01:48:23,809][00376] Collected {0: 4005888}, FPS: 3623.0 [2025-02-19 01:48:24,467][00376] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-02-19 01:48:24,468][00376] Overriding arg 'num_workers' with value 1 passed from command line [2025-02-19 01:48:24,469][00376] Adding new argument 'no_render'=True that is not in the saved config file! [2025-02-19 01:48:24,470][00376] Adding new argument 'save_video'=True that is not in the saved config file! [2025-02-19 01:48:24,471][00376] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-02-19 01:48:24,472][00376] Adding new argument 'video_name'=None that is not in the saved config file! [2025-02-19 01:48:24,473][00376] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-02-19 01:48:24,474][00376] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-02-19 01:48:24,475][00376] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-02-19 01:48:24,476][00376] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-02-19 01:48:24,477][00376] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-02-19 01:48:24,478][00376] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-02-19 01:48:24,479][00376] Adding new argument 'train_script'=None that is not in the saved config file! [2025-02-19 01:48:24,480][00376] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-02-19 01:48:24,481][00376] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-02-19 01:48:24,528][00376] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 01:48:24,532][00376] RunningMeanStd input shape: (3, 72, 128) [2025-02-19 01:48:24,533][00376] RunningMeanStd input shape: (1,) [2025-02-19 01:48:24,551][00376] ConvEncoder: input_channels=3 [2025-02-19 01:48:24,647][00376] Conv encoder output size: 512 [2025-02-19 01:48:24,648][00376] Policy head output size: 512 [2025-02-19 01:48:24,906][00376] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-19 01:48:25,660][00376] Num frames 100... [2025-02-19 01:48:25,789][00376] Num frames 200... [2025-02-19 01:48:25,919][00376] Num frames 300... [2025-02-19 01:48:26,052][00376] Num frames 400... [2025-02-19 01:48:26,179][00376] Num frames 500... [2025-02-19 01:48:26,309][00376] Num frames 600... [2025-02-19 01:48:26,439][00376] Num frames 700... [2025-02-19 01:48:26,572][00376] Num frames 800... [2025-02-19 01:48:26,702][00376] Num frames 900... [2025-02-19 01:48:26,828][00376] Num frames 1000... [2025-02-19 01:48:26,957][00376] Num frames 1100... [2025-02-19 01:48:27,085][00376] Num frames 1200... [2025-02-19 01:48:27,211][00376] Num frames 1300... [2025-02-19 01:48:27,336][00376] Num frames 1400... [2025-02-19 01:48:27,463][00376] Num frames 1500... [2025-02-19 01:48:27,566][00376] Avg episode rewards: #0: 30.360, true rewards: #0: 15.360 [2025-02-19 01:48:27,567][00376] Avg episode reward: 30.360, avg true_objective: 15.360 [2025-02-19 01:48:27,657][00376] Num frames 1600... [2025-02-19 01:48:27,784][00376] Num frames 1700... [2025-02-19 01:48:27,910][00376] Num frames 1800... [2025-02-19 01:48:28,041][00376] Num frames 1900... [2025-02-19 01:48:28,172][00376] Num frames 2000... [2025-02-19 01:48:28,308][00376] Num frames 2100... [2025-02-19 01:48:28,443][00376] Num frames 2200... [2025-02-19 01:48:28,578][00376] Num frames 2300... [2025-02-19 01:48:28,712][00376] Num frames 2400... [2025-02-19 01:48:28,839][00376] Num frames 2500... [2025-02-19 01:48:28,931][00376] Avg episode rewards: #0: 25.640, true rewards: #0: 12.640 [2025-02-19 01:48:28,932][00376] Avg episode reward: 25.640, avg true_objective: 12.640 [2025-02-19 01:48:29,031][00376] Num frames 2600... [2025-02-19 01:48:29,160][00376] Num frames 2700... [2025-02-19 01:48:29,285][00376] Num frames 2800... [2025-02-19 01:48:29,413][00376] Num frames 2900... [2025-02-19 01:48:29,542][00376] Num frames 3000... [2025-02-19 01:48:29,679][00376] Num frames 3100... [2025-02-19 01:48:29,808][00376] Num frames 3200... [2025-02-19 01:48:29,934][00376] Num frames 3300... [2025-02-19 01:48:30,069][00376] Num frames 3400... [2025-02-19 01:48:30,200][00376] Num frames 3500... [2025-02-19 01:48:30,329][00376] Num frames 3600... [2025-02-19 01:48:30,459][00376] Num frames 3700... [2025-02-19 01:48:30,588][00376] Num frames 3800... [2025-02-19 01:48:30,728][00376] Num frames 3900... [2025-02-19 01:48:30,790][00376] Avg episode rewards: #0: 29.350, true rewards: #0: 13.017 [2025-02-19 01:48:30,791][00376] Avg episode reward: 29.350, avg true_objective: 13.017 [2025-02-19 01:48:30,914][00376] Num frames 4000... [2025-02-19 01:48:31,044][00376] Num frames 4100... [2025-02-19 01:48:31,171][00376] Num frames 4200... [2025-02-19 01:48:31,298][00376] Num frames 4300... [2025-02-19 01:48:31,421][00376] Avg episode rewards: #0: 23.383, true rewards: #0: 10.882 [2025-02-19 01:48:31,422][00376] Avg episode reward: 23.383, avg true_objective: 10.882 [2025-02-19 01:48:31,483][00376] Num frames 4400... [2025-02-19 01:48:31,612][00376] Num frames 4500... [2025-02-19 01:48:31,745][00376] Num frames 4600... [2025-02-19 01:48:31,872][00376] Num frames 4700... [2025-02-19 01:48:32,000][00376] Num frames 4800... [2025-02-19 01:48:32,130][00376] Num frames 4900... [2025-02-19 01:48:32,256][00376] Num frames 5000... [2025-02-19 01:48:32,388][00376] Num frames 5100... [2025-02-19 01:48:32,518][00376] Num frames 5200... [2025-02-19 01:48:32,647][00376] Num frames 5300... [2025-02-19 01:48:32,782][00376] Num frames 5400... [2025-02-19 01:48:32,911][00376] Num frames 5500... [2025-02-19 01:48:32,974][00376] Avg episode rewards: #0: 24.610, true rewards: #0: 11.010 [2025-02-19 01:48:32,975][00376] Avg episode reward: 24.610, avg true_objective: 11.010 [2025-02-19 01:48:33,100][00376] Num frames 5600... [2025-02-19 01:48:33,226][00376] Num frames 5700... [2025-02-19 01:48:33,358][00376] Num frames 5800... [2025-02-19 01:48:33,489][00376] Num frames 5900... [2025-02-19 01:48:33,619][00376] Num frames 6000... [2025-02-19 01:48:33,755][00376] Num frames 6100... [2025-02-19 01:48:33,882][00376] Num frames 6200... [2025-02-19 01:48:34,015][00376] Num frames 6300... [2025-02-19 01:48:34,143][00376] Num frames 6400... [2025-02-19 01:48:34,203][00376] Avg episode rewards: #0: 23.838, true rewards: #0: 10.672 [2025-02-19 01:48:34,204][00376] Avg episode reward: 23.838, avg true_objective: 10.672 [2025-02-19 01:48:34,332][00376] Num frames 6500... [2025-02-19 01:48:34,463][00376] Num frames 6600... [2025-02-19 01:48:34,614][00376] Num frames 6700... [2025-02-19 01:48:34,805][00376] Num frames 6800... [2025-02-19 01:48:35,014][00376] Avg episode rewards: #0: 21.976, true rewards: #0: 9.833 [2025-02-19 01:48:35,015][00376] Avg episode reward: 21.976, avg true_objective: 9.833 [2025-02-19 01:48:35,052][00376] Num frames 6900... [2025-02-19 01:48:35,227][00376] Num frames 7000... [2025-02-19 01:48:35,395][00376] Num frames 7100... [2025-02-19 01:48:35,562][00376] Num frames 7200... [2025-02-19 01:48:35,735][00376] Num frames 7300... [2025-02-19 01:48:35,918][00376] Num frames 7400... [2025-02-19 01:48:36,107][00376] Num frames 7500... [2025-02-19 01:48:36,284][00376] Num frames 7600... [2025-02-19 01:48:36,414][00376] Avg episode rewards: #0: 21.045, true rewards: #0: 9.545 [2025-02-19 01:48:36,415][00376] Avg episode reward: 21.045, avg true_objective: 9.545 [2025-02-19 01:48:36,535][00376] Num frames 7700... [2025-02-19 01:48:36,678][00376] Num frames 7800... [2025-02-19 01:48:36,808][00376] Num frames 7900... [2025-02-19 01:48:36,943][00376] Num frames 8000... [2025-02-19 01:48:37,076][00376] Num frames 8100... [2025-02-19 01:48:37,204][00376] Num frames 8200... [2025-02-19 01:48:37,317][00376] Avg episode rewards: #0: 19.716, true rewards: #0: 9.160 [2025-02-19 01:48:37,318][00376] Avg episode reward: 19.716, avg true_objective: 9.160 [2025-02-19 01:48:37,393][00376] Num frames 8300... [2025-02-19 01:48:37,522][00376] Num frames 8400... [2025-02-19 01:48:37,649][00376] Num frames 8500... [2025-02-19 01:48:37,777][00376] Num frames 8600... [2025-02-19 01:48:37,910][00376] Num frames 8700... [2025-02-19 01:48:38,042][00376] Num frames 8800... [2025-02-19 01:48:38,170][00376] Num frames 8900... [2025-02-19 01:48:38,298][00376] Num frames 9000... [2025-02-19 01:48:38,425][00376] Num frames 9100... [2025-02-19 01:48:38,540][00376] Avg episode rewards: #0: 19.846, true rewards: #0: 9.146 [2025-02-19 01:48:38,541][00376] Avg episode reward: 19.846, avg true_objective: 9.146 [2025-02-19 01:49:29,568][00376] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-02-19 01:54:55,349][00376] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-02-19 01:54:55,350][00376] Overriding arg 'num_workers' with value 1 passed from command line [2025-02-19 01:54:55,351][00376] Adding new argument 'no_render'=True that is not in the saved config file! [2025-02-19 01:54:55,352][00376] Adding new argument 'save_video'=True that is not in the saved config file! [2025-02-19 01:54:55,353][00376] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-02-19 01:54:55,354][00376] Adding new argument 'video_name'=None that is not in the saved config file! [2025-02-19 01:54:55,354][00376] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-02-19 01:54:55,355][00376] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-02-19 01:54:55,356][00376] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-02-19 01:54:55,357][00376] Adding new argument 'hf_repository'='kate0711/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-02-19 01:54:55,358][00376] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-02-19 01:54:55,358][00376] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-02-19 01:54:55,359][00376] Adding new argument 'train_script'=None that is not in the saved config file! [2025-02-19 01:54:55,360][00376] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-02-19 01:54:55,361][00376] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-02-19 01:54:55,385][00376] RunningMeanStd input shape: (3, 72, 128) [2025-02-19 01:54:55,386][00376] RunningMeanStd input shape: (1,) [2025-02-19 01:54:55,398][00376] ConvEncoder: input_channels=3 [2025-02-19 01:54:55,432][00376] Conv encoder output size: 512 [2025-02-19 01:54:55,433][00376] Policy head output size: 512 [2025-02-19 01:54:55,450][00376] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-19 01:54:55,874][00376] Num frames 100... [2025-02-19 01:54:56,000][00376] Num frames 200... [2025-02-19 01:54:56,141][00376] Num frames 300... [2025-02-19 01:54:56,309][00376] Avg episode rewards: #0: 5.840, true rewards: #0: 3.840 [2025-02-19 01:54:56,310][00376] Avg episode reward: 5.840, avg true_objective: 3.840 [2025-02-19 01:54:56,333][00376] Num frames 400... [2025-02-19 01:54:56,464][00376] Num frames 500... [2025-02-19 01:54:56,591][00376] Num frames 600... [2025-02-19 01:54:56,714][00376] Num frames 700... [2025-02-19 01:54:56,840][00376] Num frames 800... [2025-02-19 01:54:56,964][00376] Num frames 900... [2025-02-19 01:54:57,055][00376] Avg episode rewards: #0: 7.640, true rewards: #0: 4.640 [2025-02-19 01:54:57,056][00376] Avg episode reward: 7.640, avg true_objective: 4.640 [2025-02-19 01:54:57,150][00376] Num frames 1000... [2025-02-19 01:54:57,283][00376] Num frames 1100... [2025-02-19 01:54:57,408][00376] Num frames 1200... [2025-02-19 01:54:57,535][00376] Num frames 1300... [2025-02-19 01:54:57,664][00376] Num frames 1400... [2025-02-19 01:54:57,789][00376] Num frames 1500... [2025-02-19 01:54:57,916][00376] Num frames 1600... [2025-02-19 01:54:58,043][00376] Num frames 1700... [2025-02-19 01:54:58,173][00376] Num frames 1800... [2025-02-19 01:54:58,311][00376] Num frames 1900... [2025-02-19 01:54:58,434][00376] Avg episode rewards: #0: 12.840, true rewards: #0: 6.507 [2025-02-19 01:54:58,435][00376] Avg episode reward: 12.840, avg true_objective: 6.507 [2025-02-19 01:54:58,498][00376] Num frames 2000... [2025-02-19 01:54:58,626][00376] Num frames 2100... [2025-02-19 01:54:58,753][00376] Num frames 2200... [2025-02-19 01:54:58,884][00376] Num frames 2300... [2025-02-19 01:54:59,012][00376] Num frames 2400... [2025-02-19 01:54:59,139][00376] Num frames 2500... [2025-02-19 01:54:59,273][00376] Num frames 2600... [2025-02-19 01:54:59,430][00376] Num frames 2700... [2025-02-19 01:54:59,561][00376] Avg episode rewards: #0: 13.643, true rewards: #0: 6.892 [2025-02-19 01:54:59,562][00376] Avg episode reward: 13.643, avg true_objective: 6.892 [2025-02-19 01:54:59,617][00376] Num frames 2800... [2025-02-19 01:54:59,741][00376] Num frames 2900... [2025-02-19 01:54:59,866][00376] Num frames 3000... [2025-02-19 01:54:59,991][00376] Num frames 3100... [2025-02-19 01:55:00,132][00376] Num frames 3200... [2025-02-19 01:55:00,259][00376] Num frames 3300... [2025-02-19 01:55:00,398][00376] Num frames 3400... [2025-02-19 01:55:00,525][00376] Num frames 3500... [2025-02-19 01:55:00,655][00376] Avg episode rewards: #0: 14.714, true rewards: #0: 7.114 [2025-02-19 01:55:00,656][00376] Avg episode reward: 14.714, avg true_objective: 7.114 [2025-02-19 01:55:00,712][00376] Num frames 3600... [2025-02-19 01:55:00,838][00376] Num frames 3700... [2025-02-19 01:55:00,967][00376] Num frames 3800... [2025-02-19 01:55:01,096][00376] Num frames 3900... [2025-02-19 01:55:01,222][00376] Num frames 4000... [2025-02-19 01:55:01,358][00376] Num frames 4100... [2025-02-19 01:55:01,484][00376] Num frames 4200... [2025-02-19 01:55:01,622][00376] Num frames 4300... [2025-02-19 01:55:01,788][00376] Avg episode rewards: #0: 14.982, true rewards: #0: 7.315 [2025-02-19 01:55:01,789][00376] Avg episode reward: 14.982, avg true_objective: 7.315 [2025-02-19 01:55:01,806][00376] Num frames 4400... [2025-02-19 01:55:01,932][00376] Num frames 4500... [2025-02-19 01:55:02,061][00376] Num frames 4600... [2025-02-19 01:55:02,191][00376] Num frames 4700... [2025-02-19 01:55:02,317][00376] Num frames 4800... [2025-02-19 01:55:02,452][00376] Num frames 4900... [2025-02-19 01:55:02,578][00376] Num frames 5000... [2025-02-19 01:55:02,704][00376] Num frames 5100... [2025-02-19 01:55:02,830][00376] Avg episode rewards: #0: 14.653, true rewards: #0: 7.367 [2025-02-19 01:55:02,831][00376] Avg episode reward: 14.653, avg true_objective: 7.367 [2025-02-19 01:55:02,886][00376] Num frames 5200... [2025-02-19 01:55:03,019][00376] Num frames 5300... [2025-02-19 01:55:03,148][00376] Num frames 5400... [2025-02-19 01:55:03,276][00376] Num frames 5500... [2025-02-19 01:55:03,414][00376] Num frames 5600... [2025-02-19 01:55:03,543][00376] Num frames 5700... [2025-02-19 01:55:03,669][00376] Num frames 5800... [2025-02-19 01:55:03,794][00376] Num frames 5900... [2025-02-19 01:55:03,918][00376] Num frames 6000... [2025-02-19 01:55:04,079][00376] Num frames 6100... [2025-02-19 01:55:04,257][00376] Num frames 6200... [2025-02-19 01:55:04,445][00376] Num frames 6300... [2025-02-19 01:55:04,613][00376] Num frames 6400... [2025-02-19 01:55:04,786][00376] Num frames 6500... [2025-02-19 01:55:04,953][00376] Num frames 6600... [2025-02-19 01:55:05,120][00376] Num frames 6700... [2025-02-19 01:55:05,294][00376] Num frames 6800... [2025-02-19 01:55:05,485][00376] Num frames 6900... [2025-02-19 01:55:05,662][00376] Num frames 7000... [2025-02-19 01:55:05,846][00376] Num frames 7100... [2025-02-19 01:55:06,025][00376] Num frames 7200... [2025-02-19 01:55:06,167][00376] Avg episode rewards: #0: 19.571, true rewards: #0: 9.071 [2025-02-19 01:55:06,168][00376] Avg episode reward: 19.571, avg true_objective: 9.071 [2025-02-19 01:55:06,226][00376] Num frames 7300... [2025-02-19 01:55:06,353][00376] Num frames 7400... [2025-02-19 01:55:06,484][00376] Num frames 7500... [2025-02-19 01:55:06,620][00376] Num frames 7600... [2025-02-19 01:55:06,748][00376] Num frames 7700... [2025-02-19 01:55:06,877][00376] Num frames 7800... [2025-02-19 01:55:07,009][00376] Num frames 7900... [2025-02-19 01:55:07,140][00376] Num frames 8000... [2025-02-19 01:55:07,269][00376] Num frames 8100... [2025-02-19 01:55:07,398][00376] Num frames 8200... [2025-02-19 01:55:07,565][00376] Avg episode rewards: #0: 19.979, true rewards: #0: 9.201 [2025-02-19 01:55:07,566][00376] Avg episode reward: 19.979, avg true_objective: 9.201 [2025-02-19 01:55:07,593][00376] Num frames 8300... [2025-02-19 01:55:07,720][00376] Num frames 8400... [2025-02-19 01:55:07,849][00376] Num frames 8500... [2025-02-19 01:55:07,975][00376] Num frames 8600... [2025-02-19 01:55:08,106][00376] Num frames 8700... [2025-02-19 01:55:08,236][00376] Num frames 8800... [2025-02-19 01:55:08,365][00376] Num frames 8900... [2025-02-19 01:55:08,494][00376] Num frames 9000... [2025-02-19 01:55:08,675][00376] Avg episode rewards: #0: 19.791, true rewards: #0: 9.091 [2025-02-19 01:55:08,676][00376] Avg episode reward: 19.791, avg true_objective: 9.091 [2025-02-19 01:55:58,265][00376] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-02-19 01:56:08,464][00376] The model has been pushed to https://huggingface.co/kate0711/rl_course_vizdoom_health_gathering_supreme [2025-02-19 01:57:04,385][00376] Loading legacy config file train_dir/doom_health_gathering_supreme_2222/cfg.json instead of train_dir/doom_health_gathering_supreme_2222/config.json [2025-02-19 01:57:04,386][00376] Loading existing experiment configuration from train_dir/doom_health_gathering_supreme_2222/config.json [2025-02-19 01:57:04,386][00376] Overriding arg 'experiment' with value 'doom_health_gathering_supreme_2222' passed from command line [2025-02-19 01:57:04,387][00376] Overriding arg 'train_dir' with value 'train_dir' passed from command line [2025-02-19 01:57:04,388][00376] Overriding arg 'num_workers' with value 1 passed from command line [2025-02-19 01:57:04,389][00376] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file! [2025-02-19 01:57:04,390][00376] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file! [2025-02-19 01:57:04,391][00376] Adding new argument 'env_gpu_observations'=True that is not in the saved config file! [2025-02-19 01:57:04,392][00376] Adding new argument 'no_render'=True that is not in the saved config file! [2025-02-19 01:57:04,393][00376] Adding new argument 'save_video'=True that is not in the saved config file! [2025-02-19 01:57:04,394][00376] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-02-19 01:57:04,399][00376] Adding new argument 'video_name'=None that is not in the saved config file! [2025-02-19 01:57:04,400][00376] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-02-19 01:57:04,401][00376] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-02-19 01:57:04,402][00376] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-02-19 01:57:04,403][00376] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-02-19 01:57:04,403][00376] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-02-19 01:57:04,404][00376] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-02-19 01:57:04,407][00376] Adding new argument 'train_script'=None that is not in the saved config file! [2025-02-19 01:57:04,408][00376] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-02-19 01:57:04,409][00376] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-02-19 01:57:04,452][00376] RunningMeanStd input shape: (3, 72, 128) [2025-02-19 01:57:04,453][00376] RunningMeanStd input shape: (1,) [2025-02-19 01:57:04,472][00376] ConvEncoder: input_channels=3 [2025-02-19 01:57:04,532][00376] Conv encoder output size: 512 [2025-02-19 01:57:04,534][00376] Policy head output size: 512 [2025-02-19 01:57:04,568][00376] Loading state from checkpoint train_dir/doom_health_gathering_supreme_2222/checkpoint_p0/checkpoint_000539850_4422451200.pth... [2025-02-19 01:57:05,246][00376] Num frames 100... [2025-02-19 01:57:05,413][00376] Num frames 200... [2025-02-19 01:57:05,583][00376] Num frames 300... [2025-02-19 01:57:05,768][00376] Num frames 400... [2025-02-19 01:57:05,942][00376] Num frames 500... [2025-02-19 01:57:06,126][00376] Num frames 600... [2025-02-19 01:57:06,307][00376] Num frames 700... [2025-02-19 01:57:06,461][00376] Num frames 800... [2025-02-19 01:57:06,512][00376] Avg episode rewards: #0: 16.000, true rewards: #0: 8.000 [2025-02-19 01:57:06,514][00376] Avg episode reward: 16.000, avg true_objective: 8.000 [2025-02-19 01:57:06,646][00376] Num frames 900... [2025-02-19 01:57:06,784][00376] Num frames 1000... [2025-02-19 01:57:06,913][00376] Num frames 1100... [2025-02-19 01:57:07,053][00376] Num frames 1200... [2025-02-19 01:57:07,180][00376] Num frames 1300... [2025-02-19 01:57:07,315][00376] Num frames 1400... [2025-02-19 01:57:07,461][00376] Num frames 1500... [2025-02-19 01:57:07,598][00376] Num frames 1600... [2025-02-19 01:57:07,728][00376] Num frames 1700... [2025-02-19 01:57:07,869][00376] Num frames 1800... [2025-02-19 01:57:08,003][00376] Num frames 1900... [2025-02-19 01:57:08,136][00376] Num frames 2000... [2025-02-19 01:57:08,267][00376] Num frames 2100... [2025-02-19 01:57:08,402][00376] Num frames 2200... [2025-02-19 01:57:08,537][00376] Num frames 2300... [2025-02-19 01:57:08,666][00376] Num frames 2400... [2025-02-19 01:57:08,792][00376] Avg episode rewards: #0: 32.775, true rewards: #0: 12.275 [2025-02-19 01:57:08,794][00376] Avg episode reward: 32.775, avg true_objective: 12.275 [2025-02-19 01:57:08,854][00376] Num frames 2500... [2025-02-19 01:57:08,981][00376] Num frames 2600... [2025-02-19 01:57:09,111][00376] Num frames 2700... [2025-02-19 01:57:09,237][00376] Num frames 2800... [2025-02-19 01:57:09,369][00376] Num frames 2900... [2025-02-19 01:57:09,497][00376] Num frames 3000... [2025-02-19 01:57:09,629][00376] Num frames 3100... [2025-02-19 01:57:09,759][00376] Num frames 3200... [2025-02-19 01:57:09,895][00376] Num frames 3300... [2025-02-19 01:57:10,032][00376] Num frames 3400... [2025-02-19 01:57:10,161][00376] Num frames 3500... [2025-02-19 01:57:10,297][00376] Num frames 3600... [2025-02-19 01:57:10,429][00376] Num frames 3700... [2025-02-19 01:57:10,562][00376] Num frames 3800... [2025-02-19 01:57:10,694][00376] Num frames 3900... [2025-02-19 01:57:10,824][00376] Num frames 4000... [2025-02-19 01:57:10,963][00376] Num frames 4100... [2025-02-19 01:57:11,097][00376] Num frames 4200... [2025-02-19 01:57:11,230][00376] Num frames 4300... [2025-02-19 01:57:11,360][00376] Num frames 4400... [2025-02-19 01:57:11,494][00376] Num frames 4500... [2025-02-19 01:57:11,625][00376] Avg episode rewards: #0: 45.183, true rewards: #0: 15.183 [2025-02-19 01:57:11,626][00376] Avg episode reward: 45.183, avg true_objective: 15.183 [2025-02-19 01:57:11,685][00376] Num frames 4600... [2025-02-19 01:57:11,814][00376] Num frames 4700... [2025-02-19 01:57:11,950][00376] Num frames 4800... [2025-02-19 01:57:12,080][00376] Num frames 4900... [2025-02-19 01:57:12,208][00376] Num frames 5000... [2025-02-19 01:57:12,339][00376] Num frames 5100... [2025-02-19 01:57:12,471][00376] Num frames 5200... [2025-02-19 01:57:12,600][00376] Num frames 5300... [2025-02-19 01:57:12,728][00376] Num frames 5400... [2025-02-19 01:57:12,855][00376] Num frames 5500... [2025-02-19 01:57:12,991][00376] Num frames 5600... [2025-02-19 01:57:13,123][00376] Num frames 5700... [2025-02-19 01:57:13,251][00376] Num frames 5800... [2025-02-19 01:57:13,383][00376] Num frames 5900... [2025-02-19 01:57:13,516][00376] Num frames 6000... [2025-02-19 01:57:13,648][00376] Num frames 6100... [2025-02-19 01:57:13,778][00376] Num frames 6200... [2025-02-19 01:57:13,909][00376] Num frames 6300... [2025-02-19 01:57:14,050][00376] Num frames 6400... [2025-02-19 01:57:14,183][00376] Num frames 6500... [2025-02-19 01:57:14,312][00376] Num frames 6600... [2025-02-19 01:57:14,438][00376] Avg episode rewards: #0: 48.637, true rewards: #0: 16.638 [2025-02-19 01:57:14,439][00376] Avg episode reward: 48.637, avg true_objective: 16.638 [2025-02-19 01:57:14,499][00376] Num frames 6700... [2025-02-19 01:57:14,632][00376] Num frames 6800... [2025-02-19 01:57:14,762][00376] Num frames 6900... [2025-02-19 01:57:14,894][00376] Num frames 7000... [2025-02-19 01:57:15,037][00376] Num frames 7100... [2025-02-19 01:57:15,171][00376] Num frames 7200... [2025-02-19 01:57:15,304][00376] Num frames 7300... [2025-02-19 01:57:15,449][00376] Num frames 7400... [2025-02-19 01:57:15,585][00376] Num frames 7500... [2025-02-19 01:57:15,716][00376] Num frames 7600... [2025-02-19 01:57:15,847][00376] Num frames 7700... [2025-02-19 01:57:15,988][00376] Num frames 7800... [2025-02-19 01:57:16,121][00376] Num frames 7900... [2025-02-19 01:57:16,253][00376] Num frames 8000... [2025-02-19 01:57:16,395][00376] Num frames 8100... [2025-02-19 01:57:16,575][00376] Num frames 8200... [2025-02-19 01:57:16,753][00376] Num frames 8300... [2025-02-19 01:57:16,929][00376] Num frames 8400... [2025-02-19 01:57:17,123][00376] Num frames 8500... [2025-02-19 01:57:17,300][00376] Num frames 8600... [2025-02-19 01:57:17,472][00376] Num frames 8700... [2025-02-19 01:57:17,624][00376] Avg episode rewards: #0: 51.709, true rewards: #0: 17.510 [2025-02-19 01:57:17,625][00376] Avg episode reward: 51.709, avg true_objective: 17.510 [2025-02-19 01:57:17,715][00376] Num frames 8800... [2025-02-19 01:57:17,896][00376] Num frames 8900... [2025-02-19 01:57:18,085][00376] Num frames 9000... [2025-02-19 01:57:18,274][00376] Num frames 9100... [2025-02-19 01:57:18,459][00376] Num frames 9200... [2025-02-19 01:57:18,605][00376] Num frames 9300... [2025-02-19 01:57:18,734][00376] Num frames 9400... [2025-02-19 01:57:18,865][00376] Num frames 9500... [2025-02-19 01:57:18,994][00376] Num frames 9600... [2025-02-19 01:57:19,129][00376] Num frames 9700... [2025-02-19 01:57:19,267][00376] Num frames 9800... [2025-02-19 01:57:19,396][00376] Num frames 9900... [2025-02-19 01:57:19,528][00376] Num frames 10000... [2025-02-19 01:57:19,662][00376] Num frames 10100... [2025-02-19 01:57:19,793][00376] Num frames 10200... [2025-02-19 01:57:19,938][00376] Num frames 10300... [2025-02-19 01:57:20,081][00376] Num frames 10400... [2025-02-19 01:57:20,224][00376] Num frames 10500... [2025-02-19 01:57:20,357][00376] Num frames 10600... [2025-02-19 01:57:20,487][00376] Num frames 10700... [2025-02-19 01:57:20,622][00376] Num frames 10800... [2025-02-19 01:57:20,750][00376] Avg episode rewards: #0: 54.091, true rewards: #0: 18.092 [2025-02-19 01:57:20,751][00376] Avg episode reward: 54.091, avg true_objective: 18.092 [2025-02-19 01:57:20,811][00376] Num frames 10900... [2025-02-19 01:57:20,940][00376] Num frames 11000... [2025-02-19 01:57:21,073][00376] Num frames 11100... [2025-02-19 01:57:21,208][00376] Num frames 11200... [2025-02-19 01:57:21,340][00376] Num frames 11300... [2025-02-19 01:57:21,469][00376] Num frames 11400... [2025-02-19 01:57:21,597][00376] Num frames 11500... [2025-02-19 01:57:21,727][00376] Num frames 11600... [2025-02-19 01:57:21,854][00376] Num frames 11700... [2025-02-19 01:57:21,983][00376] Num frames 11800... [2025-02-19 01:57:22,116][00376] Num frames 11900... [2025-02-19 01:57:22,253][00376] Num frames 12000... [2025-02-19 01:57:22,385][00376] Num frames 12100... [2025-02-19 01:57:22,516][00376] Num frames 12200... [2025-02-19 01:57:22,646][00376] Num frames 12300... [2025-02-19 01:57:22,777][00376] Num frames 12400... [2025-02-19 01:57:22,908][00376] Num frames 12500... [2025-02-19 01:57:23,040][00376] Num frames 12600... [2025-02-19 01:57:23,169][00376] Num frames 12700... [2025-02-19 01:57:23,309][00376] Num frames 12800... [2025-02-19 01:57:23,443][00376] Num frames 12900... [2025-02-19 01:57:23,571][00376] Avg episode rewards: #0: 55.506, true rewards: #0: 18.507 [2025-02-19 01:57:23,572][00376] Avg episode reward: 55.506, avg true_objective: 18.507 [2025-02-19 01:57:23,632][00376] Num frames 13000... [2025-02-19 01:57:23,761][00376] Num frames 13100... [2025-02-19 01:57:23,892][00376] Num frames 13200... [2025-02-19 01:57:24,027][00376] Num frames 13300... [2025-02-19 01:57:24,157][00376] Num frames 13400... [2025-02-19 01:57:24,295][00376] Num frames 13500... [2025-02-19 01:57:24,427][00376] Num frames 13600... [2025-02-19 01:57:24,560][00376] Num frames 13700... [2025-02-19 01:57:24,693][00376] Num frames 13800... [2025-02-19 01:57:24,825][00376] Num frames 13900... [2025-02-19 01:57:24,960][00376] Num frames 14000... [2025-02-19 01:57:25,096][00376] Num frames 14100... [2025-02-19 01:57:25,227][00376] Num frames 14200... [2025-02-19 01:57:25,376][00376] Num frames 14300... [2025-02-19 01:57:25,510][00376] Num frames 14400... [2025-02-19 01:57:25,640][00376] Num frames 14500... [2025-02-19 01:57:25,772][00376] Num frames 14600... [2025-02-19 01:57:25,905][00376] Num frames 14700... [2025-02-19 01:57:26,038][00376] Num frames 14800... [2025-02-19 01:57:26,168][00376] Num frames 14900... [2025-02-19 01:57:26,306][00376] Num frames 15000... [2025-02-19 01:57:26,439][00376] Avg episode rewards: #0: 56.068, true rewards: #0: 18.819 [2025-02-19 01:57:26,440][00376] Avg episode reward: 56.068, avg true_objective: 18.819 [2025-02-19 01:57:26,500][00376] Num frames 15100... [2025-02-19 01:57:26,629][00376] Num frames 15200... [2025-02-19 01:57:26,761][00376] Num frames 15300... [2025-02-19 01:57:26,894][00376] Num frames 15400... [2025-02-19 01:57:27,031][00376] Num frames 15500... [2025-02-19 01:57:27,160][00376] Num frames 15600... [2025-02-19 01:57:27,304][00376] Num frames 15700... [2025-02-19 01:57:27,447][00376] Num frames 15800... [2025-02-19 01:57:27,580][00376] Num frames 15900... [2025-02-19 01:57:27,714][00376] Num frames 16000... [2025-02-19 01:57:27,845][00376] Num frames 16100... [2025-02-19 01:57:27,987][00376] Num frames 16200... [2025-02-19 01:57:28,126][00376] Num frames 16300... [2025-02-19 01:57:28,262][00376] Num frames 16400... [2025-02-19 01:57:28,404][00376] Num frames 16500... [2025-02-19 01:57:28,550][00376] Num frames 16600... [2025-02-19 01:57:28,734][00376] Num frames 16700... [2025-02-19 01:57:28,906][00376] Num frames 16800... [2025-02-19 01:57:29,083][00376] Num frames 16900... [2025-02-19 01:57:29,262][00376] Num frames 17000... [2025-02-19 01:57:29,457][00376] Num frames 17100... [2025-02-19 01:57:29,619][00376] Avg episode rewards: #0: 57.282, true rewards: #0: 19.061 [2025-02-19 01:57:29,620][00376] Avg episode reward: 57.282, avg true_objective: 19.061 [2025-02-19 01:57:29,698][00376] Num frames 17200... [2025-02-19 01:57:29,874][00376] Num frames 17300... [2025-02-19 01:57:30,055][00376] Num frames 17400... [2025-02-19 01:57:30,232][00376] Num frames 17500... [2025-02-19 01:57:30,418][00376] Num frames 17600... [2025-02-19 01:57:30,612][00376] Num frames 17700... [2025-02-19 01:57:30,765][00376] Num frames 17800... [2025-02-19 01:57:30,898][00376] Num frames 17900... [2025-02-19 01:57:31,032][00376] Num frames 18000... [2025-02-19 01:57:31,167][00376] Num frames 18100... [2025-02-19 01:57:31,301][00376] Num frames 18200... [2025-02-19 01:57:31,434][00376] Num frames 18300... [2025-02-19 01:57:31,573][00376] Num frames 18400... [2025-02-19 01:57:31,704][00376] Num frames 18500... [2025-02-19 01:57:31,867][00376] Num frames 18600... [2025-02-19 01:57:32,003][00376] Num frames 18700... [2025-02-19 01:57:32,144][00376] Num frames 18800... [2025-02-19 01:57:32,278][00376] Num frames 18900... [2025-02-19 01:57:32,410][00376] Num frames 19000... [2025-02-19 01:57:32,550][00376] Num frames 19100... [2025-02-19 01:57:32,688][00376] Num frames 19200... [2025-02-19 01:57:32,819][00376] Avg episode rewards: #0: 57.254, true rewards: #0: 19.255 [2025-02-19 01:57:32,820][00376] Avg episode reward: 57.254, avg true_objective: 19.255 [2025-02-19 01:59:18,991][00376] Replay video saved to train_dir/doom_health_gathering_supreme_2222/replay.mp4! [2025-02-19 02:02:36,854][00376] Environment doom_basic already registered, overwriting... [2025-02-19 02:02:36,857][00376] Environment doom_two_colors_easy already registered, overwriting... [2025-02-19 02:02:36,858][00376] Environment doom_two_colors_hard already registered, overwriting... [2025-02-19 02:02:36,859][00376] Environment doom_dm already registered, overwriting... [2025-02-19 02:02:36,860][00376] Environment doom_dwango5 already registered, overwriting... [2025-02-19 02:02:36,861][00376] Environment doom_my_way_home_flat_actions already registered, overwriting... [2025-02-19 02:02:36,861][00376] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2025-02-19 02:02:36,862][00376] Environment doom_my_way_home already registered, overwriting... [2025-02-19 02:02:36,863][00376] Environment doom_deadly_corridor already registered, overwriting... [2025-02-19 02:02:36,864][00376] Environment doom_defend_the_center already registered, overwriting... [2025-02-19 02:02:36,865][00376] Environment doom_defend_the_line already registered, overwriting... [2025-02-19 02:02:36,867][00376] Environment doom_health_gathering already registered, overwriting... [2025-02-19 02:02:36,868][00376] Environment doom_health_gathering_supreme already registered, overwriting... [2025-02-19 02:02:36,869][00376] Environment doom_battle already registered, overwriting... [2025-02-19 02:02:36,870][00376] Environment doom_battle2 already registered, overwriting... [2025-02-19 02:02:36,871][00376] Environment doom_duel_bots already registered, overwriting... [2025-02-19 02:02:36,872][00376] Environment doom_deathmatch_bots already registered, overwriting... [2025-02-19 02:02:36,873][00376] Environment doom_duel already registered, overwriting... [2025-02-19 02:02:36,873][00376] Environment doom_deathmatch_full already registered, overwriting... [2025-02-19 02:02:36,874][00376] Environment doom_benchmark already registered, overwriting... [2025-02-19 02:02:36,876][00376] register_encoder_factory: [2025-02-19 02:02:36,894][00376] Loading legacy config file train_dir/doom_deathmatch_bots_2222/cfg.json instead of train_dir/doom_deathmatch_bots_2222/config.json [2025-02-19 02:02:36,896][00376] Loading existing experiment configuration from train_dir/doom_deathmatch_bots_2222/config.json [2025-02-19 02:02:36,897][00376] Overriding arg 'experiment' with value 'doom_deathmatch_bots_2222' passed from command line [2025-02-19 02:02:36,899][00376] Overriding arg 'train_dir' with value 'train_dir' passed from command line [2025-02-19 02:02:36,900][00376] Overriding arg 'num_workers' with value 1 passed from command line [2025-02-19 02:02:36,901][00376] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file! [2025-02-19 02:02:36,902][00376] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file! [2025-02-19 02:02:36,902][00376] Adding new argument 'env_gpu_observations'=True that is not in the saved config file! [2025-02-19 02:02:36,903][00376] Adding new argument 'no_render'=True that is not in the saved config file! [2025-02-19 02:02:36,904][00376] Adding new argument 'save_video'=True that is not in the saved config file! [2025-02-19 02:02:36,905][00376] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-02-19 02:02:36,905][00376] Adding new argument 'video_name'=None that is not in the saved config file! [2025-02-19 02:02:36,906][00376] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-02-19 02:02:36,907][00376] Adding new argument 'max_num_episodes'=1 that is not in the saved config file! [2025-02-19 02:02:36,908][00376] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-02-19 02:02:36,909][00376] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-02-19 02:02:36,911][00376] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-02-19 02:02:36,913][00376] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-02-19 02:02:36,915][00376] Adding new argument 'train_script'=None that is not in the saved config file! [2025-02-19 02:02:36,916][00376] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-02-19 02:02:36,917][00376] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-02-19 02:02:36,971][00376] Port 40300 is available [2025-02-19 02:02:36,972][00376] Using port 40300 [2025-02-19 02:02:36,974][00376] RunningMeanStd input shape: (23,) [2025-02-19 02:02:36,975][00376] RunningMeanStd input shape: (3, 72, 128) [2025-02-19 02:02:36,976][00376] RunningMeanStd input shape: (1,) [2025-02-19 02:02:36,991][00376] ConvEncoder: input_channels=3 [2025-02-19 02:02:37,048][00376] Conv encoder output size: 512 [2025-02-19 02:02:37,050][00376] Policy head output size: 640 [2025-02-19 02:02:37,117][00376] Loading state from checkpoint train_dir/doom_deathmatch_bots_2222/checkpoint_p0/checkpoint_000282220_2311946240.pth... [2025-02-19 02:03:57,922][00376] Environment doom_basic already registered, overwriting... [2025-02-19 02:03:57,924][00376] Environment doom_two_colors_easy already registered, overwriting... [2025-02-19 02:03:57,925][00376] Environment doom_two_colors_hard already registered, overwriting... [2025-02-19 02:03:57,925][00376] Environment doom_dm already registered, overwriting... [2025-02-19 02:03:57,926][00376] Environment doom_dwango5 already registered, overwriting... [2025-02-19 02:03:57,927][00376] Environment doom_my_way_home_flat_actions already registered, overwriting... [2025-02-19 02:03:57,928][00376] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2025-02-19 02:03:57,929][00376] Environment doom_my_way_home already registered, overwriting... [2025-02-19 02:03:57,930][00376] Environment doom_deadly_corridor already registered, overwriting... [2025-02-19 02:03:57,932][00376] Environment doom_defend_the_center already registered, overwriting... [2025-02-19 02:03:57,933][00376] Environment doom_defend_the_line already registered, overwriting... [2025-02-19 02:03:57,934][00376] Environment doom_health_gathering already registered, overwriting... [2025-02-19 02:03:57,934][00376] Environment doom_health_gathering_supreme already registered, overwriting... [2025-02-19 02:03:57,935][00376] Environment doom_battle already registered, overwriting... [2025-02-19 02:03:57,936][00376] Environment doom_battle2 already registered, overwriting... [2025-02-19 02:03:57,937][00376] Environment doom_duel_bots already registered, overwriting... [2025-02-19 02:03:57,938][00376] Environment doom_deathmatch_bots already registered, overwriting... [2025-02-19 02:03:57,939][00376] Environment doom_duel already registered, overwriting... [2025-02-19 02:03:57,940][00376] Environment doom_deathmatch_full already registered, overwriting... [2025-02-19 02:03:57,940][00376] Environment doom_benchmark already registered, overwriting... [2025-02-19 02:03:57,941][00376] register_encoder_factory: [2025-02-19 02:03:57,954][00376] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-02-19 02:03:57,954][00376] Overriding arg 'train_for_env_steps' with value 8000000 passed from command line [2025-02-19 02:03:57,960][00376] Experiment dir /content/train_dir/default_experiment already exists! [2025-02-19 02:03:57,960][00376] Resuming existing experiment from /content/train_dir/default_experiment... [2025-02-19 02:03:57,962][00376] Weights and Biases integration disabled [2025-02-19 02:03:57,964][00376] Environment var CUDA_VISIBLE_DEVICES is 0 [2025-02-19 02:04:00,119][00376] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=8000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2025-02-19 02:04:00,120][00376] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-02-19 02:04:00,122][00376] Rollout worker 0 uses device cpu [2025-02-19 02:04:00,123][00376] Rollout worker 1 uses device cpu [2025-02-19 02:04:00,124][00376] Rollout worker 2 uses device cpu [2025-02-19 02:04:00,125][00376] Rollout worker 3 uses device cpu [2025-02-19 02:04:00,126][00376] Rollout worker 4 uses device cpu [2025-02-19 02:04:00,127][00376] Rollout worker 5 uses device cpu [2025-02-19 02:04:00,128][00376] Rollout worker 6 uses device cpu [2025-02-19 02:04:00,129][00376] Rollout worker 7 uses device cpu [2025-02-19 02:04:00,206][00376] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 02:04:00,207][00376] InferenceWorker_p0-w0: min num requests: 2 [2025-02-19 02:04:00,237][00376] Starting all processes... [2025-02-19 02:04:00,238][00376] Starting process learner_proc0 [2025-02-19 02:04:00,288][00376] Starting all processes... [2025-02-19 02:04:00,293][00376] Starting process inference_proc0-0 [2025-02-19 02:04:00,294][00376] Starting process rollout_proc0 [2025-02-19 02:04:00,295][00376] Starting process rollout_proc1 [2025-02-19 02:04:00,295][00376] Starting process rollout_proc2 [2025-02-19 02:04:00,295][00376] Starting process rollout_proc3 [2025-02-19 02:04:00,295][00376] Starting process rollout_proc4 [2025-02-19 02:04:00,296][00376] Starting process rollout_proc5 [2025-02-19 02:04:00,296][00376] Starting process rollout_proc6 [2025-02-19 02:04:00,296][00376] Starting process rollout_proc7 [2025-02-19 02:04:15,443][12390] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 02:04:15,445][12390] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-02-19 02:04:15,569][12390] Num visible devices: 1 [2025-02-19 02:04:15,707][12393] Worker 2 uses CPU cores [0] [2025-02-19 02:04:15,854][12377] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 02:04:15,856][12377] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-02-19 02:04:15,924][12392] Worker 1 uses CPU cores [1] [2025-02-19 02:04:15,940][12377] Num visible devices: 1 [2025-02-19 02:04:15,956][12377] Starting seed is not provided [2025-02-19 02:04:15,957][12377] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 02:04:15,957][12377] Initializing actor-critic model on device cuda:0 [2025-02-19 02:04:15,958][12377] RunningMeanStd input shape: (3, 72, 128) [2025-02-19 02:04:15,960][12377] RunningMeanStd input shape: (1,) [2025-02-19 02:04:16,031][12377] ConvEncoder: input_channels=3 [2025-02-19 02:04:16,096][12397] Worker 6 uses CPU cores [0] [2025-02-19 02:04:16,320][12394] Worker 3 uses CPU cores [1] [2025-02-19 02:04:16,383][12395] Worker 4 uses CPU cores [0] [2025-02-19 02:04:16,393][12391] Worker 0 uses CPU cores [0] [2025-02-19 02:04:16,416][12396] Worker 5 uses CPU cores [1] [2025-02-19 02:04:16,472][12398] Worker 7 uses CPU cores [1] [2025-02-19 02:04:16,476][12377] Conv encoder output size: 512 [2025-02-19 02:04:16,476][12377] Policy head output size: 512 [2025-02-19 02:04:16,493][12377] Created Actor Critic model with architecture: [2025-02-19 02:04:16,494][12377] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-02-19 02:04:16,713][12377] Using optimizer [2025-02-19 02:04:17,653][12377] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-19 02:04:17,692][12377] Loading model from checkpoint [2025-02-19 02:04:17,694][12377] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2025-02-19 02:04:17,694][12377] Initialized policy 0 weights for model version 978 [2025-02-19 02:04:17,697][12377] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 02:04:17,722][12377] LearnerWorker_p0 finished initialization! [2025-02-19 02:04:17,944][12390] RunningMeanStd input shape: (3, 72, 128) [2025-02-19 02:04:17,945][12390] RunningMeanStd input shape: (1,) [2025-02-19 02:04:17,957][12390] ConvEncoder: input_channels=3 [2025-02-19 02:04:17,965][00376] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-19 02:04:18,065][12390] Conv encoder output size: 512 [2025-02-19 02:04:18,066][12390] Policy head output size: 512 [2025-02-19 02:04:18,104][00376] Inference worker 0-0 is ready! [2025-02-19 02:04:18,105][00376] All inference workers are ready! Signal rollout workers to start! [2025-02-19 02:04:18,354][12392] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 02:04:18,361][12398] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 02:04:18,377][12395] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 02:04:18,395][12394] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 02:04:18,400][12391] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 02:04:18,424][12393] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 02:04:18,425][12397] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 02:04:18,431][12396] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 02:04:19,697][12392] Decorrelating experience for 0 frames... [2025-02-19 02:04:19,752][12395] Decorrelating experience for 0 frames... [2025-02-19 02:04:19,749][12394] Decorrelating experience for 0 frames... [2025-02-19 02:04:19,766][12391] Decorrelating experience for 0 frames... [2025-02-19 02:04:19,776][12396] Decorrelating experience for 0 frames... [2025-02-19 02:04:19,774][12393] Decorrelating experience for 0 frames... [2025-02-19 02:04:20,198][00376] Heartbeat connected on Batcher_0 [2025-02-19 02:04:20,207][00376] Heartbeat connected on LearnerWorker_p0 [2025-02-19 02:04:20,237][00376] Heartbeat connected on InferenceWorker_p0-w0 [2025-02-19 02:04:20,708][12398] Decorrelating experience for 0 frames... [2025-02-19 02:04:20,754][12396] Decorrelating experience for 32 frames... [2025-02-19 02:04:21,167][12391] Decorrelating experience for 32 frames... [2025-02-19 02:04:21,181][12393] Decorrelating experience for 32 frames... [2025-02-19 02:04:21,991][12398] Decorrelating experience for 32 frames... [2025-02-19 02:04:22,129][12392] Decorrelating experience for 32 frames... [2025-02-19 02:04:22,597][12397] Decorrelating experience for 0 frames... [2025-02-19 02:04:22,705][12396] Decorrelating experience for 64 frames... [2025-02-19 02:04:22,965][00376] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-19 02:04:23,763][12398] Decorrelating experience for 64 frames... [2025-02-19 02:04:23,790][12395] Decorrelating experience for 32 frames... [2025-02-19 02:04:23,859][12392] Decorrelating experience for 64 frames... [2025-02-19 02:04:24,401][12397] Decorrelating experience for 32 frames... [2025-02-19 02:04:24,404][12391] Decorrelating experience for 64 frames... [2025-02-19 02:04:24,565][12393] Decorrelating experience for 64 frames... [2025-02-19 02:04:24,875][12394] Decorrelating experience for 32 frames... [2025-02-19 02:04:25,024][12392] Decorrelating experience for 96 frames... [2025-02-19 02:04:25,211][00376] Heartbeat connected on RolloutWorker_w1 [2025-02-19 02:04:25,638][12395] Decorrelating experience for 64 frames... [2025-02-19 02:04:25,836][12396] Decorrelating experience for 96 frames... [2025-02-19 02:04:26,046][00376] Heartbeat connected on RolloutWorker_w5 [2025-02-19 02:04:26,120][12391] Decorrelating experience for 96 frames... [2025-02-19 02:04:26,319][12393] Decorrelating experience for 96 frames... [2025-02-19 02:04:26,452][00376] Heartbeat connected on RolloutWorker_w0 [2025-02-19 02:04:26,782][00376] Heartbeat connected on RolloutWorker_w2 [2025-02-19 02:04:26,894][12394] Decorrelating experience for 64 frames... [2025-02-19 02:04:27,725][12398] Decorrelating experience for 96 frames... [2025-02-19 02:04:27,965][00376] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 60.8. Samples: 608. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-19 02:04:27,968][00376] Avg episode reward: [(0, '3.133')] [2025-02-19 02:04:28,240][00376] Heartbeat connected on RolloutWorker_w7 [2025-02-19 02:04:29,203][12395] Decorrelating experience for 96 frames... [2025-02-19 02:04:29,610][12394] Decorrelating experience for 96 frames... [2025-02-19 02:04:29,731][00376] Heartbeat connected on RolloutWorker_w4 [2025-02-19 02:04:29,811][12397] Decorrelating experience for 64 frames... [2025-02-19 02:04:30,141][00376] Heartbeat connected on RolloutWorker_w3 [2025-02-19 02:04:30,398][12377] Signal inference workers to stop experience collection... [2025-02-19 02:04:30,408][12390] InferenceWorker_p0-w0: stopping experience collection [2025-02-19 02:04:30,729][12397] Decorrelating experience for 96 frames... [2025-02-19 02:04:30,843][00376] Heartbeat connected on RolloutWorker_w6 [2025-02-19 02:04:31,434][12377] Signal inference workers to resume experience collection... [2025-02-19 02:04:31,435][12390] InferenceWorker_p0-w0: resuming experience collection [2025-02-19 02:04:32,965][00376] Fps is (10 sec: 1228.9, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 4018176. Throughput: 0: 153.7. Samples: 2306. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-02-19 02:04:32,969][00376] Avg episode reward: [(0, '5.898')] [2025-02-19 02:04:37,965][00376] Fps is (10 sec: 2867.2, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 4034560. Throughput: 0: 338.7. Samples: 6774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:04:37,970][00376] Avg episode reward: [(0, '11.782')] [2025-02-19 02:04:40,434][12390] Updated weights for policy 0, policy_version 988 (0.0092) [2025-02-19 02:04:42,965][00376] Fps is (10 sec: 3686.4, 60 sec: 1966.1, 300 sec: 1966.1). Total num frames: 4055040. Throughput: 0: 495.0. Samples: 12374. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:04:42,975][00376] Avg episode reward: [(0, '14.766')] [2025-02-19 02:04:47,969][00376] Fps is (10 sec: 4094.4, 60 sec: 2320.8, 300 sec: 2320.8). Total num frames: 4075520. Throughput: 0: 519.2. Samples: 15578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-19 02:04:47,973][00376] Avg episode reward: [(0, '18.135')] [2025-02-19 02:04:50,689][12390] Updated weights for policy 0, policy_version 998 (0.0017) [2025-02-19 02:04:52,967][00376] Fps is (10 sec: 3685.5, 60 sec: 2457.4, 300 sec: 2457.4). Total num frames: 4091904. Throughput: 0: 611.6. Samples: 21406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:04:52,968][00376] Avg episode reward: [(0, '18.600')] [2025-02-19 02:04:57,965][00376] Fps is (10 sec: 4097.6, 60 sec: 2764.8, 300 sec: 2764.8). Total num frames: 4116480. Throughput: 0: 693.0. Samples: 27718. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:04:57,967][00376] Avg episode reward: [(0, '19.955')] [2025-02-19 02:05:00,097][12390] Updated weights for policy 0, policy_version 1008 (0.0022) [2025-02-19 02:05:02,965][00376] Fps is (10 sec: 4916.4, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 4141056. Throughput: 0: 695.2. Samples: 31286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:05:02,966][00376] Avg episode reward: [(0, '24.133')] [2025-02-19 02:05:02,969][12377] Saving new best policy, reward=24.133! [2025-02-19 02:05:07,965][00376] Fps is (10 sec: 3686.4, 60 sec: 2949.1, 300 sec: 2949.1). Total num frames: 4153344. Throughput: 0: 817.1. Samples: 36768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:05:07,968][00376] Avg episode reward: [(0, '24.281')] [2025-02-19 02:05:08,001][12377] Saving new best policy, reward=24.281! [2025-02-19 02:05:10,640][12390] Updated weights for policy 0, policy_version 1018 (0.0027) [2025-02-19 02:05:12,965][00376] Fps is (10 sec: 3686.2, 60 sec: 3127.8, 300 sec: 3127.8). Total num frames: 4177920. Throughput: 0: 947.1. Samples: 43230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:05:12,970][00376] Avg episode reward: [(0, '22.890')] [2025-02-19 02:05:17,966][00376] Fps is (10 sec: 4914.5, 60 sec: 3276.7, 300 sec: 3276.7). Total num frames: 4202496. Throughput: 0: 985.2. Samples: 46640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:05:17,967][00376] Avg episode reward: [(0, '22.318')] [2025-02-19 02:05:20,596][12390] Updated weights for policy 0, policy_version 1028 (0.0027) [2025-02-19 02:05:22,965][00376] Fps is (10 sec: 3686.7, 60 sec: 3481.6, 300 sec: 3213.8). Total num frames: 4214784. Throughput: 0: 1004.6. Samples: 51980. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:05:22,966][00376] Avg episode reward: [(0, '21.336')] [2025-02-19 02:05:27,965][00376] Fps is (10 sec: 3686.9, 60 sec: 3891.2, 300 sec: 3335.3). Total num frames: 4239360. Throughput: 0: 1022.4. Samples: 58384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:05:27,968][00376] Avg episode reward: [(0, '21.090')] [2025-02-19 02:05:30,318][12390] Updated weights for policy 0, policy_version 1038 (0.0025) [2025-02-19 02:05:32,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3386.0). Total num frames: 4259840. Throughput: 0: 1025.1. Samples: 61704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:05:32,972][00376] Avg episode reward: [(0, '20.999')] [2025-02-19 02:05:37,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3379.2). Total num frames: 4276224. Throughput: 0: 1012.3. Samples: 66956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:05:37,966][00376] Avg episode reward: [(0, '21.297')] [2025-02-19 02:05:40,811][12390] Updated weights for policy 0, policy_version 1048 (0.0022) [2025-02-19 02:05:42,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3469.6). Total num frames: 4300800. Throughput: 0: 1026.9. Samples: 73928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:05:42,966][00376] Avg episode reward: [(0, '21.478')] [2025-02-19 02:05:47,967][00376] Fps is (10 sec: 4504.5, 60 sec: 4096.1, 300 sec: 3504.3). Total num frames: 4321280. Throughput: 0: 1024.4. Samples: 77386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:05:47,975][00376] Avg episode reward: [(0, '21.450')] [2025-02-19 02:05:51,229][12390] Updated weights for policy 0, policy_version 1058 (0.0015) [2025-02-19 02:05:52,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.4, 300 sec: 3535.5). Total num frames: 4341760. Throughput: 0: 1015.3. Samples: 82458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:05:52,967][00376] Avg episode reward: [(0, '20.235')] [2025-02-19 02:05:57,965][00376] Fps is (10 sec: 4097.0, 60 sec: 4096.0, 300 sec: 3563.5). Total num frames: 4362240. Throughput: 0: 1030.6. Samples: 89608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-19 02:05:57,968][00376] Avg episode reward: [(0, '20.847')] [2025-02-19 02:05:57,973][12377] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001065_4362240.pth... [2025-02-19 02:05:58,115][12377] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000952_3899392.pth [2025-02-19 02:05:59,811][12390] Updated weights for policy 0, policy_version 1068 (0.0028) [2025-02-19 02:06:02,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3588.9). Total num frames: 4382720. Throughput: 0: 1033.7. Samples: 93156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:06:02,966][00376] Avg episode reward: [(0, '20.259')] [2025-02-19 02:06:07,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3611.9). Total num frames: 4403200. Throughput: 0: 1023.0. Samples: 98014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-19 02:06:07,966][00376] Avg episode reward: [(0, '21.082')] [2025-02-19 02:06:10,607][12390] Updated weights for policy 0, policy_version 1078 (0.0018) [2025-02-19 02:06:12,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 3633.0). Total num frames: 4423680. Throughput: 0: 1035.2. Samples: 104968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-19 02:06:12,970][00376] Avg episode reward: [(0, '21.170')] [2025-02-19 02:06:17,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3652.3). Total num frames: 4444160. Throughput: 0: 1037.6. Samples: 108394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-19 02:06:17,967][00376] Avg episode reward: [(0, '22.114')] [2025-02-19 02:06:20,990][12390] Updated weights for policy 0, policy_version 1088 (0.0020) [2025-02-19 02:06:22,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3670.0). Total num frames: 4464640. Throughput: 0: 1030.8. Samples: 113340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:06:22,967][00376] Avg episode reward: [(0, '23.295')] [2025-02-19 02:06:27,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3717.9). Total num frames: 4489216. Throughput: 0: 1038.0. Samples: 120640. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-19 02:06:27,968][00376] Avg episode reward: [(0, '22.890')] [2025-02-19 02:06:29,563][12390] Updated weights for policy 0, policy_version 1098 (0.0015) [2025-02-19 02:06:32,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3701.6). Total num frames: 4505600. Throughput: 0: 1040.9. Samples: 124222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:06:32,966][00376] Avg episode reward: [(0, '23.041')] [2025-02-19 02:06:37,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 3715.7). Total num frames: 4526080. Throughput: 0: 1037.2. Samples: 129134. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-19 02:06:37,968][00376] Avg episode reward: [(0, '21.871')] [2025-02-19 02:06:39,991][12390] Updated weights for policy 0, policy_version 1108 (0.0025) [2025-02-19 02:06:42,965][00376] Fps is (10 sec: 4505.5, 60 sec: 4164.2, 300 sec: 3757.0). Total num frames: 4550656. Throughput: 0: 1039.4. Samples: 136380. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-19 02:06:42,968][00376] Avg episode reward: [(0, '22.345')] [2025-02-19 02:06:47,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.4, 300 sec: 3768.3). Total num frames: 4571136. Throughput: 0: 1039.4. Samples: 139930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:06:47,968][00376] Avg episode reward: [(0, '22.270')] [2025-02-19 02:06:50,389][12390] Updated weights for policy 0, policy_version 1118 (0.0022) [2025-02-19 02:06:52,965][00376] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 3778.9). Total num frames: 4591616. Throughput: 0: 1039.3. Samples: 144782. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-19 02:06:52,973][00376] Avg episode reward: [(0, '22.185')] [2025-02-19 02:06:57,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3788.8). Total num frames: 4612096. Throughput: 0: 1044.8. Samples: 151986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-19 02:06:57,969][00376] Avg episode reward: [(0, '24.378')] [2025-02-19 02:06:57,975][12377] Saving new best policy, reward=24.378! [2025-02-19 02:06:58,988][12390] Updated weights for policy 0, policy_version 1128 (0.0021) [2025-02-19 02:07:02,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3798.1). Total num frames: 4632576. Throughput: 0: 1042.5. Samples: 155306. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-19 02:07:02,968][00376] Avg episode reward: [(0, '24.558')] [2025-02-19 02:07:02,972][12377] Saving new best policy, reward=24.558! [2025-02-19 02:07:07,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3806.9). Total num frames: 4653056. Throughput: 0: 1044.4. Samples: 160336. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-19 02:07:07,966][00376] Avg episode reward: [(0, '25.198')] [2025-02-19 02:07:07,971][12377] Saving new best policy, reward=25.198! [2025-02-19 02:07:09,421][12390] Updated weights for policy 0, policy_version 1138 (0.0030) [2025-02-19 02:07:12,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 3838.5). Total num frames: 4677632. Throughput: 0: 1042.5. Samples: 167552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:07:12,966][00376] Avg episode reward: [(0, '24.330')] [2025-02-19 02:07:17,965][00376] Fps is (10 sec: 4095.9, 60 sec: 4164.3, 300 sec: 3822.9). Total num frames: 4694016. Throughput: 0: 1034.1. Samples: 170756. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:07:17,968][00376] Avg episode reward: [(0, '24.489')] [2025-02-19 02:07:19,986][12390] Updated weights for policy 0, policy_version 1148 (0.0025) [2025-02-19 02:07:22,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 3830.3). Total num frames: 4714496. Throughput: 0: 1040.0. Samples: 175936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:07:22,968][00376] Avg episode reward: [(0, '21.695')] [2025-02-19 02:07:27,965][00376] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 3858.9). Total num frames: 4739072. Throughput: 0: 1041.0. Samples: 183224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:07:27,968][00376] Avg episode reward: [(0, '21.518')] [2025-02-19 02:07:28,535][12390] Updated weights for policy 0, policy_version 1158 (0.0014) [2025-02-19 02:07:32,968][00376] Fps is (10 sec: 4094.6, 60 sec: 4164.0, 300 sec: 3843.9). Total num frames: 4755456. Throughput: 0: 1030.6. Samples: 186310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:07:32,970][00376] Avg episode reward: [(0, '21.301')] [2025-02-19 02:07:37,965][00376] Fps is (10 sec: 3686.2, 60 sec: 4164.2, 300 sec: 3850.2). Total num frames: 4775936. Throughput: 0: 1042.3. Samples: 191686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:07:37,969][00376] Avg episode reward: [(0, '22.796')] [2025-02-19 02:07:39,193][12390] Updated weights for policy 0, policy_version 1168 (0.0015) [2025-02-19 02:07:42,965][00376] Fps is (10 sec: 4507.2, 60 sec: 4164.3, 300 sec: 3876.2). Total num frames: 4800512. Throughput: 0: 1039.8. Samples: 198778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:07:42,969][00376] Avg episode reward: [(0, '23.007')] [2025-02-19 02:07:47,965][00376] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 3861.9). Total num frames: 4816896. Throughput: 0: 1032.3. Samples: 201760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:07:47,966][00376] Avg episode reward: [(0, '22.394')] [2025-02-19 02:07:49,504][12390] Updated weights for policy 0, policy_version 1178 (0.0041) [2025-02-19 02:07:52,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3867.4). Total num frames: 4837376. Throughput: 0: 1043.4. Samples: 207290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:07:52,968][00376] Avg episode reward: [(0, '22.855')] [2025-02-19 02:07:57,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3891.2). Total num frames: 4861952. Throughput: 0: 1043.4. Samples: 214506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:07:57,970][00376] Avg episode reward: [(0, '24.523')] [2025-02-19 02:07:57,976][12377] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001187_4861952.pth... [2025-02-19 02:07:58,097][12377] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2025-02-19 02:07:58,210][12390] Updated weights for policy 0, policy_version 1188 (0.0021) [2025-02-19 02:08:02,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3877.5). Total num frames: 4878336. Throughput: 0: 1034.3. Samples: 217298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-19 02:08:02,968][00376] Avg episode reward: [(0, '23.093')] [2025-02-19 02:08:07,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3900.1). Total num frames: 4902912. Throughput: 0: 1046.7. Samples: 223036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:08:07,966][00376] Avg episode reward: [(0, '22.450')] [2025-02-19 02:08:08,538][12390] Updated weights for policy 0, policy_version 1198 (0.0018) [2025-02-19 02:08:12,965][00376] Fps is (10 sec: 4915.1, 60 sec: 4164.3, 300 sec: 3921.7). Total num frames: 4927488. Throughput: 0: 1046.2. Samples: 230304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:08:12,966][00376] Avg episode reward: [(0, '23.105')] [2025-02-19 02:08:17,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3891.2). Total num frames: 4939776. Throughput: 0: 1037.3. Samples: 232984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:08:17,967][00376] Avg episode reward: [(0, '23.125')] [2025-02-19 02:08:18,896][12390] Updated weights for policy 0, policy_version 1208 (0.0042) [2025-02-19 02:08:22,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 3912.1). Total num frames: 4964352. Throughput: 0: 1046.9. Samples: 238796. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:08:22,969][00376] Avg episode reward: [(0, '23.116')] [2025-02-19 02:08:27,360][12390] Updated weights for policy 0, policy_version 1218 (0.0012) [2025-02-19 02:08:27,965][00376] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3932.2). Total num frames: 4988928. Throughput: 0: 1049.7. Samples: 246016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:08:27,968][00376] Avg episode reward: [(0, '24.107')] [2025-02-19 02:08:32,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.5, 300 sec: 3919.3). Total num frames: 5005312. Throughput: 0: 1041.7. Samples: 248638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:08:32,966][00376] Avg episode reward: [(0, '25.138')] [2025-02-19 02:08:37,744][12390] Updated weights for policy 0, policy_version 1228 (0.0024) [2025-02-19 02:08:37,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.6, 300 sec: 3938.5). Total num frames: 5029888. Throughput: 0: 1052.5. Samples: 254654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:08:37,966][00376] Avg episode reward: [(0, '25.822')] [2025-02-19 02:08:37,971][12377] Saving new best policy, reward=25.822! [2025-02-19 02:08:42,967][00376] Fps is (10 sec: 4504.6, 60 sec: 4164.1, 300 sec: 3941.4). Total num frames: 5050368. Throughput: 0: 1050.4. Samples: 261778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:08:42,973][00376] Avg episode reward: [(0, '25.099')] [2025-02-19 02:08:47,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 3929.1). Total num frames: 5066752. Throughput: 0: 1041.8. Samples: 264180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:08:47,969][00376] Avg episode reward: [(0, '24.366')] [2025-02-19 02:08:48,320][12390] Updated weights for policy 0, policy_version 1238 (0.0032) [2025-02-19 02:08:52,965][00376] Fps is (10 sec: 4096.9, 60 sec: 4232.5, 300 sec: 3947.1). Total num frames: 5091328. Throughput: 0: 1047.8. Samples: 270186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:08:52,969][00376] Avg episode reward: [(0, '23.195')] [2025-02-19 02:08:56,774][12390] Updated weights for policy 0, policy_version 1248 (0.0012) [2025-02-19 02:08:57,967][00376] Fps is (10 sec: 4914.0, 60 sec: 4232.4, 300 sec: 3964.3). Total num frames: 5115904. Throughput: 0: 1047.3. Samples: 277436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:08:57,970][00376] Avg episode reward: [(0, '23.004')] [2025-02-19 02:09:02,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 3952.3). Total num frames: 5132288. Throughput: 0: 1038.1. Samples: 279700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:09:02,966][00376] Avg episode reward: [(0, '22.710')] [2025-02-19 02:09:07,050][12390] Updated weights for policy 0, policy_version 1258 (0.0024) [2025-02-19 02:09:07,965][00376] Fps is (10 sec: 4097.0, 60 sec: 4232.5, 300 sec: 3968.9). Total num frames: 5156864. Throughput: 0: 1051.0. Samples: 286092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:09:07,966][00376] Avg episode reward: [(0, '22.483')] [2025-02-19 02:09:12,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3971.0). Total num frames: 5177344. Throughput: 0: 1046.7. Samples: 293116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:09:12,966][00376] Avg episode reward: [(0, '23.818')] [2025-02-19 02:09:17,464][12390] Updated weights for policy 0, policy_version 1268 (0.0028) [2025-02-19 02:09:17,965][00376] Fps is (10 sec: 3686.3, 60 sec: 4232.5, 300 sec: 4026.6). Total num frames: 5193728. Throughput: 0: 1036.8. Samples: 295296. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:09:17,969][00376] Avg episode reward: [(0, '24.103')] [2025-02-19 02:09:22,966][00376] Fps is (10 sec: 4095.4, 60 sec: 4232.4, 300 sec: 4109.9). Total num frames: 5218304. Throughput: 0: 1045.1. Samples: 301686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:09:22,967][00376] Avg episode reward: [(0, '23.828')] [2025-02-19 02:09:26,076][12390] Updated weights for policy 0, policy_version 1278 (0.0019) [2025-02-19 02:09:27,965][00376] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 5238784. Throughput: 0: 1041.8. Samples: 308656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:09:27,968][00376] Avg episode reward: [(0, '23.502')] [2025-02-19 02:09:32,965][00376] Fps is (10 sec: 3687.0, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 5255168. Throughput: 0: 1035.6. Samples: 310784. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:09:32,969][00376] Avg episode reward: [(0, '24.549')] [2025-02-19 02:09:36,599][12390] Updated weights for policy 0, policy_version 1288 (0.0014) [2025-02-19 02:09:37,965][00376] Fps is (10 sec: 4095.9, 60 sec: 4164.2, 300 sec: 4151.5). Total num frames: 5279744. Throughput: 0: 1052.3. Samples: 317538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-02-19 02:09:37,969][00376] Avg episode reward: [(0, '22.238')] [2025-02-19 02:09:42,969][00376] Fps is (10 sec: 4503.8, 60 sec: 4164.1, 300 sec: 4151.5). Total num frames: 5300224. Throughput: 0: 1040.1. Samples: 324242. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-19 02:09:42,970][00376] Avg episode reward: [(0, '21.501')] [2025-02-19 02:09:46,785][12390] Updated weights for policy 0, policy_version 1298 (0.0020) [2025-02-19 02:09:47,965][00376] Fps is (10 sec: 4096.1, 60 sec: 4232.5, 300 sec: 4165.5). Total num frames: 5320704. Throughput: 0: 1036.9. Samples: 326360. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:09:47,966][00376] Avg episode reward: [(0, '22.439')] [2025-02-19 02:09:52,965][00376] Fps is (10 sec: 4507.4, 60 sec: 4232.5, 300 sec: 4165.4). Total num frames: 5345280. Throughput: 0: 1047.2. Samples: 333216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:09:52,969][00376] Avg episode reward: [(0, '22.984')] [2025-02-19 02:09:55,401][12390] Updated weights for policy 0, policy_version 1308 (0.0022) [2025-02-19 02:09:57,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 4137.7). Total num frames: 5361664. Throughput: 0: 1036.5. Samples: 339758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:09:57,968][00376] Avg episode reward: [(0, '22.665')] [2025-02-19 02:09:57,976][12377] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001309_5361664.pth... [2025-02-19 02:09:58,165][12377] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001065_4362240.pth [2025-02-19 02:10:02,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 5382144. Throughput: 0: 1033.9. Samples: 341822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:10:02,966][00376] Avg episode reward: [(0, '22.740')] [2025-02-19 02:10:05,794][12390] Updated weights for policy 0, policy_version 1318 (0.0015) [2025-02-19 02:10:07,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 5406720. Throughput: 0: 1052.2. Samples: 349032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-19 02:10:07,969][00376] Avg episode reward: [(0, '23.444')] [2025-02-19 02:10:12,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 5423104. Throughput: 0: 1033.8. Samples: 355178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:10:12,968][00376] Avg episode reward: [(0, '24.014')] [2025-02-19 02:10:16,444][12390] Updated weights for policy 0, policy_version 1328 (0.0019) [2025-02-19 02:10:17,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 5443584. Throughput: 0: 1037.6. Samples: 357474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-19 02:10:17,970][00376] Avg episode reward: [(0, '23.456')] [2025-02-19 02:10:22,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.4, 300 sec: 4165.4). Total num frames: 5468160. Throughput: 0: 1044.1. Samples: 364522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-19 02:10:22,966][00376] Avg episode reward: [(0, '23.983')] [2025-02-19 02:10:25,190][12390] Updated weights for policy 0, policy_version 1338 (0.0015) [2025-02-19 02:10:27,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4151.5). Total num frames: 5484544. Throughput: 0: 1031.7. Samples: 370664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-02-19 02:10:27,967][00376] Avg episode reward: [(0, '23.519')] [2025-02-19 02:10:32,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 5509120. Throughput: 0: 1042.5. Samples: 373272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-19 02:10:32,968][00376] Avg episode reward: [(0, '23.460')] [2025-02-19 02:10:35,396][12390] Updated weights for policy 0, policy_version 1348 (0.0014) [2025-02-19 02:10:37,965][00376] Fps is (10 sec: 4915.2, 60 sec: 4232.6, 300 sec: 4179.3). Total num frames: 5533696. Throughput: 0: 1051.0. Samples: 380510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-19 02:10:37,966][00376] Avg episode reward: [(0, '23.360')] [2025-02-19 02:10:42,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.5, 300 sec: 4165.5). Total num frames: 5550080. Throughput: 0: 1035.5. Samples: 386354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-02-19 02:10:42,966][00376] Avg episode reward: [(0, '24.438')] [2025-02-19 02:10:45,664][12390] Updated weights for policy 0, policy_version 1358 (0.0013) [2025-02-19 02:10:47,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 5570560. Throughput: 0: 1050.0. Samples: 389074. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:10:47,969][00376] Avg episode reward: [(0, '23.478')] [2025-02-19 02:10:52,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 5595136. Throughput: 0: 1052.0. Samples: 396374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-19 02:10:52,966][00376] Avg episode reward: [(0, '22.292')] [2025-02-19 02:10:54,486][12390] Updated weights for policy 0, policy_version 1368 (0.0013) [2025-02-19 02:10:57,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 5611520. Throughput: 0: 1036.6. Samples: 401826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:10:57,970][00376] Avg episode reward: [(0, '21.055')] [2025-02-19 02:11:02,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 5636096. Throughput: 0: 1053.7. Samples: 404892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:11:02,966][00376] Avg episode reward: [(0, '19.232')] [2025-02-19 02:11:04,453][12390] Updated weights for policy 0, policy_version 1378 (0.0021) [2025-02-19 02:11:07,965][00376] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 5660672. Throughput: 0: 1057.6. Samples: 412112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:11:07,966][00376] Avg episode reward: [(0, '18.267')] [2025-02-19 02:11:12,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 5672960. Throughput: 0: 1039.2. Samples: 417426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:11:12,966][00376] Avg episode reward: [(0, '19.553')] [2025-02-19 02:11:14,905][12390] Updated weights for policy 0, policy_version 1388 (0.0012) [2025-02-19 02:11:17,965][00376] Fps is (10 sec: 3686.3, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 5697536. Throughput: 0: 1052.3. Samples: 420624. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:11:17,970][00376] Avg episode reward: [(0, '20.071')] [2025-02-19 02:11:22,965][00376] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 5722112. Throughput: 0: 1051.9. Samples: 427846. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:11:22,966][00376] Avg episode reward: [(0, '21.096')] [2025-02-19 02:11:23,666][12390] Updated weights for policy 0, policy_version 1398 (0.0012) [2025-02-19 02:11:27,965][00376] Fps is (10 sec: 4096.1, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 5738496. Throughput: 0: 1033.7. Samples: 432872. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-19 02:11:27,968][00376] Avg episode reward: [(0, '21.192')] [2025-02-19 02:11:32,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 5758976. Throughput: 0: 1049.9. Samples: 436320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:11:32,970][00376] Avg episode reward: [(0, '23.016')] [2025-02-19 02:11:33,912][12390] Updated weights for policy 0, policy_version 1408 (0.0013) [2025-02-19 02:11:37,967][00376] Fps is (10 sec: 4504.5, 60 sec: 4164.1, 300 sec: 4179.3). Total num frames: 5783552. Throughput: 0: 1046.9. Samples: 443488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:11:37,969][00376] Avg episode reward: [(0, '22.423')] [2025-02-19 02:11:42,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 5799936. Throughput: 0: 1037.7. Samples: 448522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:11:42,968][00376] Avg episode reward: [(0, '22.680')] [2025-02-19 02:11:44,310][12390] Updated weights for policy 0, policy_version 1418 (0.0028) [2025-02-19 02:11:47,965][00376] Fps is (10 sec: 4097.0, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 5824512. Throughput: 0: 1048.7. Samples: 452084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:11:47,969][00376] Avg episode reward: [(0, '24.445')] [2025-02-19 02:11:52,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 5844992. Throughput: 0: 1048.1. Samples: 459278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:11:52,968][00376] Avg episode reward: [(0, '23.485')] [2025-02-19 02:11:53,496][12390] Updated weights for policy 0, policy_version 1428 (0.0020) [2025-02-19 02:11:57,965][00376] Fps is (10 sec: 3686.3, 60 sec: 4164.2, 300 sec: 4165.4). Total num frames: 5861376. Throughput: 0: 1039.5. Samples: 464202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:11:57,966][00376] Avg episode reward: [(0, '23.864')] [2025-02-19 02:11:57,974][12377] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001431_5861376.pth... [2025-02-19 02:11:58,107][12377] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001187_4861952.pth [2025-02-19 02:12:02,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 5885952. Throughput: 0: 1047.4. Samples: 467756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:12:02,966][00376] Avg episode reward: [(0, '24.399')] [2025-02-19 02:12:03,282][12390] Updated weights for policy 0, policy_version 1438 (0.0019) [2025-02-19 02:12:07,965][00376] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 4165.4). Total num frames: 5906432. Throughput: 0: 1047.0. Samples: 474962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:12:07,967][00376] Avg episode reward: [(0, '25.342')] [2025-02-19 02:12:12,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 5926912. Throughput: 0: 1045.5. Samples: 479918. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:12:12,966][00376] Avg episode reward: [(0, '24.563')] [2025-02-19 02:12:13,713][12390] Updated weights for policy 0, policy_version 1448 (0.0027) [2025-02-19 02:12:17,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4232.6, 300 sec: 4193.2). Total num frames: 5951488. Throughput: 0: 1049.8. Samples: 483560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:12:17,966][00376] Avg episode reward: [(0, '24.834')] [2025-02-19 02:12:22,637][12390] Updated weights for policy 0, policy_version 1458 (0.0019) [2025-02-19 02:12:22,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 5971968. Throughput: 0: 1052.1. Samples: 490828. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:12:22,969][00376] Avg episode reward: [(0, '24.359')] [2025-02-19 02:12:27,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4179.4). Total num frames: 5988352. Throughput: 0: 1048.2. Samples: 495690. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:12:27,969][00376] Avg episode reward: [(0, '23.926')] [2025-02-19 02:12:32,633][12390] Updated weights for policy 0, policy_version 1468 (0.0021) [2025-02-19 02:12:32,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 6012928. Throughput: 0: 1048.0. Samples: 499242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:12:32,966][00376] Avg episode reward: [(0, '24.239')] [2025-02-19 02:12:37,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.4, 300 sec: 4179.3). Total num frames: 6033408. Throughput: 0: 1046.1. Samples: 506354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:12:37,970][00376] Avg episode reward: [(0, '22.894')] [2025-02-19 02:12:42,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 6049792. Throughput: 0: 1051.7. Samples: 511528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:12:42,966][00376] Avg episode reward: [(0, '23.808')] [2025-02-19 02:12:43,063][12390] Updated weights for policy 0, policy_version 1478 (0.0023) [2025-02-19 02:12:47,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4193.2). Total num frames: 6074368. Throughput: 0: 1052.9. Samples: 515138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:12:47,969][00376] Avg episode reward: [(0, '25.526')] [2025-02-19 02:12:52,354][12390] Updated weights for policy 0, policy_version 1488 (0.0024) [2025-02-19 02:12:52,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 6094848. Throughput: 0: 1045.2. Samples: 521996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:12:52,966][00376] Avg episode reward: [(0, '26.228')] [2025-02-19 02:12:52,969][12377] Saving new best policy, reward=26.228! [2025-02-19 02:12:57,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 6115328. Throughput: 0: 1050.3. Samples: 527180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:12:57,967][00376] Avg episode reward: [(0, '25.780')] [2025-02-19 02:13:01,959][12390] Updated weights for policy 0, policy_version 1498 (0.0015) [2025-02-19 02:13:02,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 6139904. Throughput: 0: 1049.5. Samples: 530788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:13:02,966][00376] Avg episode reward: [(0, '26.488')] [2025-02-19 02:13:02,971][12377] Saving new best policy, reward=26.488! [2025-02-19 02:13:07,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 6156288. Throughput: 0: 1033.3. Samples: 537328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:13:07,971][00376] Avg episode reward: [(0, '27.025')] [2025-02-19 02:13:07,981][12377] Saving new best policy, reward=27.025! [2025-02-19 02:13:12,536][12390] Updated weights for policy 0, policy_version 1508 (0.0022) [2025-02-19 02:13:12,965][00376] Fps is (10 sec: 3686.3, 60 sec: 4164.2, 300 sec: 4193.2). Total num frames: 6176768. Throughput: 0: 1044.8. Samples: 542704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:13:12,969][00376] Avg episode reward: [(0, '24.666')] [2025-02-19 02:13:17,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4193.2). Total num frames: 6201344. Throughput: 0: 1045.8. Samples: 546304. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-19 02:13:17,966][00376] Avg episode reward: [(0, '23.276')] [2025-02-19 02:13:22,269][12390] Updated weights for policy 0, policy_version 1518 (0.0027) [2025-02-19 02:13:22,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4165.4). Total num frames: 6217728. Throughput: 0: 1032.1. Samples: 552798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:13:22,970][00376] Avg episode reward: [(0, '23.110')] [2025-02-19 02:13:27,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 6238208. Throughput: 0: 1039.9. Samples: 558322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:13:27,970][00376] Avg episode reward: [(0, '22.384')] [2025-02-19 02:13:31,846][12390] Updated weights for policy 0, policy_version 1528 (0.0020) [2025-02-19 02:13:32,965][00376] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 6262784. Throughput: 0: 1036.9. Samples: 561800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:13:32,970][00376] Avg episode reward: [(0, '23.012')] [2025-02-19 02:13:37,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4165.5). Total num frames: 6279168. Throughput: 0: 1023.6. Samples: 568060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:13:37,970][00376] Avg episode reward: [(0, '22.807')] [2025-02-19 02:13:42,288][12390] Updated weights for policy 0, policy_version 1538 (0.0033) [2025-02-19 02:13:42,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 6299648. Throughput: 0: 1040.4. Samples: 574000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:13:42,966][00376] Avg episode reward: [(0, '22.999')] [2025-02-19 02:13:47,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 6324224. Throughput: 0: 1040.3. Samples: 577602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:13:47,966][00376] Avg episode reward: [(0, '22.426')] [2025-02-19 02:13:51,818][12390] Updated weights for policy 0, policy_version 1548 (0.0019) [2025-02-19 02:13:52,968][00376] Fps is (10 sec: 4094.5, 60 sec: 4095.8, 300 sec: 4151.5). Total num frames: 6340608. Throughput: 0: 1032.1. Samples: 583774. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-19 02:13:52,970][00376] Avg episode reward: [(0, '22.620')] [2025-02-19 02:13:57,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 6365184. Throughput: 0: 1042.7. Samples: 589626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-19 02:13:57,970][00376] Avg episode reward: [(0, '22.217')] [2025-02-19 02:13:57,976][12377] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001554_6365184.pth... [2025-02-19 02:13:58,105][12377] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001309_5361664.pth [2025-02-19 02:14:01,267][12390] Updated weights for policy 0, policy_version 1558 (0.0012) [2025-02-19 02:14:02,965][00376] Fps is (10 sec: 4507.2, 60 sec: 4096.0, 300 sec: 4165.4). Total num frames: 6385664. Throughput: 0: 1041.6. Samples: 593176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-19 02:14:02,971][00376] Avg episode reward: [(0, '21.956')] [2025-02-19 02:14:07,969][00376] Fps is (10 sec: 3684.8, 60 sec: 4095.7, 300 sec: 4151.5). Total num frames: 6402048. Throughput: 0: 1029.5. Samples: 599130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:14:07,971][00376] Avg episode reward: [(0, '22.565')] [2025-02-19 02:14:11,624][12390] Updated weights for policy 0, policy_version 1568 (0.0024) [2025-02-19 02:14:12,965][00376] Fps is (10 sec: 4095.9, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 6426624. Throughput: 0: 1045.0. Samples: 605348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:14:12,969][00376] Avg episode reward: [(0, '23.384')] [2025-02-19 02:14:17,965][00376] Fps is (10 sec: 4917.4, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 6451200. Throughput: 0: 1047.1. Samples: 608918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-02-19 02:14:17,966][00376] Avg episode reward: [(0, '24.667')] [2025-02-19 02:14:21,828][12390] Updated weights for policy 0, policy_version 1578 (0.0034) [2025-02-19 02:14:22,965][00376] Fps is (10 sec: 4096.2, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 6467584. Throughput: 0: 1035.8. Samples: 614672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:14:22,969][00376] Avg episode reward: [(0, '24.758')] [2025-02-19 02:14:27,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 6492160. Throughput: 0: 1044.7. Samples: 621012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:14:27,966][00376] Avg episode reward: [(0, '25.504')] [2025-02-19 02:14:30,603][12390] Updated weights for policy 0, policy_version 1588 (0.0027) [2025-02-19 02:14:32,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 6512640. Throughput: 0: 1044.5. Samples: 624604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:14:32,966][00376] Avg episode reward: [(0, '25.627')] [2025-02-19 02:14:37,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4165.5). Total num frames: 6529024. Throughput: 0: 1029.5. Samples: 630096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:14:37,968][00376] Avg episode reward: [(0, '24.780')] [2025-02-19 02:14:41,083][12390] Updated weights for policy 0, policy_version 1598 (0.0014) [2025-02-19 02:14:42,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 6553600. Throughput: 0: 1050.2. Samples: 636886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-19 02:14:42,968][00376] Avg episode reward: [(0, '25.121')] [2025-02-19 02:14:47,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 6574080. Throughput: 0: 1052.1. Samples: 640520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:14:47,966][00376] Avg episode reward: [(0, '23.923')] [2025-02-19 02:14:50,961][12390] Updated weights for policy 0, policy_version 1608 (0.0019) [2025-02-19 02:14:52,965][00376] Fps is (10 sec: 4095.8, 60 sec: 4232.8, 300 sec: 4179.3). Total num frames: 6594560. Throughput: 0: 1035.5. Samples: 645722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:14:52,970][00376] Avg episode reward: [(0, '24.422')] [2025-02-19 02:14:57,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 6615040. Throughput: 0: 1046.6. Samples: 652444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-02-19 02:14:57,969][00376] Avg episode reward: [(0, '23.990')] [2025-02-19 02:14:59,990][12390] Updated weights for policy 0, policy_version 1618 (0.0014) [2025-02-19 02:15:02,965][00376] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 6635520. Throughput: 0: 1046.7. Samples: 656018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:15:02,968][00376] Avg episode reward: [(0, '24.246')] [2025-02-19 02:15:07,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.8, 300 sec: 4179.3). Total num frames: 6656000. Throughput: 0: 1031.3. Samples: 661080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:15:07,969][00376] Avg episode reward: [(0, '23.460')] [2025-02-19 02:15:10,275][12390] Updated weights for policy 0, policy_version 1628 (0.0032) [2025-02-19 02:15:12,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4232.6, 300 sec: 4193.2). Total num frames: 6680576. Throughput: 0: 1052.0. Samples: 668354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:15:12,971][00376] Avg episode reward: [(0, '23.162')] [2025-02-19 02:15:17,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 6701056. Throughput: 0: 1053.3. Samples: 672002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:15:17,970][00376] Avg episode reward: [(0, '22.396')] [2025-02-19 02:15:20,684][12390] Updated weights for policy 0, policy_version 1638 (0.0024) [2025-02-19 02:15:22,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 6717440. Throughput: 0: 1042.2. Samples: 676996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:15:22,970][00376] Avg episode reward: [(0, '22.401')] [2025-02-19 02:15:27,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 6742016. Throughput: 0: 1051.2. Samples: 684188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:15:27,970][00376] Avg episode reward: [(0, '21.572')] [2025-02-19 02:15:29,240][12390] Updated weights for policy 0, policy_version 1648 (0.0013) [2025-02-19 02:15:32,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 6762496. Throughput: 0: 1049.5. Samples: 687748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-19 02:15:32,969][00376] Avg episode reward: [(0, '22.000')] [2025-02-19 02:15:37,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 6782976. Throughput: 0: 1046.4. Samples: 692808. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-02-19 02:15:37,966][00376] Avg episode reward: [(0, '23.770')] [2025-02-19 02:15:39,620][12390] Updated weights for policy 0, policy_version 1658 (0.0017) [2025-02-19 02:15:42,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 6807552. Throughput: 0: 1058.0. Samples: 700054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:15:42,970][00376] Avg episode reward: [(0, '23.304')] [2025-02-19 02:15:47,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 6823936. Throughput: 0: 1059.4. Samples: 703690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:15:47,968][00376] Avg episode reward: [(0, '22.151')] [2025-02-19 02:15:49,862][12390] Updated weights for policy 0, policy_version 1668 (0.0021) [2025-02-19 02:15:52,971][00376] Fps is (10 sec: 3684.2, 60 sec: 4163.9, 300 sec: 4179.2). Total num frames: 6844416. Throughput: 0: 1057.7. Samples: 708684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-19 02:15:52,972][00376] Avg episode reward: [(0, '22.043')] [2025-02-19 02:15:57,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 6868992. Throughput: 0: 1058.6. Samples: 715992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:15:57,966][00376] Avg episode reward: [(0, '22.318')] [2025-02-19 02:15:57,973][12377] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001677_6868992.pth... [2025-02-19 02:15:58,142][12377] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001431_5861376.pth [2025-02-19 02:15:58,552][12390] Updated weights for policy 0, policy_version 1678 (0.0032) [2025-02-19 02:16:02,967][00376] Fps is (10 sec: 4097.6, 60 sec: 4164.1, 300 sec: 4151.5). Total num frames: 6885376. Throughput: 0: 1047.6. Samples: 719148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:16:02,968][00376] Avg episode reward: [(0, '22.572')] [2025-02-19 02:16:07,967][00376] Fps is (10 sec: 4095.3, 60 sec: 4232.4, 300 sec: 4193.2). Total num frames: 6909952. Throughput: 0: 1053.2. Samples: 724394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:16:07,968][00376] Avg episode reward: [(0, '22.470')] [2025-02-19 02:16:08,775][12390] Updated weights for policy 0, policy_version 1688 (0.0035) [2025-02-19 02:16:12,965][00376] Fps is (10 sec: 4506.4, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 6930432. Throughput: 0: 1055.3. Samples: 731676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-19 02:16:12,966][00376] Avg episode reward: [(0, '24.103')] [2025-02-19 02:16:17,965][00376] Fps is (10 sec: 4096.7, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 6950912. Throughput: 0: 1045.6. Samples: 734798. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:16:17,970][00376] Avg episode reward: [(0, '25.225')] [2025-02-19 02:16:19,113][12390] Updated weights for policy 0, policy_version 1698 (0.0015) [2025-02-19 02:16:22,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 6971392. Throughput: 0: 1049.2. Samples: 740022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:16:22,970][00376] Avg episode reward: [(0, '25.078')] [2025-02-19 02:16:27,843][12390] Updated weights for policy 0, policy_version 1708 (0.0019) [2025-02-19 02:16:27,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 6995968. Throughput: 0: 1046.5. Samples: 747148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:16:27,966][00376] Avg episode reward: [(0, '27.342')] [2025-02-19 02:16:27,972][12377] Saving new best policy, reward=27.342! [2025-02-19 02:16:32,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4151.6). Total num frames: 7008256. Throughput: 0: 1028.8. Samples: 749986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:16:32,966][00376] Avg episode reward: [(0, '26.749')] [2025-02-19 02:16:37,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 7032832. Throughput: 0: 1042.5. Samples: 755590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:16:37,969][00376] Avg episode reward: [(0, '26.039')] [2025-02-19 02:16:38,436][12390] Updated weights for policy 0, policy_version 1718 (0.0019) [2025-02-19 02:16:42,965][00376] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 7057408. Throughput: 0: 1040.5. Samples: 762814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:16:42,968][00376] Avg episode reward: [(0, '25.968')] [2025-02-19 02:16:47,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 7073792. Throughput: 0: 1034.8. Samples: 765714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:16:47,966][00376] Avg episode reward: [(0, '24.650')] [2025-02-19 02:16:48,722][12390] Updated weights for policy 0, policy_version 1728 (0.0018) [2025-02-19 02:16:52,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.7, 300 sec: 4179.3). Total num frames: 7094272. Throughput: 0: 1047.2. Samples: 771518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-19 02:16:52,966][00376] Avg episode reward: [(0, '22.191')] [2025-02-19 02:16:57,160][12390] Updated weights for policy 0, policy_version 1738 (0.0014) [2025-02-19 02:16:57,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 7118848. Throughput: 0: 1045.5. Samples: 778722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-19 02:16:57,966][00376] Avg episode reward: [(0, '22.875')] [2025-02-19 02:17:02,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.4, 300 sec: 4165.4). Total num frames: 7135232. Throughput: 0: 1034.4. Samples: 781348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:17:02,966][00376] Avg episode reward: [(0, '22.923')] [2025-02-19 02:17:07,669][12390] Updated weights for policy 0, policy_version 1748 (0.0012) [2025-02-19 02:17:07,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.4, 300 sec: 4179.3). Total num frames: 7159808. Throughput: 0: 1049.7. Samples: 787258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:17:07,966][00376] Avg episode reward: [(0, '24.497')] [2025-02-19 02:17:12,965][00376] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 7184384. Throughput: 0: 1049.8. Samples: 794388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-19 02:17:12,968][00376] Avg episode reward: [(0, '26.179')] [2025-02-19 02:17:17,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4151.5). Total num frames: 7196672. Throughput: 0: 1042.2. Samples: 796886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:17:17,966][00376] Avg episode reward: [(0, '25.793')] [2025-02-19 02:17:18,100][12390] Updated weights for policy 0, policy_version 1758 (0.0025) [2025-02-19 02:17:22,965][00376] Fps is (10 sec: 3686.3, 60 sec: 4164.2, 300 sec: 4179.3). Total num frames: 7221248. Throughput: 0: 1054.0. Samples: 803018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-19 02:17:22,966][00376] Avg episode reward: [(0, '25.512')] [2025-02-19 02:17:26,593][12390] Updated weights for policy 0, policy_version 1768 (0.0012) [2025-02-19 02:17:27,965][00376] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 7245824. Throughput: 0: 1055.0. Samples: 810290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:17:27,969][00376] Avg episode reward: [(0, '26.690')] [2025-02-19 02:17:32,965][00376] Fps is (10 sec: 4096.1, 60 sec: 4232.5, 300 sec: 4165.4). Total num frames: 7262208. Throughput: 0: 1040.7. Samples: 812544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-19 02:17:32,971][00376] Avg episode reward: [(0, '26.200')] [2025-02-19 02:17:36,954][12390] Updated weights for policy 0, policy_version 1778 (0.0020) [2025-02-19 02:17:37,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 7286784. Throughput: 0: 1050.1. Samples: 818772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:17:37,969][00376] Avg episode reward: [(0, '24.696')] [2025-02-19 02:17:42,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 7307264. Throughput: 0: 1050.9. Samples: 826012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:17:42,966][00376] Avg episode reward: [(0, '24.857')] [2025-02-19 02:17:47,320][12390] Updated weights for policy 0, policy_version 1788 (0.0018) [2025-02-19 02:17:47,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 7323648. Throughput: 0: 1040.2. Samples: 828156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:17:47,969][00376] Avg episode reward: [(0, '26.250')] [2025-02-19 02:17:52,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 7348224. Throughput: 0: 1052.5. Samples: 834620. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:17:52,966][00376] Avg episode reward: [(0, '26.304')] [2025-02-19 02:17:55,789][12390] Updated weights for policy 0, policy_version 1798 (0.0013) [2025-02-19 02:17:57,967][00376] Fps is (10 sec: 4504.6, 60 sec: 4164.1, 300 sec: 4165.4). Total num frames: 7368704. Throughput: 0: 1051.1. Samples: 841692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:17:57,971][00376] Avg episode reward: [(0, '26.805')] [2025-02-19 02:17:57,979][12377] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001799_7368704.pth... [2025-02-19 02:17:58,144][12377] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001554_6365184.pth [2025-02-19 02:18:02,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 7389184. Throughput: 0: 1040.1. Samples: 843692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-02-19 02:18:02,966][00376] Avg episode reward: [(0, '27.408')] [2025-02-19 02:18:02,976][12377] Saving new best policy, reward=27.408! [2025-02-19 02:18:06,333][12390] Updated weights for policy 0, policy_version 1808 (0.0019) [2025-02-19 02:18:07,965][00376] Fps is (10 sec: 4096.8, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 7409664. Throughput: 0: 1051.0. Samples: 850312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:18:07,966][00376] Avg episode reward: [(0, '27.819')] [2025-02-19 02:18:07,973][12377] Saving new best policy, reward=27.819! [2025-02-19 02:18:12,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 7434240. Throughput: 0: 1038.5. Samples: 857022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-19 02:18:12,968][00376] Avg episode reward: [(0, '25.975')] [2025-02-19 02:18:16,667][12390] Updated weights for policy 0, policy_version 1818 (0.0026) [2025-02-19 02:18:17,965][00376] Fps is (10 sec: 4096.1, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 7450624. Throughput: 0: 1035.5. Samples: 859140. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-19 02:18:17,969][00376] Avg episode reward: [(0, '24.214')] [2025-02-19 02:18:22,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 7475200. Throughput: 0: 1051.0. Samples: 866068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:18:22,966][00376] Avg episode reward: [(0, '24.153')] [2025-02-19 02:18:25,167][12390] Updated weights for policy 0, policy_version 1828 (0.0015) [2025-02-19 02:18:27,965][00376] Fps is (10 sec: 4505.4, 60 sec: 4164.2, 300 sec: 4179.3). Total num frames: 7495680. Throughput: 0: 1038.3. Samples: 872736. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-19 02:18:27,973][00376] Avg episode reward: [(0, '22.547')] [2025-02-19 02:18:32,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 7512064. Throughput: 0: 1037.7. Samples: 874854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:18:32,966][00376] Avg episode reward: [(0, '21.107')] [2025-02-19 02:18:35,602][12390] Updated weights for policy 0, policy_version 1838 (0.0014) [2025-02-19 02:18:37,965][00376] Fps is (10 sec: 4096.2, 60 sec: 4164.3, 300 sec: 4193.2). Total num frames: 7536640. Throughput: 0: 1048.8. Samples: 881818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-19 02:18:37,969][00376] Avg episode reward: [(0, '23.222')] [2025-02-19 02:18:42,966][00376] Fps is (10 sec: 4504.9, 60 sec: 4164.2, 300 sec: 4179.3). Total num frames: 7557120. Throughput: 0: 1034.7. Samples: 888254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:18:42,967][00376] Avg episode reward: [(0, '24.345')] [2025-02-19 02:18:45,894][12390] Updated weights for policy 0, policy_version 1848 (0.0017) [2025-02-19 02:18:47,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 7577600. Throughput: 0: 1039.9. Samples: 890486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:18:47,970][00376] Avg episode reward: [(0, '25.287')] [2025-02-19 02:18:52,965][00376] Fps is (10 sec: 4506.3, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 7602176. Throughput: 0: 1052.7. Samples: 897682. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-19 02:18:52,969][00376] Avg episode reward: [(0, '25.405')] [2025-02-19 02:18:54,480][12390] Updated weights for policy 0, policy_version 1858 (0.0012) [2025-02-19 02:18:57,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.4, 300 sec: 4179.3). Total num frames: 7618560. Throughput: 0: 1039.9. Samples: 903818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:18:57,968][00376] Avg episode reward: [(0, '26.147')] [2025-02-19 02:19:02,966][00376] Fps is (10 sec: 3685.8, 60 sec: 4164.2, 300 sec: 4193.2). Total num frames: 7639040. Throughput: 0: 1048.1. Samples: 906304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:19:02,968][00376] Avg episode reward: [(0, '27.267')] [2025-02-19 02:19:05,112][12390] Updated weights for policy 0, policy_version 1868 (0.0021) [2025-02-19 02:19:07,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 7663616. Throughput: 0: 1047.5. Samples: 913204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:19:07,970][00376] Avg episode reward: [(0, '26.596')] [2025-02-19 02:19:12,965][00376] Fps is (10 sec: 4096.6, 60 sec: 4096.0, 300 sec: 4165.4). Total num frames: 7680000. Throughput: 0: 1030.7. Samples: 919118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:19:12,966][00376] Avg episode reward: [(0, '26.165')] [2025-02-19 02:19:15,394][12390] Updated weights for policy 0, policy_version 1878 (0.0023) [2025-02-19 02:19:17,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 7704576. Throughput: 0: 1045.5. Samples: 921900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:19:17,966][00376] Avg episode reward: [(0, '25.119')] [2025-02-19 02:19:22,970][00376] Fps is (10 sec: 4503.1, 60 sec: 4163.9, 300 sec: 4179.2). Total num frames: 7725056. Throughput: 0: 1050.3. Samples: 929088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:19:22,971][00376] Avg episode reward: [(0, '26.601')] [2025-02-19 02:19:23,827][12390] Updated weights for policy 0, policy_version 1888 (0.0015) [2025-02-19 02:19:27,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4165.4). Total num frames: 7741440. Throughput: 0: 1034.3. Samples: 934796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:19:27,966][00376] Avg episode reward: [(0, '25.201')] [2025-02-19 02:19:32,965][00376] Fps is (10 sec: 4098.2, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 7766016. Throughput: 0: 1049.0. Samples: 937692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:19:32,970][00376] Avg episode reward: [(0, '24.662')] [2025-02-19 02:19:34,314][12390] Updated weights for policy 0, policy_version 1898 (0.0013) [2025-02-19 02:19:37,965][00376] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 7790592. Throughput: 0: 1046.8. Samples: 944790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-19 02:19:37,966][00376] Avg episode reward: [(0, '24.566')] [2025-02-19 02:19:42,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.4, 300 sec: 4179.3). Total num frames: 7806976. Throughput: 0: 1034.4. Samples: 950366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:19:42,970][00376] Avg episode reward: [(0, '25.546')] [2025-02-19 02:19:44,521][12390] Updated weights for policy 0, policy_version 1908 (0.0023) [2025-02-19 02:19:47,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 7827456. Throughput: 0: 1049.8. Samples: 953542. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:19:47,969][00376] Avg episode reward: [(0, '27.207')] [2025-02-19 02:19:52,965][00376] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4207.1). Total num frames: 7856128. Throughput: 0: 1057.2. Samples: 960776. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-19 02:19:52,967][00376] Avg episode reward: [(0, '27.000')] [2025-02-19 02:19:52,964][12390] Updated weights for policy 0, policy_version 1918 (0.0019) [2025-02-19 02:19:57,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 7868416. Throughput: 0: 1043.4. Samples: 966072. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-19 02:19:57,969][00376] Avg episode reward: [(0, '27.254')] [2025-02-19 02:19:57,976][12377] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001921_7868416.pth... [2025-02-19 02:19:58,097][12377] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001677_6868992.pth [2025-02-19 02:20:02,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4232.6, 300 sec: 4193.2). Total num frames: 7892992. Throughput: 0: 1054.1. Samples: 969334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:20:02,969][00376] Avg episode reward: [(0, '27.472')] [2025-02-19 02:20:03,432][12390] Updated weights for policy 0, policy_version 1928 (0.0029) [2025-02-19 02:20:07,965][00376] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4179.3). Total num frames: 7913472. Throughput: 0: 1052.5. Samples: 976444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:20:07,971][00376] Avg episode reward: [(0, '27.714')] [2025-02-19 02:20:12,965][00376] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 7929856. Throughput: 0: 1037.3. Samples: 981474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:20:12,966][00376] Avg episode reward: [(0, '25.788')] [2025-02-19 02:20:13,971][12390] Updated weights for policy 0, policy_version 1938 (0.0020) [2025-02-19 02:20:17,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4193.2). Total num frames: 7954432. Throughput: 0: 1052.7. Samples: 985062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-19 02:20:17,966][00376] Avg episode reward: [(0, '25.549')] [2025-02-19 02:20:22,677][12390] Updated weights for policy 0, policy_version 1948 (0.0015) [2025-02-19 02:20:22,965][00376] Fps is (10 sec: 4914.9, 60 sec: 4232.9, 300 sec: 4193.2). Total num frames: 7979008. Throughput: 0: 1055.6. Samples: 992292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-19 02:20:22,967][00376] Avg episode reward: [(0, '25.274')] [2025-02-19 02:20:27,965][00376] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 7995392. Throughput: 0: 1043.8. Samples: 997338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-19 02:20:27,969][00376] Avg episode reward: [(0, '25.187')] [2025-02-19 02:20:30,316][00376] Component Batcher_0 stopped! [2025-02-19 02:20:30,315][12377] Stopping Batcher_0... [2025-02-19 02:20:30,321][12377] Loop batcher_evt_loop terminating... [2025-02-19 02:20:30,320][12377] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2025-02-19 02:20:30,384][12390] Weights refcount: 2 0 [2025-02-19 02:20:30,389][12390] Stopping InferenceWorker_p0-w0... [2025-02-19 02:20:30,389][00376] Component InferenceWorker_p0-w0 stopped! [2025-02-19 02:20:30,390][12390] Loop inference_proc0-0_evt_loop terminating... [2025-02-19 02:20:30,448][12377] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001799_7368704.pth [2025-02-19 02:20:30,457][12377] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2025-02-19 02:20:30,672][00376] Component LearnerWorker_p0 stopped! [2025-02-19 02:20:30,673][12377] Stopping LearnerWorker_p0... [2025-02-19 02:20:30,673][12377] Loop learner_proc0_evt_loop terminating... [2025-02-19 02:20:30,702][00376] Component RolloutWorker_w7 stopped! [2025-02-19 02:20:30,708][00376] Component RolloutWorker_w3 stopped! [2025-02-19 02:20:30,708][12394] Stopping RolloutWorker_w3... [2025-02-19 02:20:30,702][12398] Stopping RolloutWorker_w7... [2025-02-19 02:20:30,716][12398] Loop rollout_proc7_evt_loop terminating... [2025-02-19 02:20:30,714][12394] Loop rollout_proc3_evt_loop terminating... [2025-02-19 02:20:30,726][00376] Component RolloutWorker_w1 stopped! [2025-02-19 02:20:30,726][12392] Stopping RolloutWorker_w1... [2025-02-19 02:20:30,731][00376] Component RolloutWorker_w5 stopped! [2025-02-19 02:20:30,731][12396] Stopping RolloutWorker_w5... [2025-02-19 02:20:30,736][12392] Loop rollout_proc1_evt_loop terminating... [2025-02-19 02:20:30,737][12396] Loop rollout_proc5_evt_loop terminating... [2025-02-19 02:20:30,900][00376] Component RolloutWorker_w4 stopped! [2025-02-19 02:20:30,901][12395] Stopping RolloutWorker_w4... [2025-02-19 02:20:30,902][12395] Loop rollout_proc4_evt_loop terminating... [2025-02-19 02:20:30,905][00376] Component RolloutWorker_w2 stopped! [2025-02-19 02:20:30,906][12393] Stopping RolloutWorker_w2... [2025-02-19 02:20:30,907][12393] Loop rollout_proc2_evt_loop terminating... [2025-02-19 02:20:30,939][00376] Component RolloutWorker_w0 stopped! [2025-02-19 02:20:30,940][12391] Stopping RolloutWorker_w0... [2025-02-19 02:20:30,941][12391] Loop rollout_proc0_evt_loop terminating... [2025-02-19 02:20:30,966][00376] Component RolloutWorker_w6 stopped! [2025-02-19 02:20:30,967][00376] Waiting for process learner_proc0 to stop... [2025-02-19 02:20:30,968][12397] Stopping RolloutWorker_w6... [2025-02-19 02:20:30,968][12397] Loop rollout_proc6_evt_loop terminating... [2025-02-19 02:20:32,522][00376] Waiting for process inference_proc0-0 to join... [2025-02-19 02:20:32,526][00376] Waiting for process rollout_proc0 to join... [2025-02-19 02:20:34,769][00376] Waiting for process rollout_proc1 to join... [2025-02-19 02:20:34,771][00376] Waiting for process rollout_proc2 to join... [2025-02-19 02:20:34,772][00376] Waiting for process rollout_proc3 to join... [2025-02-19 02:20:34,774][00376] Waiting for process rollout_proc4 to join... [2025-02-19 02:20:34,777][00376] Waiting for process rollout_proc5 to join... [2025-02-19 02:20:34,778][00376] Waiting for process rollout_proc6 to join... [2025-02-19 02:20:34,779][00376] Waiting for process rollout_proc7 to join... [2025-02-19 02:20:34,780][00376] Batcher 0 profile tree view: batching: 25.4812, releasing_batches: 0.0270 [2025-02-19 02:20:34,781][00376] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 373.8903 update_model: 8.2712 weight_update: 0.0014 one_step: 0.0048 handle_policy_step: 554.3267 deserialize: 13.6233, stack: 3.0441, obs_to_device_normalize: 117.7136, forward: 282.1877, send_messages: 27.1832 prepare_outputs: 86.5312 to_cpu: 52.8726 [2025-02-19 02:20:34,782][00376] Learner 0 profile tree view: misc: 0.0049, prepare_batch: 11.9718 train: 73.2805 epoch_init: 0.0082, minibatch_init: 0.0056, losses_postprocess: 0.7399, kl_divergence: 0.6606, after_optimizer: 2.9129 calculate_losses: 24.5938 losses_init: 0.0075, forward_head: 1.4408, bptt_initial: 16.3458, tail: 1.0334, advantages_returns: 0.2461, losses: 3.4087 bptt: 1.8583 bptt_forward_core: 1.7896 update: 43.7646 clip: 0.7813 [2025-02-19 02:20:34,783][00376] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2129, enqueue_policy_requests: 88.2364, env_step: 775.7673, overhead: 10.5578, complete_rollouts: 6.6741 save_policy_outputs: 16.6933 split_output_tensors: 6.3778 [2025-02-19 02:20:34,784][00376] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2165, enqueue_policy_requests: 86.8633, env_step: 772.3320, overhead: 11.4619, complete_rollouts: 6.8690 save_policy_outputs: 17.3034 split_output_tensors: 6.6137 [2025-02-19 02:20:34,787][00376] Loop Runner_EvtLoop terminating... [2025-02-19 02:20:34,788][00376] Runner profile tree view: main_loop: 994.5507 [2025-02-19 02:20:34,789][00376] Collected {0: 8007680}, FPS: 4023.7 [2025-02-19 02:52:19,838][24317] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-02-19 02:52:19,842][24317] Rollout worker 0 uses device cpu [2025-02-19 02:52:19,843][24317] Rollout worker 1 uses device cpu [2025-02-19 02:52:19,845][24317] Rollout worker 2 uses device cpu [2025-02-19 02:52:19,846][24317] Rollout worker 3 uses device cpu [2025-02-19 02:52:19,847][24317] Rollout worker 4 uses device cpu [2025-02-19 02:52:19,848][24317] Rollout worker 5 uses device cpu [2025-02-19 02:52:19,850][24317] Rollout worker 6 uses device cpu [2025-02-19 02:52:19,851][24317] Rollout worker 7 uses device cpu [2025-02-19 02:52:19,962][24317] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 02:52:19,963][24317] InferenceWorker_p0-w0: min num requests: 2 [2025-02-19 02:52:19,997][24317] Starting all processes... [2025-02-19 02:52:19,997][24317] Starting process learner_proc0 [2025-02-19 02:52:20,059][24317] Starting all processes... [2025-02-19 02:52:20,066][24317] Starting process inference_proc0-0 [2025-02-19 02:52:20,067][24317] Starting process rollout_proc0 [2025-02-19 02:52:20,067][24317] Starting process rollout_proc1 [2025-02-19 02:52:20,068][24317] Starting process rollout_proc2 [2025-02-19 02:52:20,068][24317] Starting process rollout_proc3 [2025-02-19 02:52:20,068][24317] Starting process rollout_proc4 [2025-02-19 02:52:20,068][24317] Starting process rollout_proc5 [2025-02-19 02:52:20,068][24317] Starting process rollout_proc6 [2025-02-19 02:52:20,068][24317] Starting process rollout_proc7 [2025-02-19 02:52:34,987][24695] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 02:52:34,989][24695] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-02-19 02:52:35,085][24695] Num visible devices: 1 [2025-02-19 02:52:35,127][24695] Starting seed is not provided [2025-02-19 02:52:35,128][24695] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 02:52:35,129][24695] Initializing actor-critic model on device cuda:0 [2025-02-19 02:52:35,130][24695] RunningMeanStd input shape: (3, 72, 128) [2025-02-19 02:52:35,131][24695] RunningMeanStd input shape: (1,) [2025-02-19 02:52:35,211][24695] ConvEncoder: input_channels=3 [2025-02-19 02:52:35,227][24710] Worker 1 uses CPU cores [1] [2025-02-19 02:52:35,425][24709] Worker 0 uses CPU cores [0] [2025-02-19 02:52:35,533][24716] Worker 7 uses CPU cores [1] [2025-02-19 02:52:35,559][24711] Worker 2 uses CPU cores [0] [2025-02-19 02:52:35,564][24713] Worker 4 uses CPU cores [0] [2025-02-19 02:52:35,574][24712] Worker 3 uses CPU cores [1] [2025-02-19 02:52:35,593][24715] Worker 6 uses CPU cores [0] [2025-02-19 02:52:35,615][24708] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 02:52:35,616][24708] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-02-19 02:52:35,638][24714] Worker 5 uses CPU cores [1] [2025-02-19 02:52:35,647][24708] Num visible devices: 1 [2025-02-19 02:52:35,679][24695] Conv encoder output size: 512 [2025-02-19 02:52:35,679][24695] Policy head output size: 512 [2025-02-19 02:52:35,694][24695] Created Actor Critic model with architecture: [2025-02-19 02:52:35,694][24695] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-02-19 02:52:35,946][24695] Using optimizer [2025-02-19 02:52:37,006][24695] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2025-02-19 02:52:37,059][24695] Loading model from checkpoint [2025-02-19 02:52:37,061][24695] Loaded experiment state at self.train_step=1955, self.env_steps=8007680 [2025-02-19 02:52:37,062][24695] Initialized policy 0 weights for model version 1955 [2025-02-19 02:52:37,066][24695] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 02:52:37,074][24695] LearnerWorker_p0 finished initialization! [2025-02-19 02:52:37,336][24708] RunningMeanStd input shape: (3, 72, 128) [2025-02-19 02:52:37,338][24708] RunningMeanStd input shape: (1,) [2025-02-19 02:52:37,356][24708] ConvEncoder: input_channels=3 [2025-02-19 02:52:37,507][24708] Conv encoder output size: 512 [2025-02-19 02:52:37,508][24708] Policy head output size: 512 [2025-02-19 02:52:37,558][24317] Inference worker 0-0 is ready! [2025-02-19 02:52:37,559][24317] All inference workers are ready! Signal rollout workers to start! [2025-02-19 02:52:37,812][24710] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 02:52:37,810][24716] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 02:52:37,808][24712] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 02:52:37,815][24714] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 02:52:37,894][24715] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 02:52:37,898][24711] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 02:52:37,900][24709] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 02:52:37,893][24713] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 02:52:38,839][24709] Decorrelating experience for 0 frames... [2025-02-19 02:52:39,677][24712] Decorrelating experience for 0 frames... [2025-02-19 02:52:39,682][24716] Decorrelating experience for 0 frames... [2025-02-19 02:52:39,684][24714] Decorrelating experience for 0 frames... [2025-02-19 02:52:39,689][24710] Decorrelating experience for 0 frames... [2025-02-19 02:52:39,953][24317] Heartbeat connected on Batcher_0 [2025-02-19 02:52:39,957][24317] Heartbeat connected on LearnerWorker_p0 [2025-02-19 02:52:40,013][24317] Heartbeat connected on InferenceWorker_p0-w0 [2025-02-19 02:52:40,092][24709] Decorrelating experience for 32 frames... [2025-02-19 02:52:40,621][24709] Decorrelating experience for 64 frames... [2025-02-19 02:52:41,059][24709] Decorrelating experience for 96 frames... [2025-02-19 02:52:41,113][24716] Decorrelating experience for 32 frames... [2025-02-19 02:52:41,116][24714] Decorrelating experience for 32 frames... [2025-02-19 02:52:41,118][24712] Decorrelating experience for 32 frames... [2025-02-19 02:52:41,124][24710] Decorrelating experience for 32 frames... [2025-02-19 02:52:41,157][24317] Heartbeat connected on RolloutWorker_w0 [2025-02-19 02:52:41,361][24317] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 8007680. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-19 02:52:42,525][24714] Decorrelating experience for 64 frames... [2025-02-19 02:52:42,528][24712] Decorrelating experience for 64 frames... [2025-02-19 02:52:42,773][24711] Decorrelating experience for 0 frames... [2025-02-19 02:52:42,779][24715] Decorrelating experience for 0 frames... [2025-02-19 02:52:43,968][24710] Decorrelating experience for 64 frames... [2025-02-19 02:52:44,058][24712] Decorrelating experience for 96 frames... [2025-02-19 02:52:44,068][24714] Decorrelating experience for 96 frames... [2025-02-19 02:52:44,320][24317] Heartbeat connected on RolloutWorker_w3 [2025-02-19 02:52:44,327][24317] Heartbeat connected on RolloutWorker_w5 [2025-02-19 02:52:44,394][24716] Decorrelating experience for 64 frames... [2025-02-19 02:52:44,872][24715] Decorrelating experience for 32 frames... [2025-02-19 02:52:44,874][24711] Decorrelating experience for 32 frames... [2025-02-19 02:52:44,884][24713] Decorrelating experience for 0 frames... [2025-02-19 02:52:46,361][24317] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8007680. Throughput: 0: 93.6. Samples: 468. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-19 02:52:46,362][24317] Avg episode reward: [(0, '4.104')] [2025-02-19 02:52:47,542][24710] Decorrelating experience for 96 frames... [2025-02-19 02:52:48,399][24317] Heartbeat connected on RolloutWorker_w1 [2025-02-19 02:52:49,448][24695] Stopping Batcher_0... [2025-02-19 02:52:49,450][24695] Loop batcher_evt_loop terminating... [2025-02-19 02:52:49,450][24317] Component Batcher_0 stopped! [2025-02-19 02:52:49,463][24695] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001956_8011776.pth... [2025-02-19 02:52:49,608][24708] Weights refcount: 2 0 [2025-02-19 02:52:49,622][24317] Component InferenceWorker_p0-w0 stopped! [2025-02-19 02:52:49,622][24708] Stopping InferenceWorker_p0-w0... [2025-02-19 02:52:49,628][24708] Loop inference_proc0-0_evt_loop terminating... [2025-02-19 02:52:49,797][24695] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001921_7868416.pth [2025-02-19 02:52:49,840][24695] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001956_8011776.pth... [2025-02-19 02:52:50,312][24695] Stopping LearnerWorker_p0... [2025-02-19 02:52:50,314][24695] Loop learner_proc0_evt_loop terminating... [2025-02-19 02:52:50,312][24317] Component LearnerWorker_p0 stopped! [2025-02-19 02:52:50,418][24714] Stopping RolloutWorker_w5... [2025-02-19 02:52:50,418][24317] Component RolloutWorker_w5 stopped! [2025-02-19 02:52:50,419][24714] Loop rollout_proc5_evt_loop terminating... [2025-02-19 02:52:50,484][24317] Component RolloutWorker_w0 stopped! [2025-02-19 02:52:50,485][24709] Stopping RolloutWorker_w0... [2025-02-19 02:52:50,488][24709] Loop rollout_proc0_evt_loop terminating... [2025-02-19 02:52:50,531][24712] Stopping RolloutWorker_w3... [2025-02-19 02:52:50,532][24712] Loop rollout_proc3_evt_loop terminating... [2025-02-19 02:52:50,531][24317] Component RolloutWorker_w3 stopped! [2025-02-19 02:52:50,547][24317] Component RolloutWorker_w1 stopped! [2025-02-19 02:52:50,550][24710] Stopping RolloutWorker_w1... [2025-02-19 02:52:50,574][24710] Loop rollout_proc1_evt_loop terminating... [2025-02-19 02:52:50,774][24713] Decorrelating experience for 32 frames... [2025-02-19 02:52:53,079][24711] Decorrelating experience for 64 frames... [2025-02-19 02:52:53,076][24715] Decorrelating experience for 64 frames... [2025-02-19 02:52:53,647][24716] Decorrelating experience for 96 frames... [2025-02-19 02:52:54,019][24317] Component RolloutWorker_w7 stopped! [2025-02-19 02:52:54,020][24716] Stopping RolloutWorker_w7... [2025-02-19 02:52:54,021][24716] Loop rollout_proc7_evt_loop terminating... [2025-02-19 02:52:54,328][24713] Decorrelating experience for 64 frames... [2025-02-19 02:52:54,446][24715] Decorrelating experience for 96 frames... [2025-02-19 02:52:54,798][24317] Component RolloutWorker_w6 stopped! [2025-02-19 02:52:54,804][24715] Stopping RolloutWorker_w6... [2025-02-19 02:52:54,805][24715] Loop rollout_proc6_evt_loop terminating... [2025-02-19 02:52:55,721][24711] Decorrelating experience for 96 frames... [2025-02-19 02:52:55,803][24713] Decorrelating experience for 96 frames... [2025-02-19 02:52:55,942][24711] Stopping RolloutWorker_w2... [2025-02-19 02:52:55,943][24711] Loop rollout_proc2_evt_loop terminating... [2025-02-19 02:52:55,942][24317] Component RolloutWorker_w2 stopped! [2025-02-19 02:52:56,034][24713] Stopping RolloutWorker_w4... [2025-02-19 02:52:56,034][24713] Loop rollout_proc4_evt_loop terminating... [2025-02-19 02:52:56,034][24317] Component RolloutWorker_w4 stopped! [2025-02-19 02:52:56,035][24317] Waiting for process learner_proc0 to stop... [2025-02-19 02:52:56,036][24317] Waiting for process inference_proc0-0 to join... [2025-02-19 02:52:56,037][24317] Waiting for process rollout_proc0 to join... [2025-02-19 02:52:56,038][24317] Waiting for process rollout_proc1 to join... [2025-02-19 02:52:56,039][24317] Waiting for process rollout_proc2 to join... [2025-02-19 02:52:57,061][24317] Waiting for process rollout_proc3 to join... [2025-02-19 02:52:57,062][24317] Waiting for process rollout_proc4 to join... [2025-02-19 02:52:57,081][24317] Waiting for process rollout_proc5 to join... [2025-02-19 02:52:57,082][24317] Waiting for process rollout_proc6 to join... [2025-02-19 02:52:57,083][24317] Waiting for process rollout_proc7 to join... [2025-02-19 02:52:57,084][24317] Batcher 0 profile tree view: batching: 0.0421, releasing_batches: 0.0000 [2025-02-19 02:52:57,085][24317] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 8.7231 update_model: 0.0209 weight_update: 0.0013 one_step: 0.0024 handle_policy_step: 3.0143 deserialize: 0.0473, stack: 0.0082, obs_to_device_normalize: 0.5846, forward: 1.7970, send_messages: 0.0560 prepare_outputs: 0.4054 to_cpu: 0.3033 [2025-02-19 02:52:57,086][24317] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 1.1619 train: 3.2530 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0003, kl_divergence: 0.0491, after_optimizer: 0.1808 calculate_losses: 1.1032 losses_init: 0.0000, forward_head: 0.4122, bptt_initial: 0.4741, tail: 0.1010, advantages_returns: 0.0010, losses: 0.1081 bptt: 0.0066 bptt_forward_core: 0.0065 update: 1.9188 clip: 0.1257 [2025-02-19 02:52:57,088][24317] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0014, enqueue_policy_requests: 2.1785, env_step: 4.4155, overhead: 0.1251, complete_rollouts: 0.0500 save_policy_outputs: 0.1072 split_output_tensors: 0.0379 [2025-02-19 02:52:57,089][24317] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.0028 [2025-02-19 02:52:57,090][24317] Loop Runner_EvtLoop terminating... [2025-02-19 02:52:57,091][24317] Runner profile tree view: main_loop: 37.0944 [2025-02-19 02:52:57,092][24317] Collected {0: 8011776}, FPS: 110.4 [2025-02-19 03:02:06,573][24317] Environment doom_basic already registered, overwriting... [2025-02-19 03:02:06,574][24317] Environment doom_two_colors_easy already registered, overwriting... [2025-02-19 03:02:06,575][24317] Environment doom_two_colors_hard already registered, overwriting... [2025-02-19 03:02:06,576][24317] Environment doom_dm already registered, overwriting... [2025-02-19 03:02:06,576][24317] Environment doom_dwango5 already registered, overwriting... [2025-02-19 03:02:06,577][24317] Environment doom_my_way_home_flat_actions already registered, overwriting... [2025-02-19 03:02:06,578][24317] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2025-02-19 03:02:06,578][24317] Environment doom_my_way_home already registered, overwriting... [2025-02-19 03:02:06,582][24317] Environment doom_deadly_corridor already registered, overwriting... [2025-02-19 03:02:06,583][24317] Environment doom_defend_the_center already registered, overwriting... [2025-02-19 03:02:06,584][24317] Environment doom_defend_the_line already registered, overwriting... [2025-02-19 03:02:06,585][24317] Environment doom_health_gathering already registered, overwriting... [2025-02-19 03:02:06,586][24317] Environment doom_health_gathering_supreme already registered, overwriting... [2025-02-19 03:02:06,587][24317] Environment doom_battle already registered, overwriting... [2025-02-19 03:02:06,588][24317] Environment doom_battle2 already registered, overwriting... [2025-02-19 03:02:06,589][24317] Environment doom_duel_bots already registered, overwriting... [2025-02-19 03:02:06,592][24317] Environment doom_deathmatch_bots already registered, overwriting... [2025-02-19 03:02:06,593][24317] Environment doom_duel already registered, overwriting... [2025-02-19 03:02:06,594][24317] Environment doom_deathmatch_full already registered, overwriting... [2025-02-19 03:02:06,595][24317] Environment doom_benchmark already registered, overwriting... [2025-02-19 03:02:06,595][24317] register_encoder_factory: [2025-02-19 03:02:06,613][24317] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-02-19 03:02:06,623][24317] Experiment dir /content/train_dir/default_experiment already exists! [2025-02-19 03:02:06,624][24317] Resuming existing experiment from /content/train_dir/default_experiment... [2025-02-19 03:02:06,625][24317] Weights and Biases integration disabled [2025-02-19 03:02:06,630][24317] Environment var CUDA_VISIBLE_DEVICES is 0 [2025-02-19 03:02:09,080][24317] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=8000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2025-02-19 03:02:09,081][24317] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-02-19 03:02:09,084][24317] Rollout worker 0 uses device cpu [2025-02-19 03:02:09,085][24317] Rollout worker 1 uses device cpu [2025-02-19 03:02:09,086][24317] Rollout worker 2 uses device cpu [2025-02-19 03:02:09,087][24317] Rollout worker 3 uses device cpu [2025-02-19 03:02:09,088][24317] Rollout worker 4 uses device cpu [2025-02-19 03:02:09,089][24317] Rollout worker 5 uses device cpu [2025-02-19 03:02:09,090][24317] Rollout worker 6 uses device cpu [2025-02-19 03:02:09,090][24317] Rollout worker 7 uses device cpu [2025-02-19 03:02:09,192][24317] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 03:02:09,193][24317] InferenceWorker_p0-w0: min num requests: 2 [2025-02-19 03:02:09,226][24317] Starting all processes... [2025-02-19 03:02:09,226][24317] Starting process learner_proc0 [2025-02-19 03:02:09,277][24317] Starting all processes... [2025-02-19 03:02:09,280][24317] Starting process inference_proc0-0 [2025-02-19 03:02:09,280][24317] Starting process rollout_proc0 [2025-02-19 03:02:09,281][24317] Starting process rollout_proc1 [2025-02-19 03:02:09,281][24317] Starting process rollout_proc2 [2025-02-19 03:02:09,281][24317] Starting process rollout_proc3 [2025-02-19 03:02:09,281][24317] Starting process rollout_proc4 [2025-02-19 03:02:09,281][24317] Starting process rollout_proc5 [2025-02-19 03:02:09,281][24317] Starting process rollout_proc6 [2025-02-19 03:02:09,281][24317] Starting process rollout_proc7 [2025-02-19 03:02:24,311][27448] Worker 6 uses CPU cores [0] [2025-02-19 03:02:24,520][27444] Worker 3 uses CPU cores [1] [2025-02-19 03:02:24,573][27428] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 03:02:24,574][27428] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-02-19 03:02:24,617][27446] Worker 4 uses CPU cores [0] [2025-02-19 03:02:24,630][27441] Worker 0 uses CPU cores [0] [2025-02-19 03:02:24,631][27428] Num visible devices: 1 [2025-02-19 03:02:24,657][27445] Worker 5 uses CPU cores [1] [2025-02-19 03:02:24,662][27428] Starting seed is not provided [2025-02-19 03:02:24,662][27428] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 03:02:24,663][27428] Initializing actor-critic model on device cuda:0 [2025-02-19 03:02:24,663][27428] RunningMeanStd input shape: (3, 72, 128) [2025-02-19 03:02:24,664][27428] RunningMeanStd input shape: (1,) [2025-02-19 03:02:24,704][27428] ConvEncoder: input_channels=3 [2025-02-19 03:02:24,775][27449] Worker 7 uses CPU cores [1] [2025-02-19 03:02:24,814][27442] Worker 1 uses CPU cores [1] [2025-02-19 03:02:24,836][27447] Worker 2 uses CPU cores [0] [2025-02-19 03:02:24,902][27443] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 03:02:24,902][27443] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-02-19 03:02:24,921][27443] Num visible devices: 1 [2025-02-19 03:02:24,928][27428] Conv encoder output size: 512 [2025-02-19 03:02:24,929][27428] Policy head output size: 512 [2025-02-19 03:02:24,946][27428] Created Actor Critic model with architecture: [2025-02-19 03:02:24,946][27428] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-02-19 03:02:25,208][27428] Using optimizer [2025-02-19 03:02:26,251][27428] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001956_8011776.pth... [2025-02-19 03:02:26,288][27428] Loading model from checkpoint [2025-02-19 03:02:26,290][27428] Loaded experiment state at self.train_step=1956, self.env_steps=8011776 [2025-02-19 03:02:26,290][27428] Initialized policy 0 weights for model version 1956 [2025-02-19 03:02:26,293][27428] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-19 03:02:26,302][27428] LearnerWorker_p0 finished initialization! [2025-02-19 03:02:26,485][27443] RunningMeanStd input shape: (3, 72, 128) [2025-02-19 03:02:26,486][27443] RunningMeanStd input shape: (1,) [2025-02-19 03:02:26,498][27443] ConvEncoder: input_channels=3 [2025-02-19 03:02:26,598][27443] Conv encoder output size: 512 [2025-02-19 03:02:26,598][27443] Policy head output size: 512 [2025-02-19 03:02:26,631][24317] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 8011776. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-19 03:02:26,635][24317] Inference worker 0-0 is ready! [2025-02-19 03:02:26,636][24317] All inference workers are ready! Signal rollout workers to start! [2025-02-19 03:02:26,844][27448] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 03:02:26,841][27449] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 03:02:26,847][27446] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 03:02:26,842][27442] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 03:02:26,847][27441] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 03:02:26,850][27447] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 03:02:26,844][27445] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 03:02:26,839][27444] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 03:02:27,455][27442] Decorrelating experience for 0 frames... [2025-02-19 03:02:27,816][27448] Decorrelating experience for 0 frames... [2025-02-19 03:02:27,818][27446] Decorrelating experience for 0 frames... [2025-02-19 03:02:27,821][27442] Decorrelating experience for 32 frames... [2025-02-19 03:02:28,327][27444] Decorrelating experience for 0 frames... [2025-02-19 03:02:28,705][27444] Decorrelating experience for 32 frames... [2025-02-19 03:02:29,184][24317] Heartbeat connected on Batcher_0 [2025-02-19 03:02:29,188][24317] Heartbeat connected on LearnerWorker_p0 [2025-02-19 03:02:29,192][27445] Decorrelating experience for 0 frames... [2025-02-19 03:02:29,240][24317] Heartbeat connected on InferenceWorker_p0-w0 [2025-02-19 03:02:29,305][27446] Decorrelating experience for 32 frames... [2025-02-19 03:02:29,306][27448] Decorrelating experience for 32 frames... [2025-02-19 03:02:29,328][27441] Decorrelating experience for 0 frames... [2025-02-19 03:02:29,348][27447] Decorrelating experience for 0 frames... [2025-02-19 03:02:29,928][27447] Decorrelating experience for 32 frames... [2025-02-19 03:02:30,359][27445] Decorrelating experience for 32 frames... [2025-02-19 03:02:30,404][27449] Decorrelating experience for 0 frames... [2025-02-19 03:02:30,605][27442] Decorrelating experience for 64 frames... [2025-02-19 03:02:30,620][27444] Decorrelating experience for 64 frames... [2025-02-19 03:02:31,131][27448] Decorrelating experience for 64 frames... [2025-02-19 03:02:31,631][24317] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8011776. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-19 03:02:31,713][27447] Decorrelating experience for 64 frames... [2025-02-19 03:02:31,809][27449] Decorrelating experience for 32 frames... [2025-02-19 03:02:32,340][27448] Decorrelating experience for 96 frames... [2025-02-19 03:02:32,564][24317] Heartbeat connected on RolloutWorker_w6 [2025-02-19 03:02:32,650][27442] Decorrelating experience for 96 frames... [2025-02-19 03:02:33,178][24317] Heartbeat connected on RolloutWorker_w1 [2025-02-19 03:02:33,385][27446] Decorrelating experience for 64 frames... [2025-02-19 03:02:34,217][27445] Decorrelating experience for 64 frames... [2025-02-19 03:02:34,946][27441] Decorrelating experience for 32 frames... [2025-02-19 03:02:34,948][27449] Decorrelating experience for 64 frames... [2025-02-19 03:02:35,730][27446] Decorrelating experience for 96 frames... [2025-02-19 03:02:36,065][24317] Heartbeat connected on RolloutWorker_w4 [2025-02-19 03:02:36,594][27444] Decorrelating experience for 96 frames... [2025-02-19 03:02:36,632][24317] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8011776. Throughput: 0: 88.0. Samples: 880. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-19 03:02:36,635][24317] Avg episode reward: [(0, '9.070')] [2025-02-19 03:02:36,815][27445] Decorrelating experience for 96 frames... [2025-02-19 03:02:36,937][24317] Heartbeat connected on RolloutWorker_w3 [2025-02-19 03:02:37,247][24317] Heartbeat connected on RolloutWorker_w5 [2025-02-19 03:02:37,698][27449] Decorrelating experience for 96 frames... [2025-02-19 03:02:37,987][24317] Heartbeat connected on RolloutWorker_w7 [2025-02-19 03:02:38,784][27428] Signal inference workers to stop experience collection... [2025-02-19 03:02:38,804][27443] InferenceWorker_p0-w0: stopping experience collection [2025-02-19 03:02:38,823][27441] Decorrelating experience for 64 frames... [2025-02-19 03:02:39,030][27447] Decorrelating experience for 96 frames... [2025-02-19 03:02:39,153][24317] Heartbeat connected on RolloutWorker_w2 [2025-02-19 03:02:39,227][27428] Signal inference workers to resume experience collection... [2025-02-19 03:02:39,232][27428] Stopping Batcher_0... [2025-02-19 03:02:39,233][24317] Component Batcher_0 stopped! [2025-02-19 03:02:39,233][27428] Loop batcher_evt_loop terminating... [2025-02-19 03:02:39,272][27443] Weights refcount: 2 0 [2025-02-19 03:02:39,278][24317] Component InferenceWorker_p0-w0 stopped! [2025-02-19 03:02:39,282][27443] Stopping InferenceWorker_p0-w0... [2025-02-19 03:02:39,283][27443] Loop inference_proc0-0_evt_loop terminating... [2025-02-19 03:02:39,572][24317] Component RolloutWorker_w4 stopped! [2025-02-19 03:02:39,577][27446] Stopping RolloutWorker_w4... [2025-02-19 03:02:39,585][24317] Component RolloutWorker_w2 stopped! [2025-02-19 03:02:39,588][27447] Stopping RolloutWorker_w2... [2025-02-19 03:02:39,597][24317] Component RolloutWorker_w6 stopped! [2025-02-19 03:02:39,600][27448] Stopping RolloutWorker_w6... [2025-02-19 03:02:39,581][27446] Loop rollout_proc4_evt_loop terminating... [2025-02-19 03:02:39,589][27447] Loop rollout_proc2_evt_loop terminating... [2025-02-19 03:02:39,601][27448] Loop rollout_proc6_evt_loop terminating... [2025-02-19 03:02:39,649][27445] Stopping RolloutWorker_w5... [2025-02-19 03:02:39,652][27445] Loop rollout_proc5_evt_loop terminating... [2025-02-19 03:02:39,649][24317] Component RolloutWorker_w5 stopped! [2025-02-19 03:02:39,676][24317] Component RolloutWorker_w7 stopped! [2025-02-19 03:02:39,677][24317] Component RolloutWorker_w3 stopped! [2025-02-19 03:02:39,676][27444] Stopping RolloutWorker_w3... [2025-02-19 03:02:39,677][27449] Stopping RolloutWorker_w7... [2025-02-19 03:02:39,679][27444] Loop rollout_proc3_evt_loop terminating... [2025-02-19 03:02:39,681][27449] Loop rollout_proc7_evt_loop terminating... [2025-02-19 03:02:39,688][27428] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001958_8019968.pth... [2025-02-19 03:02:39,706][24317] Component RolloutWorker_w1 stopped! [2025-02-19 03:02:39,707][27442] Stopping RolloutWorker_w1... [2025-02-19 03:02:39,708][27442] Loop rollout_proc1_evt_loop terminating... [2025-02-19 03:02:39,801][27428] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth [2025-02-19 03:02:39,816][27428] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001958_8019968.pth... [2025-02-19 03:02:39,988][24317] Component LearnerWorker_p0 stopped! [2025-02-19 03:02:39,989][27428] Stopping LearnerWorker_p0... [2025-02-19 03:02:39,989][27428] Loop learner_proc0_evt_loop terminating... [2025-02-19 03:02:41,187][27441] Decorrelating experience for 96 frames... [2025-02-19 03:02:41,628][24317] Component RolloutWorker_w0 stopped! [2025-02-19 03:02:41,630][24317] Waiting for process learner_proc0 to stop... [2025-02-19 03:02:41,631][24317] Waiting for process inference_proc0-0 to join... [2025-02-19 03:02:41,632][24317] Waiting for process rollout_proc0 to join... [2025-02-19 03:02:41,628][27441] Stopping RolloutWorker_w0... [2025-02-19 03:02:41,640][27441] Loop rollout_proc0_evt_loop terminating... [2025-02-19 03:02:43,460][24317] Waiting for process rollout_proc1 to join... [2025-02-19 03:02:43,532][24317] Waiting for process rollout_proc2 to join... [2025-02-19 03:02:43,533][24317] Waiting for process rollout_proc3 to join... [2025-02-19 03:02:43,534][24317] Waiting for process rollout_proc4 to join... [2025-02-19 03:02:43,535][24317] Waiting for process rollout_proc5 to join... [2025-02-19 03:02:43,537][24317] Waiting for process rollout_proc6 to join... [2025-02-19 03:02:43,538][24317] Waiting for process rollout_proc7 to join... [2025-02-19 03:02:43,539][24317] Batcher 0 profile tree view: batching: 0.1821, releasing_batches: 0.0043 [2025-02-19 03:02:43,540][24317] InferenceWorker_p0-w0 profile tree view: update_model: 0.0193 wait_policy: 0.0000 wait_policy_total: 8.8782 one_step: 0.0078 handle_policy_step: 3.0920 deserialize: 0.0570, stack: 0.0083, obs_to_device_normalize: 0.5945, forward: 1.9746, send_messages: 0.0546 prepare_outputs: 0.3439 to_cpu: 0.2602 [2025-02-19 03:02:43,541][24317] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 1.4851 train: 2.2175 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0005, kl_divergence: 0.0231, after_optimizer: 0.0798 calculate_losses: 0.6490 losses_init: 0.0000, forward_head: 0.3752, bptt_initial: 0.1802, tail: 0.0430, advantages_returns: 0.0012, losses: 0.0457 bptt: 0.0033 bptt_forward_core: 0.0032 update: 1.4639 clip: 0.0510 [2025-02-19 03:02:43,542][24317] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.0007 [2025-02-19 03:02:43,543][24317] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.0692, env_step: 0.6902, overhead: 0.0039, complete_rollouts: 0.0000 save_policy_outputs: 0.0228 split_output_tensors: 0.0020 [2025-02-19 03:02:43,544][24317] Loop Runner_EvtLoop terminating... [2025-02-19 03:02:43,545][24317] Runner profile tree view: main_loop: 34.3200 [2025-02-19 03:02:43,546][24317] Collected {0: 8019968}, FPS: 238.7 [2025-02-19 03:03:19,736][24317] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-02-19 03:03:19,737][24317] Overriding arg 'num_workers' with value 1 passed from command line [2025-02-19 03:03:19,738][24317] Adding new argument 'no_render'=True that is not in the saved config file! [2025-02-19 03:03:19,739][24317] Adding new argument 'save_video'=True that is not in the saved config file! [2025-02-19 03:03:19,740][24317] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-02-19 03:03:19,740][24317] Adding new argument 'video_name'=None that is not in the saved config file! [2025-02-19 03:03:19,741][24317] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-02-19 03:03:19,743][24317] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-02-19 03:03:19,744][24317] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-02-19 03:03:19,745][24317] Adding new argument 'hf_repository'='kate0711/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-02-19 03:03:19,746][24317] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-02-19 03:03:19,746][24317] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-02-19 03:03:19,747][24317] Adding new argument 'train_script'=None that is not in the saved config file! [2025-02-19 03:03:19,751][24317] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-02-19 03:03:19,752][24317] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-02-19 03:03:19,793][24317] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-19 03:03:19,798][24317] RunningMeanStd input shape: (3, 72, 128) [2025-02-19 03:03:19,800][24317] RunningMeanStd input shape: (1,) [2025-02-19 03:03:19,820][24317] ConvEncoder: input_channels=3 [2025-02-19 03:03:19,977][24317] Conv encoder output size: 512 [2025-02-19 03:03:19,978][24317] Policy head output size: 512 [2025-02-19 03:03:20,298][24317] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001958_8019968.pth... [2025-02-19 03:03:21,083][24317] Num frames 100... [2025-02-19 03:03:21,209][24317] Num frames 200... [2025-02-19 03:03:21,333][24317] Num frames 300... [2025-02-19 03:03:21,464][24317] Num frames 400... [2025-02-19 03:03:21,593][24317] Num frames 500... [2025-02-19 03:03:21,751][24317] Avg episode rewards: #0: 9.760, true rewards: #0: 5.760 [2025-02-19 03:03:21,753][24317] Avg episode reward: 9.760, avg true_objective: 5.760 [2025-02-19 03:03:21,784][24317] Num frames 600... [2025-02-19 03:03:21,912][24317] Num frames 700... [2025-02-19 03:03:22,049][24317] Num frames 800... [2025-02-19 03:03:22,178][24317] Num frames 900... [2025-02-19 03:03:22,306][24317] Num frames 1000... [2025-02-19 03:03:22,435][24317] Num frames 1100... [2025-02-19 03:03:22,566][24317] Num frames 1200... [2025-02-19 03:03:22,704][24317] Num frames 1300... [2025-02-19 03:03:22,836][24317] Num frames 1400... [2025-02-19 03:03:22,964][24317] Num frames 1500... [2025-02-19 03:03:23,098][24317] Num frames 1600... [2025-02-19 03:03:23,229][24317] Num frames 1700... [2025-02-19 03:03:23,405][24317] Avg episode rewards: #0: 18.460, true rewards: #0: 8.960 [2025-02-19 03:03:23,406][24317] Avg episode reward: 18.460, avg true_objective: 8.960 [2025-02-19 03:03:23,421][24317] Num frames 1800... [2025-02-19 03:03:23,549][24317] Num frames 1900... [2025-02-19 03:03:23,677][24317] Num frames 2000... [2025-02-19 03:03:23,811][24317] Num frames 2100... [2025-02-19 03:03:23,938][24317] Num frames 2200... [2025-02-19 03:03:24,070][24317] Num frames 2300... [2025-02-19 03:03:24,199][24317] Num frames 2400... [2025-02-19 03:03:24,329][24317] Num frames 2500... [2025-02-19 03:03:24,461][24317] Num frames 2600... [2025-02-19 03:03:24,592][24317] Num frames 2700... [2025-02-19 03:03:24,723][24317] Num frames 2800... [2025-02-19 03:03:24,864][24317] Num frames 2900... [2025-02-19 03:03:24,992][24317] Num frames 3000... [2025-02-19 03:03:25,127][24317] Num frames 3100... [2025-02-19 03:03:25,258][24317] Num frames 3200... [2025-02-19 03:03:25,389][24317] Num frames 3300... [2025-02-19 03:03:25,485][24317] Avg episode rewards: #0: 25.763, true rewards: #0: 11.097 [2025-02-19 03:03:25,486][24317] Avg episode reward: 25.763, avg true_objective: 11.097 [2025-02-19 03:03:25,577][24317] Num frames 3400... [2025-02-19 03:03:25,709][24317] Num frames 3500... [2025-02-19 03:03:25,845][24317] Num frames 3600... [2025-02-19 03:03:25,972][24317] Num frames 3700... [2025-02-19 03:03:26,113][24317] Num frames 3800... [2025-02-19 03:03:26,241][24317] Num frames 3900... [2025-02-19 03:03:26,376][24317] Num frames 4000... [2025-02-19 03:03:26,512][24317] Num frames 4100... [2025-02-19 03:03:26,664][24317] Num frames 4200... [2025-02-19 03:03:26,805][24317] Num frames 4300... [2025-02-19 03:03:26,934][24317] Num frames 4400... [2025-02-19 03:03:27,070][24317] Num frames 4500... [2025-02-19 03:03:27,199][24317] Num frames 4600... [2025-02-19 03:03:27,347][24317] Avg episode rewards: #0: 26.183, true rewards: #0: 11.682 [2025-02-19 03:03:27,348][24317] Avg episode reward: 26.183, avg true_objective: 11.682 [2025-02-19 03:03:27,383][24317] Num frames 4700... [2025-02-19 03:03:27,514][24317] Num frames 4800... [2025-02-19 03:03:27,644][24317] Num frames 4900... [2025-02-19 03:03:27,774][24317] Num frames 5000... [2025-02-19 03:03:27,910][24317] Num frames 5100... [2025-02-19 03:03:28,043][24317] Num frames 5200... [2025-02-19 03:03:28,205][24317] Avg episode rewards: #0: 23.362, true rewards: #0: 10.562 [2025-02-19 03:03:28,206][24317] Avg episode reward: 23.362, avg true_objective: 10.562 [2025-02-19 03:03:28,233][24317] Num frames 5300... [2025-02-19 03:03:28,360][24317] Num frames 5400... [2025-02-19 03:03:28,489][24317] Num frames 5500... [2025-02-19 03:03:28,622][24317] Num frames 5600... [2025-02-19 03:03:28,759][24317] Num frames 5700... [2025-02-19 03:03:28,895][24317] Num frames 5800... [2025-02-19 03:03:29,030][24317] Num frames 5900... [2025-02-19 03:03:29,159][24317] Num frames 6000... [2025-02-19 03:03:29,286][24317] Num frames 6100... [2025-02-19 03:03:29,417][24317] Num frames 6200... [2025-02-19 03:03:29,548][24317] Num frames 6300... [2025-02-19 03:03:29,677][24317] Num frames 6400... [2025-02-19 03:03:29,805][24317] Num frames 6500... [2025-02-19 03:03:29,941][24317] Num frames 6600... [2025-02-19 03:03:30,075][24317] Num frames 6700... [2025-02-19 03:03:30,203][24317] Num frames 6800... [2025-02-19 03:03:30,322][24317] Avg episode rewards: #0: 24.915, true rewards: #0: 11.415 [2025-02-19 03:03:30,323][24317] Avg episode reward: 24.915, avg true_objective: 11.415 [2025-02-19 03:03:30,393][24317] Num frames 6900... [2025-02-19 03:03:30,554][24317] Num frames 7000... [2025-02-19 03:03:30,739][24317] Num frames 7100... [2025-02-19 03:03:30,920][24317] Num frames 7200... [2025-02-19 03:03:31,096][24317] Num frames 7300... [2025-02-19 03:03:31,262][24317] Num frames 7400... [2025-02-19 03:03:31,430][24317] Num frames 7500... [2025-02-19 03:03:31,598][24317] Num frames 7600... [2025-02-19 03:03:31,766][24317] Num frames 7700... [2025-02-19 03:03:31,833][24317] Avg episode rewards: #0: 23.864, true rewards: #0: 11.007 [2025-02-19 03:03:31,834][24317] Avg episode reward: 23.864, avg true_objective: 11.007 [2025-02-19 03:03:32,002][24317] Num frames 7800... [2025-02-19 03:03:32,184][24317] Num frames 7900... [2025-02-19 03:03:32,364][24317] Num frames 8000... [2025-02-19 03:03:32,544][24317] Num frames 8100... [2025-02-19 03:03:32,674][24317] Num frames 8200... [2025-02-19 03:03:32,804][24317] Num frames 8300... [2025-02-19 03:03:32,920][24317] Avg episode rewards: #0: 22.556, true rewards: #0: 10.431 [2025-02-19 03:03:32,921][24317] Avg episode reward: 22.556, avg true_objective: 10.431 [2025-02-19 03:03:33,000][24317] Num frames 8400... [2025-02-19 03:03:33,132][24317] Num frames 8500... [2025-02-19 03:03:33,255][24317] Num frames 8600... [2025-02-19 03:03:33,383][24317] Num frames 8700... [2025-02-19 03:03:33,512][24317] Num frames 8800... [2025-02-19 03:03:33,641][24317] Avg episode rewards: #0: 21.175, true rewards: #0: 9.841 [2025-02-19 03:03:33,642][24317] Avg episode reward: 21.175, avg true_objective: 9.841 [2025-02-19 03:03:33,697][24317] Num frames 8900... [2025-02-19 03:03:33,827][24317] Num frames 9000... [2025-02-19 03:03:33,955][24317] Num frames 9100... [2025-02-19 03:03:34,097][24317] Num frames 9200... [2025-02-19 03:03:34,222][24317] Num frames 9300... [2025-02-19 03:03:34,349][24317] Num frames 9400... [2025-02-19 03:03:34,488][24317] Avg episode rewards: #0: 19.965, true rewards: #0: 9.465 [2025-02-19 03:03:34,489][24317] Avg episode reward: 19.965, avg true_objective: 9.465 [2025-02-19 03:04:26,561][24317] Replay video saved to /content/train_dir/default_experiment/replay.mp4!