[INFO|2025-02-10 23:10:14] tokenization_utils_base.py:2032 >> loading file merges.txt

[INFO|2025-02-10 23:10:14] tokenization_utils_base.py:2032 >> loading file tokenizer.json

[INFO|2025-02-10 23:10:14] tokenization_utils_base.py:2032 >> loading file added_tokens.json

[INFO|2025-02-10 23:10:14] tokenization_utils_base.py:2032 >> loading file special_tokens_map.json

[INFO|2025-02-10 23:10:14] tokenization_utils_base.py:2032 >> loading file tokenizer_config.json

[INFO|2025-02-10 23:10:14] tokenization_utils_base.py:2032 >> loading file chat_template.jinja

[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2304 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[INFO|2025-02-10 23:10:15] configuration_utils.py:694 >> loading configuration file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/config.json

[INFO|2025-02-10 23:10:15] configuration_utils.py:768 >> Model config Qwen2Config {
  "_name_or_path": "/nas/shared/ma4agi/model/Qwen2.5-7B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.48.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2032 >> loading file vocab.json

[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2032 >> loading file merges.txt

[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2032 >> loading file tokenizer.json

[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2032 >> loading file added_tokens.json

[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2032 >> loading file special_tokens_map.json

[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2032 >> loading file tokenizer_config.json

[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2032 >> loading file chat_template.jinja

[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2304 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[INFO|2025-02-10 23:10:15] logging.py:157 >> Add <|im_end|> to stop words.

[INFO|2025-02-10 23:10:15] logging.py:157 >> Loading dataset graph_planning/graph_planning_train.json...

[INFO|2025-02-10 23:10:23] configuration_utils.py:694 >> loading configuration file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/config.json

[INFO|2025-02-10 23:10:23] configuration_utils.py:768 >> Model config Qwen2Config {
  "_name_or_path": "/nas/shared/ma4agi/model/Qwen2.5-7B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.48.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|2025-02-10 23:10:24] modeling_utils.py:3901 >> loading weights file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/model.safetensors.index.json

[INFO|2025-02-10 23:10:24] modeling_utils.py:1582 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.

[INFO|2025-02-10 23:10:24] configuration_utils.py:1140 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151645
}


[INFO|2025-02-10 23:10:27] modeling_utils.py:4888 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.


[INFO|2025-02-10 23:10:27] modeling_utils.py:4896 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.

[INFO|2025-02-10 23:10:27] configuration_utils.py:1093 >> loading configuration file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/generation_config.json

[INFO|2025-02-10 23:10:27] configuration_utils.py:1140 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "repetition_penalty": 1.05,
  "temperature": 0.7,
  "top_k": 20,
  "top_p": 0.8
}


[INFO|2025-02-10 23:10:27] logging.py:157 >> Gradient checkpointing enabled.

[INFO|2025-02-10 23:10:27] logging.py:157 >> Using torch SDPA for faster training and inference.

[INFO|2025-02-10 23:10:27] logging.py:157 >> Upcasting trainable params to float32.

[INFO|2025-02-10 23:10:27] logging.py:157 >> Fine-tuning method: LoRA

[INFO|2025-02-10 23:10:27] logging.py:157 >> Found linear modules: o_proj,q_proj,gate_proj,v_proj,up_proj,k_proj,down_proj

[INFO|2025-02-10 23:10:28] logging.py:157 >> trainable params: 20,185,088 || all params: 7,635,801,600 || trainable%: 0.2643

[INFO|2025-02-10 23:10:28] trainer.py:741 >> Using auto half precision backend

[INFO|2025-02-10 23:10:28] trainer.py:2369 >> ***** Running training *****

[INFO|2025-02-10 23:10:28] trainer.py:2370 >>   Num examples = 14,500

[INFO|2025-02-10 23:10:28] trainer.py:2371 >>   Num Epochs = 3

[INFO|2025-02-10 23:10:28] trainer.py:2372 >>   Instantaneous batch size per device = 2

[INFO|2025-02-10 23:10:28] trainer.py:2375 >>   Total train batch size (w. parallel, distributed & accumulation) = 128

[INFO|2025-02-10 23:10:28] trainer.py:2376 >>   Gradient Accumulation steps = 16

[INFO|2025-02-10 23:10:28] trainer.py:2377 >>   Total optimization steps = 339

[INFO|2025-02-10 23:10:28] trainer.py:2378 >>   Number of trainable parameters = 20,185,088

[INFO|2025-02-10 23:10:56] logging.py:157 >> {'loss': 0.2055, 'learning_rate': 9.9998e-05, 'epoch': 0.01, 'throughput': 7338.97}

[INFO|2025-02-10 23:11:22] logging.py:157 >> {'loss': 0.1902, 'learning_rate': 9.9991e-05, 'epoch': 0.02, 'throughput': 7493.38}

[INFO|2025-02-10 23:11:47] logging.py:157 >> {'loss': 0.1421, 'learning_rate': 9.9981e-05, 'epoch': 0.03, 'throughput': 7812.96}

[INFO|2025-02-10 23:12:15] logging.py:157 >> {'loss': 0.1102, 'learning_rate': 9.9966e-05, 'epoch': 0.04, 'throughput': 7668.61}

[INFO|2025-02-10 23:12:40] logging.py:157 >> {'loss': 0.0801, 'learning_rate': 9.9946e-05, 'epoch': 0.04, 'throughput': 7740.76}

[INFO|2025-02-10 23:13:07] logging.py:157 >> {'loss': 0.0574, 'learning_rate': 9.9923e-05, 'epoch': 0.05, 'throughput': 7743.47}

[INFO|2025-02-10 23:13:31] logging.py:157 >> {'loss': 0.0401, 'learning_rate': 9.9895e-05, 'epoch': 0.06, 'throughput': 7830.06}

[INFO|2025-02-10 23:14:00] logging.py:157 >> {'loss': 0.0295, 'learning_rate': 9.9863e-05, 'epoch': 0.07, 'throughput': 7781.82}

[INFO|2025-02-10 23:14:23] logging.py:157 >> {'loss': 0.0262, 'learning_rate': 9.9826e-05, 'epoch': 0.08, 'throughput': 7819.12}

[INFO|2025-02-10 23:14:51] logging.py:157 >> {'loss': 0.0263, 'learning_rate': 9.9785e-05, 'epoch': 0.09, 'throughput': 7807.27}

[INFO|2025-02-10 23:15:15] logging.py:157 >> {'loss': 0.0220, 'learning_rate': 9.9740e-05, 'epoch': 0.10, 'throughput': 7850.91}

[INFO|2025-02-10 23:15:39] logging.py:157 >> {'loss': 0.0202, 'learning_rate': 9.9691e-05, 'epoch': 0.11, 'throughput': 7850.63}

[INFO|2025-02-10 23:16:08] logging.py:157 >> {'loss': 0.0202, 'learning_rate': 9.9638e-05, 'epoch': 0.11, 'throughput': 7823.47}

[INFO|2025-02-10 23:16:38] logging.py:157 >> {'loss': 0.0178, 'learning_rate': 9.9580e-05, 'epoch': 0.12, 'throughput': 7772.01}

[INFO|2025-02-10 23:17:02] logging.py:157 >> {'loss': 0.0158, 'learning_rate': 9.9518e-05, 'epoch': 0.13, 'throughput': 7802.71}

[INFO|2025-02-10 23:17:29] logging.py:157 >> {'loss': 0.0159, 'learning_rate': 9.9451e-05, 'epoch': 0.14, 'throughput': 7786.47}

[INFO|2025-02-10 23:17:56] logging.py:157 >> {'loss': 0.0147, 'learning_rate': 9.9381e-05, 'epoch': 0.15, 'throughput': 7807.80}

[INFO|2025-02-10 23:18:22] logging.py:157 >> {'loss': 0.0138, 'learning_rate': 9.9306e-05, 'epoch': 0.16, 'throughput': 7814.87}

[INFO|2025-02-10 23:18:46] logging.py:157 >> {'loss': 0.0128, 'learning_rate': 9.9227e-05, 'epoch': 0.17, 'throughput': 7835.96}

[INFO|2025-02-10 23:19:13] logging.py:157 >> {'loss': 0.0124, 'learning_rate': 9.9144e-05, 'epoch': 0.18, 'throughput': 7833.94}

[INFO|2025-02-10 23:19:39] logging.py:157 >> {'loss': 0.0116, 'learning_rate': 9.9056e-05, 'epoch': 0.19, 'throughput': 7845.67}

[INFO|2025-02-10 23:20:05] logging.py:157 >> {'loss': 0.0109, 'learning_rate': 9.8964e-05, 'epoch': 0.19, 'throughput': 7844.79}

[INFO|2025-02-10 23:20:32] logging.py:157 >> {'loss': 0.0106, 'learning_rate': 9.8869e-05, 'epoch': 0.20, 'throughput': 7843.06}

[INFO|2025-02-10 23:20:59] logging.py:157 >> {'loss': 0.0099, 'learning_rate': 9.8768e-05, 'epoch': 0.21, 'throughput': 7837.07}

[INFO|2025-02-10 23:21:24] logging.py:157 >> {'loss': 0.0109, 'learning_rate': 9.8664e-05, 'epoch': 0.22, 'throughput': 7848.70}

[INFO|2025-02-10 23:21:48] logging.py:157 >> {'loss': 0.0084, 'learning_rate': 9.8556e-05, 'epoch': 0.23, 'throughput': 7866.16}

[INFO|2025-02-10 23:22:15] logging.py:157 >> {'loss': 0.0081, 'learning_rate': 9.8443e-05, 'epoch': 0.24, 'throughput': 7874.35}

[INFO|2025-02-10 23:22:41] logging.py:157 >> {'loss': 0.0078, 'learning_rate': 9.8326e-05, 'epoch': 0.25, 'throughput': 7877.49}

[INFO|2025-02-10 23:23:09] logging.py:157 >> {'loss': 0.0081, 'learning_rate': 9.8205e-05, 'epoch': 0.26, 'throughput': 7855.15}

[INFO|2025-02-10 23:23:36] logging.py:157 >> {'loss': 0.0087, 'learning_rate': 9.8080e-05, 'epoch': 0.26, 'throughput': 7862.08}

[INFO|2025-02-10 23:24:02] logging.py:157 >> {'loss': 0.0079, 'learning_rate': 9.7951e-05, 'epoch': 0.27, 'throughput': 7859.19}

[INFO|2025-02-10 23:24:28] logging.py:157 >> {'loss': 0.0086, 'learning_rate': 9.7817e-05, 'epoch': 0.28, 'throughput': 7858.66}

[INFO|2025-02-10 23:24:55] logging.py:157 >> {'loss': 0.0079, 'learning_rate': 9.7680e-05, 'epoch': 0.29, 'throughput': 7854.06}

[INFO|2025-02-10 23:25:21] logging.py:157 >> {'loss': 0.0069, 'learning_rate': 9.7538e-05, 'epoch': 0.30, 'throughput': 7844.67}

[INFO|2025-02-10 23:25:48] logging.py:157 >> {'loss': 0.0064, 'learning_rate': 9.7393e-05, 'epoch': 0.31, 'throughput': 7832.92}

[INFO|2025-02-10 23:26:13] logging.py:157 >> {'loss': 0.0060, 'learning_rate': 9.7243e-05, 'epoch': 0.32, 'throughput': 7834.75}

[INFO|2025-02-10 23:26:41] logging.py:157 >> {'loss': 0.0067, 'learning_rate': 9.7089e-05, 'epoch': 0.33, 'throughput': 7845.98}

[INFO|2025-02-10 23:27:07] logging.py:157 >> {'loss': 0.0063, 'learning_rate': 9.6932e-05, 'epoch': 0.34, 'throughput': 7852.67}

[INFO|2025-02-10 23:27:36] logging.py:157 >> {'loss': 0.0058, 'learning_rate': 9.6770e-05, 'epoch': 0.34, 'throughput': 7846.29}

[INFO|2025-02-10 23:28:05] logging.py:157 >> {'loss': 0.0061, 'learning_rate': 9.6604e-05, 'epoch': 0.35, 'throughput': 7836.02}

[INFO|2025-02-10 23:28:29] logging.py:157 >> {'loss': 0.0055, 'learning_rate': 9.6434e-05, 'epoch': 0.36, 'throughput': 7840.93}

[INFO|2025-02-10 23:28:56] logging.py:157 >> {'loss': 0.0050, 'learning_rate': 9.6260e-05, 'epoch': 0.37, 'throughput': 7842.20}

[INFO|2025-02-10 23:29:22] logging.py:157 >> {'loss': 0.0060, 'learning_rate': 9.6082e-05, 'epoch': 0.38, 'throughput': 7833.15}

[INFO|2025-02-10 23:29:48] logging.py:157 >> {'loss': 0.0048, 'learning_rate': 9.5901e-05, 'epoch': 0.39, 'throughput': 7835.19}

[INFO|2025-02-10 23:30:14] logging.py:157 >> {'loss': 0.0047, 'learning_rate': 9.5715e-05, 'epoch': 0.40, 'throughput': 7841.66}

[INFO|2025-02-10 23:30:41] logging.py:157 >> {'loss': 0.0053, 'learning_rate': 9.5525e-05, 'epoch': 0.41, 'throughput': 7848.48}

[INFO|2025-02-10 23:31:05] logging.py:157 >> {'loss': 0.0044, 'learning_rate': 9.5332e-05, 'epoch': 0.41, 'throughput': 7851.65}

[INFO|2025-02-10 23:31:31] logging.py:157 >> {'loss': 0.0043, 'learning_rate': 9.5134e-05, 'epoch': 0.42, 'throughput': 7850.33}

[INFO|2025-02-10 23:31:57] logging.py:157 >> {'loss': 0.0041, 'learning_rate': 9.4933e-05, 'epoch': 0.43, 'throughput': 7853.40}

[INFO|2025-02-10 23:32:26] logging.py:157 >> {'loss': 0.0044, 'learning_rate': 9.4728e-05, 'epoch': 0.44, 'throughput': 7850.51}

[INFO|2025-02-10 23:32:52] logging.py:157 >> {'loss': 0.0040, 'learning_rate': 9.4519e-05, 'epoch': 0.45, 'throughput': 7847.60}

[INFO|2025-02-10 23:33:18] logging.py:157 >> {'loss': 0.0045, 'learning_rate': 9.4306e-05, 'epoch': 0.46, 'throughput': 7853.35}

[INFO|2025-02-10 23:33:43] logging.py:157 >> {'loss': 0.0028, 'learning_rate': 9.4089e-05, 'epoch': 0.47, 'throughput': 7845.79}

[INFO|2025-02-10 23:34:08] logging.py:157 >> {'loss': 0.0034, 'learning_rate': 9.3869e-05, 'epoch': 0.48, 'throughput': 7853.45}

[INFO|2025-02-10 23:34:36] logging.py:157 >> {'loss': 0.0042, 'learning_rate': 9.3645e-05, 'epoch': 0.49, 'throughput': 7852.19}

[INFO|2025-02-10 23:34:59] logging.py:157 >> {'loss': 0.0028, 'learning_rate': 9.3417e-05, 'epoch': 0.49, 'throughput': 7868.08}

[INFO|2025-02-10 23:35:24] logging.py:157 >> {'loss': 0.0035, 'learning_rate': 9.3185e-05, 'epoch': 0.50, 'throughput': 7875.29}

[INFO|2025-02-10 23:35:48] logging.py:157 >> {'loss': 0.0029, 'learning_rate': 9.2950e-05, 'epoch': 0.51, 'throughput': 7884.45}

[INFO|2025-02-10 23:36:13] logging.py:157 >> {'loss': 0.0028, 'learning_rate': 9.2710e-05, 'epoch': 0.52, 'throughput': 7890.10}

[INFO|2025-02-10 23:36:37] logging.py:157 >> {'loss': 0.0027, 'learning_rate': 9.2468e-05, 'epoch': 0.53, 'throughput': 7895.72}

[INFO|2025-02-10 23:37:02] logging.py:157 >> {'loss': 0.0022, 'learning_rate': 9.2221e-05, 'epoch': 0.54, 'throughput': 7896.89}

[INFO|2025-02-10 23:37:34] logging.py:157 >> {'loss': 0.0027, 'learning_rate': 9.1971e-05, 'epoch': 0.55, 'throughput': 7889.78}

[INFO|2025-02-10 23:38:02] logging.py:157 >> {'loss': 0.0025, 'learning_rate': 9.1718e-05, 'epoch': 0.56, 'throughput': 7887.26}

[INFO|2025-02-10 23:38:27] logging.py:157 >> {'loss': 0.0025, 'learning_rate': 9.1461e-05, 'epoch': 0.56, 'throughput': 7894.29}

[INFO|2025-02-10 23:38:57] logging.py:157 >> {'loss': 0.0022, 'learning_rate': 9.1200e-05, 'epoch': 0.57, 'throughput': 7878.69}

[INFO|2025-02-10 23:39:23] logging.py:157 >> {'loss': 0.0023, 'learning_rate': 9.0935e-05, 'epoch': 0.58, 'throughput': 7879.18}

[INFO|2025-02-10 23:39:50] logging.py:157 >> {'loss': 0.0024, 'learning_rate': 9.0668e-05, 'epoch': 0.59, 'throughput': 7877.08}

[INFO|2025-02-10 23:40:18] logging.py:157 >> {'loss': 0.0015, 'learning_rate': 9.0396e-05, 'epoch': 0.60, 'throughput': 7867.14}

[INFO|2025-02-10 23:40:48] logging.py:157 >> {'loss': 0.0033, 'learning_rate': 9.0122e-05, 'epoch': 0.61, 'throughput': 7856.50}

[INFO|2025-02-10 23:41:16] logging.py:157 >> {'loss': 0.0021, 'learning_rate': 8.9843e-05, 'epoch': 0.62, 'throughput': 7860.92}

[INFO|2025-02-10 23:41:41] logging.py:157 >> {'loss': 0.0013, 'learning_rate': 8.9562e-05, 'epoch': 0.63, 'throughput': 7863.29}

[INFO|2025-02-10 23:42:04] logging.py:157 >> {'loss': 0.0016, 'learning_rate': 8.9277e-05, 'epoch': 0.64, 'throughput': 7866.76}

[INFO|2025-02-10 23:42:29] logging.py:157 >> {'loss': 0.0019, 'learning_rate': 8.8988e-05, 'epoch': 0.64, 'throughput': 7870.01}

[INFO|2025-02-10 23:42:56] logging.py:157 >> {'loss': 0.0017, 'learning_rate': 8.8696e-05, 'epoch': 0.65, 'throughput': 7870.22}

[INFO|2025-02-10 23:43:23] logging.py:157 >> {'loss': 0.0015, 'learning_rate': 8.8401e-05, 'epoch': 0.66, 'throughput': 7872.22}

[INFO|2025-02-10 23:43:51] logging.py:157 >> {'loss': 0.0017, 'learning_rate': 8.8103e-05, 'epoch': 0.67, 'throughput': 7866.00}

[INFO|2025-02-10 23:44:17] logging.py:157 >> {'loss': 0.0010, 'learning_rate': 8.7801e-05, 'epoch': 0.68, 'throughput': 7866.60}

[INFO|2025-02-10 23:44:42] logging.py:157 >> {'loss': 0.0018, 'learning_rate': 8.7496e-05, 'epoch': 0.69, 'throughput': 7869.35}

[INFO|2025-02-10 23:45:05] logging.py:157 >> {'loss': 0.0010, 'learning_rate': 8.7188e-05, 'epoch': 0.70, 'throughput': 7871.98}

[INFO|2025-02-10 23:45:31] logging.py:157 >> {'loss': 0.0014, 'learning_rate': 8.6877e-05, 'epoch': 0.71, 'throughput': 7877.09}

[INFO|2025-02-10 23:45:56] logging.py:157 >> {'loss': 0.0010, 'learning_rate': 8.6562e-05, 'epoch': 0.71, 'throughput': 7874.92}

[INFO|2025-02-10 23:46:27] logging.py:157 >> {'loss': 0.0013, 'learning_rate': 8.6245e-05, 'epoch': 0.72, 'throughput': 7863.86}

[INFO|2025-02-10 23:46:50] logging.py:157 >> {'loss': 0.0012, 'learning_rate': 8.5924e-05, 'epoch': 0.73, 'throughput': 7864.79}

[INFO|2025-02-10 23:47:14] logging.py:157 >> {'loss': 0.0013, 'learning_rate': 8.5600e-05, 'epoch': 0.74, 'throughput': 7871.03}

[INFO|2025-02-10 23:47:40] logging.py:157 >> {'loss': 0.0011, 'learning_rate': 8.5273e-05, 'epoch': 0.75, 'throughput': 7869.86}

[INFO|2025-02-10 23:48:09] logging.py:157 >> {'loss': 0.0037, 'learning_rate': 8.4943e-05, 'epoch': 0.76, 'throughput': 7865.40}

[INFO|2025-02-10 23:48:36] logging.py:157 >> {'loss': 0.0007, 'learning_rate': 8.4611e-05, 'epoch': 0.77, 'throughput': 7860.70}

[INFO|2025-02-10 23:49:00] logging.py:157 >> {'loss': 0.0008, 'learning_rate': 8.4275e-05, 'epoch': 0.78, 'throughput': 7865.68}

[INFO|2025-02-10 23:49:28] logging.py:157 >> {'loss': 0.0006, 'learning_rate': 8.3936e-05, 'epoch': 0.79, 'throughput': 7857.75}

[INFO|2025-02-10 23:49:53] logging.py:157 >> {'loss': 0.0010, 'learning_rate': 8.3594e-05, 'epoch': 0.79, 'throughput': 7861.64}

[INFO|2025-02-10 23:50:18] logging.py:157 >> {'loss': 0.0010, 'learning_rate': 8.3249e-05, 'epoch': 0.80, 'throughput': 7864.33}

[INFO|2025-02-10 23:50:43] logging.py:157 >> {'loss': 0.0008, 'learning_rate': 8.2902e-05, 'epoch': 0.81, 'throughput': 7862.08}

[INFO|2025-02-10 23:51:08] logging.py:157 >> {'loss': 0.0014, 'learning_rate': 8.2552e-05, 'epoch': 0.82, 'throughput': 7866.62}

[INFO|2025-02-10 23:51:33] logging.py:157 >> {'loss': 0.0010, 'learning_rate': 8.2199e-05, 'epoch': 0.83, 'throughput': 7869.38}

[INFO|2025-02-10 23:51:59] logging.py:157 >> {'loss': 0.0007, 'learning_rate': 8.1843e-05, 'epoch': 0.84, 'throughput': 7872.25}

[INFO|2025-02-10 23:52:23] logging.py:157 >> {'loss': 0.0008, 'learning_rate': 8.1484e-05, 'epoch': 0.85, 'throughput': 7875.39}

[INFO|2025-02-10 23:52:51] logging.py:157 >> {'loss': 0.0011, 'learning_rate': 8.1123e-05, 'epoch': 0.86, 'throughput': 7868.64}

[INFO|2025-02-10 23:53:21] logging.py:157 >> {'loss': 0.0009, 'learning_rate': 8.0759e-05, 'epoch': 0.86, 'throughput': 7865.85}

[INFO|2025-02-10 23:53:44] logging.py:157 >> {'loss': 0.0006, 'learning_rate': 8.0392e-05, 'epoch': 0.87, 'throughput': 7870.61}

[INFO|2025-02-10 23:54:08] logging.py:157 >> {'loss': 0.0009, 'learning_rate': 8.0023e-05, 'epoch': 0.88, 'throughput': 7876.56}

[INFO|2025-02-10 23:54:08] trainer.py:3910 >> Saving model checkpoint to saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-100

[INFO|2025-02-10 23:54:08] configuration_utils.py:694 >> loading configuration file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/config.json

[INFO|2025-02-10 23:54:08] configuration_utils.py:768 >> Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.48.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|2025-02-10 23:54:08] tokenization_utils_base.py:2491 >> tokenizer config file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-100/tokenizer_config.json

[INFO|2025-02-10 23:54:08] tokenization_utils_base.py:2500 >> Special tokens file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-100/special_tokens_map.json

[INFO|2025-02-10 23:54:33] logging.py:157 >> {'loss': 0.0006, 'learning_rate': 7.9651e-05, 'epoch': 0.89, 'throughput': 7876.49}

[INFO|2025-02-10 23:55:01] logging.py:157 >> {'loss': 0.0008, 'learning_rate': 7.9277e-05, 'epoch': 0.90, 'throughput': 7875.34}

[INFO|2025-02-10 23:55:25] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 7.8900e-05, 'epoch': 0.91, 'throughput': 7878.05}

[INFO|2025-02-10 23:55:48] logging.py:157 >> {'loss': 0.0006, 'learning_rate': 7.8520e-05, 'epoch': 0.92, 'throughput': 7881.27}

[INFO|2025-02-10 23:56:16] logging.py:157 >> {'loss': 0.0008, 'learning_rate': 7.8139e-05, 'epoch': 0.93, 'throughput': 7877.02}

[INFO|2025-02-10 23:56:44] logging.py:157 >> {'loss': 0.0007, 'learning_rate': 7.7754e-05, 'epoch': 0.94, 'throughput': 7877.23}

[INFO|2025-02-10 23:57:08] logging.py:157 >> {'loss': 0.0006, 'learning_rate': 7.7368e-05, 'epoch': 0.94, 'throughput': 7878.63}

[INFO|2025-02-10 23:57:34] logging.py:157 >> {'loss': 0.0012, 'learning_rate': 7.6979e-05, 'epoch': 0.95, 'throughput': 7883.06}

[INFO|2025-02-10 23:57:59] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 7.6588e-05, 'epoch': 0.96, 'throughput': 7881.65}

[INFO|2025-02-10 23:58:23] logging.py:157 >> {'loss': 0.0008, 'learning_rate': 7.6194e-05, 'epoch': 0.97, 'throughput': 7883.60}

[INFO|2025-02-10 23:58:51] logging.py:157 >> {'loss': 0.0007, 'learning_rate': 7.5798e-05, 'epoch': 0.98, 'throughput': 7884.35}

[INFO|2025-02-10 23:59:14] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 7.5400e-05, 'epoch': 0.99, 'throughput': 7889.50}

[INFO|2025-02-10 23:59:42] logging.py:157 >> {'loss': 0.0007, 'learning_rate': 7.5000e-05, 'epoch': 1.00, 'throughput': 7889.54}

[INFO|2025-02-11 00:00:15] logging.py:157 >> {'loss': 0.0014, 'learning_rate': 7.4598e-05, 'epoch': 1.01, 'throughput': 7889.22}

[INFO|2025-02-11 00:00:42] logging.py:157 >> {'loss': 0.0007, 'learning_rate': 7.4193e-05, 'epoch': 1.02, 'throughput': 7886.62}

[INFO|2025-02-11 00:01:08] logging.py:157 >> {'loss': 0.0006, 'learning_rate': 7.3787e-05, 'epoch': 1.03, 'throughput': 7886.10}

[INFO|2025-02-11 00:01:33] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 7.3378e-05, 'epoch': 1.04, 'throughput': 7886.46}

[INFO|2025-02-11 00:01:56] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 7.2967e-05, 'epoch': 1.04, 'throughput': 7889.90}

[INFO|2025-02-11 00:02:21] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 7.2555e-05, 'epoch': 1.05, 'throughput': 7889.45}

[INFO|2025-02-11 00:02:50] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 7.2140e-05, 'epoch': 1.06, 'throughput': 7886.78}

[INFO|2025-02-11 00:03:14] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 7.1724e-05, 'epoch': 1.07, 'throughput': 7893.28}

[INFO|2025-02-11 00:03:40] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 7.1306e-05, 'epoch': 1.08, 'throughput': 7896.08}

[INFO|2025-02-11 00:04:05] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 7.0886e-05, 'epoch': 1.09, 'throughput': 7901.06}

[INFO|2025-02-11 00:04:28] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 7.0464e-05, 'epoch': 1.10, 'throughput': 7904.61}

[INFO|2025-02-11 00:04:54] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 7.0040e-05, 'epoch': 1.11, 'throughput': 7904.82}

[INFO|2025-02-11 00:05:22] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 6.9615e-05, 'epoch': 1.11, 'throughput': 7905.63}

[INFO|2025-02-11 00:05:49] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 6.9188e-05, 'epoch': 1.12, 'throughput': 7904.23}

[INFO|2025-02-11 00:06:16] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 6.8759e-05, 'epoch': 1.13, 'throughput': 7903.09}

[INFO|2025-02-11 00:06:46] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 6.8329e-05, 'epoch': 1.14, 'throughput': 7898.37}

[INFO|2025-02-11 00:07:13] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 6.7897e-05, 'epoch': 1.15, 'throughput': 7897.68}

[INFO|2025-02-11 00:07:41] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 6.7463e-05, 'epoch': 1.16, 'throughput': 7897.31}

[INFO|2025-02-11 00:08:08] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 6.7028e-05, 'epoch': 1.17, 'throughput': 7895.24}

[INFO|2025-02-11 00:08:34] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 6.6592e-05, 'epoch': 1.18, 'throughput': 7895.50}

[INFO|2025-02-11 00:09:01] logging.py:157 >> {'loss': 0.0009, 'learning_rate': 6.6154e-05, 'epoch': 1.19, 'throughput': 7896.09}

[INFO|2025-02-11 00:09:29] logging.py:157 >> {'loss': 0.0014, 'learning_rate': 6.5715e-05, 'epoch': 1.19, 'throughput': 7892.01}

[INFO|2025-02-11 00:09:54] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 6.5274e-05, 'epoch': 1.20, 'throughput': 7891.89}

[INFO|2025-02-11 00:10:22] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 6.4833e-05, 'epoch': 1.21, 'throughput': 7888.92}

[INFO|2025-02-11 00:10:49] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 6.4389e-05, 'epoch': 1.22, 'throughput': 7888.52}

[INFO|2025-02-11 00:11:14] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 6.3945e-05, 'epoch': 1.23, 'throughput': 7890.66}

[INFO|2025-02-11 00:11:36] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 6.3500e-05, 'epoch': 1.24, 'throughput': 7892.64}

[INFO|2025-02-11 00:12:00] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 6.3053e-05, 'epoch': 1.25, 'throughput': 7894.92}

[INFO|2025-02-11 00:12:29] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 6.2605e-05, 'epoch': 1.26, 'throughput': 7892.49}

[INFO|2025-02-11 00:12:54] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 6.2156e-05, 'epoch': 1.26, 'throughput': 7894.05}

[INFO|2025-02-11 00:13:20] logging.py:157 >> {'loss': 0.0006, 'learning_rate': 6.1706e-05, 'epoch': 1.27, 'throughput': 7893.70}

[INFO|2025-02-11 00:13:42] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 6.1255e-05, 'epoch': 1.28, 'throughput': 7896.62}

[INFO|2025-02-11 00:14:09] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 6.0803e-05, 'epoch': 1.29, 'throughput': 7895.09}

[INFO|2025-02-11 00:14:34] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 6.0350e-05, 'epoch': 1.30, 'throughput': 7897.58}

[INFO|2025-02-11 00:15:00] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.9896e-05, 'epoch': 1.31, 'throughput': 7896.39}

[INFO|2025-02-11 00:15:25] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 5.9442e-05, 'epoch': 1.32, 'throughput': 7897.88}

[INFO|2025-02-11 00:15:48] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 5.8986e-05, 'epoch': 1.33, 'throughput': 7902.00}

[INFO|2025-02-11 00:16:14] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 5.8530e-05, 'epoch': 1.34, 'throughput': 7900.57}

[INFO|2025-02-11 00:16:41] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.8073e-05, 'epoch': 1.34, 'throughput': 7901.36}

[INFO|2025-02-11 00:17:06] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.7616e-05, 'epoch': 1.35, 'throughput': 7898.31}

[INFO|2025-02-11 00:17:30] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.7157e-05, 'epoch': 1.36, 'throughput': 7901.98}

[INFO|2025-02-11 00:17:58] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 5.6699e-05, 'epoch': 1.37, 'throughput': 7898.91}

[INFO|2025-02-11 00:18:22] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 5.6239e-05, 'epoch': 1.38, 'throughput': 7901.45}

[INFO|2025-02-11 00:18:47] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 5.5779e-05, 'epoch': 1.39, 'throughput': 7900.71}

[INFO|2025-02-11 00:19:14] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 5.5319e-05, 'epoch': 1.40, 'throughput': 7898.98}

[INFO|2025-02-11 00:19:41] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.4858e-05, 'epoch': 1.41, 'throughput': 7897.63}

[INFO|2025-02-11 00:20:08] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.4396e-05, 'epoch': 1.41, 'throughput': 7897.71}

[INFO|2025-02-11 00:20:33] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.3935e-05, 'epoch': 1.42, 'throughput': 7897.58}

[INFO|2025-02-11 00:20:57] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 5.3472e-05, 'epoch': 1.43, 'throughput': 7900.98}

[INFO|2025-02-11 00:21:26] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.3010e-05, 'epoch': 1.44, 'throughput': 7899.49}

[INFO|2025-02-11 00:21:50] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 5.2547e-05, 'epoch': 1.45, 'throughput': 7899.53}

[INFO|2025-02-11 00:22:13] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.2085e-05, 'epoch': 1.46, 'throughput': 7901.89}

[INFO|2025-02-11 00:22:39] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 5.1621e-05, 'epoch': 1.47, 'throughput': 7902.03}

[INFO|2025-02-11 00:23:05] logging.py:157 >> {'loss': 0.0016, 'learning_rate': 5.1158e-05, 'epoch': 1.48, 'throughput': 7901.05}

[INFO|2025-02-11 00:23:32] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 5.0695e-05, 'epoch': 1.49, 'throughput': 7901.53}

[INFO|2025-02-11 00:23:57] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.0232e-05, 'epoch': 1.49, 'throughput': 7901.79}

[INFO|2025-02-11 00:24:22] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.9768e-05, 'epoch': 1.50, 'throughput': 7901.03}

[INFO|2025-02-11 00:24:45] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 4.9305e-05, 'epoch': 1.51, 'throughput': 7903.96}

[INFO|2025-02-11 00:25:11] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.8842e-05, 'epoch': 1.52, 'throughput': 7905.12}

[INFO|2025-02-11 00:25:33] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.8379e-05, 'epoch': 1.53, 'throughput': 7909.17}

[INFO|2025-02-11 00:25:58] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.7915e-05, 'epoch': 1.54, 'throughput': 7909.02}

[INFO|2025-02-11 00:26:29] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 4.7453e-05, 'epoch': 1.55, 'throughput': 7905.59}

[INFO|2025-02-11 00:26:56] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.6990e-05, 'epoch': 1.56, 'throughput': 7905.47}

[INFO|2025-02-11 00:27:22] logging.py:157 >> {'loss': 0.0027, 'learning_rate': 4.6528e-05, 'epoch': 1.56, 'throughput': 7904.70}

[INFO|2025-02-11 00:27:47] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 4.6065e-05, 'epoch': 1.57, 'throughput': 7905.80}

[INFO|2025-02-11 00:28:12] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 4.5604e-05, 'epoch': 1.58, 'throughput': 7907.65}

[INFO|2025-02-11 00:28:39] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 4.5142e-05, 'epoch': 1.59, 'throughput': 7907.43}

[INFO|2025-02-11 00:29:09] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.4681e-05, 'epoch': 1.60, 'throughput': 7902.86}

[INFO|2025-02-11 00:29:35] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 4.4221e-05, 'epoch': 1.61, 'throughput': 7903.10}

[INFO|2025-02-11 00:30:02] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 4.3761e-05, 'epoch': 1.62, 'throughput': 7904.08}

[INFO|2025-02-11 00:30:28] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 4.3301e-05, 'epoch': 1.63, 'throughput': 7905.25}

[INFO|2025-02-11 00:30:50] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 4.2843e-05, 'epoch': 1.64, 'throughput': 7908.78}

[INFO|2025-02-11 00:31:13] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.2384e-05, 'epoch': 1.64, 'throughput': 7912.45}

[INFO|2025-02-11 00:31:40] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 4.1927e-05, 'epoch': 1.65, 'throughput': 7912.95}

[INFO|2025-02-11 00:32:06] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 4.1470e-05, 'epoch': 1.66, 'throughput': 7913.14}

[INFO|2025-02-11 00:32:33] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.1014e-05, 'epoch': 1.67, 'throughput': 7914.08}

[INFO|2025-02-11 00:33:02] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 4.0558e-05, 'epoch': 1.68, 'throughput': 7911.16}

[INFO|2025-02-11 00:33:29] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 4.0104e-05, 'epoch': 1.69, 'throughput': 7911.74}

[INFO|2025-02-11 00:33:56] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 3.9650e-05, 'epoch': 1.70, 'throughput': 7911.05}

[INFO|2025-02-11 00:34:21] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.9197e-05, 'epoch': 1.71, 'throughput': 7910.45}

[INFO|2025-02-11 00:34:46] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 3.8745e-05, 'epoch': 1.71, 'throughput': 7911.47}

[INFO|2025-02-11 00:35:10] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 3.8294e-05, 'epoch': 1.72, 'throughput': 7913.07}

[INFO|2025-02-11 00:35:36] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 3.7844e-05, 'epoch': 1.73, 'throughput': 7912.61}

[INFO|2025-02-11 00:36:01] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 3.7395e-05, 'epoch': 1.74, 'throughput': 7914.15}

[INFO|2025-02-11 00:36:33] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 3.6947e-05, 'epoch': 1.75, 'throughput': 7908.90}

[INFO|2025-02-11 00:36:56] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.6500e-05, 'epoch': 1.76, 'throughput': 7909.88}

[INFO|2025-02-11 00:37:27] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 3.6055e-05, 'epoch': 1.77, 'throughput': 7904.23}

[INFO|2025-02-11 00:37:27] trainer.py:3910 >> Saving model checkpoint to saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-200

[INFO|2025-02-11 00:37:28] configuration_utils.py:694 >> loading configuration file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/config.json

[INFO|2025-02-11 00:37:28] configuration_utils.py:768 >> Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.48.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|2025-02-11 00:37:28] tokenization_utils_base.py:2491 >> tokenizer config file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-200/tokenizer_config.json

[INFO|2025-02-11 00:37:28] tokenization_utils_base.py:2500 >> Special tokens file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-200/special_tokens_map.json

[INFO|2025-02-11 00:37:58] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 3.5611e-05, 'epoch': 1.78, 'throughput': 7898.91}

[INFO|2025-02-11 00:38:25] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 3.5167e-05, 'epoch': 1.79, 'throughput': 7897.44}

[INFO|2025-02-11 00:38:53] logging.py:157 >> {'loss': 0.0013, 'learning_rate': 3.4726e-05, 'epoch': 1.79, 'throughput': 7894.63}

[INFO|2025-02-11 00:39:20] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.4285e-05, 'epoch': 1.80, 'throughput': 7893.32}

[INFO|2025-02-11 00:39:45] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 3.3846e-05, 'epoch': 1.81, 'throughput': 7893.48}

[INFO|2025-02-11 00:40:10] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 3.3408e-05, 'epoch': 1.82, 'throughput': 7897.32}

[INFO|2025-02-11 00:40:34] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.2972e-05, 'epoch': 1.83, 'throughput': 7898.07}

[INFO|2025-02-11 00:41:01] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 3.2537e-05, 'epoch': 1.84, 'throughput': 7898.41}

[INFO|2025-02-11 00:41:27] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.2103e-05, 'epoch': 1.85, 'throughput': 7898.41}

[INFO|2025-02-11 00:41:56] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 3.1671e-05, 'epoch': 1.86, 'throughput': 7897.79}

[INFO|2025-02-11 00:42:22] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.1241e-05, 'epoch': 1.86, 'throughput': 7896.37}

[INFO|2025-02-11 00:42:45] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 3.0812e-05, 'epoch': 1.87, 'throughput': 7897.48}

[INFO|2025-02-11 00:43:13] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 3.0385e-05, 'epoch': 1.88, 'throughput': 7898.39}

[INFO|2025-02-11 00:43:39] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.9960e-05, 'epoch': 1.89, 'throughput': 7897.60}

[INFO|2025-02-11 00:44:03] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.9536e-05, 'epoch': 1.90, 'throughput': 7898.54}

[INFO|2025-02-11 00:44:28] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.9114e-05, 'epoch': 1.91, 'throughput': 7900.99}

[INFO|2025-02-11 00:44:55] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.8694e-05, 'epoch': 1.92, 'throughput': 7899.67}

[INFO|2025-02-11 00:45:23] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.8276e-05, 'epoch': 1.93, 'throughput': 7898.82}

[INFO|2025-02-11 00:45:50] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.7860e-05, 'epoch': 1.94, 'throughput': 7898.40}

[INFO|2025-02-11 00:46:19] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.7445e-05, 'epoch': 1.94, 'throughput': 7895.44}

[INFO|2025-02-11 00:46:46] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 2.7033e-05, 'epoch': 1.95, 'throughput': 7892.18}

[INFO|2025-02-11 00:47:13] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.6622e-05, 'epoch': 1.96, 'throughput': 7890.65}

[INFO|2025-02-11 00:47:36] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.6213e-05, 'epoch': 1.97, 'throughput': 7892.67}

[INFO|2025-02-11 00:48:03] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.5807e-05, 'epoch': 1.98, 'throughput': 7892.10}

[INFO|2025-02-11 00:48:31] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.5402e-05, 'epoch': 1.99, 'throughput': 7891.47}

[INFO|2025-02-11 00:48:57] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 2.5000e-05, 'epoch': 2.00, 'throughput': 7891.65}

[INFO|2025-02-11 00:49:32] logging.py:157 >> {'loss': 0.0011, 'learning_rate': 2.4600e-05, 'epoch': 2.01, 'throughput': 7890.98}

[INFO|2025-02-11 00:49:59] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.4202e-05, 'epoch': 2.02, 'throughput': 7888.59}

[INFO|2025-02-11 00:50:25] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.3806e-05, 'epoch': 2.03, 'throughput': 7889.79}

[INFO|2025-02-11 00:50:54] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 2.3412e-05, 'epoch': 2.04, 'throughput': 7887.38}

[INFO|2025-02-11 00:51:17] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.3021e-05, 'epoch': 2.04, 'throughput': 7889.45}

[INFO|2025-02-11 00:51:46] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.2632e-05, 'epoch': 2.05, 'throughput': 7887.07}

[INFO|2025-02-11 00:52:11] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.2246e-05, 'epoch': 2.06, 'throughput': 7889.59}

[INFO|2025-02-11 00:52:43] logging.py:157 >> {'loss': 0.0025, 'learning_rate': 2.1861e-05, 'epoch': 2.07, 'throughput': 7885.55}

[INFO|2025-02-11 00:53:13] logging.py:157 >> {'loss': 0.0013, 'learning_rate': 2.1480e-05, 'epoch': 2.08, 'throughput': 7881.47}

[INFO|2025-02-11 00:53:37] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.1100e-05, 'epoch': 2.09, 'throughput': 7884.07}

[INFO|2025-02-11 00:54:04] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.0723e-05, 'epoch': 2.10, 'throughput': 7884.69}

[INFO|2025-02-11 00:54:29] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.0349e-05, 'epoch': 2.11, 'throughput': 7886.69}

[INFO|2025-02-11 00:54:55] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.9977e-05, 'epoch': 2.11, 'throughput': 7886.90}

[INFO|2025-02-11 00:55:19] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.9608e-05, 'epoch': 2.12, 'throughput': 7888.14}

[INFO|2025-02-11 00:55:44] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.9241e-05, 'epoch': 2.13, 'throughput': 7889.13}

[INFO|2025-02-11 00:56:09] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.8877e-05, 'epoch': 2.14, 'throughput': 7888.30}

[INFO|2025-02-11 00:56:35] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.8516e-05, 'epoch': 2.15, 'throughput': 7887.77}

[INFO|2025-02-11 00:56:59] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.8157e-05, 'epoch': 2.16, 'throughput': 7889.80}

[INFO|2025-02-11 00:57:28] logging.py:157 >> {'loss': 0.0006, 'learning_rate': 1.7801e-05, 'epoch': 2.17, 'throughput': 7887.66}

[INFO|2025-02-11 00:57:55] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.7448e-05, 'epoch': 2.18, 'throughput': 7886.91}

[INFO|2025-02-11 00:58:18] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.7098e-05, 'epoch': 2.19, 'throughput': 7889.08}

[INFO|2025-02-11 00:58:42] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.6751e-05, 'epoch': 2.19, 'throughput': 7890.61}

[INFO|2025-02-11 00:59:08] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.6406e-05, 'epoch': 2.20, 'throughput': 7889.22}

[INFO|2025-02-11 00:59:33] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.6064e-05, 'epoch': 2.21, 'throughput': 7890.88}

[INFO|2025-02-11 00:59:58] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.5725e-05, 'epoch': 2.22, 'throughput': 7890.72}

[INFO|2025-02-11 01:00:24] logging.py:157 >> {'loss': 0.0012, 'learning_rate': 1.5389e-05, 'epoch': 2.23, 'throughput': 7890.97}

[INFO|2025-02-11 01:00:47] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.5057e-05, 'epoch': 2.24, 'throughput': 7892.37}

[INFO|2025-02-11 01:01:11] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 1.4727e-05, 'epoch': 2.25, 'throughput': 7894.16}

[INFO|2025-02-11 01:01:34] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.4400e-05, 'epoch': 2.26, 'throughput': 7894.39}

[INFO|2025-02-11 01:02:01] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.4076e-05, 'epoch': 2.26, 'throughput': 7895.04}

[INFO|2025-02-11 01:02:29] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.3755e-05, 'epoch': 2.27, 'throughput': 7894.62}

[INFO|2025-02-11 01:02:57] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.3438e-05, 'epoch': 2.28, 'throughput': 7893.02}

[INFO|2025-02-11 01:03:28] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.3123e-05, 'epoch': 2.29, 'throughput': 7891.68}

[INFO|2025-02-11 01:03:52] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.2812e-05, 'epoch': 2.30, 'throughput': 7892.09}

[INFO|2025-02-11 01:04:18] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.2504e-05, 'epoch': 2.31, 'throughput': 7894.29}

[INFO|2025-02-11 01:04:46] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.2199e-05, 'epoch': 2.32, 'throughput': 7893.79}

[INFO|2025-02-11 01:05:13] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.1897e-05, 'epoch': 2.33, 'throughput': 7894.54}

[INFO|2025-02-11 01:05:37] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.1599e-05, 'epoch': 2.34, 'throughput': 7894.72}

[INFO|2025-02-11 01:06:03] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.1304e-05, 'epoch': 2.34, 'throughput': 7893.77}

[INFO|2025-02-11 01:06:29] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.1012e-05, 'epoch': 2.35, 'throughput': 7895.00}

[INFO|2025-02-11 01:06:55] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.0723e-05, 'epoch': 2.36, 'throughput': 7895.05}

[INFO|2025-02-11 01:07:22] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.0438e-05, 'epoch': 2.37, 'throughput': 7895.15}

[INFO|2025-02-11 01:07:48] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.0157e-05, 'epoch': 2.38, 'throughput': 7895.90}

[INFO|2025-02-11 01:08:14] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 9.8785e-06, 'epoch': 2.39, 'throughput': 7897.00}

[INFO|2025-02-11 01:08:43] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 9.6037e-06, 'epoch': 2.40, 'throughput': 7893.16}

[INFO|2025-02-11 01:09:10] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 9.3324e-06, 'epoch': 2.41, 'throughput': 7891.24}

[INFO|2025-02-11 01:09:34] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 9.0646e-06, 'epoch': 2.41, 'throughput': 7893.30}

[INFO|2025-02-11 01:10:00] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 8.8003e-06, 'epoch': 2.42, 'throughput': 7893.02}

[INFO|2025-02-11 01:10:25] logging.py:157 >> {'loss': 0.0000, 'learning_rate': 8.5395e-06, 'epoch': 2.43, 'throughput': 7892.52}

[INFO|2025-02-11 01:10:51] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 8.2823e-06, 'epoch': 2.44, 'throughput': 7892.63}

[INFO|2025-02-11 01:11:17] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 8.0287e-06, 'epoch': 2.45, 'throughput': 7892.41}

[INFO|2025-02-11 01:11:45] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 7.7786e-06, 'epoch': 2.46, 'throughput': 7889.96}

[INFO|2025-02-11 01:12:10] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 7.5322e-06, 'epoch': 2.47, 'throughput': 7892.18}

[INFO|2025-02-11 01:12:34] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 7.2895e-06, 'epoch': 2.48, 'throughput': 7893.12}

[INFO|2025-02-11 01:13:03] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 7.0504e-06, 'epoch': 2.49, 'throughput': 7891.37}

[INFO|2025-02-11 01:13:30] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 6.8150e-06, 'epoch': 2.49, 'throughput': 7891.23}

[INFO|2025-02-11 01:13:58] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 6.5834e-06, 'epoch': 2.50, 'throughput': 7890.21}

[INFO|2025-02-11 01:14:25] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 6.3554e-06, 'epoch': 2.51, 'throughput': 7889.96}

[INFO|2025-02-11 01:14:53] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 6.1312e-06, 'epoch': 2.52, 'throughput': 7889.86}

[INFO|2025-02-11 01:15:20] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 5.9108e-06, 'epoch': 2.53, 'throughput': 7889.91}

[INFO|2025-02-11 01:15:44] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 5.6941e-06, 'epoch': 2.54, 'throughput': 7891.05}

[INFO|2025-02-11 01:16:10] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.4813e-06, 'epoch': 2.55, 'throughput': 7890.97}

[INFO|2025-02-11 01:16:34] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 5.2723e-06, 'epoch': 2.56, 'throughput': 7891.95}

[INFO|2025-02-11 01:16:58] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 5.0671e-06, 'epoch': 2.56, 'throughput': 7893.32}

[INFO|2025-02-11 01:17:24] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.8657e-06, 'epoch': 2.57, 'throughput': 7893.98}

[INFO|2025-02-11 01:17:51] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.6683e-06, 'epoch': 2.58, 'throughput': 7892.90}

[INFO|2025-02-11 01:18:18] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 4.4748e-06, 'epoch': 2.59, 'throughput': 7892.36}

[INFO|2025-02-11 01:18:46] logging.py:157 >> {'loss': 0.0010, 'learning_rate': 4.2851e-06, 'epoch': 2.60, 'throughput': 7891.40}

[INFO|2025-02-11 01:19:11] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 4.0994e-06, 'epoch': 2.61, 'throughput': 7891.16}

[INFO|2025-02-11 01:19:35] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.9176e-06, 'epoch': 2.62, 'throughput': 7892.02}

[INFO|2025-02-11 01:19:59] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.7398e-06, 'epoch': 2.63, 'throughput': 7892.85}

[INFO|2025-02-11 01:20:26] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.5660e-06, 'epoch': 2.64, 'throughput': 7892.72}

[INFO|2025-02-11 01:20:52] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 3.3961e-06, 'epoch': 2.64, 'throughput': 7893.04}

[INFO|2025-02-11 01:21:17] logging.py:157 >> {'loss': 0.0000, 'learning_rate': 3.2303e-06, 'epoch': 2.65, 'throughput': 7892.55}

[INFO|2025-02-11 01:21:17] trainer.py:3910 >> Saving model checkpoint to saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-300

[INFO|2025-02-11 01:21:17] configuration_utils.py:694 >> loading configuration file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/config.json

[INFO|2025-02-11 01:21:17] configuration_utils.py:768 >> Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.48.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|2025-02-11 01:21:17] tokenization_utils_base.py:2491 >> tokenizer config file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-300/tokenizer_config.json

[INFO|2025-02-11 01:21:17] tokenization_utils_base.py:2500 >> Special tokens file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-300/special_tokens_map.json

[INFO|2025-02-11 01:21:42] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.0684e-06, 'epoch': 2.66, 'throughput': 7893.70}

[INFO|2025-02-11 01:22:07] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.9106e-06, 'epoch': 2.67, 'throughput': 7893.69}

[INFO|2025-02-11 01:22:31] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.7569e-06, 'epoch': 2.68, 'throughput': 7893.79}

[INFO|2025-02-11 01:22:58] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 2.6071e-06, 'epoch': 2.69, 'throughput': 7893.85}

[INFO|2025-02-11 01:23:27] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.4615e-06, 'epoch': 2.70, 'throughput': 7891.77}

[INFO|2025-02-11 01:23:51] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.3200e-06, 'epoch': 2.71, 'throughput': 7892.22}

[INFO|2025-02-11 01:24:18] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.1825e-06, 'epoch': 2.71, 'throughput': 7891.29}

[INFO|2025-02-11 01:24:43] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.0492e-06, 'epoch': 2.72, 'throughput': 7892.66}

[INFO|2025-02-11 01:25:08] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.9199e-06, 'epoch': 2.73, 'throughput': 7892.67}

[INFO|2025-02-11 01:25:34] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.7948e-06, 'epoch': 2.74, 'throughput': 7892.91}

[INFO|2025-02-11 01:26:01] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.6739e-06, 'epoch': 2.75, 'throughput': 7892.87}

[INFO|2025-02-11 01:26:28] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.5570e-06, 'epoch': 2.76, 'throughput': 7892.66}

[INFO|2025-02-11 01:26:55] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.4444e-06, 'epoch': 2.77, 'throughput': 7894.09}

[INFO|2025-02-11 01:27:20] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.3359e-06, 'epoch': 2.78, 'throughput': 7894.56}

[INFO|2025-02-11 01:27:46] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 1.2316e-06, 'epoch': 2.79, 'throughput': 7893.92}

[INFO|2025-02-11 01:28:12] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.1315e-06, 'epoch': 2.79, 'throughput': 7894.65}

[INFO|2025-02-11 01:28:38] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.0356e-06, 'epoch': 2.80, 'throughput': 7893.46}

[INFO|2025-02-11 01:29:02] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 9.4386e-07, 'epoch': 2.81, 'throughput': 7894.99}

[INFO|2025-02-11 01:29:31] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 8.5636e-07, 'epoch': 2.82, 'throughput': 7894.00}

[INFO|2025-02-11 01:29:57] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 7.7308e-07, 'epoch': 2.83, 'throughput': 7895.06}

[INFO|2025-02-11 01:30:24] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 6.9403e-07, 'epoch': 2.84, 'throughput': 7894.58}

[INFO|2025-02-11 01:30:53] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 6.1921e-07, 'epoch': 2.85, 'throughput': 7893.93}

[INFO|2025-02-11 01:31:19] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 5.4864e-07, 'epoch': 2.86, 'throughput': 7893.63}

[INFO|2025-02-11 01:31:45] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 4.8231e-07, 'epoch': 2.86, 'throughput': 7893.28}

[INFO|2025-02-11 01:32:10] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 4.2023e-07, 'epoch': 2.87, 'throughput': 7894.28}

[INFO|2025-02-11 01:32:33] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.6241e-07, 'epoch': 2.88, 'throughput': 7895.04}

[INFO|2025-02-11 01:32:57] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.0886e-07, 'epoch': 2.89, 'throughput': 7896.03}

[INFO|2025-02-11 01:33:21] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.5957e-07, 'epoch': 2.90, 'throughput': 7896.57}

[INFO|2025-02-11 01:33:45] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.1455e-07, 'epoch': 2.91, 'throughput': 7897.70}

[INFO|2025-02-11 01:34:10] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.7381e-07, 'epoch': 2.92, 'throughput': 7898.05}

[INFO|2025-02-11 01:34:36] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.3735e-07, 'epoch': 2.93, 'throughput': 7898.34}

[INFO|2025-02-11 01:35:01] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.0517e-07, 'epoch': 2.94, 'throughput': 7898.84}

[INFO|2025-02-11 01:35:26] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 7.7274e-08, 'epoch': 2.94, 'throughput': 7900.67}

[INFO|2025-02-11 01:35:51] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 5.3666e-08, 'epoch': 2.95, 'throughput': 7901.22}

[INFO|2025-02-11 01:36:19] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.4349e-08, 'epoch': 2.96, 'throughput': 7899.73}

[INFO|2025-02-11 01:36:44] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 1.9322e-08, 'epoch': 2.97, 'throughput': 7900.03}

[INFO|2025-02-11 01:37:08] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 8.5879e-09, 'epoch': 2.98, 'throughput': 7902.03}

[INFO|2025-02-11 01:37:35] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.1470e-09, 'epoch': 2.99, 'throughput': 7903.27}

[INFO|2025-02-11 01:38:00] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 0.0000e+00, 'epoch': 3.00, 'throughput': 7902.86}

[INFO|2025-02-11 01:38:00] trainer.py:3910 >> Saving model checkpoint to saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-339

[INFO|2025-02-11 01:38:00] configuration_utils.py:694 >> loading configuration file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/config.json

[INFO|2025-02-11 01:38:00] configuration_utils.py:768 >> Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.48.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|2025-02-11 01:38:00] tokenization_utils_base.py:2491 >> tokenizer config file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-339/tokenizer_config.json

[INFO|2025-02-11 01:38:00] tokenization_utils_base.py:2500 >> Special tokens file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-339/special_tokens_map.json

[INFO|2025-02-11 01:38:01] trainer.py:2643 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|2025-02-11 01:38:01] trainer.py:3910 >> Saving model checkpoint to saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128

[INFO|2025-02-11 01:38:01] configuration_utils.py:694 >> loading configuration file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/config.json

[INFO|2025-02-11 01:38:01] configuration_utils.py:768 >> Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.48.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|2025-02-11 01:38:01] tokenization_utils_base.py:2491 >> tokenizer config file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/tokenizer_config.json

[INFO|2025-02-11 01:38:01] tokenization_utils_base.py:2500 >> Special tokens file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/special_tokens_map.json

[WARNING|2025-02-11 01:38:02] logging.py:162 >> No metric eval_loss to plot.

[WARNING|2025-02-11 01:38:02] logging.py:162 >> No metric eval_accuracy to plot.

[INFO|2025-02-11 01:38:02] modelcard.py:449 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}