henggg's picture
Upload 18 files
86cd44f verified
[INFO|2025-02-10 23:10:14] tokenization_utils_base.py:2032 >> loading file merges.txt
[INFO|2025-02-10 23:10:14] tokenization_utils_base.py:2032 >> loading file tokenizer.json
[INFO|2025-02-10 23:10:14] tokenization_utils_base.py:2032 >> loading file added_tokens.json
[INFO|2025-02-10 23:10:14] tokenization_utils_base.py:2032 >> loading file special_tokens_map.json
[INFO|2025-02-10 23:10:14] tokenization_utils_base.py:2032 >> loading file tokenizer_config.json
[INFO|2025-02-10 23:10:14] tokenization_utils_base.py:2032 >> loading file chat_template.jinja
[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2304 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|2025-02-10 23:10:15] configuration_utils.py:694 >> loading configuration file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/config.json
[INFO|2025-02-10 23:10:15] configuration_utils.py:768 >> Model config Qwen2Config {
"_name_or_path": "/nas/shared/ma4agi/model/Qwen2.5-7B-Instruct",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.48.2",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 152064
}
[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2032 >> loading file vocab.json
[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2032 >> loading file merges.txt
[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2032 >> loading file tokenizer.json
[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2032 >> loading file added_tokens.json
[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2032 >> loading file special_tokens_map.json
[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2032 >> loading file tokenizer_config.json
[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2032 >> loading file chat_template.jinja
[INFO|2025-02-10 23:10:15] tokenization_utils_base.py:2304 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|2025-02-10 23:10:15] logging.py:157 >> Add <|im_end|> to stop words.
[INFO|2025-02-10 23:10:15] logging.py:157 >> Loading dataset graph_planning/graph_planning_train.json...
[INFO|2025-02-10 23:10:23] configuration_utils.py:694 >> loading configuration file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/config.json
[INFO|2025-02-10 23:10:23] configuration_utils.py:768 >> Model config Qwen2Config {
"_name_or_path": "/nas/shared/ma4agi/model/Qwen2.5-7B-Instruct",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.48.2",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 152064
}
[INFO|2025-02-10 23:10:24] modeling_utils.py:3901 >> loading weights file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/model.safetensors.index.json
[INFO|2025-02-10 23:10:24] modeling_utils.py:1582 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.
[INFO|2025-02-10 23:10:24] configuration_utils.py:1140 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"eos_token_id": 151645
}
[INFO|2025-02-10 23:10:27] modeling_utils.py:4888 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.
[INFO|2025-02-10 23:10:27] modeling_utils.py:4896 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[INFO|2025-02-10 23:10:27] configuration_utils.py:1093 >> loading configuration file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/generation_config.json
[INFO|2025-02-10 23:10:27] configuration_utils.py:1140 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"repetition_penalty": 1.05,
"temperature": 0.7,
"top_k": 20,
"top_p": 0.8
}
[INFO|2025-02-10 23:10:27] logging.py:157 >> Gradient checkpointing enabled.
[INFO|2025-02-10 23:10:27] logging.py:157 >> Using torch SDPA for faster training and inference.
[INFO|2025-02-10 23:10:27] logging.py:157 >> Upcasting trainable params to float32.
[INFO|2025-02-10 23:10:27] logging.py:157 >> Fine-tuning method: LoRA
[INFO|2025-02-10 23:10:27] logging.py:157 >> Found linear modules: o_proj,q_proj,gate_proj,v_proj,up_proj,k_proj,down_proj
[INFO|2025-02-10 23:10:28] logging.py:157 >> trainable params: 20,185,088 || all params: 7,635,801,600 || trainable%: 0.2643
[INFO|2025-02-10 23:10:28] trainer.py:741 >> Using auto half precision backend
[INFO|2025-02-10 23:10:28] trainer.py:2369 >> ***** Running training *****
[INFO|2025-02-10 23:10:28] trainer.py:2370 >> Num examples = 14,500
[INFO|2025-02-10 23:10:28] trainer.py:2371 >> Num Epochs = 3
[INFO|2025-02-10 23:10:28] trainer.py:2372 >> Instantaneous batch size per device = 2
[INFO|2025-02-10 23:10:28] trainer.py:2375 >> Total train batch size (w. parallel, distributed & accumulation) = 128
[INFO|2025-02-10 23:10:28] trainer.py:2376 >> Gradient Accumulation steps = 16
[INFO|2025-02-10 23:10:28] trainer.py:2377 >> Total optimization steps = 339
[INFO|2025-02-10 23:10:28] trainer.py:2378 >> Number of trainable parameters = 20,185,088
[INFO|2025-02-10 23:10:56] logging.py:157 >> {'loss': 0.2055, 'learning_rate': 9.9998e-05, 'epoch': 0.01, 'throughput': 7338.97}
[INFO|2025-02-10 23:11:22] logging.py:157 >> {'loss': 0.1902, 'learning_rate': 9.9991e-05, 'epoch': 0.02, 'throughput': 7493.38}
[INFO|2025-02-10 23:11:47] logging.py:157 >> {'loss': 0.1421, 'learning_rate': 9.9981e-05, 'epoch': 0.03, 'throughput': 7812.96}
[INFO|2025-02-10 23:12:15] logging.py:157 >> {'loss': 0.1102, 'learning_rate': 9.9966e-05, 'epoch': 0.04, 'throughput': 7668.61}
[INFO|2025-02-10 23:12:40] logging.py:157 >> {'loss': 0.0801, 'learning_rate': 9.9946e-05, 'epoch': 0.04, 'throughput': 7740.76}
[INFO|2025-02-10 23:13:07] logging.py:157 >> {'loss': 0.0574, 'learning_rate': 9.9923e-05, 'epoch': 0.05, 'throughput': 7743.47}
[INFO|2025-02-10 23:13:31] logging.py:157 >> {'loss': 0.0401, 'learning_rate': 9.9895e-05, 'epoch': 0.06, 'throughput': 7830.06}
[INFO|2025-02-10 23:14:00] logging.py:157 >> {'loss': 0.0295, 'learning_rate': 9.9863e-05, 'epoch': 0.07, 'throughput': 7781.82}
[INFO|2025-02-10 23:14:23] logging.py:157 >> {'loss': 0.0262, 'learning_rate': 9.9826e-05, 'epoch': 0.08, 'throughput': 7819.12}
[INFO|2025-02-10 23:14:51] logging.py:157 >> {'loss': 0.0263, 'learning_rate': 9.9785e-05, 'epoch': 0.09, 'throughput': 7807.27}
[INFO|2025-02-10 23:15:15] logging.py:157 >> {'loss': 0.0220, 'learning_rate': 9.9740e-05, 'epoch': 0.10, 'throughput': 7850.91}
[INFO|2025-02-10 23:15:39] logging.py:157 >> {'loss': 0.0202, 'learning_rate': 9.9691e-05, 'epoch': 0.11, 'throughput': 7850.63}
[INFO|2025-02-10 23:16:08] logging.py:157 >> {'loss': 0.0202, 'learning_rate': 9.9638e-05, 'epoch': 0.11, 'throughput': 7823.47}
[INFO|2025-02-10 23:16:38] logging.py:157 >> {'loss': 0.0178, 'learning_rate': 9.9580e-05, 'epoch': 0.12, 'throughput': 7772.01}
[INFO|2025-02-10 23:17:02] logging.py:157 >> {'loss': 0.0158, 'learning_rate': 9.9518e-05, 'epoch': 0.13, 'throughput': 7802.71}
[INFO|2025-02-10 23:17:29] logging.py:157 >> {'loss': 0.0159, 'learning_rate': 9.9451e-05, 'epoch': 0.14, 'throughput': 7786.47}
[INFO|2025-02-10 23:17:56] logging.py:157 >> {'loss': 0.0147, 'learning_rate': 9.9381e-05, 'epoch': 0.15, 'throughput': 7807.80}
[INFO|2025-02-10 23:18:22] logging.py:157 >> {'loss': 0.0138, 'learning_rate': 9.9306e-05, 'epoch': 0.16, 'throughput': 7814.87}
[INFO|2025-02-10 23:18:46] logging.py:157 >> {'loss': 0.0128, 'learning_rate': 9.9227e-05, 'epoch': 0.17, 'throughput': 7835.96}
[INFO|2025-02-10 23:19:13] logging.py:157 >> {'loss': 0.0124, 'learning_rate': 9.9144e-05, 'epoch': 0.18, 'throughput': 7833.94}
[INFO|2025-02-10 23:19:39] logging.py:157 >> {'loss': 0.0116, 'learning_rate': 9.9056e-05, 'epoch': 0.19, 'throughput': 7845.67}
[INFO|2025-02-10 23:20:05] logging.py:157 >> {'loss': 0.0109, 'learning_rate': 9.8964e-05, 'epoch': 0.19, 'throughput': 7844.79}
[INFO|2025-02-10 23:20:32] logging.py:157 >> {'loss': 0.0106, 'learning_rate': 9.8869e-05, 'epoch': 0.20, 'throughput': 7843.06}
[INFO|2025-02-10 23:20:59] logging.py:157 >> {'loss': 0.0099, 'learning_rate': 9.8768e-05, 'epoch': 0.21, 'throughput': 7837.07}
[INFO|2025-02-10 23:21:24] logging.py:157 >> {'loss': 0.0109, 'learning_rate': 9.8664e-05, 'epoch': 0.22, 'throughput': 7848.70}
[INFO|2025-02-10 23:21:48] logging.py:157 >> {'loss': 0.0084, 'learning_rate': 9.8556e-05, 'epoch': 0.23, 'throughput': 7866.16}
[INFO|2025-02-10 23:22:15] logging.py:157 >> {'loss': 0.0081, 'learning_rate': 9.8443e-05, 'epoch': 0.24, 'throughput': 7874.35}
[INFO|2025-02-10 23:22:41] logging.py:157 >> {'loss': 0.0078, 'learning_rate': 9.8326e-05, 'epoch': 0.25, 'throughput': 7877.49}
[INFO|2025-02-10 23:23:09] logging.py:157 >> {'loss': 0.0081, 'learning_rate': 9.8205e-05, 'epoch': 0.26, 'throughput': 7855.15}
[INFO|2025-02-10 23:23:36] logging.py:157 >> {'loss': 0.0087, 'learning_rate': 9.8080e-05, 'epoch': 0.26, 'throughput': 7862.08}
[INFO|2025-02-10 23:24:02] logging.py:157 >> {'loss': 0.0079, 'learning_rate': 9.7951e-05, 'epoch': 0.27, 'throughput': 7859.19}
[INFO|2025-02-10 23:24:28] logging.py:157 >> {'loss': 0.0086, 'learning_rate': 9.7817e-05, 'epoch': 0.28, 'throughput': 7858.66}
[INFO|2025-02-10 23:24:55] logging.py:157 >> {'loss': 0.0079, 'learning_rate': 9.7680e-05, 'epoch': 0.29, 'throughput': 7854.06}
[INFO|2025-02-10 23:25:21] logging.py:157 >> {'loss': 0.0069, 'learning_rate': 9.7538e-05, 'epoch': 0.30, 'throughput': 7844.67}
[INFO|2025-02-10 23:25:48] logging.py:157 >> {'loss': 0.0064, 'learning_rate': 9.7393e-05, 'epoch': 0.31, 'throughput': 7832.92}
[INFO|2025-02-10 23:26:13] logging.py:157 >> {'loss': 0.0060, 'learning_rate': 9.7243e-05, 'epoch': 0.32, 'throughput': 7834.75}
[INFO|2025-02-10 23:26:41] logging.py:157 >> {'loss': 0.0067, 'learning_rate': 9.7089e-05, 'epoch': 0.33, 'throughput': 7845.98}
[INFO|2025-02-10 23:27:07] logging.py:157 >> {'loss': 0.0063, 'learning_rate': 9.6932e-05, 'epoch': 0.34, 'throughput': 7852.67}
[INFO|2025-02-10 23:27:36] logging.py:157 >> {'loss': 0.0058, 'learning_rate': 9.6770e-05, 'epoch': 0.34, 'throughput': 7846.29}
[INFO|2025-02-10 23:28:05] logging.py:157 >> {'loss': 0.0061, 'learning_rate': 9.6604e-05, 'epoch': 0.35, 'throughput': 7836.02}
[INFO|2025-02-10 23:28:29] logging.py:157 >> {'loss': 0.0055, 'learning_rate': 9.6434e-05, 'epoch': 0.36, 'throughput': 7840.93}
[INFO|2025-02-10 23:28:56] logging.py:157 >> {'loss': 0.0050, 'learning_rate': 9.6260e-05, 'epoch': 0.37, 'throughput': 7842.20}
[INFO|2025-02-10 23:29:22] logging.py:157 >> {'loss': 0.0060, 'learning_rate': 9.6082e-05, 'epoch': 0.38, 'throughput': 7833.15}
[INFO|2025-02-10 23:29:48] logging.py:157 >> {'loss': 0.0048, 'learning_rate': 9.5901e-05, 'epoch': 0.39, 'throughput': 7835.19}
[INFO|2025-02-10 23:30:14] logging.py:157 >> {'loss': 0.0047, 'learning_rate': 9.5715e-05, 'epoch': 0.40, 'throughput': 7841.66}
[INFO|2025-02-10 23:30:41] logging.py:157 >> {'loss': 0.0053, 'learning_rate': 9.5525e-05, 'epoch': 0.41, 'throughput': 7848.48}
[INFO|2025-02-10 23:31:05] logging.py:157 >> {'loss': 0.0044, 'learning_rate': 9.5332e-05, 'epoch': 0.41, 'throughput': 7851.65}
[INFO|2025-02-10 23:31:31] logging.py:157 >> {'loss': 0.0043, 'learning_rate': 9.5134e-05, 'epoch': 0.42, 'throughput': 7850.33}
[INFO|2025-02-10 23:31:57] logging.py:157 >> {'loss': 0.0041, 'learning_rate': 9.4933e-05, 'epoch': 0.43, 'throughput': 7853.40}
[INFO|2025-02-10 23:32:26] logging.py:157 >> {'loss': 0.0044, 'learning_rate': 9.4728e-05, 'epoch': 0.44, 'throughput': 7850.51}
[INFO|2025-02-10 23:32:52] logging.py:157 >> {'loss': 0.0040, 'learning_rate': 9.4519e-05, 'epoch': 0.45, 'throughput': 7847.60}
[INFO|2025-02-10 23:33:18] logging.py:157 >> {'loss': 0.0045, 'learning_rate': 9.4306e-05, 'epoch': 0.46, 'throughput': 7853.35}
[INFO|2025-02-10 23:33:43] logging.py:157 >> {'loss': 0.0028, 'learning_rate': 9.4089e-05, 'epoch': 0.47, 'throughput': 7845.79}
[INFO|2025-02-10 23:34:08] logging.py:157 >> {'loss': 0.0034, 'learning_rate': 9.3869e-05, 'epoch': 0.48, 'throughput': 7853.45}
[INFO|2025-02-10 23:34:36] logging.py:157 >> {'loss': 0.0042, 'learning_rate': 9.3645e-05, 'epoch': 0.49, 'throughput': 7852.19}
[INFO|2025-02-10 23:34:59] logging.py:157 >> {'loss': 0.0028, 'learning_rate': 9.3417e-05, 'epoch': 0.49, 'throughput': 7868.08}
[INFO|2025-02-10 23:35:24] logging.py:157 >> {'loss': 0.0035, 'learning_rate': 9.3185e-05, 'epoch': 0.50, 'throughput': 7875.29}
[INFO|2025-02-10 23:35:48] logging.py:157 >> {'loss': 0.0029, 'learning_rate': 9.2950e-05, 'epoch': 0.51, 'throughput': 7884.45}
[INFO|2025-02-10 23:36:13] logging.py:157 >> {'loss': 0.0028, 'learning_rate': 9.2710e-05, 'epoch': 0.52, 'throughput': 7890.10}
[INFO|2025-02-10 23:36:37] logging.py:157 >> {'loss': 0.0027, 'learning_rate': 9.2468e-05, 'epoch': 0.53, 'throughput': 7895.72}
[INFO|2025-02-10 23:37:02] logging.py:157 >> {'loss': 0.0022, 'learning_rate': 9.2221e-05, 'epoch': 0.54, 'throughput': 7896.89}
[INFO|2025-02-10 23:37:34] logging.py:157 >> {'loss': 0.0027, 'learning_rate': 9.1971e-05, 'epoch': 0.55, 'throughput': 7889.78}
[INFO|2025-02-10 23:38:02] logging.py:157 >> {'loss': 0.0025, 'learning_rate': 9.1718e-05, 'epoch': 0.56, 'throughput': 7887.26}
[INFO|2025-02-10 23:38:27] logging.py:157 >> {'loss': 0.0025, 'learning_rate': 9.1461e-05, 'epoch': 0.56, 'throughput': 7894.29}
[INFO|2025-02-10 23:38:57] logging.py:157 >> {'loss': 0.0022, 'learning_rate': 9.1200e-05, 'epoch': 0.57, 'throughput': 7878.69}
[INFO|2025-02-10 23:39:23] logging.py:157 >> {'loss': 0.0023, 'learning_rate': 9.0935e-05, 'epoch': 0.58, 'throughput': 7879.18}
[INFO|2025-02-10 23:39:50] logging.py:157 >> {'loss': 0.0024, 'learning_rate': 9.0668e-05, 'epoch': 0.59, 'throughput': 7877.08}
[INFO|2025-02-10 23:40:18] logging.py:157 >> {'loss': 0.0015, 'learning_rate': 9.0396e-05, 'epoch': 0.60, 'throughput': 7867.14}
[INFO|2025-02-10 23:40:48] logging.py:157 >> {'loss': 0.0033, 'learning_rate': 9.0122e-05, 'epoch': 0.61, 'throughput': 7856.50}
[INFO|2025-02-10 23:41:16] logging.py:157 >> {'loss': 0.0021, 'learning_rate': 8.9843e-05, 'epoch': 0.62, 'throughput': 7860.92}
[INFO|2025-02-10 23:41:41] logging.py:157 >> {'loss': 0.0013, 'learning_rate': 8.9562e-05, 'epoch': 0.63, 'throughput': 7863.29}
[INFO|2025-02-10 23:42:04] logging.py:157 >> {'loss': 0.0016, 'learning_rate': 8.9277e-05, 'epoch': 0.64, 'throughput': 7866.76}
[INFO|2025-02-10 23:42:29] logging.py:157 >> {'loss': 0.0019, 'learning_rate': 8.8988e-05, 'epoch': 0.64, 'throughput': 7870.01}
[INFO|2025-02-10 23:42:56] logging.py:157 >> {'loss': 0.0017, 'learning_rate': 8.8696e-05, 'epoch': 0.65, 'throughput': 7870.22}
[INFO|2025-02-10 23:43:23] logging.py:157 >> {'loss': 0.0015, 'learning_rate': 8.8401e-05, 'epoch': 0.66, 'throughput': 7872.22}
[INFO|2025-02-10 23:43:51] logging.py:157 >> {'loss': 0.0017, 'learning_rate': 8.8103e-05, 'epoch': 0.67, 'throughput': 7866.00}
[INFO|2025-02-10 23:44:17] logging.py:157 >> {'loss': 0.0010, 'learning_rate': 8.7801e-05, 'epoch': 0.68, 'throughput': 7866.60}
[INFO|2025-02-10 23:44:42] logging.py:157 >> {'loss': 0.0018, 'learning_rate': 8.7496e-05, 'epoch': 0.69, 'throughput': 7869.35}
[INFO|2025-02-10 23:45:05] logging.py:157 >> {'loss': 0.0010, 'learning_rate': 8.7188e-05, 'epoch': 0.70, 'throughput': 7871.98}
[INFO|2025-02-10 23:45:31] logging.py:157 >> {'loss': 0.0014, 'learning_rate': 8.6877e-05, 'epoch': 0.71, 'throughput': 7877.09}
[INFO|2025-02-10 23:45:56] logging.py:157 >> {'loss': 0.0010, 'learning_rate': 8.6562e-05, 'epoch': 0.71, 'throughput': 7874.92}
[INFO|2025-02-10 23:46:27] logging.py:157 >> {'loss': 0.0013, 'learning_rate': 8.6245e-05, 'epoch': 0.72, 'throughput': 7863.86}
[INFO|2025-02-10 23:46:50] logging.py:157 >> {'loss': 0.0012, 'learning_rate': 8.5924e-05, 'epoch': 0.73, 'throughput': 7864.79}
[INFO|2025-02-10 23:47:14] logging.py:157 >> {'loss': 0.0013, 'learning_rate': 8.5600e-05, 'epoch': 0.74, 'throughput': 7871.03}
[INFO|2025-02-10 23:47:40] logging.py:157 >> {'loss': 0.0011, 'learning_rate': 8.5273e-05, 'epoch': 0.75, 'throughput': 7869.86}
[INFO|2025-02-10 23:48:09] logging.py:157 >> {'loss': 0.0037, 'learning_rate': 8.4943e-05, 'epoch': 0.76, 'throughput': 7865.40}
[INFO|2025-02-10 23:48:36] logging.py:157 >> {'loss': 0.0007, 'learning_rate': 8.4611e-05, 'epoch': 0.77, 'throughput': 7860.70}
[INFO|2025-02-10 23:49:00] logging.py:157 >> {'loss': 0.0008, 'learning_rate': 8.4275e-05, 'epoch': 0.78, 'throughput': 7865.68}
[INFO|2025-02-10 23:49:28] logging.py:157 >> {'loss': 0.0006, 'learning_rate': 8.3936e-05, 'epoch': 0.79, 'throughput': 7857.75}
[INFO|2025-02-10 23:49:53] logging.py:157 >> {'loss': 0.0010, 'learning_rate': 8.3594e-05, 'epoch': 0.79, 'throughput': 7861.64}
[INFO|2025-02-10 23:50:18] logging.py:157 >> {'loss': 0.0010, 'learning_rate': 8.3249e-05, 'epoch': 0.80, 'throughput': 7864.33}
[INFO|2025-02-10 23:50:43] logging.py:157 >> {'loss': 0.0008, 'learning_rate': 8.2902e-05, 'epoch': 0.81, 'throughput': 7862.08}
[INFO|2025-02-10 23:51:08] logging.py:157 >> {'loss': 0.0014, 'learning_rate': 8.2552e-05, 'epoch': 0.82, 'throughput': 7866.62}
[INFO|2025-02-10 23:51:33] logging.py:157 >> {'loss': 0.0010, 'learning_rate': 8.2199e-05, 'epoch': 0.83, 'throughput': 7869.38}
[INFO|2025-02-10 23:51:59] logging.py:157 >> {'loss': 0.0007, 'learning_rate': 8.1843e-05, 'epoch': 0.84, 'throughput': 7872.25}
[INFO|2025-02-10 23:52:23] logging.py:157 >> {'loss': 0.0008, 'learning_rate': 8.1484e-05, 'epoch': 0.85, 'throughput': 7875.39}
[INFO|2025-02-10 23:52:51] logging.py:157 >> {'loss': 0.0011, 'learning_rate': 8.1123e-05, 'epoch': 0.86, 'throughput': 7868.64}
[INFO|2025-02-10 23:53:21] logging.py:157 >> {'loss': 0.0009, 'learning_rate': 8.0759e-05, 'epoch': 0.86, 'throughput': 7865.85}
[INFO|2025-02-10 23:53:44] logging.py:157 >> {'loss': 0.0006, 'learning_rate': 8.0392e-05, 'epoch': 0.87, 'throughput': 7870.61}
[INFO|2025-02-10 23:54:08] logging.py:157 >> {'loss': 0.0009, 'learning_rate': 8.0023e-05, 'epoch': 0.88, 'throughput': 7876.56}
[INFO|2025-02-10 23:54:08] trainer.py:3910 >> Saving model checkpoint to saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-100
[INFO|2025-02-10 23:54:08] configuration_utils.py:694 >> loading configuration file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/config.json
[INFO|2025-02-10 23:54:08] configuration_utils.py:768 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.48.2",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 152064
}
[INFO|2025-02-10 23:54:08] tokenization_utils_base.py:2491 >> tokenizer config file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-100/tokenizer_config.json
[INFO|2025-02-10 23:54:08] tokenization_utils_base.py:2500 >> Special tokens file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-100/special_tokens_map.json
[INFO|2025-02-10 23:54:33] logging.py:157 >> {'loss': 0.0006, 'learning_rate': 7.9651e-05, 'epoch': 0.89, 'throughput': 7876.49}
[INFO|2025-02-10 23:55:01] logging.py:157 >> {'loss': 0.0008, 'learning_rate': 7.9277e-05, 'epoch': 0.90, 'throughput': 7875.34}
[INFO|2025-02-10 23:55:25] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 7.8900e-05, 'epoch': 0.91, 'throughput': 7878.05}
[INFO|2025-02-10 23:55:48] logging.py:157 >> {'loss': 0.0006, 'learning_rate': 7.8520e-05, 'epoch': 0.92, 'throughput': 7881.27}
[INFO|2025-02-10 23:56:16] logging.py:157 >> {'loss': 0.0008, 'learning_rate': 7.8139e-05, 'epoch': 0.93, 'throughput': 7877.02}
[INFO|2025-02-10 23:56:44] logging.py:157 >> {'loss': 0.0007, 'learning_rate': 7.7754e-05, 'epoch': 0.94, 'throughput': 7877.23}
[INFO|2025-02-10 23:57:08] logging.py:157 >> {'loss': 0.0006, 'learning_rate': 7.7368e-05, 'epoch': 0.94, 'throughput': 7878.63}
[INFO|2025-02-10 23:57:34] logging.py:157 >> {'loss': 0.0012, 'learning_rate': 7.6979e-05, 'epoch': 0.95, 'throughput': 7883.06}
[INFO|2025-02-10 23:57:59] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 7.6588e-05, 'epoch': 0.96, 'throughput': 7881.65}
[INFO|2025-02-10 23:58:23] logging.py:157 >> {'loss': 0.0008, 'learning_rate': 7.6194e-05, 'epoch': 0.97, 'throughput': 7883.60}
[INFO|2025-02-10 23:58:51] logging.py:157 >> {'loss': 0.0007, 'learning_rate': 7.5798e-05, 'epoch': 0.98, 'throughput': 7884.35}
[INFO|2025-02-10 23:59:14] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 7.5400e-05, 'epoch': 0.99, 'throughput': 7889.50}
[INFO|2025-02-10 23:59:42] logging.py:157 >> {'loss': 0.0007, 'learning_rate': 7.5000e-05, 'epoch': 1.00, 'throughput': 7889.54}
[INFO|2025-02-11 00:00:15] logging.py:157 >> {'loss': 0.0014, 'learning_rate': 7.4598e-05, 'epoch': 1.01, 'throughput': 7889.22}
[INFO|2025-02-11 00:00:42] logging.py:157 >> {'loss': 0.0007, 'learning_rate': 7.4193e-05, 'epoch': 1.02, 'throughput': 7886.62}
[INFO|2025-02-11 00:01:08] logging.py:157 >> {'loss': 0.0006, 'learning_rate': 7.3787e-05, 'epoch': 1.03, 'throughput': 7886.10}
[INFO|2025-02-11 00:01:33] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 7.3378e-05, 'epoch': 1.04, 'throughput': 7886.46}
[INFO|2025-02-11 00:01:56] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 7.2967e-05, 'epoch': 1.04, 'throughput': 7889.90}
[INFO|2025-02-11 00:02:21] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 7.2555e-05, 'epoch': 1.05, 'throughput': 7889.45}
[INFO|2025-02-11 00:02:50] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 7.2140e-05, 'epoch': 1.06, 'throughput': 7886.78}
[INFO|2025-02-11 00:03:14] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 7.1724e-05, 'epoch': 1.07, 'throughput': 7893.28}
[INFO|2025-02-11 00:03:40] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 7.1306e-05, 'epoch': 1.08, 'throughput': 7896.08}
[INFO|2025-02-11 00:04:05] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 7.0886e-05, 'epoch': 1.09, 'throughput': 7901.06}
[INFO|2025-02-11 00:04:28] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 7.0464e-05, 'epoch': 1.10, 'throughput': 7904.61}
[INFO|2025-02-11 00:04:54] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 7.0040e-05, 'epoch': 1.11, 'throughput': 7904.82}
[INFO|2025-02-11 00:05:22] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 6.9615e-05, 'epoch': 1.11, 'throughput': 7905.63}
[INFO|2025-02-11 00:05:49] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 6.9188e-05, 'epoch': 1.12, 'throughput': 7904.23}
[INFO|2025-02-11 00:06:16] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 6.8759e-05, 'epoch': 1.13, 'throughput': 7903.09}
[INFO|2025-02-11 00:06:46] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 6.8329e-05, 'epoch': 1.14, 'throughput': 7898.37}
[INFO|2025-02-11 00:07:13] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 6.7897e-05, 'epoch': 1.15, 'throughput': 7897.68}
[INFO|2025-02-11 00:07:41] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 6.7463e-05, 'epoch': 1.16, 'throughput': 7897.31}
[INFO|2025-02-11 00:08:08] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 6.7028e-05, 'epoch': 1.17, 'throughput': 7895.24}
[INFO|2025-02-11 00:08:34] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 6.6592e-05, 'epoch': 1.18, 'throughput': 7895.50}
[INFO|2025-02-11 00:09:01] logging.py:157 >> {'loss': 0.0009, 'learning_rate': 6.6154e-05, 'epoch': 1.19, 'throughput': 7896.09}
[INFO|2025-02-11 00:09:29] logging.py:157 >> {'loss': 0.0014, 'learning_rate': 6.5715e-05, 'epoch': 1.19, 'throughput': 7892.01}
[INFO|2025-02-11 00:09:54] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 6.5274e-05, 'epoch': 1.20, 'throughput': 7891.89}
[INFO|2025-02-11 00:10:22] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 6.4833e-05, 'epoch': 1.21, 'throughput': 7888.92}
[INFO|2025-02-11 00:10:49] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 6.4389e-05, 'epoch': 1.22, 'throughput': 7888.52}
[INFO|2025-02-11 00:11:14] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 6.3945e-05, 'epoch': 1.23, 'throughput': 7890.66}
[INFO|2025-02-11 00:11:36] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 6.3500e-05, 'epoch': 1.24, 'throughput': 7892.64}
[INFO|2025-02-11 00:12:00] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 6.3053e-05, 'epoch': 1.25, 'throughput': 7894.92}
[INFO|2025-02-11 00:12:29] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 6.2605e-05, 'epoch': 1.26, 'throughput': 7892.49}
[INFO|2025-02-11 00:12:54] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 6.2156e-05, 'epoch': 1.26, 'throughput': 7894.05}
[INFO|2025-02-11 00:13:20] logging.py:157 >> {'loss': 0.0006, 'learning_rate': 6.1706e-05, 'epoch': 1.27, 'throughput': 7893.70}
[INFO|2025-02-11 00:13:42] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 6.1255e-05, 'epoch': 1.28, 'throughput': 7896.62}
[INFO|2025-02-11 00:14:09] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 6.0803e-05, 'epoch': 1.29, 'throughput': 7895.09}
[INFO|2025-02-11 00:14:34] logging.py:157 >> {'loss': 0.0005, 'learning_rate': 6.0350e-05, 'epoch': 1.30, 'throughput': 7897.58}
[INFO|2025-02-11 00:15:00] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.9896e-05, 'epoch': 1.31, 'throughput': 7896.39}
[INFO|2025-02-11 00:15:25] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 5.9442e-05, 'epoch': 1.32, 'throughput': 7897.88}
[INFO|2025-02-11 00:15:48] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 5.8986e-05, 'epoch': 1.33, 'throughput': 7902.00}
[INFO|2025-02-11 00:16:14] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 5.8530e-05, 'epoch': 1.34, 'throughput': 7900.57}
[INFO|2025-02-11 00:16:41] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.8073e-05, 'epoch': 1.34, 'throughput': 7901.36}
[INFO|2025-02-11 00:17:06] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.7616e-05, 'epoch': 1.35, 'throughput': 7898.31}
[INFO|2025-02-11 00:17:30] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.7157e-05, 'epoch': 1.36, 'throughput': 7901.98}
[INFO|2025-02-11 00:17:58] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 5.6699e-05, 'epoch': 1.37, 'throughput': 7898.91}
[INFO|2025-02-11 00:18:22] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 5.6239e-05, 'epoch': 1.38, 'throughput': 7901.45}
[INFO|2025-02-11 00:18:47] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 5.5779e-05, 'epoch': 1.39, 'throughput': 7900.71}
[INFO|2025-02-11 00:19:14] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 5.5319e-05, 'epoch': 1.40, 'throughput': 7898.98}
[INFO|2025-02-11 00:19:41] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.4858e-05, 'epoch': 1.41, 'throughput': 7897.63}
[INFO|2025-02-11 00:20:08] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.4396e-05, 'epoch': 1.41, 'throughput': 7897.71}
[INFO|2025-02-11 00:20:33] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.3935e-05, 'epoch': 1.42, 'throughput': 7897.58}
[INFO|2025-02-11 00:20:57] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 5.3472e-05, 'epoch': 1.43, 'throughput': 7900.98}
[INFO|2025-02-11 00:21:26] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.3010e-05, 'epoch': 1.44, 'throughput': 7899.49}
[INFO|2025-02-11 00:21:50] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 5.2547e-05, 'epoch': 1.45, 'throughput': 7899.53}
[INFO|2025-02-11 00:22:13] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.2085e-05, 'epoch': 1.46, 'throughput': 7901.89}
[INFO|2025-02-11 00:22:39] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 5.1621e-05, 'epoch': 1.47, 'throughput': 7902.03}
[INFO|2025-02-11 00:23:05] logging.py:157 >> {'loss': 0.0016, 'learning_rate': 5.1158e-05, 'epoch': 1.48, 'throughput': 7901.05}
[INFO|2025-02-11 00:23:32] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 5.0695e-05, 'epoch': 1.49, 'throughput': 7901.53}
[INFO|2025-02-11 00:23:57] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.0232e-05, 'epoch': 1.49, 'throughput': 7901.79}
[INFO|2025-02-11 00:24:22] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.9768e-05, 'epoch': 1.50, 'throughput': 7901.03}
[INFO|2025-02-11 00:24:45] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 4.9305e-05, 'epoch': 1.51, 'throughput': 7903.96}
[INFO|2025-02-11 00:25:11] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.8842e-05, 'epoch': 1.52, 'throughput': 7905.12}
[INFO|2025-02-11 00:25:33] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.8379e-05, 'epoch': 1.53, 'throughput': 7909.17}
[INFO|2025-02-11 00:25:58] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.7915e-05, 'epoch': 1.54, 'throughput': 7909.02}
[INFO|2025-02-11 00:26:29] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 4.7453e-05, 'epoch': 1.55, 'throughput': 7905.59}
[INFO|2025-02-11 00:26:56] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.6990e-05, 'epoch': 1.56, 'throughput': 7905.47}
[INFO|2025-02-11 00:27:22] logging.py:157 >> {'loss': 0.0027, 'learning_rate': 4.6528e-05, 'epoch': 1.56, 'throughput': 7904.70}
[INFO|2025-02-11 00:27:47] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 4.6065e-05, 'epoch': 1.57, 'throughput': 7905.80}
[INFO|2025-02-11 00:28:12] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 4.5604e-05, 'epoch': 1.58, 'throughput': 7907.65}
[INFO|2025-02-11 00:28:39] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 4.5142e-05, 'epoch': 1.59, 'throughput': 7907.43}
[INFO|2025-02-11 00:29:09] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.4681e-05, 'epoch': 1.60, 'throughput': 7902.86}
[INFO|2025-02-11 00:29:35] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 4.4221e-05, 'epoch': 1.61, 'throughput': 7903.10}
[INFO|2025-02-11 00:30:02] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 4.3761e-05, 'epoch': 1.62, 'throughput': 7904.08}
[INFO|2025-02-11 00:30:28] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 4.3301e-05, 'epoch': 1.63, 'throughput': 7905.25}
[INFO|2025-02-11 00:30:50] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 4.2843e-05, 'epoch': 1.64, 'throughput': 7908.78}
[INFO|2025-02-11 00:31:13] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.2384e-05, 'epoch': 1.64, 'throughput': 7912.45}
[INFO|2025-02-11 00:31:40] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 4.1927e-05, 'epoch': 1.65, 'throughput': 7912.95}
[INFO|2025-02-11 00:32:06] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 4.1470e-05, 'epoch': 1.66, 'throughput': 7913.14}
[INFO|2025-02-11 00:32:33] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.1014e-05, 'epoch': 1.67, 'throughput': 7914.08}
[INFO|2025-02-11 00:33:02] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 4.0558e-05, 'epoch': 1.68, 'throughput': 7911.16}
[INFO|2025-02-11 00:33:29] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 4.0104e-05, 'epoch': 1.69, 'throughput': 7911.74}
[INFO|2025-02-11 00:33:56] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 3.9650e-05, 'epoch': 1.70, 'throughput': 7911.05}
[INFO|2025-02-11 00:34:21] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.9197e-05, 'epoch': 1.71, 'throughput': 7910.45}
[INFO|2025-02-11 00:34:46] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 3.8745e-05, 'epoch': 1.71, 'throughput': 7911.47}
[INFO|2025-02-11 00:35:10] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 3.8294e-05, 'epoch': 1.72, 'throughput': 7913.07}
[INFO|2025-02-11 00:35:36] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 3.7844e-05, 'epoch': 1.73, 'throughput': 7912.61}
[INFO|2025-02-11 00:36:01] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 3.7395e-05, 'epoch': 1.74, 'throughput': 7914.15}
[INFO|2025-02-11 00:36:33] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 3.6947e-05, 'epoch': 1.75, 'throughput': 7908.90}
[INFO|2025-02-11 00:36:56] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.6500e-05, 'epoch': 1.76, 'throughput': 7909.88}
[INFO|2025-02-11 00:37:27] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 3.6055e-05, 'epoch': 1.77, 'throughput': 7904.23}
[INFO|2025-02-11 00:37:27] trainer.py:3910 >> Saving model checkpoint to saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-200
[INFO|2025-02-11 00:37:28] configuration_utils.py:694 >> loading configuration file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/config.json
[INFO|2025-02-11 00:37:28] configuration_utils.py:768 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.48.2",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 152064
}
[INFO|2025-02-11 00:37:28] tokenization_utils_base.py:2491 >> tokenizer config file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-200/tokenizer_config.json
[INFO|2025-02-11 00:37:28] tokenization_utils_base.py:2500 >> Special tokens file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-200/special_tokens_map.json
[INFO|2025-02-11 00:37:58] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 3.5611e-05, 'epoch': 1.78, 'throughput': 7898.91}
[INFO|2025-02-11 00:38:25] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 3.5167e-05, 'epoch': 1.79, 'throughput': 7897.44}
[INFO|2025-02-11 00:38:53] logging.py:157 >> {'loss': 0.0013, 'learning_rate': 3.4726e-05, 'epoch': 1.79, 'throughput': 7894.63}
[INFO|2025-02-11 00:39:20] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.4285e-05, 'epoch': 1.80, 'throughput': 7893.32}
[INFO|2025-02-11 00:39:45] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 3.3846e-05, 'epoch': 1.81, 'throughput': 7893.48}
[INFO|2025-02-11 00:40:10] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 3.3408e-05, 'epoch': 1.82, 'throughput': 7897.32}
[INFO|2025-02-11 00:40:34] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.2972e-05, 'epoch': 1.83, 'throughput': 7898.07}
[INFO|2025-02-11 00:41:01] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 3.2537e-05, 'epoch': 1.84, 'throughput': 7898.41}
[INFO|2025-02-11 00:41:27] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.2103e-05, 'epoch': 1.85, 'throughput': 7898.41}
[INFO|2025-02-11 00:41:56] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 3.1671e-05, 'epoch': 1.86, 'throughput': 7897.79}
[INFO|2025-02-11 00:42:22] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.1241e-05, 'epoch': 1.86, 'throughput': 7896.37}
[INFO|2025-02-11 00:42:45] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 3.0812e-05, 'epoch': 1.87, 'throughput': 7897.48}
[INFO|2025-02-11 00:43:13] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 3.0385e-05, 'epoch': 1.88, 'throughput': 7898.39}
[INFO|2025-02-11 00:43:39] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.9960e-05, 'epoch': 1.89, 'throughput': 7897.60}
[INFO|2025-02-11 00:44:03] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.9536e-05, 'epoch': 1.90, 'throughput': 7898.54}
[INFO|2025-02-11 00:44:28] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.9114e-05, 'epoch': 1.91, 'throughput': 7900.99}
[INFO|2025-02-11 00:44:55] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.8694e-05, 'epoch': 1.92, 'throughput': 7899.67}
[INFO|2025-02-11 00:45:23] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.8276e-05, 'epoch': 1.93, 'throughput': 7898.82}
[INFO|2025-02-11 00:45:50] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.7860e-05, 'epoch': 1.94, 'throughput': 7898.40}
[INFO|2025-02-11 00:46:19] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.7445e-05, 'epoch': 1.94, 'throughput': 7895.44}
[INFO|2025-02-11 00:46:46] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 2.7033e-05, 'epoch': 1.95, 'throughput': 7892.18}
[INFO|2025-02-11 00:47:13] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.6622e-05, 'epoch': 1.96, 'throughput': 7890.65}
[INFO|2025-02-11 00:47:36] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.6213e-05, 'epoch': 1.97, 'throughput': 7892.67}
[INFO|2025-02-11 00:48:03] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.5807e-05, 'epoch': 1.98, 'throughput': 7892.10}
[INFO|2025-02-11 00:48:31] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.5402e-05, 'epoch': 1.99, 'throughput': 7891.47}
[INFO|2025-02-11 00:48:57] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 2.5000e-05, 'epoch': 2.00, 'throughput': 7891.65}
[INFO|2025-02-11 00:49:32] logging.py:157 >> {'loss': 0.0011, 'learning_rate': 2.4600e-05, 'epoch': 2.01, 'throughput': 7890.98}
[INFO|2025-02-11 00:49:59] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.4202e-05, 'epoch': 2.02, 'throughput': 7888.59}
[INFO|2025-02-11 00:50:25] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.3806e-05, 'epoch': 2.03, 'throughput': 7889.79}
[INFO|2025-02-11 00:50:54] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 2.3412e-05, 'epoch': 2.04, 'throughput': 7887.38}
[INFO|2025-02-11 00:51:17] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.3021e-05, 'epoch': 2.04, 'throughput': 7889.45}
[INFO|2025-02-11 00:51:46] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.2632e-05, 'epoch': 2.05, 'throughput': 7887.07}
[INFO|2025-02-11 00:52:11] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.2246e-05, 'epoch': 2.06, 'throughput': 7889.59}
[INFO|2025-02-11 00:52:43] logging.py:157 >> {'loss': 0.0025, 'learning_rate': 2.1861e-05, 'epoch': 2.07, 'throughput': 7885.55}
[INFO|2025-02-11 00:53:13] logging.py:157 >> {'loss': 0.0013, 'learning_rate': 2.1480e-05, 'epoch': 2.08, 'throughput': 7881.47}
[INFO|2025-02-11 00:53:37] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.1100e-05, 'epoch': 2.09, 'throughput': 7884.07}
[INFO|2025-02-11 00:54:04] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.0723e-05, 'epoch': 2.10, 'throughput': 7884.69}
[INFO|2025-02-11 00:54:29] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.0349e-05, 'epoch': 2.11, 'throughput': 7886.69}
[INFO|2025-02-11 00:54:55] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.9977e-05, 'epoch': 2.11, 'throughput': 7886.90}
[INFO|2025-02-11 00:55:19] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.9608e-05, 'epoch': 2.12, 'throughput': 7888.14}
[INFO|2025-02-11 00:55:44] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.9241e-05, 'epoch': 2.13, 'throughput': 7889.13}
[INFO|2025-02-11 00:56:09] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.8877e-05, 'epoch': 2.14, 'throughput': 7888.30}
[INFO|2025-02-11 00:56:35] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.8516e-05, 'epoch': 2.15, 'throughput': 7887.77}
[INFO|2025-02-11 00:56:59] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.8157e-05, 'epoch': 2.16, 'throughput': 7889.80}
[INFO|2025-02-11 00:57:28] logging.py:157 >> {'loss': 0.0006, 'learning_rate': 1.7801e-05, 'epoch': 2.17, 'throughput': 7887.66}
[INFO|2025-02-11 00:57:55] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.7448e-05, 'epoch': 2.18, 'throughput': 7886.91}
[INFO|2025-02-11 00:58:18] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.7098e-05, 'epoch': 2.19, 'throughput': 7889.08}
[INFO|2025-02-11 00:58:42] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.6751e-05, 'epoch': 2.19, 'throughput': 7890.61}
[INFO|2025-02-11 00:59:08] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.6406e-05, 'epoch': 2.20, 'throughput': 7889.22}
[INFO|2025-02-11 00:59:33] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.6064e-05, 'epoch': 2.21, 'throughput': 7890.88}
[INFO|2025-02-11 00:59:58] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.5725e-05, 'epoch': 2.22, 'throughput': 7890.72}
[INFO|2025-02-11 01:00:24] logging.py:157 >> {'loss': 0.0012, 'learning_rate': 1.5389e-05, 'epoch': 2.23, 'throughput': 7890.97}
[INFO|2025-02-11 01:00:47] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.5057e-05, 'epoch': 2.24, 'throughput': 7892.37}
[INFO|2025-02-11 01:01:11] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 1.4727e-05, 'epoch': 2.25, 'throughput': 7894.16}
[INFO|2025-02-11 01:01:34] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.4400e-05, 'epoch': 2.26, 'throughput': 7894.39}
[INFO|2025-02-11 01:02:01] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.4076e-05, 'epoch': 2.26, 'throughput': 7895.04}
[INFO|2025-02-11 01:02:29] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.3755e-05, 'epoch': 2.27, 'throughput': 7894.62}
[INFO|2025-02-11 01:02:57] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.3438e-05, 'epoch': 2.28, 'throughput': 7893.02}
[INFO|2025-02-11 01:03:28] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.3123e-05, 'epoch': 2.29, 'throughput': 7891.68}
[INFO|2025-02-11 01:03:52] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.2812e-05, 'epoch': 2.30, 'throughput': 7892.09}
[INFO|2025-02-11 01:04:18] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.2504e-05, 'epoch': 2.31, 'throughput': 7894.29}
[INFO|2025-02-11 01:04:46] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.2199e-05, 'epoch': 2.32, 'throughput': 7893.79}
[INFO|2025-02-11 01:05:13] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.1897e-05, 'epoch': 2.33, 'throughput': 7894.54}
[INFO|2025-02-11 01:05:37] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.1599e-05, 'epoch': 2.34, 'throughput': 7894.72}
[INFO|2025-02-11 01:06:03] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.1304e-05, 'epoch': 2.34, 'throughput': 7893.77}
[INFO|2025-02-11 01:06:29] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.1012e-05, 'epoch': 2.35, 'throughput': 7895.00}
[INFO|2025-02-11 01:06:55] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.0723e-05, 'epoch': 2.36, 'throughput': 7895.05}
[INFO|2025-02-11 01:07:22] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.0438e-05, 'epoch': 2.37, 'throughput': 7895.15}
[INFO|2025-02-11 01:07:48] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.0157e-05, 'epoch': 2.38, 'throughput': 7895.90}
[INFO|2025-02-11 01:08:14] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 9.8785e-06, 'epoch': 2.39, 'throughput': 7897.00}
[INFO|2025-02-11 01:08:43] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 9.6037e-06, 'epoch': 2.40, 'throughput': 7893.16}
[INFO|2025-02-11 01:09:10] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 9.3324e-06, 'epoch': 2.41, 'throughput': 7891.24}
[INFO|2025-02-11 01:09:34] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 9.0646e-06, 'epoch': 2.41, 'throughput': 7893.30}
[INFO|2025-02-11 01:10:00] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 8.8003e-06, 'epoch': 2.42, 'throughput': 7893.02}
[INFO|2025-02-11 01:10:25] logging.py:157 >> {'loss': 0.0000, 'learning_rate': 8.5395e-06, 'epoch': 2.43, 'throughput': 7892.52}
[INFO|2025-02-11 01:10:51] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 8.2823e-06, 'epoch': 2.44, 'throughput': 7892.63}
[INFO|2025-02-11 01:11:17] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 8.0287e-06, 'epoch': 2.45, 'throughput': 7892.41}
[INFO|2025-02-11 01:11:45] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 7.7786e-06, 'epoch': 2.46, 'throughput': 7889.96}
[INFO|2025-02-11 01:12:10] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 7.5322e-06, 'epoch': 2.47, 'throughput': 7892.18}
[INFO|2025-02-11 01:12:34] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 7.2895e-06, 'epoch': 2.48, 'throughput': 7893.12}
[INFO|2025-02-11 01:13:03] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 7.0504e-06, 'epoch': 2.49, 'throughput': 7891.37}
[INFO|2025-02-11 01:13:30] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 6.8150e-06, 'epoch': 2.49, 'throughput': 7891.23}
[INFO|2025-02-11 01:13:58] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 6.5834e-06, 'epoch': 2.50, 'throughput': 7890.21}
[INFO|2025-02-11 01:14:25] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 6.3554e-06, 'epoch': 2.51, 'throughput': 7889.96}
[INFO|2025-02-11 01:14:53] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 6.1312e-06, 'epoch': 2.52, 'throughput': 7889.86}
[INFO|2025-02-11 01:15:20] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 5.9108e-06, 'epoch': 2.53, 'throughput': 7889.91}
[INFO|2025-02-11 01:15:44] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 5.6941e-06, 'epoch': 2.54, 'throughput': 7891.05}
[INFO|2025-02-11 01:16:10] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 5.4813e-06, 'epoch': 2.55, 'throughput': 7890.97}
[INFO|2025-02-11 01:16:34] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 5.2723e-06, 'epoch': 2.56, 'throughput': 7891.95}
[INFO|2025-02-11 01:16:58] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 5.0671e-06, 'epoch': 2.56, 'throughput': 7893.32}
[INFO|2025-02-11 01:17:24] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.8657e-06, 'epoch': 2.57, 'throughput': 7893.98}
[INFO|2025-02-11 01:17:51] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 4.6683e-06, 'epoch': 2.58, 'throughput': 7892.90}
[INFO|2025-02-11 01:18:18] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 4.4748e-06, 'epoch': 2.59, 'throughput': 7892.36}
[INFO|2025-02-11 01:18:46] logging.py:157 >> {'loss': 0.0010, 'learning_rate': 4.2851e-06, 'epoch': 2.60, 'throughput': 7891.40}
[INFO|2025-02-11 01:19:11] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 4.0994e-06, 'epoch': 2.61, 'throughput': 7891.16}
[INFO|2025-02-11 01:19:35] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.9176e-06, 'epoch': 2.62, 'throughput': 7892.02}
[INFO|2025-02-11 01:19:59] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.7398e-06, 'epoch': 2.63, 'throughput': 7892.85}
[INFO|2025-02-11 01:20:26] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.5660e-06, 'epoch': 2.64, 'throughput': 7892.72}
[INFO|2025-02-11 01:20:52] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 3.3961e-06, 'epoch': 2.64, 'throughput': 7893.04}
[INFO|2025-02-11 01:21:17] logging.py:157 >> {'loss': 0.0000, 'learning_rate': 3.2303e-06, 'epoch': 2.65, 'throughput': 7892.55}
[INFO|2025-02-11 01:21:17] trainer.py:3910 >> Saving model checkpoint to saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-300
[INFO|2025-02-11 01:21:17] configuration_utils.py:694 >> loading configuration file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/config.json
[INFO|2025-02-11 01:21:17] configuration_utils.py:768 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.48.2",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 152064
}
[INFO|2025-02-11 01:21:17] tokenization_utils_base.py:2491 >> tokenizer config file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-300/tokenizer_config.json
[INFO|2025-02-11 01:21:17] tokenization_utils_base.py:2500 >> Special tokens file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-300/special_tokens_map.json
[INFO|2025-02-11 01:21:42] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.0684e-06, 'epoch': 2.66, 'throughput': 7893.70}
[INFO|2025-02-11 01:22:07] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.9106e-06, 'epoch': 2.67, 'throughput': 7893.69}
[INFO|2025-02-11 01:22:31] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.7569e-06, 'epoch': 2.68, 'throughput': 7893.79}
[INFO|2025-02-11 01:22:58] logging.py:157 >> {'loss': 0.0004, 'learning_rate': 2.6071e-06, 'epoch': 2.69, 'throughput': 7893.85}
[INFO|2025-02-11 01:23:27] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.4615e-06, 'epoch': 2.70, 'throughput': 7891.77}
[INFO|2025-02-11 01:23:51] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.3200e-06, 'epoch': 2.71, 'throughput': 7892.22}
[INFO|2025-02-11 01:24:18] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.1825e-06, 'epoch': 2.71, 'throughput': 7891.29}
[INFO|2025-02-11 01:24:43] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.0492e-06, 'epoch': 2.72, 'throughput': 7892.66}
[INFO|2025-02-11 01:25:08] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.9199e-06, 'epoch': 2.73, 'throughput': 7892.67}
[INFO|2025-02-11 01:25:34] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.7948e-06, 'epoch': 2.74, 'throughput': 7892.91}
[INFO|2025-02-11 01:26:01] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.6739e-06, 'epoch': 2.75, 'throughput': 7892.87}
[INFO|2025-02-11 01:26:28] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.5570e-06, 'epoch': 2.76, 'throughput': 7892.66}
[INFO|2025-02-11 01:26:55] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.4444e-06, 'epoch': 2.77, 'throughput': 7894.09}
[INFO|2025-02-11 01:27:20] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.3359e-06, 'epoch': 2.78, 'throughput': 7894.56}
[INFO|2025-02-11 01:27:46] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 1.2316e-06, 'epoch': 2.79, 'throughput': 7893.92}
[INFO|2025-02-11 01:28:12] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.1315e-06, 'epoch': 2.79, 'throughput': 7894.65}
[INFO|2025-02-11 01:28:38] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.0356e-06, 'epoch': 2.80, 'throughput': 7893.46}
[INFO|2025-02-11 01:29:02] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 9.4386e-07, 'epoch': 2.81, 'throughput': 7894.99}
[INFO|2025-02-11 01:29:31] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 8.5636e-07, 'epoch': 2.82, 'throughput': 7894.00}
[INFO|2025-02-11 01:29:57] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 7.7308e-07, 'epoch': 2.83, 'throughput': 7895.06}
[INFO|2025-02-11 01:30:24] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 6.9403e-07, 'epoch': 2.84, 'throughput': 7894.58}
[INFO|2025-02-11 01:30:53] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 6.1921e-07, 'epoch': 2.85, 'throughput': 7893.93}
[INFO|2025-02-11 01:31:19] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 5.4864e-07, 'epoch': 2.86, 'throughput': 7893.63}
[INFO|2025-02-11 01:31:45] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 4.8231e-07, 'epoch': 2.86, 'throughput': 7893.28}
[INFO|2025-02-11 01:32:10] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 4.2023e-07, 'epoch': 2.87, 'throughput': 7894.28}
[INFO|2025-02-11 01:32:33] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.6241e-07, 'epoch': 2.88, 'throughput': 7895.04}
[INFO|2025-02-11 01:32:57] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.0886e-07, 'epoch': 2.89, 'throughput': 7896.03}
[INFO|2025-02-11 01:33:21] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.5957e-07, 'epoch': 2.90, 'throughput': 7896.57}
[INFO|2025-02-11 01:33:45] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 2.1455e-07, 'epoch': 2.91, 'throughput': 7897.70}
[INFO|2025-02-11 01:34:10] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.7381e-07, 'epoch': 2.92, 'throughput': 7898.05}
[INFO|2025-02-11 01:34:36] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 1.3735e-07, 'epoch': 2.93, 'throughput': 7898.34}
[INFO|2025-02-11 01:35:01] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 1.0517e-07, 'epoch': 2.94, 'throughput': 7898.84}
[INFO|2025-02-11 01:35:26] logging.py:157 >> {'loss': 0.0002, 'learning_rate': 7.7274e-08, 'epoch': 2.94, 'throughput': 7900.67}
[INFO|2025-02-11 01:35:51] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 5.3666e-08, 'epoch': 2.95, 'throughput': 7901.22}
[INFO|2025-02-11 01:36:19] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 3.4349e-08, 'epoch': 2.96, 'throughput': 7899.73}
[INFO|2025-02-11 01:36:44] logging.py:157 >> {'loss': 0.0003, 'learning_rate': 1.9322e-08, 'epoch': 2.97, 'throughput': 7900.03}
[INFO|2025-02-11 01:37:08] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 8.5879e-09, 'epoch': 2.98, 'throughput': 7902.03}
[INFO|2025-02-11 01:37:35] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 2.1470e-09, 'epoch': 2.99, 'throughput': 7903.27}
[INFO|2025-02-11 01:38:00] logging.py:157 >> {'loss': 0.0001, 'learning_rate': 0.0000e+00, 'epoch': 3.00, 'throughput': 7902.86}
[INFO|2025-02-11 01:38:00] trainer.py:3910 >> Saving model checkpoint to saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-339
[INFO|2025-02-11 01:38:00] configuration_utils.py:694 >> loading configuration file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/config.json
[INFO|2025-02-11 01:38:00] configuration_utils.py:768 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.48.2",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 152064
}
[INFO|2025-02-11 01:38:00] tokenization_utils_base.py:2491 >> tokenizer config file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-339/tokenizer_config.json
[INFO|2025-02-11 01:38:00] tokenization_utils_base.py:2500 >> Special tokens file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/checkpoint-339/special_tokens_map.json
[INFO|2025-02-11 01:38:01] trainer.py:2643 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
[INFO|2025-02-11 01:38:01] trainer.py:3910 >> Saving model checkpoint to saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128
[INFO|2025-02-11 01:38:01] configuration_utils.py:694 >> loading configuration file /nas/shared/ma4agi/model/Qwen2.5-7B-Instruct/config.json
[INFO|2025-02-11 01:38:01] configuration_utils.py:768 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.48.2",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 152064
}
[INFO|2025-02-11 01:38:01] tokenization_utils_base.py:2491 >> tokenizer config file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/tokenizer_config.json
[INFO|2025-02-11 01:38:01] tokenization_utils_base.py:2500 >> Special tokens file saved in saves/Qwen2.5-7B-Instruct/lora/sft-qwen2.5-7b-instruct-graph-planning-bs128/special_tokens_map.json
[WARNING|2025-02-11 01:38:02] logging.py:162 >> No metric eval_loss to plot.
[WARNING|2025-02-11 01:38:02] logging.py:162 >> No metric eval_accuracy to plot.
[INFO|2025-02-11 01:38:02] modelcard.py:449 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}