Upload 5 files

Browse files

Files changed (5) hide show

CRM/sft-qwen2.5-math-prm-7b-score-model-simple-bs128/README.md +202 -0
CRM/sft-qwen2.5-math-prm-7b-score-model-simple-bs128/adapter_config.json +31 -0
CRM/sft-qwen2.5-math-prm-7b-score-model-simple-bs128/adapter_model.safetensors +3 -0
CRM/sft-qwen2.5-math-prm-7b-score-model-simple-bs128/trainer_state.json +3084 -0
CRM/sft-qwen2.5-math-prm-7b-score-model-simple-bs128/training_args.bin +3 -0

CRM/sft-qwen2.5-math-prm-7b-score-model-simple-bs128/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: /home/xuexiangyuan/workspace/score-model/model
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.12.0

CRM/sft-qwen2.5-math-prm-7b-score-model-simple-bs128/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/home/xuexiangyuan/workspace/score-model/model",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32.0,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj",
+    "score.2",
+    "score.0"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

CRM/sft-qwen2.5-math-prm-7b-score-model-simple-bs128/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e16e930b3fc6cc8c77aa418990335b7783354ce64ab4045c02b17578d007dbef
+size 10451840

CRM/sft-qwen2.5-math-prm-7b-score-model-simple-bs128/trainer_state.json ADDED Viewed

	@@ -0,0 +1,3084 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 9.768,
+  "eval_steps": 10,
+  "global_step": 390,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0256,
+      "grad_norm": 0.6148048043251038,
+      "learning_rate": 9.974358974358975e-05,
+      "loss": 0.645,
+      "step": 1
+    },
+    {
+      "epoch": 0.0512,
+      "grad_norm": 1.038170337677002,
+      "learning_rate": 9.948717948717949e-05,
+      "loss": 0.6743,
+      "step": 2
+    },
+    {
+      "epoch": 0.0768,
+      "grad_norm": 0.6856626868247986,
+      "learning_rate": 9.923076923076923e-05,
+      "loss": 0.6746,
+      "step": 3
+    },
+    {
+      "epoch": 0.1024,
+      "grad_norm": 1.1796700954437256,
+      "learning_rate": 9.897435897435898e-05,
+      "loss": 0.6025,
+      "step": 4
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 0.8707571625709534,
+      "learning_rate": 9.871794871794872e-05,
+      "loss": 0.6289,
+      "step": 5
+    },
+    {
+      "epoch": 0.1536,
+      "grad_norm": 2.7378222942352295,
+      "learning_rate": 9.846153846153848e-05,
+      "loss": 0.6433,
+      "step": 6
+    },
+    {
+      "epoch": 0.1792,
+      "grad_norm": 4.110616683959961,
+      "learning_rate": 9.820512820512821e-05,
+      "loss": 0.6826,
+      "step": 7
+    },
+    {
+      "epoch": 0.2048,
+      "grad_norm": 2.4638514518737793,
+      "learning_rate": 9.794871794871795e-05,
+      "loss": 0.6047,
+      "step": 8
+    },
+    {
+      "epoch": 0.2304,
+      "grad_norm": 2.8323960304260254,
+      "learning_rate": 9.76923076923077e-05,
+      "loss": 0.5871,
+      "step": 9
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 2.379040479660034,
+      "learning_rate": 9.743589743589744e-05,
+      "loss": 0.5875,
+      "step": 10
+    },
+    {
+      "epoch": 0.256,
+      "eval_loss": 0.5159781575202942,
+      "eval_runtime": 46.1024,
+      "eval_samples_per_second": 9.783,
+      "eval_steps_per_second": 0.174,
+      "step": 10
+    },
+    {
+      "epoch": 0.2816,
+      "grad_norm": 1.054911732673645,
+      "learning_rate": 9.717948717948718e-05,
+      "loss": 0.531,
+      "step": 11
+    },
+    {
+      "epoch": 0.3072,
+      "grad_norm": 1.300252079963684,
+      "learning_rate": 9.692307692307692e-05,
+      "loss": 0.5179,
+      "step": 12
+    },
+    {
+      "epoch": 0.3328,
+      "grad_norm": 2.310645341873169,
+      "learning_rate": 9.666666666666667e-05,
+      "loss": 0.5082,
+      "step": 13
+    },
+    {
+      "epoch": 0.3584,
+      "grad_norm": 1.4785258769989014,
+      "learning_rate": 9.641025641025641e-05,
+      "loss": 0.465,
+      "step": 14
+    },
+    {
+      "epoch": 0.384,
+      "grad_norm": 1.6794428825378418,
+      "learning_rate": 9.615384615384617e-05,
+      "loss": 0.4115,
+      "step": 15
+    },
+    {
+      "epoch": 0.4096,
+      "grad_norm": 1.2971839904785156,
+      "learning_rate": 9.589743589743591e-05,
+      "loss": 0.4262,
+      "step": 16
+    },
+    {
+      "epoch": 0.4352,
+      "grad_norm": 1.181352972984314,
+      "learning_rate": 9.564102564102565e-05,
+      "loss": 0.43,
+      "step": 17
+    },
+    {
+      "epoch": 0.4608,
+      "grad_norm": 1.1496334075927734,
+      "learning_rate": 9.53846153846154e-05,
+      "loss": 0.3859,
+      "step": 18
+    },
+    {
+      "epoch": 0.4864,
+      "grad_norm": 2.361189842224121,
+      "learning_rate": 9.512820512820513e-05,
+      "loss": 0.4587,
+      "step": 19
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 2.7645130157470703,
+      "learning_rate": 9.487179487179487e-05,
+      "loss": 0.3971,
+      "step": 20
+    },
+    {
+      "epoch": 0.512,
+      "eval_loss": 0.30324116349220276,
+      "eval_runtime": 46.0919,
+      "eval_samples_per_second": 9.785,
+      "eval_steps_per_second": 0.174,
+      "step": 20
+    },
+    {
+      "epoch": 0.5376,
+      "grad_norm": 1.1214560270309448,
+      "learning_rate": 9.461538461538461e-05,
+      "loss": 0.3439,
+      "step": 21
+    },
+    {
+      "epoch": 0.5632,
+      "grad_norm": 1.79205322265625,
+      "learning_rate": 9.435897435897436e-05,
+      "loss": 0.3618,
+      "step": 22
+    },
+    {
+      "epoch": 0.5888,
+      "grad_norm": 1.568602442741394,
+      "learning_rate": 9.41025641025641e-05,
+      "loss": 0.4483,
+      "step": 23
+    },
+    {
+      "epoch": 0.6144,
+      "grad_norm": 1.2847480773925781,
+      "learning_rate": 9.384615384615386e-05,
+      "loss": 0.3828,
+      "step": 24
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 1.417415976524353,
+      "learning_rate": 9.35897435897436e-05,
+      "loss": 0.4042,
+      "step": 25
+    },
+    {
+      "epoch": 0.6656,
+      "grad_norm": 2.1015641689300537,
+      "learning_rate": 9.333333333333334e-05,
+      "loss": 0.3737,
+      "step": 26
+    },
+    {
+      "epoch": 0.6912,
+      "grad_norm": 2.055237293243408,
+      "learning_rate": 9.307692307692309e-05,
+      "loss": 0.3245,
+      "step": 27
+    },
+    {
+      "epoch": 0.7168,
+      "grad_norm": 1.224246859550476,
+      "learning_rate": 9.282051282051283e-05,
+      "loss": 0.3457,
+      "step": 28
+    },
+    {
+      "epoch": 0.7424,
+      "grad_norm": 1.686336636543274,
+      "learning_rate": 9.256410256410257e-05,
+      "loss": 0.2515,
+      "step": 29
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 0.8864126205444336,
+      "learning_rate": 9.230769230769232e-05,
+      "loss": 0.318,
+      "step": 30
+    },
+    {
+      "epoch": 0.768,
+      "eval_loss": 0.2457895278930664,
+      "eval_runtime": 46.2363,
+      "eval_samples_per_second": 9.754,
+      "eval_steps_per_second": 0.173,
+      "step": 30
+    },
+    {
+      "epoch": 0.7936,
+      "grad_norm": 1.4351671934127808,
+      "learning_rate": 9.205128205128205e-05,
+      "loss": 0.414,
+      "step": 31
+    },
+    {
+      "epoch": 0.8192,
+      "grad_norm": 1.2227267026901245,
+      "learning_rate": 9.179487179487179e-05,
+      "loss": 0.2987,
+      "step": 32
+    },
+    {
+      "epoch": 0.8448,
+      "grad_norm": 1.0370402336120605,
+      "learning_rate": 9.153846153846155e-05,
+      "loss": 0.3175,
+      "step": 33
+    },
+    {
+      "epoch": 0.8704,
+      "grad_norm": 1.1455687284469604,
+      "learning_rate": 9.128205128205129e-05,
+      "loss": 0.3664,
+      "step": 34
+    },
+    {
+      "epoch": 0.896,
+      "grad_norm": 0.978793740272522,
+      "learning_rate": 9.102564102564103e-05,
+      "loss": 0.3321,
+      "step": 35
+    },
+    {
+      "epoch": 0.9216,
+      "grad_norm": 1.6957523822784424,
+      "learning_rate": 9.076923076923078e-05,
+      "loss": 0.317,
+      "step": 36
+    },
+    {
+      "epoch": 0.9472,
+      "grad_norm": 1.6317747831344604,
+      "learning_rate": 9.051282051282052e-05,
+      "loss": 0.2248,
+      "step": 37
+    },
+    {
+      "epoch": 0.9728,
+      "grad_norm": 1.592618703842163,
+      "learning_rate": 9.025641025641026e-05,
+      "loss": 0.3547,
+      "step": 38
+    },
+    {
+      "epoch": 0.9984,
+      "grad_norm": 1.3694124221801758,
+      "learning_rate": 9e-05,
+      "loss": 0.3967,
+      "step": 39
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 0.802060067653656,
+      "learning_rate": 8.974358974358975e-05,
+      "loss": 0.0485,
+      "step": 40
+    },
+    {
+      "epoch": 1.0,
+      "eval_loss": 0.2322917878627777,
+      "eval_runtime": 46.2118,
+      "eval_samples_per_second": 9.759,
+      "eval_steps_per_second": 0.173,
+      "step": 40
+    },
+    {
+      "epoch": 1.0256,
+      "grad_norm": 1.097848892211914,
+      "learning_rate": 8.948717948717949e-05,
+      "loss": 0.2527,
+      "step": 41
+    },
+    {
+      "epoch": 1.0512,
+      "grad_norm": 2.0955655574798584,
+      "learning_rate": 8.923076923076924e-05,
+      "loss": 0.385,
+      "step": 42
+    },
+    {
+      "epoch": 1.0768,
+      "grad_norm": 1.8478257656097412,
+      "learning_rate": 8.897435897435898e-05,
+      "loss": 0.3122,
+      "step": 43
+    },
+    {
+      "epoch": 1.1024,
+      "grad_norm": 0.8632456660270691,
+      "learning_rate": 8.871794871794872e-05,
+      "loss": 0.2293,
+      "step": 44
+    },
+    {
+      "epoch": 1.1280000000000001,
+      "grad_norm": 1.1462959051132202,
+      "learning_rate": 8.846153846153847e-05,
+      "loss": 0.2703,
+      "step": 45
+    },
+    {
+      "epoch": 1.1536,
+      "grad_norm": 1.3070207834243774,
+      "learning_rate": 8.820512820512821e-05,
+      "loss": 0.3301,
+      "step": 46
+    },
+    {
+      "epoch": 1.1792,
+      "grad_norm": 1.0568207502365112,
+      "learning_rate": 8.794871794871795e-05,
+      "loss": 0.2951,
+      "step": 47
+    },
+    {
+      "epoch": 1.2048,
+      "grad_norm": 0.943413257598877,
+      "learning_rate": 8.76923076923077e-05,
+      "loss": 0.2531,
+      "step": 48
+    },
+    {
+      "epoch": 1.2304,
+      "grad_norm": 0.8083965182304382,
+      "learning_rate": 8.743589743589744e-05,
+      "loss": 0.2516,
+      "step": 49
+    },
+    {
+      "epoch": 1.256,
+      "grad_norm": 0.9122912883758545,
+      "learning_rate": 8.717948717948718e-05,
+      "loss": 0.2332,
+      "step": 50
+    },
+    {
+      "epoch": 1.256,
+      "eval_loss": 0.22658510506153107,
+      "eval_runtime": 46.128,
+      "eval_samples_per_second": 9.777,
+      "eval_steps_per_second": 0.173,
+      "step": 50
+    },
+    {
+      "epoch": 1.2816,
+      "grad_norm": 0.8947486877441406,
+      "learning_rate": 8.692307692307692e-05,
+      "loss": 0.2393,
+      "step": 51
+    },
+    {
+      "epoch": 1.3072,
+      "grad_norm": 1.822934627532959,
+      "learning_rate": 8.666666666666667e-05,
+      "loss": 0.3606,
+      "step": 52
+    },
+    {
+      "epoch": 1.3328,
+      "grad_norm": 1.1772205829620361,
+      "learning_rate": 8.641025641025642e-05,
+      "loss": 0.3321,
+      "step": 53
+    },
+    {
+      "epoch": 1.3584,
+      "grad_norm": 1.822982907295227,
+      "learning_rate": 8.615384615384617e-05,
+      "loss": 0.2437,
+      "step": 54
+    },
+    {
+      "epoch": 1.384,
+      "grad_norm": 2.0824837684631348,
+      "learning_rate": 8.58974358974359e-05,
+      "loss": 0.1932,
+      "step": 55
+    },
+    {
+      "epoch": 1.4096,
+      "grad_norm": 1.6939680576324463,
+      "learning_rate": 8.564102564102564e-05,
+      "loss": 0.3335,
+      "step": 56
+    },
+    {
+      "epoch": 1.4352,
+      "grad_norm": 1.8843656778335571,
+      "learning_rate": 8.538461538461538e-05,
+      "loss": 0.2614,
+      "step": 57
+    },
+    {
+      "epoch": 1.4607999999999999,
+      "grad_norm": 1.1397833824157715,
+      "learning_rate": 8.512820512820513e-05,
+      "loss": 0.2694,
+      "step": 58
+    },
+    {
+      "epoch": 1.4864,
+      "grad_norm": 1.1693910360336304,
+      "learning_rate": 8.487179487179487e-05,
+      "loss": 0.2141,
+      "step": 59
+    },
+    {
+      "epoch": 1.512,
+      "grad_norm": 1.0279395580291748,
+      "learning_rate": 8.461538461538461e-05,
+      "loss": 0.2065,
+      "step": 60
+    },
+    {
+      "epoch": 1.512,
+      "eval_loss": 0.20384371280670166,
+      "eval_runtime": 46.065,
+      "eval_samples_per_second": 9.791,
+      "eval_steps_per_second": 0.174,
+      "step": 60
+    },
+    {
+      "epoch": 1.5375999999999999,
+      "grad_norm": 0.9246665835380554,
+      "learning_rate": 8.435897435897436e-05,
+      "loss": 0.2704,
+      "step": 61
+    },
+    {
+      "epoch": 1.5632000000000001,
+      "grad_norm": 1.0461363792419434,
+      "learning_rate": 8.410256410256411e-05,
+      "loss": 0.1973,
+      "step": 62
+    },
+    {
+      "epoch": 1.5888,
+      "grad_norm": 2.1885361671447754,
+      "learning_rate": 8.384615384615386e-05,
+      "loss": 0.2158,
+      "step": 63
+    },
+    {
+      "epoch": 1.6143999999999998,
+      "grad_norm": 1.0617963075637817,
+      "learning_rate": 8.35897435897436e-05,
+      "loss": 0.2576,
+      "step": 64
+    },
+    {
+      "epoch": 1.6400000000000001,
+      "grad_norm": 1.603236198425293,
+      "learning_rate": 8.333333333333334e-05,
+      "loss": 0.2539,
+      "step": 65
+    },
+    {
+      "epoch": 1.6656,
+      "grad_norm": 1.3767043352127075,
+      "learning_rate": 8.307692307692309e-05,
+      "loss": 0.297,
+      "step": 66
+    },
+    {
+      "epoch": 1.6912,
+      "grad_norm": 1.6388436555862427,
+      "learning_rate": 8.282051282051283e-05,
+      "loss": 0.2893,
+      "step": 67
+    },
+    {
+      "epoch": 1.7168,
+      "grad_norm": 1.2009103298187256,
+      "learning_rate": 8.256410256410256e-05,
+      "loss": 0.2061,
+      "step": 68
+    },
+    {
+      "epoch": 1.7424,
+      "grad_norm": 1.3011361360549927,
+      "learning_rate": 8.23076923076923e-05,
+      "loss": 0.1979,
+      "step": 69
+    },
+    {
+      "epoch": 1.768,
+      "grad_norm": 1.444062352180481,
+      "learning_rate": 8.205128205128205e-05,
+      "loss": 0.2701,
+      "step": 70
+    },
+    {
+      "epoch": 1.768,
+      "eval_loss": 0.19174103438854218,
+      "eval_runtime": 46.1067,
+      "eval_samples_per_second": 9.782,
+      "eval_steps_per_second": 0.174,
+      "step": 70
+    },
+    {
+      "epoch": 1.7936,
+      "grad_norm": 1.275272250175476,
+      "learning_rate": 8.179487179487179e-05,
+      "loss": 0.2422,
+      "step": 71
+    },
+    {
+      "epoch": 1.8192,
+      "grad_norm": 1.6378053426742554,
+      "learning_rate": 8.153846153846155e-05,
+      "loss": 0.2733,
+      "step": 72
+    },
+    {
+      "epoch": 1.8448,
+      "grad_norm": 1.6403453350067139,
+      "learning_rate": 8.128205128205129e-05,
+      "loss": 0.3505,
+      "step": 73
+    },
+    {
+      "epoch": 1.8704,
+      "grad_norm": 1.606053113937378,
+      "learning_rate": 8.102564102564103e-05,
+      "loss": 0.2992,
+      "step": 74
+    },
+    {
+      "epoch": 1.896,
+      "grad_norm": 1.493260383605957,
+      "learning_rate": 8.076923076923078e-05,
+      "loss": 0.3058,
+      "step": 75
+    },
+    {
+      "epoch": 1.9216,
+      "grad_norm": 1.1610751152038574,
+      "learning_rate": 8.051282051282052e-05,
+      "loss": 0.2328,
+      "step": 76
+    },
+    {
+      "epoch": 1.9472,
+      "grad_norm": 1.2894799709320068,
+      "learning_rate": 8.025641025641026e-05,
+      "loss": 0.3086,
+      "step": 77
+    },
+    {
+      "epoch": 1.9727999999999999,
+      "grad_norm": 1.655768871307373,
+      "learning_rate": 8e-05,
+      "loss": 0.2247,
+      "step": 78
+    },
+    {
+      "epoch": 1.9984,
+      "grad_norm": 1.2705315351486206,
+      "learning_rate": 7.974358974358975e-05,
+      "loss": 0.2229,
+      "step": 79
+    },
+    {
+      "epoch": 2.0,
+      "grad_norm": 0.3245500326156616,
+      "learning_rate": 7.948717948717948e-05,
+      "loss": 0.0082,
+      "step": 80
+    },
+    {
+      "epoch": 2.0,
+      "eval_loss": 0.18327006697654724,
+      "eval_runtime": 46.2169,
+      "eval_samples_per_second": 9.758,
+      "eval_steps_per_second": 0.173,
+      "step": 80
+    },
+    {
+      "epoch": 2.0256,
+      "grad_norm": 1.2532856464385986,
+      "learning_rate": 7.923076923076924e-05,
+      "loss": 0.1692,
+      "step": 81
+    },
+    {
+      "epoch": 2.0512,
+      "grad_norm": 1.3299051523208618,
+      "learning_rate": 7.897435897435898e-05,
+      "loss": 0.2139,
+      "step": 82
+    },
+    {
+      "epoch": 2.0768,
+      "grad_norm": 1.3725135326385498,
+      "learning_rate": 7.871794871794872e-05,
+      "loss": 0.2156,
+      "step": 83
+    },
+    {
+      "epoch": 2.1024,
+      "grad_norm": 1.0960525274276733,
+      "learning_rate": 7.846153846153847e-05,
+      "loss": 0.2208,
+      "step": 84
+    },
+    {
+      "epoch": 2.128,
+      "grad_norm": 1.8501545190811157,
+      "learning_rate": 7.820512820512821e-05,
+      "loss": 0.1551,
+      "step": 85
+    },
+    {
+      "epoch": 2.1536,
+      "grad_norm": 1.0871448516845703,
+      "learning_rate": 7.794871794871795e-05,
+      "loss": 0.1282,
+      "step": 86
+    },
+    {
+      "epoch": 2.1792,
+      "grad_norm": 1.4489119052886963,
+      "learning_rate": 7.76923076923077e-05,
+      "loss": 0.2375,
+      "step": 87
+    },
+    {
+      "epoch": 2.2048,
+      "grad_norm": 1.5719850063323975,
+      "learning_rate": 7.743589743589744e-05,
+      "loss": 0.2503,
+      "step": 88
+    },
+    {
+      "epoch": 2.2304,
+      "grad_norm": 2.2236907482147217,
+      "learning_rate": 7.717948717948718e-05,
+      "loss": 0.2384,
+      "step": 89
+    },
+    {
+      "epoch": 2.2560000000000002,
+      "grad_norm": 1.2721396684646606,
+      "learning_rate": 7.692307692307693e-05,
+      "loss": 0.1752,
+      "step": 90
+    },
+    {
+      "epoch": 2.2560000000000002,
+      "eval_loss": 0.19305120408535004,
+      "eval_runtime": 46.1652,
+      "eval_samples_per_second": 9.769,
+      "eval_steps_per_second": 0.173,
+      "step": 90
+    },
+    {
+      "epoch": 2.2816,
+      "grad_norm": 0.9860581159591675,
+      "learning_rate": 7.666666666666667e-05,
+      "loss": 0.138,
+      "step": 91
+    },
+    {
+      "epoch": 2.3072,
+      "grad_norm": 1.7323637008666992,
+      "learning_rate": 7.641025641025641e-05,
+      "loss": 0.144,
+      "step": 92
+    },
+    {
+      "epoch": 2.3327999999999998,
+      "grad_norm": 1.7598427534103394,
+      "learning_rate": 7.615384615384616e-05,
+      "loss": 0.1451,
+      "step": 93
+    },
+    {
+      "epoch": 2.3584,
+      "grad_norm": 1.6885058879852295,
+      "learning_rate": 7.58974358974359e-05,
+      "loss": 0.1625,
+      "step": 94
+    },
+    {
+      "epoch": 2.384,
+      "grad_norm": 1.3391759395599365,
+      "learning_rate": 7.564102564102564e-05,
+      "loss": 0.2039,
+      "step": 95
+    },
+    {
+      "epoch": 2.4096,
+      "grad_norm": 3.5374937057495117,
+      "learning_rate": 7.538461538461539e-05,
+      "loss": 0.2374,
+      "step": 96
+    },
+    {
+      "epoch": 2.4352,
+      "grad_norm": 1.5867712497711182,
+      "learning_rate": 7.512820512820513e-05,
+      "loss": 0.1915,
+      "step": 97
+    },
+    {
+      "epoch": 2.4608,
+      "grad_norm": 1.6900838613510132,
+      "learning_rate": 7.487179487179487e-05,
+      "loss": 0.1946,
+      "step": 98
+    },
+    {
+      "epoch": 2.4864,
+      "grad_norm": 1.116487741470337,
+      "learning_rate": 7.461538461538462e-05,
+      "loss": 0.1056,
+      "step": 99
+    },
+    {
+      "epoch": 2.512,
+      "grad_norm": 1.9830695390701294,
+      "learning_rate": 7.435897435897436e-05,
+      "loss": 0.2071,
+      "step": 100
+    },
+    {
+      "epoch": 2.512,
+      "eval_loss": 0.18597476184368134,
+      "eval_runtime": 46.0769,
+      "eval_samples_per_second": 9.788,
+      "eval_steps_per_second": 0.174,
+      "step": 100
+    },
+    {
+      "epoch": 2.5376,
+      "grad_norm": 1.5822725296020508,
+      "learning_rate": 7.410256410256412e-05,
+      "loss": 0.1704,
+      "step": 101
+    },
+    {
+      "epoch": 2.5632,
+      "grad_norm": 2.2399463653564453,
+      "learning_rate": 7.384615384615386e-05,
+      "loss": 0.118,
+      "step": 102
+    },
+    {
+      "epoch": 2.5888,
+      "grad_norm": 1.2649646997451782,
+      "learning_rate": 7.35897435897436e-05,
+      "loss": 0.1192,
+      "step": 103
+    },
+    {
+      "epoch": 2.6144,
+      "grad_norm": 3.1921226978302,
+      "learning_rate": 7.333333333333333e-05,
+      "loss": 0.186,
+      "step": 104
+    },
+    {
+      "epoch": 2.64,
+      "grad_norm": 2.00970196723938,
+      "learning_rate": 7.307692307692307e-05,
+      "loss": 0.1235,
+      "step": 105
+    },
+    {
+      "epoch": 2.6656,
+      "grad_norm": 3.3171231746673584,
+      "learning_rate": 7.282051282051282e-05,
+      "loss": 0.164,
+      "step": 106
+    },
+    {
+      "epoch": 2.6912000000000003,
+      "grad_norm": 3.0452377796173096,
+      "learning_rate": 7.256410256410256e-05,
+      "loss": 0.2077,
+      "step": 107
+    },
+    {
+      "epoch": 2.7168,
+      "grad_norm": 1.9057981967926025,
+      "learning_rate": 7.23076923076923e-05,
+      "loss": 0.2009,
+      "step": 108
+    },
+    {
+      "epoch": 2.7424,
+      "grad_norm": 1.4594613313674927,
+      "learning_rate": 7.205128205128205e-05,
+      "loss": 0.1402,
+      "step": 109
+    },
+    {
+      "epoch": 2.768,
+      "grad_norm": 4.038388252258301,
+      "learning_rate": 7.17948717948718e-05,
+      "loss": 0.2379,
+      "step": 110
+    },
+    {
+      "epoch": 2.768,
+      "eval_loss": 0.1910053789615631,
+      "eval_runtime": 46.105,
+      "eval_samples_per_second": 9.782,
+      "eval_steps_per_second": 0.174,
+      "step": 110
+    },
+    {
+      "epoch": 2.7936,
+      "grad_norm": 1.4920284748077393,
+      "learning_rate": 7.153846153846155e-05,
+      "loss": 0.1924,
+      "step": 111
+    },
+    {
+      "epoch": 2.8192,
+      "grad_norm": 2.1072158813476562,
+      "learning_rate": 7.128205128205129e-05,
+      "loss": 0.2131,
+      "step": 112
+    },
+    {
+      "epoch": 2.8448,
+      "grad_norm": 2.7636947631835938,
+      "learning_rate": 7.102564102564103e-05,
+      "loss": 0.1955,
+      "step": 113
+    },
+    {
+      "epoch": 2.8704,
+      "grad_norm": 2.5713424682617188,
+      "learning_rate": 7.076923076923078e-05,
+      "loss": 0.2524,
+      "step": 114
+    },
+    {
+      "epoch": 2.896,
+      "grad_norm": 1.9095423221588135,
+      "learning_rate": 7.051282051282052e-05,
+      "loss": 0.1939,
+      "step": 115
+    },
+    {
+      "epoch": 2.9215999999999998,
+      "grad_norm": 2.3303332328796387,
+      "learning_rate": 7.025641025641025e-05,
+      "loss": 0.1829,
+      "step": 116
+    },
+    {
+      "epoch": 2.9472,
+      "grad_norm": 1.6265172958374023,
+      "learning_rate": 7e-05,
+      "loss": 0.1818,
+      "step": 117
+    },
+    {
+      "epoch": 2.9728,
+      "grad_norm": 1.802085518836975,
+      "learning_rate": 6.974358974358974e-05,
+      "loss": 0.1759,
+      "step": 118
+    },
+    {
+      "epoch": 2.9984,
+      "grad_norm": 2.0247466564178467,
+      "learning_rate": 6.94871794871795e-05,
+      "loss": 0.1263,
+      "step": 119
+    },
+    {
+      "epoch": 3.0,
+      "grad_norm": 0.0567646324634552,
+      "learning_rate": 6.923076923076924e-05,
+      "loss": 0.0012,
+      "step": 120
+    },
+    {
+      "epoch": 3.0,
+      "eval_loss": 0.1671789288520813,
+      "eval_runtime": 46.0792,
+      "eval_samples_per_second": 9.788,
+      "eval_steps_per_second": 0.174,
+      "step": 120
+    },
+    {
+      "epoch": 3.0256,
+      "grad_norm": 1.1815694570541382,
+      "learning_rate": 6.897435897435898e-05,
+      "loss": 0.0875,
+      "step": 121
+    },
+    {
+      "epoch": 3.0512,
+      "grad_norm": 2.159958839416504,
+      "learning_rate": 6.871794871794872e-05,
+      "loss": 0.2338,
+      "step": 122
+    },
+    {
+      "epoch": 3.0768,
+      "grad_norm": 0.8782948851585388,
+      "learning_rate": 6.846153846153847e-05,
+      "loss": 0.0931,
+      "step": 123
+    },
+    {
+      "epoch": 3.1024,
+      "grad_norm": 1.0242409706115723,
+      "learning_rate": 6.820512820512821e-05,
+      "loss": 0.089,
+      "step": 124
+    },
+    {
+      "epoch": 3.128,
+      "grad_norm": 1.9822003841400146,
+      "learning_rate": 6.794871794871795e-05,
+      "loss": 0.2043,
+      "step": 125
+    },
+    {
+      "epoch": 3.1536,
+      "grad_norm": 1.6593323945999146,
+      "learning_rate": 6.76923076923077e-05,
+      "loss": 0.0855,
+      "step": 126
+    },
+    {
+      "epoch": 3.1792,
+      "grad_norm": 1.2175663709640503,
+      "learning_rate": 6.743589743589744e-05,
+      "loss": 0.0522,
+      "step": 127
+    },
+    {
+      "epoch": 3.2048,
+      "grad_norm": 1.3775852918624878,
+      "learning_rate": 6.717948717948718e-05,
+      "loss": 0.1653,
+      "step": 128
+    },
+    {
+      "epoch": 3.2304,
+      "grad_norm": 1.052992820739746,
+      "learning_rate": 6.692307692307693e-05,
+      "loss": 0.0829,
+      "step": 129
+    },
+    {
+      "epoch": 3.2560000000000002,
+      "grad_norm": 1.744211196899414,
+      "learning_rate": 6.666666666666667e-05,
+      "loss": 0.1709,
+      "step": 130
+    },
+    {
+      "epoch": 3.2560000000000002,
+      "eval_loss": 0.1594507247209549,
+      "eval_runtime": 46.0782,
+      "eval_samples_per_second": 9.788,
+      "eval_steps_per_second": 0.174,
+      "step": 130
+    },
+    {
+      "epoch": 3.2816,
+      "grad_norm": 3.3828606605529785,
+      "learning_rate": 6.641025641025641e-05,
+      "loss": 0.1371,
+      "step": 131
+    },
+    {
+      "epoch": 3.3072,
+      "grad_norm": 1.9074350595474243,
+      "learning_rate": 6.615384615384616e-05,
+      "loss": 0.1507,
+      "step": 132
+    },
+    {
+      "epoch": 3.3327999999999998,
+      "grad_norm": 2.2225406169891357,
+      "learning_rate": 6.58974358974359e-05,
+      "loss": 0.1531,
+      "step": 133
+    },
+    {
+      "epoch": 3.3584,
+      "grad_norm": 2.2324278354644775,
+      "learning_rate": 6.564102564102564e-05,
+      "loss": 0.1032,
+      "step": 134
+    },
+    {
+      "epoch": 3.384,
+      "grad_norm": 1.0784149169921875,
+      "learning_rate": 6.538461538461539e-05,
+      "loss": 0.1118,
+      "step": 135
+    },
+    {
+      "epoch": 3.4096,
+      "grad_norm": 1.7640796899795532,
+      "learning_rate": 6.512820512820513e-05,
+      "loss": 0.128,
+      "step": 136
+    },
+    {
+      "epoch": 3.4352,
+      "grad_norm": 1.2802343368530273,
+      "learning_rate": 6.487179487179487e-05,
+      "loss": 0.1023,
+      "step": 137
+    },
+    {
+      "epoch": 3.4608,
+      "grad_norm": 2.069298028945923,
+      "learning_rate": 6.461538461538462e-05,
+      "loss": 0.0983,
+      "step": 138
+    },
+    {
+      "epoch": 3.4864,
+      "grad_norm": 1.988155722618103,
+      "learning_rate": 6.435897435897437e-05,
+      "loss": 0.1016,
+      "step": 139
+    },
+    {
+      "epoch": 3.512,
+      "grad_norm": 2.035534620285034,
+      "learning_rate": 6.410256410256412e-05,
+      "loss": 0.1486,
+      "step": 140
+    },
+    {
+      "epoch": 3.512,
+      "eval_loss": 0.15938659012317657,
+      "eval_runtime": 46.1666,
+      "eval_samples_per_second": 9.769,
+      "eval_steps_per_second": 0.173,
+      "step": 140
+    },
+    {
+      "epoch": 3.5376,
+      "grad_norm": 1.5117779970169067,
+      "learning_rate": 6.384615384615385e-05,
+      "loss": 0.1346,
+      "step": 141
+    },
+    {
+      "epoch": 3.5632,
+      "grad_norm": 3.2372398376464844,
+      "learning_rate": 6.358974358974359e-05,
+      "loss": 0.0628,
+      "step": 142
+    },
+    {
+      "epoch": 3.5888,
+      "grad_norm": 1.734154462814331,
+      "learning_rate": 6.333333333333333e-05,
+      "loss": 0.1006,
+      "step": 143
+    },
+    {
+      "epoch": 3.6144,
+      "grad_norm": 1.8823364973068237,
+      "learning_rate": 6.307692307692308e-05,
+      "loss": 0.1361,
+      "step": 144
+    },
+    {
+      "epoch": 3.64,
+      "grad_norm": 1.515404462814331,
+      "learning_rate": 6.282051282051282e-05,
+      "loss": 0.0794,
+      "step": 145
+    },
+    {
+      "epoch": 3.6656,
+      "grad_norm": 1.3684951066970825,
+      "learning_rate": 6.256410256410256e-05,
+      "loss": 0.0917,
+      "step": 146
+    },
+    {
+      "epoch": 3.6912000000000003,
+      "grad_norm": 1.0272871255874634,
+      "learning_rate": 6.23076923076923e-05,
+      "loss": 0.0695,
+      "step": 147
+    },
+    {
+      "epoch": 3.7168,
+      "grad_norm": 2.0209293365478516,
+      "learning_rate": 6.205128205128206e-05,
+      "loss": 0.1134,
+      "step": 148
+    },
+    {
+      "epoch": 3.7424,
+      "grad_norm": 1.2262115478515625,
+      "learning_rate": 6.17948717948718e-05,
+      "loss": 0.0909,
+      "step": 149
+    },
+    {
+      "epoch": 3.768,
+      "grad_norm": 1.7901175022125244,
+      "learning_rate": 6.153846153846155e-05,
+      "loss": 0.0651,
+      "step": 150
+    },
+    {
+      "epoch": 3.768,
+      "eval_loss": 0.16204459965229034,
+      "eval_runtime": 46.1014,
+      "eval_samples_per_second": 9.783,
+      "eval_steps_per_second": 0.174,
+      "step": 150
+    },
+    {
+      "epoch": 3.7936,
+      "grad_norm": 1.6963258981704712,
+      "learning_rate": 6.128205128205129e-05,
+      "loss": 0.1277,
+      "step": 151
+    },
+    {
+      "epoch": 3.8192,
+      "grad_norm": 3.0077695846557617,
+      "learning_rate": 6.1025641025641035e-05,
+      "loss": 0.1935,
+      "step": 152
+    },
+    {
+      "epoch": 3.8448,
+      "grad_norm": 2.1329452991485596,
+      "learning_rate": 6.0769230769230765e-05,
+      "loss": 0.1412,
+      "step": 153
+    },
+    {
+      "epoch": 3.8704,
+      "grad_norm": 3.3125712871551514,
+      "learning_rate": 6.0512820512820515e-05,
+      "loss": 0.18,
+      "step": 154
+    },
+    {
+      "epoch": 3.896,
+      "grad_norm": 1.4981236457824707,
+      "learning_rate": 6.025641025641026e-05,
+      "loss": 0.0833,
+      "step": 155
+    },
+    {
+      "epoch": 3.9215999999999998,
+      "grad_norm": 1.3236380815505981,
+      "learning_rate": 6e-05,
+      "loss": 0.1045,
+      "step": 156
+    },
+    {
+      "epoch": 3.9472,
+      "grad_norm": 2.1060631275177,
+      "learning_rate": 5.9743589743589745e-05,
+      "loss": 0.0907,
+      "step": 157
+    },
+    {
+      "epoch": 3.9728,
+      "grad_norm": 0.9833385348320007,
+      "learning_rate": 5.948717948717949e-05,
+      "loss": 0.0502,
+      "step": 158
+    },
+    {
+      "epoch": 3.9984,
+      "grad_norm": 1.5143266916275024,
+      "learning_rate": 5.923076923076923e-05,
+      "loss": 0.0822,
+      "step": 159
+    },
+    {
+      "epoch": 4.0,
+      "grad_norm": 0.31218358874320984,
+      "learning_rate": 5.897435897435898e-05,
+      "loss": 0.0043,
+      "step": 160
+    },
+    {
+      "epoch": 4.0,
+      "eval_loss": 0.1569308638572693,
+      "eval_runtime": 46.0425,
+      "eval_samples_per_second": 9.795,
+      "eval_steps_per_second": 0.174,
+      "step": 160
+    },
+    {
+      "epoch": 4.0256,
+      "grad_norm": 1.2191623449325562,
+      "learning_rate": 5.8717948717948725e-05,
+      "loss": 0.0555,
+      "step": 161
+    },
+    {
+      "epoch": 4.0512,
+      "grad_norm": 1.833415150642395,
+      "learning_rate": 5.846153846153847e-05,
+      "loss": 0.069,
+      "step": 162
+    },
+    {
+      "epoch": 4.0768,
+      "grad_norm": 2.106884717941284,
+      "learning_rate": 5.820512820512821e-05,
+      "loss": 0.1464,
+      "step": 163
+    },
+    {
+      "epoch": 4.1024,
+      "grad_norm": 1.7515654563903809,
+      "learning_rate": 5.7948717948717954e-05,
+      "loss": 0.0675,
+      "step": 164
+    },
+    {
+      "epoch": 4.128,
+      "grad_norm": 1.447368860244751,
+      "learning_rate": 5.769230769230769e-05,
+      "loss": 0.0677,
+      "step": 165
+    },
+    {
+      "epoch": 4.1536,
+      "grad_norm": 2.6953341960906982,
+      "learning_rate": 5.7435897435897434e-05,
+      "loss": 0.099,
+      "step": 166
+    },
+    {
+      "epoch": 4.1792,
+      "grad_norm": 1.5634942054748535,
+      "learning_rate": 5.717948717948718e-05,
+      "loss": 0.0596,
+      "step": 167
+    },
+    {
+      "epoch": 4.2048,
+      "grad_norm": 1.276308298110962,
+      "learning_rate": 5.692307692307692e-05,
+      "loss": 0.0661,
+      "step": 168
+    },
+    {
+      "epoch": 4.2304,
+      "grad_norm": 1.228543996810913,
+      "learning_rate": 5.666666666666667e-05,
+      "loss": 0.0808,
+      "step": 169
+    },
+    {
+      "epoch": 4.256,
+      "grad_norm": 1.6373244524002075,
+      "learning_rate": 5.6410256410256414e-05,
+      "loss": 0.0465,
+      "step": 170
+    },
+    {
+      "epoch": 4.256,
+      "eval_loss": 0.15826600790023804,
+      "eval_runtime": 46.1301,
+      "eval_samples_per_second": 9.777,
+      "eval_steps_per_second": 0.173,
+      "step": 170
+    },
+    {
+      "epoch": 4.2816,
+      "grad_norm": 1.3432047367095947,
+      "learning_rate": 5.615384615384616e-05,
+      "loss": 0.0755,
+      "step": 171
+    },
+    {
+      "epoch": 4.3072,
+      "grad_norm": 1.2605931758880615,
+      "learning_rate": 5.58974358974359e-05,
+      "loss": 0.0611,
+      "step": 172
+    },
+    {
+      "epoch": 4.3328,
+      "grad_norm": 2.181140422821045,
+      "learning_rate": 5.5641025641025644e-05,
+      "loss": 0.0615,
+      "step": 173
+    },
+    {
+      "epoch": 4.3584,
+      "grad_norm": 1.6840100288391113,
+      "learning_rate": 5.538461538461539e-05,
+      "loss": 0.0814,
+      "step": 174
+    },
+    {
+      "epoch": 4.384,
+      "grad_norm": 1.543228268623352,
+      "learning_rate": 5.512820512820514e-05,
+      "loss": 0.1456,
+      "step": 175
+    },
+    {
+      "epoch": 4.4096,
+      "grad_norm": 1.5495244264602661,
+      "learning_rate": 5.487179487179488e-05,
+      "loss": 0.07,
+      "step": 176
+    },
+    {
+      "epoch": 4.4352,
+      "grad_norm": 2.249027729034424,
+      "learning_rate": 5.461538461538461e-05,
+      "loss": 0.0488,
+      "step": 177
+    },
+    {
+      "epoch": 4.4608,
+      "grad_norm": 3.975053310394287,
+      "learning_rate": 5.435897435897436e-05,
+      "loss": 0.1022,
+      "step": 178
+    },
+    {
+      "epoch": 4.4864,
+      "grad_norm": 1.0509039163589478,
+      "learning_rate": 5.41025641025641e-05,
+      "loss": 0.0395,
+      "step": 179
+    },
+    {
+      "epoch": 4.5120000000000005,
+      "grad_norm": 0.7361845374107361,
+      "learning_rate": 5.384615384615385e-05,
+      "loss": 0.0288,
+      "step": 180
+    },
+    {
+      "epoch": 4.5120000000000005,
+      "eval_loss": 0.15827181935310364,
+      "eval_runtime": 46.0746,
+      "eval_samples_per_second": 9.788,
+      "eval_steps_per_second": 0.174,
+      "step": 180
+    },
+    {
+      "epoch": 4.5376,
+      "grad_norm": 1.3167545795440674,
+      "learning_rate": 5.358974358974359e-05,
+      "loss": 0.0413,
+      "step": 181
+    },
+    {
+      "epoch": 4.5632,
+      "grad_norm": 1.1776018142700195,
+      "learning_rate": 5.333333333333333e-05,
+      "loss": 0.0289,
+      "step": 182
+    },
+    {
+      "epoch": 4.5888,
+      "grad_norm": 1.1779769659042358,
+      "learning_rate": 5.3076923076923076e-05,
+      "loss": 0.0451,
+      "step": 183
+    },
+    {
+      "epoch": 4.6144,
+      "grad_norm": 1.191728949546814,
+      "learning_rate": 5.2820512820512826e-05,
+      "loss": 0.0402,
+      "step": 184
+    },
+    {
+      "epoch": 4.64,
+      "grad_norm": 1.8995720148086548,
+      "learning_rate": 5.256410256410257e-05,
+      "loss": 0.0407,
+      "step": 185
+    },
+    {
+      "epoch": 4.6655999999999995,
+      "grad_norm": 2.597722053527832,
+      "learning_rate": 5.230769230769231e-05,
+      "loss": 0.059,
+      "step": 186
+    },
+    {
+      "epoch": 4.6912,
+      "grad_norm": 2.1279001235961914,
+      "learning_rate": 5.2051282051282056e-05,
+      "loss": 0.07,
+      "step": 187
+    },
+    {
+      "epoch": 4.7168,
+      "grad_norm": 2.0430350303649902,
+      "learning_rate": 5.17948717948718e-05,
+      "loss": 0.0555,
+      "step": 188
+    },
+    {
+      "epoch": 4.7424,
+      "grad_norm": 2.5439860820770264,
+      "learning_rate": 5.1538461538461536e-05,
+      "loss": 0.0994,
+      "step": 189
+    },
+    {
+      "epoch": 4.768,
+      "grad_norm": 2.0388362407684326,
+      "learning_rate": 5.128205128205128e-05,
+      "loss": 0.0665,
+      "step": 190
+    },
+    {
+      "epoch": 4.768,
+      "eval_loss": 0.15733125805854797,
+      "eval_runtime": 46.0902,
+      "eval_samples_per_second": 9.785,
+      "eval_steps_per_second": 0.174,
+      "step": 190
+    },
+    {
+      "epoch": 4.7936,
+      "grad_norm": 2.7209362983703613,
+      "learning_rate": 5.102564102564102e-05,
+      "loss": 0.0489,
+      "step": 191
+    },
+    {
+      "epoch": 4.8192,
+      "grad_norm": 1.3868178129196167,
+      "learning_rate": 5.0769230769230766e-05,
+      "loss": 0.0666,
+      "step": 192
+    },
+    {
+      "epoch": 4.8448,
+      "grad_norm": 0.9440222978591919,
+      "learning_rate": 5.0512820512820516e-05,
+      "loss": 0.0259,
+      "step": 193
+    },
+    {
+      "epoch": 4.8704,
+      "grad_norm": 1.782159447669983,
+      "learning_rate": 5.025641025641026e-05,
+      "loss": 0.0532,
+      "step": 194
+    },
+    {
+      "epoch": 4.896,
+      "grad_norm": 2.1891701221466064,
+      "learning_rate": 5e-05,
+      "loss": 0.0369,
+      "step": 195
+    },
+    {
+      "epoch": 4.9216,
+      "grad_norm": 0.9858984351158142,
+      "learning_rate": 4.9743589743589746e-05,
+      "loss": 0.0248,
+      "step": 196
+    },
+    {
+      "epoch": 4.9472000000000005,
+      "grad_norm": 1.814439296722412,
+      "learning_rate": 4.948717948717949e-05,
+      "loss": 0.0494,
+      "step": 197
+    },
+    {
+      "epoch": 4.9728,
+      "grad_norm": 2.066479206085205,
+      "learning_rate": 4.923076923076924e-05,
+      "loss": 0.0626,
+      "step": 198
+    },
+    {
+      "epoch": 4.9984,
+      "grad_norm": 0.9697967767715454,
+      "learning_rate": 4.8974358974358975e-05,
+      "loss": 0.0416,
+      "step": 199
+    },
+    {
+      "epoch": 5.0,
+      "grad_norm": 0.023776618763804436,
+      "learning_rate": 4.871794871794872e-05,
+      "loss": 0.0002,
+      "step": 200
+    },
+    {
+      "epoch": 5.0,
+      "eval_loss": 0.1582992523908615,
+      "eval_runtime": 46.0662,
+      "eval_samples_per_second": 9.79,
+      "eval_steps_per_second": 0.174,
+      "step": 200
+    },
+    {
+      "epoch": 5.0256,
+      "grad_norm": 1.4514086246490479,
+      "learning_rate": 4.846153846153846e-05,
+      "loss": 0.0326,
+      "step": 201
+    },
+    {
+      "epoch": 5.0512,
+      "grad_norm": 1.0607144832611084,
+      "learning_rate": 4.8205128205128205e-05,
+      "loss": 0.0347,
+      "step": 202
+    },
+    {
+      "epoch": 5.0768,
+      "grad_norm": 1.9190572500228882,
+      "learning_rate": 4.7948717948717955e-05,
+      "loss": 0.028,
+      "step": 203
+    },
+    {
+      "epoch": 5.1024,
+      "grad_norm": 1.476803183555603,
+      "learning_rate": 4.76923076923077e-05,
+      "loss": 0.0287,
+      "step": 204
+    },
+    {
+      "epoch": 5.128,
+      "grad_norm": 1.7281403541564941,
+      "learning_rate": 4.7435897435897435e-05,
+      "loss": 0.0422,
+      "step": 205
+    },
+    {
+      "epoch": 5.1536,
+      "grad_norm": 1.5540275573730469,
+      "learning_rate": 4.717948717948718e-05,
+      "loss": 0.0219,
+      "step": 206
+    },
+    {
+      "epoch": 5.1792,
+      "grad_norm": 0.6488132476806641,
+      "learning_rate": 4.692307692307693e-05,
+      "loss": 0.0177,
+      "step": 207
+    },
+    {
+      "epoch": 5.2048,
+      "grad_norm": 0.6170593500137329,
+      "learning_rate": 4.666666666666667e-05,
+      "loss": 0.0156,
+      "step": 208
+    },
+    {
+      "epoch": 5.2304,
+      "grad_norm": 1.2647796869277954,
+      "learning_rate": 4.6410256410256415e-05,
+      "loss": 0.0204,
+      "step": 209
+    },
+    {
+      "epoch": 5.256,
+      "grad_norm": 2.475916624069214,
+      "learning_rate": 4.615384615384616e-05,
+      "loss": 0.0306,
+      "step": 210
+    },
+    {
+      "epoch": 5.256,
+      "eval_loss": 0.16460593044757843,
+      "eval_runtime": 46.1754,
+      "eval_samples_per_second": 9.767,
+      "eval_steps_per_second": 0.173,
+      "step": 210
+    },
+    {
+      "epoch": 5.2816,
+      "grad_norm": 2.594714879989624,
+      "learning_rate": 4.5897435897435895e-05,
+      "loss": 0.0271,
+      "step": 211
+    },
+    {
+      "epoch": 5.3072,
+      "grad_norm": 1.516627550125122,
+      "learning_rate": 4.5641025641025645e-05,
+      "loss": 0.0443,
+      "step": 212
+    },
+    {
+      "epoch": 5.3328,
+      "grad_norm": 0.8540337681770325,
+      "learning_rate": 4.538461538461539e-05,
+      "loss": 0.018,
+      "step": 213
+    },
+    {
+      "epoch": 5.3584,
+      "grad_norm": 0.8031948208808899,
+      "learning_rate": 4.512820512820513e-05,
+      "loss": 0.0144,
+      "step": 214
+    },
+    {
+      "epoch": 5.384,
+      "grad_norm": 1.550010085105896,
+      "learning_rate": 4.4871794871794874e-05,
+      "loss": 0.0256,
+      "step": 215
+    },
+    {
+      "epoch": 5.4096,
+      "grad_norm": 0.9779123067855835,
+      "learning_rate": 4.461538461538462e-05,
+      "loss": 0.0217,
+      "step": 216
+    },
+    {
+      "epoch": 5.4352,
+      "grad_norm": 1.3814575672149658,
+      "learning_rate": 4.435897435897436e-05,
+      "loss": 0.0509,
+      "step": 217
+    },
+    {
+      "epoch": 5.4608,
+      "grad_norm": 0.6444527506828308,
+      "learning_rate": 4.4102564102564104e-05,
+      "loss": 0.0128,
+      "step": 218
+    },
+    {
+      "epoch": 5.4864,
+      "grad_norm": 2.447659730911255,
+      "learning_rate": 4.384615384615385e-05,
+      "loss": 0.0408,
+      "step": 219
+    },
+    {
+      "epoch": 5.5120000000000005,
+      "grad_norm": 1.4684360027313232,
+      "learning_rate": 4.358974358974359e-05,
+      "loss": 0.0282,
+      "step": 220
+    },
+    {
+      "epoch": 5.5120000000000005,
+      "eval_loss": 0.17198146879673004,
+      "eval_runtime": 46.1144,
+      "eval_samples_per_second": 9.78,
+      "eval_steps_per_second": 0.173,
+      "step": 220
+    },
+    {
+      "epoch": 5.5376,
+      "grad_norm": 2.4431331157684326,
+      "learning_rate": 4.3333333333333334e-05,
+      "loss": 0.0577,
+      "step": 221
+    },
+    {
+      "epoch": 5.5632,
+      "grad_norm": 2.405123472213745,
+      "learning_rate": 4.3076923076923084e-05,
+      "loss": 0.0601,
+      "step": 222
+    },
+    {
+      "epoch": 5.5888,
+      "grad_norm": 1.342482089996338,
+      "learning_rate": 4.282051282051282e-05,
+      "loss": 0.0211,
+      "step": 223
+    },
+    {
+      "epoch": 5.6144,
+      "grad_norm": 0.6261419057846069,
+      "learning_rate": 4.2564102564102564e-05,
+      "loss": 0.0104,
+      "step": 224
+    },
+    {
+      "epoch": 5.64,
+      "grad_norm": 1.443263053894043,
+      "learning_rate": 4.230769230769231e-05,
+      "loss": 0.0306,
+      "step": 225
+    },
+    {
+      "epoch": 5.6655999999999995,
+      "grad_norm": 3.3630077838897705,
+      "learning_rate": 4.205128205128206e-05,
+      "loss": 0.0346,
+      "step": 226
+    },
+    {
+      "epoch": 5.6912,
+      "grad_norm": 1.3521815538406372,
+      "learning_rate": 4.17948717948718e-05,
+      "loss": 0.0186,
+      "step": 227
+    },
+    {
+      "epoch": 5.7168,
+      "grad_norm": 0.9824311137199402,
+      "learning_rate": 4.1538461538461544e-05,
+      "loss": 0.0094,
+      "step": 228
+    },
+    {
+      "epoch": 5.7424,
+      "grad_norm": 0.9648368954658508,
+      "learning_rate": 4.128205128205128e-05,
+      "loss": 0.025,
+      "step": 229
+    },
+    {
+      "epoch": 5.768,
+      "grad_norm": 1.2381614446640015,
+      "learning_rate": 4.1025641025641023e-05,
+      "loss": 0.0225,
+      "step": 230
+    },
+    {
+      "epoch": 5.768,
+      "eval_loss": 0.16620177030563354,
+      "eval_runtime": 46.0894,
+      "eval_samples_per_second": 9.785,
+      "eval_steps_per_second": 0.174,
+      "step": 230
+    },
+    {
+      "epoch": 5.7936,
+      "grad_norm": 1.3803844451904297,
+      "learning_rate": 4.0769230769230773e-05,
+      "loss": 0.0277,
+      "step": 231
+    },
+    {
+      "epoch": 5.8192,
+      "grad_norm": 2.083890914916992,
+      "learning_rate": 4.051282051282052e-05,
+      "loss": 0.0315,
+      "step": 232
+    },
+    {
+      "epoch": 5.8448,
+      "grad_norm": 2.827819347381592,
+      "learning_rate": 4.025641025641026e-05,
+      "loss": 0.0809,
+      "step": 233
+    },
+    {
+      "epoch": 5.8704,
+      "grad_norm": 1.4013893604278564,
+      "learning_rate": 4e-05,
+      "loss": 0.0731,
+      "step": 234
+    },
+    {
+      "epoch": 5.896,
+      "grad_norm": 1.5475901365280151,
+      "learning_rate": 3.974358974358974e-05,
+      "loss": 0.0309,
+      "step": 235
+    },
+    {
+      "epoch": 5.9216,
+      "grad_norm": 2.6997406482696533,
+      "learning_rate": 3.948717948717949e-05,
+      "loss": 0.061,
+      "step": 236
+    },
+    {
+      "epoch": 5.9472000000000005,
+      "grad_norm": 0.8604904413223267,
+      "learning_rate": 3.923076923076923e-05,
+      "loss": 0.0142,
+      "step": 237
+    },
+    {
+      "epoch": 5.9728,
+      "grad_norm": 1.4761265516281128,
+      "learning_rate": 3.8974358974358976e-05,
+      "loss": 0.0182,
+      "step": 238
+    },
+    {
+      "epoch": 5.9984,
+      "grad_norm": 1.3573832511901855,
+      "learning_rate": 3.871794871794872e-05,
+      "loss": 0.0186,
+      "step": 239
+    },
+    {
+      "epoch": 6.0,
+      "grad_norm": 0.05791207030415535,
+      "learning_rate": 3.846153846153846e-05,
+      "loss": 0.0004,
+      "step": 240
+    },
+    {
+      "epoch": 6.0,
+      "eval_loss": 0.1659858524799347,
+      "eval_runtime": 46.0508,
+      "eval_samples_per_second": 9.794,
+      "eval_steps_per_second": 0.174,
+      "step": 240
+    },
+    {
+      "epoch": 6.0256,
+      "grad_norm": 0.8057020902633667,
+      "learning_rate": 3.8205128205128206e-05,
+      "loss": 0.0148,
+      "step": 241
+    },
+    {
+      "epoch": 6.0512,
+      "grad_norm": 1.3963549137115479,
+      "learning_rate": 3.794871794871795e-05,
+      "loss": 0.0311,
+      "step": 242
+    },
+    {
+      "epoch": 6.0768,
+      "grad_norm": 0.4920665919780731,
+      "learning_rate": 3.769230769230769e-05,
+      "loss": 0.007,
+      "step": 243
+    },
+    {
+      "epoch": 6.1024,
+      "grad_norm": 0.5270194411277771,
+      "learning_rate": 3.7435897435897436e-05,
+      "loss": 0.0112,
+      "step": 244
+    },
+    {
+      "epoch": 6.128,
+      "grad_norm": 1.5079636573791504,
+      "learning_rate": 3.717948717948718e-05,
+      "loss": 0.033,
+      "step": 245
+    },
+    {
+      "epoch": 6.1536,
+      "grad_norm": 0.7430498600006104,
+      "learning_rate": 3.692307692307693e-05,
+      "loss": 0.0088,
+      "step": 246
+    },
+    {
+      "epoch": 6.1792,
+      "grad_norm": 1.5670939683914185,
+      "learning_rate": 3.6666666666666666e-05,
+      "loss": 0.0111,
+      "step": 247
+    },
+    {
+      "epoch": 6.2048,
+      "grad_norm": 1.0994242429733276,
+      "learning_rate": 3.641025641025641e-05,
+      "loss": 0.0284,
+      "step": 248
+    },
+    {
+      "epoch": 6.2304,
+      "grad_norm": 1.008639931678772,
+      "learning_rate": 3.615384615384615e-05,
+      "loss": 0.0112,
+      "step": 249
+    },
+    {
+      "epoch": 6.256,
+      "grad_norm": 0.6727498769760132,
+      "learning_rate": 3.58974358974359e-05,
+      "loss": 0.01,
+      "step": 250
+    },
+    {
+      "epoch": 6.256,
+      "eval_loss": 0.16884523630142212,
+      "eval_runtime": 46.0532,
+      "eval_samples_per_second": 9.793,
+      "eval_steps_per_second": 0.174,
+      "step": 250
+    },
+    {
+      "epoch": 6.2816,
+      "grad_norm": 0.7288702726364136,
+      "learning_rate": 3.5641025641025646e-05,
+      "loss": 0.0095,
+      "step": 251
+    },
+    {
+      "epoch": 6.3072,
+      "grad_norm": 0.5252525806427002,
+      "learning_rate": 3.538461538461539e-05,
+      "loss": 0.0064,
+      "step": 252
+    },
+    {
+      "epoch": 6.3328,
+      "grad_norm": 0.30171850323677063,
+      "learning_rate": 3.5128205128205125e-05,
+      "loss": 0.0046,
+      "step": 253
+    },
+    {
+      "epoch": 6.3584,
+      "grad_norm": 0.8617231845855713,
+      "learning_rate": 3.487179487179487e-05,
+      "loss": 0.0103,
+      "step": 254
+    },
+    {
+      "epoch": 6.384,
+      "grad_norm": 0.37915322184562683,
+      "learning_rate": 3.461538461538462e-05,
+      "loss": 0.006,
+      "step": 255
+    },
+    {
+      "epoch": 6.4096,
+      "grad_norm": 0.2540806829929352,
+      "learning_rate": 3.435897435897436e-05,
+      "loss": 0.0044,
+      "step": 256
+    },
+    {
+      "epoch": 6.4352,
+      "grad_norm": 1.2370727062225342,
+      "learning_rate": 3.4102564102564105e-05,
+      "loss": 0.0643,
+      "step": 257
+    },
+    {
+      "epoch": 6.4608,
+      "grad_norm": 1.992367148399353,
+      "learning_rate": 3.384615384615385e-05,
+      "loss": 0.0203,
+      "step": 258
+    },
+    {
+      "epoch": 6.4864,
+      "grad_norm": 0.20070694386959076,
+      "learning_rate": 3.358974358974359e-05,
+      "loss": 0.003,
+      "step": 259
+    },
+    {
+      "epoch": 6.5120000000000005,
+      "grad_norm": 1.786927342414856,
+      "learning_rate": 3.3333333333333335e-05,
+      "loss": 0.0301,
+      "step": 260
+    },
+    {
+      "epoch": 6.5120000000000005,
+      "eval_loss": 0.18007855117321014,
+      "eval_runtime": 46.2331,
+      "eval_samples_per_second": 9.755,
+      "eval_steps_per_second": 0.173,
+      "step": 260
+    },
+    {
+      "epoch": 6.5376,
+      "grad_norm": 0.2549300789833069,
+      "learning_rate": 3.307692307692308e-05,
+      "loss": 0.0042,
+      "step": 261
+    },
+    {
+      "epoch": 6.5632,
+      "grad_norm": 0.47428637742996216,
+      "learning_rate": 3.282051282051282e-05,
+      "loss": 0.0033,
+      "step": 262
+    },
+    {
+      "epoch": 6.5888,
+      "grad_norm": 0.4179193079471588,
+      "learning_rate": 3.2564102564102565e-05,
+      "loss": 0.0056,
+      "step": 263
+    },
+    {
+      "epoch": 6.6144,
+      "grad_norm": 0.7487953901290894,
+      "learning_rate": 3.230769230769231e-05,
+      "loss": 0.0053,
+      "step": 264
+    },
+    {
+      "epoch": 6.64,
+      "grad_norm": 1.241440773010254,
+      "learning_rate": 3.205128205128206e-05,
+      "loss": 0.0092,
+      "step": 265
+    },
+    {
+      "epoch": 6.6655999999999995,
+      "grad_norm": 1.0357407331466675,
+      "learning_rate": 3.1794871794871795e-05,
+      "loss": 0.0127,
+      "step": 266
+    },
+    {
+      "epoch": 6.6912,
+      "grad_norm": 1.5997498035430908,
+      "learning_rate": 3.153846153846154e-05,
+      "loss": 0.0146,
+      "step": 267
+    },
+    {
+      "epoch": 6.7168,
+      "grad_norm": 0.2820545732975006,
+      "learning_rate": 3.128205128205128e-05,
+      "loss": 0.0032,
+      "step": 268
+    },
+    {
+      "epoch": 6.7424,
+      "grad_norm": 1.378670334815979,
+      "learning_rate": 3.102564102564103e-05,
+      "loss": 0.0073,
+      "step": 269
+    },
+    {
+      "epoch": 6.768,
+      "grad_norm": 0.5203121304512024,
+      "learning_rate": 3.0769230769230774e-05,
+      "loss": 0.0089,
+      "step": 270
+    },
+    {
+      "epoch": 6.768,
+      "eval_loss": 0.18411996960639954,
+      "eval_runtime": 46.0604,
+      "eval_samples_per_second": 9.791,
+      "eval_steps_per_second": 0.174,
+      "step": 270
+    },
+    {
+      "epoch": 6.7936,
+      "grad_norm": 0.44411182403564453,
+      "learning_rate": 3.0512820512820518e-05,
+      "loss": 0.003,
+      "step": 271
+    },
+    {
+      "epoch": 6.8192,
+      "grad_norm": 1.3348493576049805,
+      "learning_rate": 3.0256410256410257e-05,
+      "loss": 0.0166,
+      "step": 272
+    },
+    {
+      "epoch": 6.8448,
+      "grad_norm": 0.26802995800971985,
+      "learning_rate": 3e-05,
+      "loss": 0.0043,
+      "step": 273
+    },
+    {
+      "epoch": 6.8704,
+      "grad_norm": 0.3201054036617279,
+      "learning_rate": 2.9743589743589744e-05,
+      "loss": 0.003,
+      "step": 274
+    },
+    {
+      "epoch": 6.896,
+      "grad_norm": 1.0242953300476074,
+      "learning_rate": 2.948717948717949e-05,
+      "loss": 0.0116,
+      "step": 275
+    },
+    {
+      "epoch": 6.9216,
+      "grad_norm": 0.33335596323013306,
+      "learning_rate": 2.9230769230769234e-05,
+      "loss": 0.0043,
+      "step": 276
+    },
+    {
+      "epoch": 6.9472000000000005,
+      "grad_norm": 1.5013763904571533,
+      "learning_rate": 2.8974358974358977e-05,
+      "loss": 0.0179,
+      "step": 277
+    },
+    {
+      "epoch": 6.9728,
+      "grad_norm": 1.779761791229248,
+      "learning_rate": 2.8717948717948717e-05,
+      "loss": 0.0303,
+      "step": 278
+    },
+    {
+      "epoch": 6.9984,
+      "grad_norm": 0.758307933807373,
+      "learning_rate": 2.846153846153846e-05,
+      "loss": 0.0136,
+      "step": 279
+    },
+    {
+      "epoch": 7.0,
+      "grad_norm": 1.0144366025924683,
+      "learning_rate": 2.8205128205128207e-05,
+      "loss": 0.0079,
+      "step": 280
+    },
+    {
+      "epoch": 7.0,
+      "eval_loss": 0.19086241722106934,
+      "eval_runtime": 46.0659,
+      "eval_samples_per_second": 9.79,
+      "eval_steps_per_second": 0.174,
+      "step": 280
+    },
+    {
+      "epoch": 7.0256,
+      "grad_norm": 0.37199220061302185,
+      "learning_rate": 2.794871794871795e-05,
+      "loss": 0.0041,
+      "step": 281
+    },
+    {
+      "epoch": 7.0512,
+      "grad_norm": 0.7925446629524231,
+      "learning_rate": 2.7692307692307694e-05,
+      "loss": 0.0055,
+      "step": 282
+    },
+    {
+      "epoch": 7.0768,
+      "grad_norm": 0.2660503387451172,
+      "learning_rate": 2.743589743589744e-05,
+      "loss": 0.0028,
+      "step": 283
+    },
+    {
+      "epoch": 7.1024,
+      "grad_norm": 0.9306926727294922,
+      "learning_rate": 2.717948717948718e-05,
+      "loss": 0.0465,
+      "step": 284
+    },
+    {
+      "epoch": 7.128,
+      "grad_norm": 0.6875828504562378,
+      "learning_rate": 2.6923076923076923e-05,
+      "loss": 0.0091,
+      "step": 285
+    },
+    {
+      "epoch": 7.1536,
+      "grad_norm": 0.16312462091445923,
+      "learning_rate": 2.6666666666666667e-05,
+      "loss": 0.0023,
+      "step": 286
+    },
+    {
+      "epoch": 7.1792,
+      "grad_norm": 0.6696649789810181,
+      "learning_rate": 2.6410256410256413e-05,
+      "loss": 0.0089,
+      "step": 287
+    },
+    {
+      "epoch": 7.2048,
+      "grad_norm": 0.3207850754261017,
+      "learning_rate": 2.6153846153846157e-05,
+      "loss": 0.0039,
+      "step": 288
+    },
+    {
+      "epoch": 7.2304,
+      "grad_norm": 0.34680280089378357,
+      "learning_rate": 2.58974358974359e-05,
+      "loss": 0.0033,
+      "step": 289
+    },
+    {
+      "epoch": 7.256,
+      "grad_norm": 0.19657476246356964,
+      "learning_rate": 2.564102564102564e-05,
+      "loss": 0.003,
+      "step": 290
+    },
+    {
+      "epoch": 7.256,
+      "eval_loss": 0.19000744819641113,
+      "eval_runtime": 46.0565,
+      "eval_samples_per_second": 9.792,
+      "eval_steps_per_second": 0.174,
+      "step": 290
+    },
+    {
+      "epoch": 7.2816,
+      "grad_norm": 0.2564392685890198,
+      "learning_rate": 2.5384615384615383e-05,
+      "loss": 0.004,
+      "step": 291
+    },
+    {
+      "epoch": 7.3072,
+      "grad_norm": 0.782609224319458,
+      "learning_rate": 2.512820512820513e-05,
+      "loss": 0.0083,
+      "step": 292
+    },
+    {
+      "epoch": 7.3328,
+      "grad_norm": 0.1740560382604599,
+      "learning_rate": 2.4871794871794873e-05,
+      "loss": 0.0022,
+      "step": 293
+    },
+    {
+      "epoch": 7.3584,
+      "grad_norm": 0.08479921519756317,
+      "learning_rate": 2.461538461538462e-05,
+      "loss": 0.0016,
+      "step": 294
+    },
+    {
+      "epoch": 7.384,
+      "grad_norm": 0.38202646374702454,
+      "learning_rate": 2.435897435897436e-05,
+      "loss": 0.0042,
+      "step": 295
+    },
+    {
+      "epoch": 7.4096,
+      "grad_norm": 0.20064568519592285,
+      "learning_rate": 2.4102564102564103e-05,
+      "loss": 0.002,
+      "step": 296
+    },
+    {
+      "epoch": 7.4352,
+      "grad_norm": 0.2550853490829468,
+      "learning_rate": 2.384615384615385e-05,
+      "loss": 0.0026,
+      "step": 297
+    },
+    {
+      "epoch": 7.4608,
+      "grad_norm": 0.48428618907928467,
+      "learning_rate": 2.358974358974359e-05,
+      "loss": 0.0054,
+      "step": 298
+    },
+    {
+      "epoch": 7.4864,
+      "grad_norm": 1.1436755657196045,
+      "learning_rate": 2.3333333333333336e-05,
+      "loss": 0.0152,
+      "step": 299
+    },
+    {
+      "epoch": 7.5120000000000005,
+      "grad_norm": 0.2809373140335083,
+      "learning_rate": 2.307692307692308e-05,
+      "loss": 0.0018,
+      "step": 300
+    },
+    {
+      "epoch": 7.5120000000000005,
+      "eval_loss": 0.19469289481639862,
+      "eval_runtime": 46.0372,
+      "eval_samples_per_second": 9.796,
+      "eval_steps_per_second": 0.174,
+      "step": 300
+    },
+    {
+      "epoch": 7.5376,
+      "grad_norm": 1.0407063961029053,
+      "learning_rate": 2.2820512820512822e-05,
+      "loss": 0.0124,
+      "step": 301
+    },
+    {
+      "epoch": 7.5632,
+      "grad_norm": 0.3781401515007019,
+      "learning_rate": 2.2564102564102566e-05,
+      "loss": 0.0024,
+      "step": 302
+    },
+    {
+      "epoch": 7.5888,
+      "grad_norm": 0.3046801686286926,
+      "learning_rate": 2.230769230769231e-05,
+      "loss": 0.0028,
+      "step": 303
+    },
+    {
+      "epoch": 7.6144,
+      "grad_norm": 0.17133867740631104,
+      "learning_rate": 2.2051282051282052e-05,
+      "loss": 0.002,
+      "step": 304
+    },
+    {
+      "epoch": 7.64,
+      "grad_norm": 0.29253873229026794,
+      "learning_rate": 2.1794871794871795e-05,
+      "loss": 0.0049,
+      "step": 305
+    },
+    {
+      "epoch": 7.6655999999999995,
+      "grad_norm": 0.6607497930526733,
+      "learning_rate": 2.1538461538461542e-05,
+      "loss": 0.0152,
+      "step": 306
+    },
+    {
+      "epoch": 7.6912,
+      "grad_norm": 0.4728367328643799,
+      "learning_rate": 2.1282051282051282e-05,
+      "loss": 0.0049,
+      "step": 307
+    },
+    {
+      "epoch": 7.7168,
+      "grad_norm": 0.5030187368392944,
+      "learning_rate": 2.102564102564103e-05,
+      "loss": 0.0037,
+      "step": 308
+    },
+    {
+      "epoch": 7.7424,
+      "grad_norm": 0.11288933455944061,
+      "learning_rate": 2.0769230769230772e-05,
+      "loss": 0.0012,
+      "step": 309
+    },
+    {
+      "epoch": 7.768,
+      "grad_norm": 0.4386049211025238,
+      "learning_rate": 2.0512820512820512e-05,
+      "loss": 0.0059,
+      "step": 310
+    },
+    {
+      "epoch": 7.768,
+      "eval_loss": 0.19445903599262238,
+      "eval_runtime": 46.0284,
+      "eval_samples_per_second": 9.798,
+      "eval_steps_per_second": 0.174,
+      "step": 310
+    },
+    {
+      "epoch": 7.7936,
+      "grad_norm": 0.15119102597236633,
+      "learning_rate": 2.025641025641026e-05,
+      "loss": 0.0016,
+      "step": 311
+    },
+    {
+      "epoch": 7.8192,
+      "grad_norm": 0.36368149518966675,
+      "learning_rate": 2e-05,
+      "loss": 0.0037,
+      "step": 312
+    },
+    {
+      "epoch": 7.8448,
+      "grad_norm": 0.22931738197803497,
+      "learning_rate": 1.9743589743589745e-05,
+      "loss": 0.0023,
+      "step": 313
+    },
+    {
+      "epoch": 7.8704,
+      "grad_norm": 0.30464252829551697,
+      "learning_rate": 1.9487179487179488e-05,
+      "loss": 0.0027,
+      "step": 314
+    },
+    {
+      "epoch": 7.896,
+      "grad_norm": 0.13730935752391815,
+      "learning_rate": 1.923076923076923e-05,
+      "loss": 0.0017,
+      "step": 315
+    },
+    {
+      "epoch": 7.9216,
+      "grad_norm": 0.21119461953639984,
+      "learning_rate": 1.8974358974358975e-05,
+      "loss": 0.0031,
+      "step": 316
+    },
+    {
+      "epoch": 7.9472000000000005,
+      "grad_norm": 1.3350460529327393,
+      "learning_rate": 1.8717948717948718e-05,
+      "loss": 0.0154,
+      "step": 317
+    },
+    {
+      "epoch": 7.9728,
+      "grad_norm": 0.29415613412857056,
+      "learning_rate": 1.8461538461538465e-05,
+      "loss": 0.0025,
+      "step": 318
+    },
+    {
+      "epoch": 7.9984,
+      "grad_norm": 0.20422235131263733,
+      "learning_rate": 1.8205128205128204e-05,
+      "loss": 0.0023,
+      "step": 319
+    },
+    {
+      "epoch": 8.0,
+      "grad_norm": 0.01667635701596737,
+      "learning_rate": 1.794871794871795e-05,
+      "loss": 0.0001,
+      "step": 320
+    },
+    {
+      "epoch": 8.0,
+      "eval_loss": 0.19959121942520142,
+      "eval_runtime": 46.0703,
+      "eval_samples_per_second": 9.789,
+      "eval_steps_per_second": 0.174,
+      "step": 320
+    },
+    {
+      "epoch": 8.0256,
+      "grad_norm": 0.3130502700805664,
+      "learning_rate": 1.7692307692307694e-05,
+      "loss": 0.003,
+      "step": 321
+    },
+    {
+      "epoch": 8.0512,
+      "grad_norm": 0.5019899010658264,
+      "learning_rate": 1.7435897435897434e-05,
+      "loss": 0.0096,
+      "step": 322
+    },
+    {
+      "epoch": 8.0768,
+      "grad_norm": 0.07362326979637146,
+      "learning_rate": 1.717948717948718e-05,
+      "loss": 0.0011,
+      "step": 323
+    },
+    {
+      "epoch": 8.1024,
+      "grad_norm": 0.15674403309822083,
+      "learning_rate": 1.6923076923076924e-05,
+      "loss": 0.0018,
+      "step": 324
+    },
+    {
+      "epoch": 8.128,
+      "grad_norm": 0.16153913736343384,
+      "learning_rate": 1.6666666666666667e-05,
+      "loss": 0.002,
+      "step": 325
+    },
+    {
+      "epoch": 8.1536,
+      "grad_norm": 0.10829649865627289,
+      "learning_rate": 1.641025641025641e-05,
+      "loss": 0.001,
+      "step": 326
+    },
+    {
+      "epoch": 8.1792,
+      "grad_norm": 0.13399221003055573,
+      "learning_rate": 1.6153846153846154e-05,
+      "loss": 0.0016,
+      "step": 327
+    },
+    {
+      "epoch": 8.2048,
+      "grad_norm": 0.6621348857879639,
+      "learning_rate": 1.5897435897435897e-05,
+      "loss": 0.0061,
+      "step": 328
+    },
+    {
+      "epoch": 8.2304,
+      "grad_norm": 0.12478233873844147,
+      "learning_rate": 1.564102564102564e-05,
+      "loss": 0.0013,
+      "step": 329
+    },
+    {
+      "epoch": 8.256,
+      "grad_norm": 0.2321283519268036,
+      "learning_rate": 1.5384615384615387e-05,
+      "loss": 0.0024,
+      "step": 330
+    },
+    {
+      "epoch": 8.256,
+      "eval_loss": 0.20678183436393738,
+      "eval_runtime": 46.102,
+      "eval_samples_per_second": 9.783,
+      "eval_steps_per_second": 0.174,
+      "step": 330
+    },
+    {
+      "epoch": 8.2816,
+      "grad_norm": 0.3146536648273468,
+      "learning_rate": 1.5128205128205129e-05,
+      "loss": 0.0028,
+      "step": 331
+    },
+    {
+      "epoch": 8.3072,
+      "grad_norm": 0.34905433654785156,
+      "learning_rate": 1.4871794871794872e-05,
+      "loss": 0.0023,
+      "step": 332
+    },
+    {
+      "epoch": 8.3328,
+      "grad_norm": 0.677470326423645,
+      "learning_rate": 1.4615384615384617e-05,
+      "loss": 0.0049,
+      "step": 333
+    },
+    {
+      "epoch": 8.3584,
+      "grad_norm": 0.14551085233688354,
+      "learning_rate": 1.4358974358974359e-05,
+      "loss": 0.0014,
+      "step": 334
+    },
+    {
+      "epoch": 8.384,
+      "grad_norm": 0.11376652866601944,
+      "learning_rate": 1.4102564102564104e-05,
+      "loss": 0.0015,
+      "step": 335
+    },
+    {
+      "epoch": 8.4096,
+      "grad_norm": 0.6229234337806702,
+      "learning_rate": 1.3846153846153847e-05,
+      "loss": 0.0042,
+      "step": 336
+    },
+    {
+      "epoch": 8.4352,
+      "grad_norm": 0.1884259432554245,
+      "learning_rate": 1.358974358974359e-05,
+      "loss": 0.0019,
+      "step": 337
+    },
+    {
+      "epoch": 8.4608,
+      "grad_norm": 0.27811136841773987,
+      "learning_rate": 1.3333333333333333e-05,
+      "loss": 0.0023,
+      "step": 338
+    },
+    {
+      "epoch": 8.4864,
+      "grad_norm": 0.07908125221729279,
+      "learning_rate": 1.3076923076923078e-05,
+      "loss": 0.0011,
+      "step": 339
+    },
+    {
+      "epoch": 8.512,
+      "grad_norm": 0.15802721679210663,
+      "learning_rate": 1.282051282051282e-05,
+      "loss": 0.001,
+      "step": 340
+    },
+    {
+      "epoch": 8.512,
+      "eval_loss": 0.20775224268436432,
+      "eval_runtime": 46.0873,
+      "eval_samples_per_second": 9.786,
+      "eval_steps_per_second": 0.174,
+      "step": 340
+    },
+    {
+      "epoch": 8.5376,
+      "grad_norm": 0.4371594488620758,
+      "learning_rate": 1.2564102564102565e-05,
+      "loss": 0.0039,
+      "step": 341
+    },
+    {
+      "epoch": 8.5632,
+      "grad_norm": 0.7350974082946777,
+      "learning_rate": 1.230769230769231e-05,
+      "loss": 0.0058,
+      "step": 342
+    },
+    {
+      "epoch": 8.588799999999999,
+      "grad_norm": 0.2187982201576233,
+      "learning_rate": 1.2051282051282051e-05,
+      "loss": 0.0019,
+      "step": 343
+    },
+    {
+      "epoch": 8.6144,
+      "grad_norm": 0.36454564332962036,
+      "learning_rate": 1.1794871794871795e-05,
+      "loss": 0.0037,
+      "step": 344
+    },
+    {
+      "epoch": 8.64,
+      "grad_norm": 0.2089586853981018,
+      "learning_rate": 1.153846153846154e-05,
+      "loss": 0.002,
+      "step": 345
+    },
+    {
+      "epoch": 8.6656,
+      "grad_norm": 0.16004692018032074,
+      "learning_rate": 1.1282051282051283e-05,
+      "loss": 0.0016,
+      "step": 346
+    },
+    {
+      "epoch": 8.6912,
+      "grad_norm": 0.13019374012947083,
+      "learning_rate": 1.1025641025641026e-05,
+      "loss": 0.0017,
+      "step": 347
+    },
+    {
+      "epoch": 8.7168,
+      "grad_norm": 0.20604482293128967,
+      "learning_rate": 1.0769230769230771e-05,
+      "loss": 0.0028,
+      "step": 348
+    },
+    {
+      "epoch": 8.7424,
+      "grad_norm": 0.2585042417049408,
+      "learning_rate": 1.0512820512820514e-05,
+      "loss": 0.0019,
+      "step": 349
+    },
+    {
+      "epoch": 8.768,
+      "grad_norm": 0.13119027018547058,
+      "learning_rate": 1.0256410256410256e-05,
+      "loss": 0.0015,
+      "step": 350
+    },
+    {
+      "epoch": 8.768,
+      "eval_loss": 0.2104375809431076,
+      "eval_runtime": 46.0673,
+      "eval_samples_per_second": 9.79,
+      "eval_steps_per_second": 0.174,
+      "step": 350
+    },
+    {
+      "epoch": 8.7936,
+      "grad_norm": 0.23227037489414215,
+      "learning_rate": 1e-05,
+      "loss": 0.0023,
+      "step": 351
+    },
+    {
+      "epoch": 8.8192,
+      "grad_norm": 0.25416502356529236,
+      "learning_rate": 9.743589743589744e-06,
+      "loss": 0.0021,
+      "step": 352
+    },
+    {
+      "epoch": 8.8448,
+      "grad_norm": 0.08733928948640823,
+      "learning_rate": 9.487179487179487e-06,
+      "loss": 0.0012,
+      "step": 353
+    },
+    {
+      "epoch": 8.8704,
+      "grad_norm": 0.33069783449172974,
+      "learning_rate": 9.230769230769232e-06,
+      "loss": 0.0041,
+      "step": 354
+    },
+    {
+      "epoch": 8.896,
+      "grad_norm": 1.0839825868606567,
+      "learning_rate": 8.974358974358976e-06,
+      "loss": 0.0438,
+      "step": 355
+    },
+    {
+      "epoch": 8.9216,
+      "grad_norm": 0.23376193642616272,
+      "learning_rate": 8.717948717948717e-06,
+      "loss": 0.0028,
+      "step": 356
+    },
+    {
+      "epoch": 8.9472,
+      "grad_norm": 0.2836299240589142,
+      "learning_rate": 8.461538461538462e-06,
+      "loss": 0.002,
+      "step": 357
+    },
+    {
+      "epoch": 8.9728,
+      "grad_norm": 0.04910902678966522,
+      "learning_rate": 8.205128205128205e-06,
+      "loss": 0.0007,
+      "step": 358
+    },
+    {
+      "epoch": 8.9984,
+      "grad_norm": 0.08654181659221649,
+      "learning_rate": 7.948717948717949e-06,
+      "loss": 0.0009,
+      "step": 359
+    },
+    {
+      "epoch": 9.0,
+      "grad_norm": 0.016530824825167656,
+      "learning_rate": 7.692307692307694e-06,
+      "loss": 0.0001,
+      "step": 360
+    },
+    {
+      "epoch": 9.0,
+      "eval_loss": 0.2104598730802536,
+      "eval_runtime": 46.2138,
+      "eval_samples_per_second": 9.759,
+      "eval_steps_per_second": 0.173,
+      "step": 360
+    },
+    {
+      "epoch": 9.0256,
+      "grad_norm": 0.19786006212234497,
+      "learning_rate": 7.435897435897436e-06,
+      "loss": 0.0029,
+      "step": 361
+    },
+    {
+      "epoch": 9.0512,
+      "grad_norm": 0.15577854216098785,
+      "learning_rate": 7.179487179487179e-06,
+      "loss": 0.0012,
+      "step": 362
+    },
+    {
+      "epoch": 9.0768,
+      "grad_norm": 0.09487804770469666,
+      "learning_rate": 6.923076923076923e-06,
+      "loss": 0.0014,
+      "step": 363
+    },
+    {
+      "epoch": 9.1024,
+      "grad_norm": 0.1183762401342392,
+      "learning_rate": 6.666666666666667e-06,
+      "loss": 0.0013,
+      "step": 364
+    },
+    {
+      "epoch": 9.128,
+      "grad_norm": 0.2357182800769806,
+      "learning_rate": 6.41025641025641e-06,
+      "loss": 0.0022,
+      "step": 365
+    },
+    {
+      "epoch": 9.1536,
+      "grad_norm": 0.16520433127880096,
+      "learning_rate": 6.153846153846155e-06,
+      "loss": 0.0016,
+      "step": 366
+    },
+    {
+      "epoch": 9.1792,
+      "grad_norm": 0.35106149315834045,
+      "learning_rate": 5.897435897435897e-06,
+      "loss": 0.0041,
+      "step": 367
+    },
+    {
+      "epoch": 9.2048,
+      "grad_norm": 0.28743863105773926,
+      "learning_rate": 5.641025641025641e-06,
+      "loss": 0.0015,
+      "step": 368
+    },
+    {
+      "epoch": 9.2304,
+      "grad_norm": 0.07648279517889023,
+      "learning_rate": 5.3846153846153855e-06,
+      "loss": 0.0009,
+      "step": 369
+    },
+    {
+      "epoch": 9.256,
+      "grad_norm": 0.25945615768432617,
+      "learning_rate": 5.128205128205128e-06,
+      "loss": 0.0021,
+      "step": 370
+    },
+    {
+      "epoch": 9.256,
+      "eval_loss": 0.2159082591533661,
+      "eval_runtime": 46.0968,
+      "eval_samples_per_second": 9.784,
+      "eval_steps_per_second": 0.174,
+      "step": 370
+    },
+    {
+      "epoch": 9.2816,
+      "grad_norm": 0.12486568838357925,
+      "learning_rate": 4.871794871794872e-06,
+      "loss": 0.0013,
+      "step": 371
+    },
+    {
+      "epoch": 9.3072,
+      "grad_norm": 0.22302450239658356,
+      "learning_rate": 4.615384615384616e-06,
+      "loss": 0.0027,
+      "step": 372
+    },
+    {
+      "epoch": 9.3328,
+      "grad_norm": 0.9940178990364075,
+      "learning_rate": 4.3589743589743586e-06,
+      "loss": 0.0376,
+      "step": 373
+    },
+    {
+      "epoch": 9.3584,
+      "grad_norm": 0.0722355991601944,
+      "learning_rate": 4.102564102564103e-06,
+      "loss": 0.0009,
+      "step": 374
+    },
+    {
+      "epoch": 9.384,
+      "grad_norm": 0.16168925166130066,
+      "learning_rate": 3.846153846153847e-06,
+      "loss": 0.0016,
+      "step": 375
+    },
+    {
+      "epoch": 9.4096,
+      "grad_norm": 0.14245012402534485,
+      "learning_rate": 3.5897435897435896e-06,
+      "loss": 0.0012,
+      "step": 376
+    },
+    {
+      "epoch": 9.4352,
+      "grad_norm": 0.6443990468978882,
+      "learning_rate": 3.3333333333333333e-06,
+      "loss": 0.0051,
+      "step": 377
+    },
+    {
+      "epoch": 9.4608,
+      "grad_norm": 0.06897876411676407,
+      "learning_rate": 3.0769230769230774e-06,
+      "loss": 0.0008,
+      "step": 378
+    },
+    {
+      "epoch": 9.4864,
+      "grad_norm": 0.06450845301151276,
+      "learning_rate": 2.8205128205128207e-06,
+      "loss": 0.0007,
+      "step": 379
+    },
+    {
+      "epoch": 9.512,
+      "grad_norm": 0.16875888407230377,
+      "learning_rate": 2.564102564102564e-06,
+      "loss": 0.0017,
+      "step": 380
+    },
+    {
+      "epoch": 9.512,
+      "eval_loss": 0.21007762849330902,
+      "eval_runtime": 46.1576,
+      "eval_samples_per_second": 9.771,
+      "eval_steps_per_second": 0.173,
+      "step": 380
+    },
+    {
+      "epoch": 9.5376,
+      "grad_norm": 0.08018496632575989,
+      "learning_rate": 2.307692307692308e-06,
+      "loss": 0.0008,
+      "step": 381
+    },
+    {
+      "epoch": 9.5632,
+      "grad_norm": 0.22642257809638977,
+      "learning_rate": 2.0512820512820513e-06,
+      "loss": 0.0015,
+      "step": 382
+    },
+    {
+      "epoch": 9.588799999999999,
+      "grad_norm": 0.5728319883346558,
+      "learning_rate": 1.7948717948717948e-06,
+      "loss": 0.0079,
+      "step": 383
+    },
+    {
+      "epoch": 9.6144,
+      "grad_norm": 0.16731196641921997,
+      "learning_rate": 1.5384615384615387e-06,
+      "loss": 0.0012,
+      "step": 384
+    },
+    {
+      "epoch": 9.64,
+      "grad_norm": 0.08967146277427673,
+      "learning_rate": 1.282051282051282e-06,
+      "loss": 0.0009,
+      "step": 385
+    },
+    {
+      "epoch": 9.6656,
+      "grad_norm": 0.19866441190242767,
+      "learning_rate": 1.0256410256410257e-06,
+      "loss": 0.0023,
+      "step": 386
+    },
+    {
+      "epoch": 9.6912,
+      "grad_norm": 0.07427145540714264,
+      "learning_rate": 7.692307692307694e-07,
+      "loss": 0.0009,
+      "step": 387
+    },
+    {
+      "epoch": 9.7168,
+      "grad_norm": 0.2308674156665802,
+      "learning_rate": 5.128205128205128e-07,
+      "loss": 0.0022,
+      "step": 388
+    },
+    {
+      "epoch": 9.7424,
+      "grad_norm": 0.4221293330192566,
+      "learning_rate": 2.564102564102564e-07,
+      "loss": 0.0047,
+      "step": 389
+    },
+    {
+      "epoch": 9.768,
+      "grad_norm": 0.14237630367279053,
+      "learning_rate": 0.0,
+      "loss": 0.0014,
+      "step": 390
+    },
+    {
+      "epoch": 9.768,
+      "eval_loss": 0.21078599989414215,
+      "eval_runtime": 46.065,
+      "eval_samples_per_second": 9.791,
+      "eval_steps_per_second": 0.174,
+      "step": 390
+    },
+    {
+      "epoch": 9.768,
+      "step": 390,
+      "total_flos": 7.851213014748365e+18,
+      "train_loss": 0.11308465492553436,
+      "train_runtime": 13137.6351,
+      "train_samples_per_second": 3.806,
+      "train_steps_per_second": 0.03
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 390,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 10,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 7.851213014748365e+18,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

CRM/sft-qwen2.5-math-prm-7b-score-model-simple-bs128/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e58bf2d34fe45c8cf3d1e4929a5e0a46db02f846085781f1d5c6c3d29d72f02a
+size 5496