2023-10-06 10:16:42,500 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:16:42,501 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-06 10:16:42,501 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:16:42,501 MultiCorpus: 1214 train + 266 dev + 251 test sentences - NER_HIPE_2022 Corpus: 1214 train + 266 dev + 251 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/en/with_doc_seperator 2023-10-06 10:16:42,501 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:16:42,501 Train: 1214 sentences 2023-10-06 10:16:42,501 (train_with_dev=False, train_with_test=False) 2023-10-06 10:16:42,501 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:16:42,501 Training Params: 2023-10-06 10:16:42,502 - learning_rate: "0.00015" 2023-10-06 10:16:42,502 - mini_batch_size: "8" 2023-10-06 10:16:42,502 - max_epochs: "10" 2023-10-06 10:16:42,502 - shuffle: "True" 2023-10-06 10:16:42,502 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:16:42,502 Plugins: 2023-10-06 10:16:42,502 - TensorboardLogger 2023-10-06 10:16:42,502 - LinearScheduler | warmup_fraction: '0.1' 2023-10-06 10:16:42,502 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:16:42,502 Final evaluation on model from best epoch (best-model.pt) 2023-10-06 10:16:42,502 - metric: "('micro avg', 'f1-score')" 2023-10-06 10:16:42,502 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:16:42,502 Computation: 2023-10-06 10:16:42,502 - compute on device: cuda:0 2023-10-06 10:16:42,502 - embedding storage: none 2023-10-06 10:16:42,502 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:16:42,502 Model training base path: "hmbench-ajmc/en-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-1" 2023-10-06 10:16:42,502 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:16:42,502 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:16:42,503 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-06 10:16:53,330 epoch 1 - iter 15/152 - loss 3.23091021 - time (sec): 10.83 - samples/sec: 317.28 - lr: 0.000014 - momentum: 0.000000 2023-10-06 10:17:03,982 epoch 1 - iter 30/152 - loss 3.22511292 - time (sec): 21.48 - samples/sec: 313.90 - lr: 0.000029 - momentum: 0.000000 2023-10-06 10:17:14,260 epoch 1 - iter 45/152 - loss 3.21551443 - time (sec): 31.76 - samples/sec: 306.65 - lr: 0.000043 - momentum: 0.000000 2023-10-06 10:17:24,442 epoch 1 - iter 60/152 - loss 3.19708544 - time (sec): 41.94 - samples/sec: 304.52 - lr: 0.000058 - momentum: 0.000000 2023-10-06 10:17:34,184 epoch 1 - iter 75/152 - loss 3.15990751 - time (sec): 51.68 - samples/sec: 300.68 - lr: 0.000073 - momentum: 0.000000 2023-10-06 10:17:43,968 epoch 1 - iter 90/152 - loss 3.09937290 - time (sec): 61.46 - samples/sec: 296.45 - lr: 0.000088 - momentum: 0.000000 2023-10-06 10:17:53,908 epoch 1 - iter 105/152 - loss 3.01983427 - time (sec): 71.40 - samples/sec: 295.72 - lr: 0.000103 - momentum: 0.000000 2023-10-06 10:18:04,288 epoch 1 - iter 120/152 - loss 2.92755753 - time (sec): 81.78 - samples/sec: 294.74 - lr: 0.000117 - momentum: 0.000000 2023-10-06 10:18:14,750 epoch 1 - iter 135/152 - loss 2.82685567 - time (sec): 92.25 - samples/sec: 294.82 - lr: 0.000132 - momentum: 0.000000 2023-10-06 10:18:26,053 epoch 1 - iter 150/152 - loss 2.71271200 - time (sec): 103.55 - samples/sec: 296.44 - lr: 0.000147 - momentum: 0.000000 2023-10-06 10:18:27,181 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:18:27,181 EPOCH 1 done: loss 2.7035 - lr: 0.000147 2023-10-06 10:18:34,201 DEV : loss 1.6245863437652588 - f1-score (micro avg) 0.0 2023-10-06 10:18:34,209 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:18:44,588 epoch 2 - iter 15/152 - loss 1.56143171 - time (sec): 10.38 - samples/sec: 296.22 - lr: 0.000148 - momentum: 0.000000 2023-10-06 10:18:54,822 epoch 2 - iter 30/152 - loss 1.43488801 - time (sec): 20.61 - samples/sec: 293.05 - lr: 0.000147 - momentum: 0.000000 2023-10-06 10:19:05,238 epoch 2 - iter 45/152 - loss 1.32486715 - time (sec): 31.03 - samples/sec: 293.07 - lr: 0.000145 - momentum: 0.000000 2023-10-06 10:19:16,373 epoch 2 - iter 60/152 - loss 1.23906990 - time (sec): 42.16 - samples/sec: 297.42 - lr: 0.000144 - momentum: 0.000000 2023-10-06 10:19:26,260 epoch 2 - iter 75/152 - loss 1.16652823 - time (sec): 52.05 - samples/sec: 294.55 - lr: 0.000142 - momentum: 0.000000 2023-10-06 10:19:36,701 epoch 2 - iter 90/152 - loss 1.08230752 - time (sec): 62.49 - samples/sec: 294.88 - lr: 0.000140 - momentum: 0.000000 2023-10-06 10:19:46,814 epoch 2 - iter 105/152 - loss 1.01356554 - time (sec): 72.60 - samples/sec: 294.16 - lr: 0.000139 - momentum: 0.000000 2023-10-06 10:19:57,488 epoch 2 - iter 120/152 - loss 0.96622789 - time (sec): 83.28 - samples/sec: 294.83 - lr: 0.000137 - momentum: 0.000000 2023-10-06 10:20:07,841 epoch 2 - iter 135/152 - loss 0.94236746 - time (sec): 93.63 - samples/sec: 295.81 - lr: 0.000135 - momentum: 0.000000 2023-10-06 10:20:17,820 epoch 2 - iter 150/152 - loss 0.90992544 - time (sec): 103.61 - samples/sec: 295.68 - lr: 0.000134 - momentum: 0.000000 2023-10-06 10:20:19,028 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:20:19,029 EPOCH 2 done: loss 0.9041 - lr: 0.000134 2023-10-06 10:20:25,943 DEV : loss 0.5565100908279419 - f1-score (micro avg) 0.0 2023-10-06 10:20:25,950 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:20:36,607 epoch 3 - iter 15/152 - loss 0.48009388 - time (sec): 10.66 - samples/sec: 300.33 - lr: 0.000132 - momentum: 0.000000 2023-10-06 10:20:47,334 epoch 3 - iter 30/152 - loss 0.44650096 - time (sec): 21.38 - samples/sec: 299.36 - lr: 0.000130 - momentum: 0.000000 2023-10-06 10:20:57,116 epoch 3 - iter 45/152 - loss 0.43237998 - time (sec): 31.16 - samples/sec: 294.38 - lr: 0.000129 - momentum: 0.000000 2023-10-06 10:21:07,989 epoch 3 - iter 60/152 - loss 0.44025313 - time (sec): 42.04 - samples/sec: 296.12 - lr: 0.000127 - momentum: 0.000000 2023-10-06 10:21:18,025 epoch 3 - iter 75/152 - loss 0.42832921 - time (sec): 52.07 - samples/sec: 293.86 - lr: 0.000125 - momentum: 0.000000 2023-10-06 10:21:28,161 epoch 3 - iter 90/152 - loss 0.42221204 - time (sec): 62.21 - samples/sec: 294.36 - lr: 0.000124 - momentum: 0.000000 2023-10-06 10:21:38,720 epoch 3 - iter 105/152 - loss 0.41320316 - time (sec): 72.77 - samples/sec: 294.95 - lr: 0.000122 - momentum: 0.000000 2023-10-06 10:21:49,640 epoch 3 - iter 120/152 - loss 0.40092408 - time (sec): 83.69 - samples/sec: 295.93 - lr: 0.000120 - momentum: 0.000000 2023-10-06 10:21:59,911 epoch 3 - iter 135/152 - loss 0.39055575 - time (sec): 93.96 - samples/sec: 294.54 - lr: 0.000119 - momentum: 0.000000 2023-10-06 10:22:09,905 epoch 3 - iter 150/152 - loss 0.38134225 - time (sec): 103.95 - samples/sec: 294.00 - lr: 0.000117 - momentum: 0.000000 2023-10-06 10:22:11,203 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:22:11,204 EPOCH 3 done: loss 0.3786 - lr: 0.000117 2023-10-06 10:22:18,300 DEV : loss 0.33053359389305115 - f1-score (micro avg) 0.4623 2023-10-06 10:22:18,308 saving best model 2023-10-06 10:22:19,161 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:22:29,490 epoch 4 - iter 15/152 - loss 0.24948890 - time (sec): 10.33 - samples/sec: 294.67 - lr: 0.000115 - momentum: 0.000000 2023-10-06 10:22:40,049 epoch 4 - iter 30/152 - loss 0.27326062 - time (sec): 20.89 - samples/sec: 290.82 - lr: 0.000114 - momentum: 0.000000 2023-10-06 10:22:50,336 epoch 4 - iter 45/152 - loss 0.26318091 - time (sec): 31.17 - samples/sec: 288.65 - lr: 0.000112 - momentum: 0.000000 2023-10-06 10:23:00,686 epoch 4 - iter 60/152 - loss 0.24647017 - time (sec): 41.52 - samples/sec: 290.01 - lr: 0.000110 - momentum: 0.000000 2023-10-06 10:23:12,273 epoch 4 - iter 75/152 - loss 0.24638108 - time (sec): 53.11 - samples/sec: 292.85 - lr: 0.000109 - momentum: 0.000000 2023-10-06 10:23:23,009 epoch 4 - iter 90/152 - loss 0.24135100 - time (sec): 63.85 - samples/sec: 290.54 - lr: 0.000107 - momentum: 0.000000 2023-10-06 10:23:33,332 epoch 4 - iter 105/152 - loss 0.23534937 - time (sec): 74.17 - samples/sec: 289.37 - lr: 0.000105 - momentum: 0.000000 2023-10-06 10:23:44,267 epoch 4 - iter 120/152 - loss 0.22915137 - time (sec): 85.10 - samples/sec: 288.99 - lr: 0.000104 - momentum: 0.000000 2023-10-06 10:23:54,797 epoch 4 - iter 135/152 - loss 0.22562166 - time (sec): 95.63 - samples/sec: 288.54 - lr: 0.000102 - momentum: 0.000000 2023-10-06 10:24:05,519 epoch 4 - iter 150/152 - loss 0.22539775 - time (sec): 106.36 - samples/sec: 288.57 - lr: 0.000101 - momentum: 0.000000 2023-10-06 10:24:06,629 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:24:06,629 EPOCH 4 done: loss 0.2249 - lr: 0.000101 2023-10-06 10:24:14,348 DEV : loss 0.22695742547512054 - f1-score (micro avg) 0.6729 2023-10-06 10:24:14,356 saving best model 2023-10-06 10:24:15,248 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:24:26,459 epoch 5 - iter 15/152 - loss 0.16360169 - time (sec): 11.21 - samples/sec: 285.82 - lr: 0.000099 - momentum: 0.000000 2023-10-06 10:24:37,579 epoch 5 - iter 30/152 - loss 0.15269730 - time (sec): 22.33 - samples/sec: 280.88 - lr: 0.000097 - momentum: 0.000000 2023-10-06 10:24:48,462 epoch 5 - iter 45/152 - loss 0.17164260 - time (sec): 33.21 - samples/sec: 279.89 - lr: 0.000095 - momentum: 0.000000 2023-10-06 10:24:59,191 epoch 5 - iter 60/152 - loss 0.16426728 - time (sec): 43.94 - samples/sec: 279.26 - lr: 0.000094 - momentum: 0.000000 2023-10-06 10:25:09,552 epoch 5 - iter 75/152 - loss 0.16241537 - time (sec): 54.30 - samples/sec: 275.57 - lr: 0.000092 - momentum: 0.000000 2023-10-06 10:25:21,763 epoch 5 - iter 90/152 - loss 0.16460190 - time (sec): 66.51 - samples/sec: 277.09 - lr: 0.000091 - momentum: 0.000000 2023-10-06 10:25:32,755 epoch 5 - iter 105/152 - loss 0.16346546 - time (sec): 77.51 - samples/sec: 275.76 - lr: 0.000089 - momentum: 0.000000 2023-10-06 10:25:44,545 epoch 5 - iter 120/152 - loss 0.15738522 - time (sec): 89.30 - samples/sec: 275.69 - lr: 0.000087 - momentum: 0.000000 2023-10-06 10:25:55,620 epoch 5 - iter 135/152 - loss 0.15447561 - time (sec): 100.37 - samples/sec: 276.31 - lr: 0.000086 - momentum: 0.000000 2023-10-06 10:26:06,687 epoch 5 - iter 150/152 - loss 0.15098173 - time (sec): 111.44 - samples/sec: 275.96 - lr: 0.000084 - momentum: 0.000000 2023-10-06 10:26:07,767 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:26:07,767 EPOCH 5 done: loss 0.1508 - lr: 0.000084 2023-10-06 10:26:15,719 DEV : loss 0.17634844779968262 - f1-score (micro avg) 0.7082 2023-10-06 10:26:15,726 saving best model 2023-10-06 10:26:16,775 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:26:27,732 epoch 6 - iter 15/152 - loss 0.11574705 - time (sec): 10.96 - samples/sec: 284.24 - lr: 0.000082 - momentum: 0.000000 2023-10-06 10:26:38,194 epoch 6 - iter 30/152 - loss 0.12493814 - time (sec): 21.42 - samples/sec: 277.02 - lr: 0.000080 - momentum: 0.000000 2023-10-06 10:26:49,325 epoch 6 - iter 45/152 - loss 0.11610682 - time (sec): 32.55 - samples/sec: 279.43 - lr: 0.000079 - momentum: 0.000000 2023-10-06 10:27:00,619 epoch 6 - iter 60/152 - loss 0.11038204 - time (sec): 43.84 - samples/sec: 277.82 - lr: 0.000077 - momentum: 0.000000 2023-10-06 10:27:12,194 epoch 6 - iter 75/152 - loss 0.11913649 - time (sec): 55.42 - samples/sec: 278.29 - lr: 0.000076 - momentum: 0.000000 2023-10-06 10:27:22,833 epoch 6 - iter 90/152 - loss 0.11446896 - time (sec): 66.06 - samples/sec: 279.34 - lr: 0.000074 - momentum: 0.000000 2023-10-06 10:27:33,745 epoch 6 - iter 105/152 - loss 0.11149503 - time (sec): 76.97 - samples/sec: 278.18 - lr: 0.000072 - momentum: 0.000000 2023-10-06 10:27:44,826 epoch 6 - iter 120/152 - loss 0.11300383 - time (sec): 88.05 - samples/sec: 278.17 - lr: 0.000071 - momentum: 0.000000 2023-10-06 10:27:56,388 epoch 6 - iter 135/152 - loss 0.11066339 - time (sec): 99.61 - samples/sec: 278.14 - lr: 0.000069 - momentum: 0.000000 2023-10-06 10:28:06,990 epoch 6 - iter 150/152 - loss 0.10867282 - time (sec): 110.21 - samples/sec: 277.33 - lr: 0.000067 - momentum: 0.000000 2023-10-06 10:28:08,457 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:28:08,457 EPOCH 6 done: loss 0.1081 - lr: 0.000067 2023-10-06 10:28:16,219 DEV : loss 0.15336917340755463 - f1-score (micro avg) 0.7894 2023-10-06 10:28:16,226 saving best model 2023-10-06 10:28:17,112 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:28:27,726 epoch 7 - iter 15/152 - loss 0.09326580 - time (sec): 10.61 - samples/sec: 265.44 - lr: 0.000066 - momentum: 0.000000 2023-10-06 10:28:38,788 epoch 7 - iter 30/152 - loss 0.09593838 - time (sec): 21.67 - samples/sec: 273.69 - lr: 0.000064 - momentum: 0.000000 2023-10-06 10:28:49,985 epoch 7 - iter 45/152 - loss 0.10338269 - time (sec): 32.87 - samples/sec: 275.98 - lr: 0.000062 - momentum: 0.000000 2023-10-06 10:29:01,569 epoch 7 - iter 60/152 - loss 0.09242330 - time (sec): 44.46 - samples/sec: 278.68 - lr: 0.000061 - momentum: 0.000000 2023-10-06 10:29:13,019 epoch 7 - iter 75/152 - loss 0.09030299 - time (sec): 55.91 - samples/sec: 277.56 - lr: 0.000059 - momentum: 0.000000 2023-10-06 10:29:23,900 epoch 7 - iter 90/152 - loss 0.08695488 - time (sec): 66.79 - samples/sec: 276.55 - lr: 0.000057 - momentum: 0.000000 2023-10-06 10:29:34,399 epoch 7 - iter 105/152 - loss 0.08651679 - time (sec): 77.29 - samples/sec: 275.82 - lr: 0.000056 - momentum: 0.000000 2023-10-06 10:29:45,825 epoch 7 - iter 120/152 - loss 0.08483135 - time (sec): 88.71 - samples/sec: 276.10 - lr: 0.000054 - momentum: 0.000000 2023-10-06 10:29:56,578 epoch 7 - iter 135/152 - loss 0.08450830 - time (sec): 99.46 - samples/sec: 276.33 - lr: 0.000052 - momentum: 0.000000 2023-10-06 10:30:07,876 epoch 7 - iter 150/152 - loss 0.08079298 - time (sec): 110.76 - samples/sec: 277.05 - lr: 0.000051 - momentum: 0.000000 2023-10-06 10:30:09,145 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:30:09,146 EPOCH 7 done: loss 0.0814 - lr: 0.000051 2023-10-06 10:30:17,034 DEV : loss 0.15049363672733307 - f1-score (micro avg) 0.8014 2023-10-06 10:30:17,043 saving best model 2023-10-06 10:30:17,959 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:30:29,597 epoch 8 - iter 15/152 - loss 0.04241940 - time (sec): 11.64 - samples/sec: 286.52 - lr: 0.000049 - momentum: 0.000000 2023-10-06 10:30:40,809 epoch 8 - iter 30/152 - loss 0.06172672 - time (sec): 22.85 - samples/sec: 282.99 - lr: 0.000047 - momentum: 0.000000 2023-10-06 10:30:51,511 epoch 8 - iter 45/152 - loss 0.05567269 - time (sec): 33.55 - samples/sec: 281.51 - lr: 0.000046 - momentum: 0.000000 2023-10-06 10:31:01,744 epoch 8 - iter 60/152 - loss 0.06084948 - time (sec): 43.78 - samples/sec: 282.32 - lr: 0.000044 - momentum: 0.000000 2023-10-06 10:31:12,008 epoch 8 - iter 75/152 - loss 0.05856775 - time (sec): 54.05 - samples/sec: 283.99 - lr: 0.000042 - momentum: 0.000000 2023-10-06 10:31:22,740 epoch 8 - iter 90/152 - loss 0.06433682 - time (sec): 64.78 - samples/sec: 287.48 - lr: 0.000041 - momentum: 0.000000 2023-10-06 10:31:32,083 epoch 8 - iter 105/152 - loss 0.06447632 - time (sec): 74.12 - samples/sec: 285.58 - lr: 0.000039 - momentum: 0.000000 2023-10-06 10:31:42,555 epoch 8 - iter 120/152 - loss 0.06436414 - time (sec): 84.59 - samples/sec: 287.38 - lr: 0.000037 - momentum: 0.000000 2023-10-06 10:31:53,116 epoch 8 - iter 135/152 - loss 0.06348652 - time (sec): 95.16 - samples/sec: 288.79 - lr: 0.000036 - momentum: 0.000000 2023-10-06 10:32:03,629 epoch 8 - iter 150/152 - loss 0.06534277 - time (sec): 105.67 - samples/sec: 289.94 - lr: 0.000034 - momentum: 0.000000 2023-10-06 10:32:04,913 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:32:04,914 EPOCH 8 done: loss 0.0649 - lr: 0.000034 2023-10-06 10:32:12,135 DEV : loss 0.15033212304115295 - f1-score (micro avg) 0.8163 2023-10-06 10:32:12,143 saving best model 2023-10-06 10:32:13,089 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:32:23,574 epoch 9 - iter 15/152 - loss 0.05908528 - time (sec): 10.48 - samples/sec: 291.79 - lr: 0.000032 - momentum: 0.000000 2023-10-06 10:32:34,137 epoch 9 - iter 30/152 - loss 0.05493238 - time (sec): 21.05 - samples/sec: 294.63 - lr: 0.000031 - momentum: 0.000000 2023-10-06 10:32:44,467 epoch 9 - iter 45/152 - loss 0.05753799 - time (sec): 31.38 - samples/sec: 294.93 - lr: 0.000029 - momentum: 0.000000 2023-10-06 10:32:54,957 epoch 9 - iter 60/152 - loss 0.05865308 - time (sec): 41.87 - samples/sec: 296.56 - lr: 0.000027 - momentum: 0.000000 2023-10-06 10:33:04,747 epoch 9 - iter 75/152 - loss 0.05488586 - time (sec): 51.66 - samples/sec: 293.26 - lr: 0.000026 - momentum: 0.000000 2023-10-06 10:33:15,265 epoch 9 - iter 90/152 - loss 0.05741659 - time (sec): 62.18 - samples/sec: 294.64 - lr: 0.000024 - momentum: 0.000000 2023-10-06 10:33:25,437 epoch 9 - iter 105/152 - loss 0.05511597 - time (sec): 72.35 - samples/sec: 294.64 - lr: 0.000022 - momentum: 0.000000 2023-10-06 10:33:36,015 epoch 9 - iter 120/152 - loss 0.05551882 - time (sec): 82.92 - samples/sec: 294.74 - lr: 0.000021 - momentum: 0.000000 2023-10-06 10:33:46,490 epoch 9 - iter 135/152 - loss 0.05863271 - time (sec): 93.40 - samples/sec: 294.03 - lr: 0.000019 - momentum: 0.000000 2023-10-06 10:33:57,321 epoch 9 - iter 150/152 - loss 0.05625687 - time (sec): 104.23 - samples/sec: 293.37 - lr: 0.000018 - momentum: 0.000000 2023-10-06 10:33:58,684 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:33:58,684 EPOCH 9 done: loss 0.0559 - lr: 0.000018 2023-10-06 10:34:05,973 DEV : loss 0.1504821926355362 - f1-score (micro avg) 0.8126 2023-10-06 10:34:05,982 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:34:16,053 epoch 10 - iter 15/152 - loss 0.08761670 - time (sec): 10.07 - samples/sec: 289.29 - lr: 0.000016 - momentum: 0.000000 2023-10-06 10:34:25,866 epoch 10 - iter 30/152 - loss 0.06113544 - time (sec): 19.88 - samples/sec: 282.11 - lr: 0.000014 - momentum: 0.000000 2023-10-06 10:34:37,193 epoch 10 - iter 45/152 - loss 0.05328596 - time (sec): 31.21 - samples/sec: 286.00 - lr: 0.000012 - momentum: 0.000000 2023-10-06 10:34:47,971 epoch 10 - iter 60/152 - loss 0.05387219 - time (sec): 41.99 - samples/sec: 289.64 - lr: 0.000011 - momentum: 0.000000 2023-10-06 10:34:58,676 epoch 10 - iter 75/152 - loss 0.05043342 - time (sec): 52.69 - samples/sec: 289.51 - lr: 0.000009 - momentum: 0.000000 2023-10-06 10:35:09,526 epoch 10 - iter 90/152 - loss 0.05149537 - time (sec): 63.54 - samples/sec: 289.55 - lr: 0.000008 - momentum: 0.000000 2023-10-06 10:35:21,028 epoch 10 - iter 105/152 - loss 0.05263310 - time (sec): 75.04 - samples/sec: 288.52 - lr: 0.000006 - momentum: 0.000000 2023-10-06 10:35:31,797 epoch 10 - iter 120/152 - loss 0.05146062 - time (sec): 85.81 - samples/sec: 286.93 - lr: 0.000004 - momentum: 0.000000 2023-10-06 10:35:42,510 epoch 10 - iter 135/152 - loss 0.05312749 - time (sec): 96.53 - samples/sec: 286.80 - lr: 0.000003 - momentum: 0.000000 2023-10-06 10:35:52,841 epoch 10 - iter 150/152 - loss 0.05177611 - time (sec): 106.86 - samples/sec: 286.76 - lr: 0.000001 - momentum: 0.000000 2023-10-06 10:35:54,036 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:35:54,037 EPOCH 10 done: loss 0.0515 - lr: 0.000001 2023-10-06 10:36:01,323 DEV : loss 0.15003764629364014 - f1-score (micro avg) 0.8172 2023-10-06 10:36:01,332 saving best model 2023-10-06 10:36:03,119 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:36:03,137 Loading model from best epoch ... 2023-10-06 10:36:06,144 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-date, B-date, E-date, I-date, S-object, B-object, E-object, I-object 2023-10-06 10:36:12,975 Results: - F-score (micro) 0.7984 - F-score (macro) 0.4848 - Accuracy 0.672 By class: precision recall f1-score support scope 0.7658 0.8013 0.7832 151 pers 0.7521 0.9479 0.8387 96 work 0.7411 0.8737 0.8019 95 loc 0.0000 0.0000 0.0000 3 date 0.0000 0.0000 0.0000 3 micro avg 0.7545 0.8477 0.7984 348 macro avg 0.4518 0.5246 0.4848 348 weighted avg 0.7421 0.8477 0.7901 348 2023-10-06 10:36:12,975 ----------------------------------------------------------------------------------------------------