yhavinga's picture
Saving at step 2840k, loss 1.108, acc 0.747
68faee3
raw
history blame contribute delete
429 Bytes
INFO:__main__: Optimizer = adafactor
INFO:__main__: Learning rate (peak) = 0.005
INFO:__main__: Num examples = 31519126
INFO:__main__: Num tokenized group examples 36347268
INFO:__main__: Num Epochs = 10
INFO:__main__: Instantaneous batch size per device = 16
INFO:__main__: Total train batch size (w. parallel & grad accum) = 128
INFO:__main__: Steps per epoch = 283963
INFO:__main__: Total optimization steps = 2839630