INFO:__main__: Optimizer = adafactor | |
INFO:__main__: Learning rate (peak) = 0.005 | |
INFO:__main__: Num examples = 31519126 | |
INFO:__main__: Num tokenized group examples 36347268 | |
INFO:__main__: Num Epochs = 10 | |
INFO:__main__: Instantaneous batch size per device = 16 | |
INFO:__main__: Total train batch size (w. parallel & grad accum) = 128 | |
INFO:__main__: Steps per epoch = 283963 | |
INFO:__main__: Total optimization steps = 2839630 |