Hello! Could the authors provide some more information into the training process of this model. I know it already lists how many H100 GPUs were used but some metrics on GPU hours and total cost specifically and other compute measures would be helpful. Thank you!