Unusual Training metrics

#22
by shadowlilac - opened

When finetuning the model with the TRL SFTTrainer I get a loss in the range of 60-65 and an extremely low starting mean token accuracy of 20%> Is this expected or am I doing something wrong? I am using a training script that otherwise works on any other LLM for finetuning.

I noticed the router auxiliary loss coefficient is 0.9. is this value expected? Seems a bit high

Sign up or log in to comment