Unusual Training metrics
#22
by
shadowlilac
- opened
When finetuning the model with the TRL SFTTrainer I get a loss in the range of 60-65 and an extremely low starting mean token accuracy of 20%> Is this expected or am I doing something wrong? I am using a training script that otherwise works on any other LLM for finetuning.
I noticed the router auxiliary loss coefficient is 0.9. is this value expected? Seems a bit high