Unusual Training metrics

#22

by shadowlilac - opened 15 days ago

15 days ago

When finetuning the model with the TRL SFTTrainer I get a loss in the range of 60-65 and an extremely low starting mean token accuracy of 20%> Is this expected or am I doing something wrong? I am using a training script that otherwise works on any other LLM for finetuning.

shadowlilac

15 days ago

I noticed the router auxiliary loss coefficient is 0.9. is this value expected? Seems a bit high

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment