Fine tuning phi-3.5 or phi-2 ?

#33
by lpalbou - opened

Hi everyone, phi-3.5 is an instruct model so has already gone a number of post process including fine tuning.

For a moderate fine-tuning of a 5000-50000 { prompt, response }, would phi-3.5 be suitable or is it better to use phi-2 ?

Also, could you confirm the best layers to fine-tune are for phi-3.5 :
"self_attn.qkv_proj", "self_attn.o_proj", "mlp.gate_up_proj", "mlp.down_proj"

and for phi-2.0:
"q_proj", "k_proj", "v_proj", "dense" ?

Sign up or log in to comment