Fine tuning phi-3.5 or phi-2 ?
#33
by
lpalbou
- opened
Hi everyone, phi-3.5 is an instruct model so has already gone a number of post process including fine tuning.
For a moderate fine-tuning of a 5000-50000 { prompt, response }, would phi-3.5 be suitable or is it better to use phi-2 ?
Also, could you confirm the best layers to fine-tune are for phi-3.5 :
"self_attn.qkv_proj", "self_attn.o_proj", "mlp.gate_up_proj", "mlp.down_proj"
and for phi-2.0:
"q_proj", "k_proj", "v_proj", "dense" ?