license: apache-2.0 | |
Catastrophic forgetting test results: | |
Initial evaluation loss on 1k subset of HuggingFaceTB/cosmopedia-100k dataset was 1.102. 100 steps of LISA training reduced this to 1.049. | |
Comparison to control: cosmo-1b started out with 1.003 loss on (a different subset of) dataset, increasing to 1.024 at 100 steps. | |
Axolotl config: Same as qdora version but without dora. |