File size: 2,078 Bytes
4e8df82 0950e36 4e8df82 0950e36 233c68c 8affad3 233c68c 8affad3 0950e36 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
---
base_model:
- HuggingFaceTB/cosmo-1b
- Lambent/cosmo-1b-galore-pythontest
- Lambent/cosmo-1b-qlora-pythontest
- Lambent/cosmo-1b-lisa-pythontest
library_name: transformers
tags:
- mergekit
- merge
---
# pythontestmerge
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
Testing training data validation:
* Model Stock 3/4 Loss: 0.451
My hypothesis that the pretraining was dragging down the stock merge's performance on training data in any way seems inaccurate.
Cosmopedia data validation:
* Model Stock 3/4 Loss: 1.021
On the other hand, it indeed may have pulled it towards forgetfulness.
This is a better loss vs catastrophic forgetting than the prior Model Stock or any of the training methods.
I'm going to estimate that using the base model as an anchor point is a strong remedy for catastrophic forgetting when using multiple different training methods on the same dataset.
Less sure I can say anything about how it affects adaptation to the new dataset. It's possible that if using this method, you'd want louder/stronger adaptation to start with than you otherwise would.
## Merge Details
### Merge Method
This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [HuggingFaceTB/cosmo-1b](https://huggingface.co/HuggingFaceTB/cosmo-1b) as a base.
### Models Merged
The following models were included in the merge:
* [Lambent/cosmo-1b-galore-pythontest](https://huggingface.co/Lambent/cosmo-1b-galore-pythontest)
* [Lambent/cosmo-1b-qlora-pythontest](https://huggingface.co/Lambent/cosmo-1b-qlora-pythontest)
* [Lambent/cosmo-1b-lisa-pythontest](https://huggingface.co/Lambent/cosmo-1b-lisa-pythontest)
### Configuration
The following YAML configuration was used to produce this model:
```yaml
models:
- model: Lambent/cosmo-1b-lisa-pythontest
- model: Lambent/cosmo-1b-qlora-pythontest
- model: Lambent/cosmo-1b-galore-pythontest
base_model: HuggingFaceTB/cosmo-1b
merge_method: model_stock
parameters:
filter_wise: false
dtype: float16
```
|