Optimized Mistral-Hermes Merge (3B Parameters)
This is an optimized merge of pre-trained language models created using mergekit, successfully reducing the original 7B models to approximately 3B parameters while maintaining core capabilities.
Model Size Optimization
The reduction from 7B to 3B parameters was achieved through:
- Layer reduction from 32 to 12 layers
- Conversion to bfloat16 format (half precision)
- Selective layer range implementation
- SLERP merge method optimization
About Me
I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and resource usage.
๐ Connect with me on LinkedIn
Merge Details
Merge Method & Optimization
This model was merged using the SLERP merge method with specific optimizations:
- Reduced to 12 layers for better memory efficiency
- Using bfloat16 format
- Optimized attention and MLP parameters
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
base_model: OpenPipe/mistral-ft-optimized-1218
dtype: bfloat16
merge_method: slerp
parameters:
t:
- filter: self_attn
value: [0.0, 0.5]
- filter: mlp
value: [1.0, 0.5]
- value: 0.5
slices:
- sources:
- layer_range: [0, 12]
model: OpenPipe/mistral-ft-optimized-1218
- layer_range: [0, 12]
model: mlabonne/NeuralHermes-2.5-Mistral-7B
- Downloads last month
- 12