Optimized Mistral-Hermes Merge (3B Parameters)

This is an optimized merge of pre-trained language models created using mergekit, successfully reducing the original 7B models to approximately 3B parameters while maintaining core capabilities.

Model Size Optimization

The reduction from 7B to 3B parameters was achieved through:

Layer reduction from 32 to 12 layers
Conversion to bfloat16 format (half precision)
Selective layer range implementation
SLERP merge method optimization

About Me

I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and resource usage.

🔗 Connect with me on LinkedIn

Merge Details

Merge Method & Optimization

This model was merged using the SLERP merge method with specific optimizations:

Reduced to 12 layers for better memory efficiency
Using bfloat16 format
Optimized attention and MLP parameters

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

base_model: OpenPipe/mistral-ft-optimized-1218
dtype: bfloat16
merge_method: slerp
parameters:
  t:
  - filter: self_attn
    value: [0.0, 0.5]
  - filter: mlp
    value: [1.0, 0.5]
  - value: 0.5
slices:
- sources:
  - layer_range: [0, 12]
    model: OpenPipe/mistral-ft-optimized-1218
  - layer_range: [0, 12]
    model: mlabonne/NeuralHermes-2.5-Mistral-7B

Davidsv
/

SUONG-1