Optimized Mistral-Hermes Merge (3B Parameters)

This is an optimized merge of pre-trained language models created using mergekit, successfully reducing the original 7B models to approximately 3B parameters while maintaining core capabilities.

Model Size Optimization

The reduction from 7B to 3B parameters was achieved through:

  • Layer reduction from 32 to 12 layers
  • Conversion to bfloat16 format (half precision)
  • Selective layer range implementation
  • SLERP merge method optimization

About Me

I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and resource usage.

๐Ÿ”— Connect with me on LinkedIn

Merge Details

Merge Method & Optimization

This model was merged using the SLERP merge method with specific optimizations:

  • Reduced to 12 layers for better memory efficiency
  • Using bfloat16 format
  • Optimized attention and MLP parameters

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

base_model: OpenPipe/mistral-ft-optimized-1218
dtype: bfloat16
merge_method: slerp
parameters:
  t:
  - filter: self_attn
    value: [0.0, 0.5]
  - filter: mlp
    value: [1.0, 0.5]
  - value: 0.5
slices:
- sources:
  - layer_range: [0, 12]
    model: OpenPipe/mistral-ft-optimized-1218
  - layer_range: [0, 12]
    model: mlabonne/NeuralHermes-2.5-Mistral-7B
Downloads last month
12
Safetensors
Model size
2.88B params
Tensor type
BF16
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for Davidsv/SUONG-1