TinyLlama-Hybrid-Merge

This is a merge of TinyLlama models created using MergeKit, combining the foundational capabilities of the base TinyLlama with its Chat-tuned version through a sophisticated SLERP fusion with variable interpolation values.

About Me

I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and capabilities.

🔗 Connect with me on LinkedIn

Merge Details

Merge Method

This model uses SLERP (Spherical Linear Interpolation) with carefully tuned parameters to achieve optimal performance balance:

Attention Layers: Variable interpolation values [0, 0.5, 0.3, 0.7, 1] leveraging the chat model's instruction-following capabilities
MLP Layers: Variable interpolation values [1, 0.5, 0.7, 0.3, 0] maintaining the base model's reasoning capabilities
Other Parameters: 0.5 interpolation value creating an equal blend for balanced performance
Format: bfloat16 precision for efficient memory usage

Models Merged

TinyLlama/TinyLlama-1.1B-step-50K-105b - The base TinyLlama model offering foundational language capabilities
TinyLlama/TinyLlama-1.1B-Chat-v1.0 - A fine-tuned version optimized for chat and instruction following

Configuration

slices:
  - sources:
      - model: TinyLlama/TinyLlama-1.1B-step-50K-105b
        layer_range: [0, 22]
      - model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
        layer_range: [0, 22]
merge_method: slerp
base_model: TinyLlama/TinyLlama-1.1B-step-50K-105b
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16

Model Capabilities

This merge combines:

TinyLlama base model's foundational knowledge and reasoning
TinyLlama Chat's improved instruction following and conversational abilities
Optimized parameter distribution for balanced performance
Compact 1.1B parameter size suitable for resource-constrained environments

The resulting model provides enhanced performance on tasks requiring both reasoning and conversational abilities, such as:

Basic question answering with improved coherence
Simple instruction following with better response quality
Lightweight deployment scenarios requiring balanced capabilities
Educational and demonstration purposes for model merging techniques

Limitations

Inherits the fundamental limitations of small 1.1B parameter models
Limited context window and knowledge compared to larger models
May struggle with complex reasoning, specialized domains, or nuanced tasks
No additional training beyond the parameter merging process
Performance ceiling constrained by the small model size

License

This model is released under the Apache 2.0 license, consistent with the underlying models' licenses.

Davidsv
/

TinyLlama-Chat-Merge