TinyLlama-Hybrid-Merge
This is a merge of TinyLlama models created using MergeKit, combining the foundational capabilities of the base TinyLlama with its Chat-tuned version through a sophisticated SLERP fusion with variable interpolation values.
About Me
I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and capabilities.
Merge Details
Merge Method
This model uses SLERP (Spherical Linear Interpolation) with carefully tuned parameters to achieve optimal performance balance:
- Attention Layers: Variable interpolation values [0, 0.5, 0.3, 0.7, 1] leveraging the chat model's instruction-following capabilities
- MLP Layers: Variable interpolation values [1, 0.5, 0.7, 0.3, 0] maintaining the base model's reasoning capabilities
- Other Parameters: 0.5 interpolation value creating an equal blend for balanced performance
- Format: bfloat16 precision for efficient memory usage
Models Merged
- TinyLlama/TinyLlama-1.1B-step-50K-105b - The base TinyLlama model offering foundational language capabilities
- TinyLlama/TinyLlama-1.1B-Chat-v1.0 - A fine-tuned version optimized for chat and instruction following
Configuration
slices:
- sources:
- model: TinyLlama/TinyLlama-1.1B-step-50K-105b
layer_range: [0, 22]
- model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
layer_range: [0, 22]
merge_method: slerp
base_model: TinyLlama/TinyLlama-1.1B-step-50K-105b
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
dtype: bfloat16
Model Capabilities
This merge combines:
- TinyLlama base model's foundational knowledge and reasoning
- TinyLlama Chat's improved instruction following and conversational abilities
- Optimized parameter distribution for balanced performance
- Compact 1.1B parameter size suitable for resource-constrained environments
The resulting model provides enhanced performance on tasks requiring both reasoning and conversational abilities, such as:
- Basic question answering with improved coherence
- Simple instruction following with better response quality
- Lightweight deployment scenarios requiring balanced capabilities
- Educational and demonstration purposes for model merging techniques
Limitations
- Inherits the fundamental limitations of small 1.1B parameter models
- Limited context window and knowledge compared to larger models
- May struggle with complex reasoning, specialized domains, or nuanced tasks
- No additional training beyond the parameter merging process
- Performance ceiling constrained by the small model size
License
This model is released under the Apache 2.0 license, consistent with the underlying models' licenses.
- Downloads last month
- 3