Davidsv
/

TinyLlama-Chat-Merge

@@ -1,23 +1,36 @@
 ---
 base_model:
 - TinyLlama/TinyLlama-1.1B-step-50K-105b
 - TinyLlama/TinyLlama-1.1B-Chat-v1.0
 tags:
 - merge
 - mergekit
-- lazymergekit
-- TinyLlama/TinyLlama-1.1B-step-50K-105b
-- TinyLlama/TinyLlama-1.1B-Chat-v1.0
 ---
-# TinyLlama-Chat-Merge
-TinyLlama-Chat-Merge is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
-* [TinyLlama/TinyLlama-1.1B-step-50K-105b](https://huggingface.co/TinyLlama/TinyLlama-1.1B-step-50K-105b)
-* [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
-## 🧩 Configuration
 ```yaml
 slices:
   - sources:
@@ -37,27 +50,25 @@ parameters:
 dtype: bfloat16
 ```
-## 💻 Usage
-```python
-!pip install -qU transformers accelerate
-from transformers import AutoTokenizer
-import transformers
-import torch
-model = "Davidsv/TinyLlama-Chat-Merge"
-messages = [{"role": "user", "content": "What is a large language model?"}]
-tokenizer = AutoTokenizer.from_pretrained(model)
-prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-pipeline = transformers.pipeline(
-    "text-generation",
-    model=model,
-    torch_dtype=torch.float16,
-    device_map="auto",
-)
-outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
-print(outputs[0]["generated_text"])
-```

 ---
+license: apache-2.0
 base_model:
 - TinyLlama/TinyLlama-1.1B-step-50K-105b
 - TinyLlama/TinyLlama-1.1B-Chat-v1.0
 tags:
 - merge
 - mergekit
+- tinyllama
+- slerp
 ---
+# TinyLlama-Hybrid-Merge
+This is a merge of TinyLlama models created using MergeKit, combining the foundational capabilities of the base TinyLlama with its Chat-tuned version through a sophisticated SLERP fusion with variable interpolation values.
+## About Me
+I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and capabilities.
+🔗 [Connect with me on LinkedIn](https://www.linkedin.com/in/david-soeiro-vuong-a28b582ba/)
+## Merge Details
+### Merge Method
+This model uses SLERP (Spherical Linear Interpolation) with carefully tuned parameters to achieve optimal performance balance:
+- **Attention Layers**: Variable interpolation values [0, 0.5, 0.3, 0.7, 1] leveraging the chat model's instruction-following capabilities
+- **MLP Layers**: Variable interpolation values [1, 0.5, 0.7, 0.3, 0] maintaining the base model's reasoning capabilities
+- **Other Parameters**: 0.5 interpolation value creating an equal blend for balanced performance
+- **Format**: bfloat16 precision for efficient memory usage
+### Models Merged
+* [TinyLlama/TinyLlama-1.1B-step-50K-105b](https://huggingface.co/TinyLlama/TinyLlama-1.1B-step-50K-105b) - The base TinyLlama model offering foundational language capabilities
+* [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) - A fine-tuned version optimized for chat and instruction following
+### Configuration
 ```yaml
 slices:
   - sources:
 dtype: bfloat16
 ```
+## Model Capabilities
+This merge combines:
+- TinyLlama base model's foundational knowledge and reasoning
+- TinyLlama Chat's improved instruction following and conversational abilities
+- Optimized parameter distribution for balanced performance
+- Compact 1.1B parameter size suitable for resource-constrained environments
+The resulting model provides enhanced performance on tasks requiring both reasoning and conversational abilities, such as:
+- Basic question answering with improved coherence
+- Simple instruction following with better response quality
+- Lightweight deployment scenarios requiring balanced capabilities
+- Educational and demonstration purposes for model merging techniques
+## Limitations
+- Inherits the fundamental limitations of small 1.1B parameter models
+- Limited context window and knowledge compared to larger models
+- May struggle with complex reasoning, specialized domains, or nuanced tasks
+- No additional training beyond the parameter merging process
+- Performance ceiling constrained by the small model size
+## License
+This model is released under the Apache 2.0 license, consistent with the underlying models' licenses.