--- license: apache-2.0 language: - en - zh base_model: - agentica-org/DeepCoder-14B-Preview - EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2 - Azure99/Blossom-V6-14B - deepcogito/cogito-v1-preview-qwen-14B - qihoo360/Light-R1-14B-DS - Qwen/Qwen2.5-14B - Qwen/Qwen2.5-14B-Instruct - Qwen/Qwen2.5-14B-Instruct-1M - Qwen/Qwen2.5-Coder-14B - Qwen/Qwen2.5-Coder-14B-Instruct - arcee-ai/SuperNova-Medius - arcee-ai/Virtuoso-Small-v2 - PKU-DS-LAB/FairyR1-14B-Preview - FractalAIResearch/Fathom-R1-14B pipeline_tag: text-generation tags: - merge --- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/fWdB3_gWSCHsD3qnQ4ERa.jpeg) # ZYH-LLM-Qwen2.5-14B-V5 *The ZYH-LLM-Qwen2.5-14B fifth-generation model was officially released!* It merges **high-performance instruction, code, and reasoning models** built on the **Qwen2.5-14B**. Recently, many high-performance reasoning models have emerged, such as: * [deepcogito/cogito-v1-preview-qwen-14B](https://huggingface.co/deepcogito/cogito-v1-preview-qwen-14B) * [FractalAIResearch/Fathom-R1-14B](https://huggingface.co/FractalAIResearch/Fathom-R1-14B) * [agentica-org/DeepCoder-14B-Preview](https://huggingface.co/agentica-org/DeepCoder-14B-Preview) * [PKU-DS-LAB/FairyR1-14B-Preview](https://huggingface.co/PKU-DS-LAB/FairyR1-14B-Preview) * [qihoo360/Light-R1-14B-DS](https://huggingface.co/qihoo360/Light-R1-14B-DS) These lay a good foundation for further improving model performance. ## First stage: ### Step 1: *Create a code model* ```yaml models: - model: Qwen/Qwen2.5-Coder-14B-Instruct parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: Qwen/Qwen2.5-Coder-14B parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: bfloat16 tokenizer_source: base name: Qwen2.5-Coder-14B-della ``` ### Step 2: *Create five different instruction models* ```yaml models: - model: Qwen/Qwen2.5-14B-Instruct parameters: density: 1 weight: 1 lambda: 0.9 - model: Qwen/Qwen2.5-14B-Instruct-1M parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: Qwen/Qwen2.5-14B parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-della-Base ``` ```yaml models: - model: Qwen/Qwen2.5-14B-Instruct parameters: density: 1 weight: 1 lambda: 0.9 - model: Qwen/Qwen2.5-14B-Instruct-1M parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: arcee-ai/Virtuoso-Small-v2 parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-della-V2 ``` ```yaml models: - model: Qwen/Qwen2.5-14B-Instruct parameters: density: 1 weight: 1 lambda: 0.9 - model: Qwen/Qwen2.5-14B-Instruct-1M parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: arcee-ai/SuperNova-Medius parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-della-Nova ``` ```yaml models: - model: Qwen/Qwen2.5-14B-Instruct parameters: density: 1 weight: 1 lambda: 0.9 - model: Qwen/Qwen2.5-14B-Instruct-1M parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: Azure99/Blossom-V6-14B parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-della-V6 ``` ```yaml models: - model: Qwen/Qwen2.5-14B-Instruct parameters: density: 1 weight: 1 lambda: 0.9 - model: Qwen/Qwen2.5-14B-Instruct-1M parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2 parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-della-EVA ``` ### Step 3: *Use the **arcee_fusion** merging method to incorporate **cogito-v1-preview-qwen-14B** into five instruction models.* ```yaml models: - model: cogito-v1-preview-qwen-14B merge_method: arcee_fusion base_model: Qwen2.5-14B-della-Base parameters: normalize: true int8_mask: true dtype: bfloat16 out_dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-Base-cogito ``` ```yaml models: - model: cogito-v1-preview-qwen-14B merge_method: arcee_fusion base_model: Qwen2.5-14B-della-V2 parameters: normalize: true int8_mask: true dtype: bfloat16 out_dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-V2-cogito ``` ```yaml models: - model: cogito-v1-preview-qwen-14B merge_method: arcee_fusion base_model: Qwen2.5-14B-della-V6 parameters: normalize: true int8_mask: true dtype: bfloat16 out_dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-V6-cogito ``` ```yaml models: - model: cogito-v1-preview-qwen-14B merge_method: arcee_fusion base_model: Qwen2.5-14B-della-Nova parameters: normalize: true int8_mask: true dtype: bfloat16 out_dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-Nova-cogito ``` ```yaml models: - model: cogito-v1-preview-qwen-14B merge_method: arcee_fusion base_model: Qwen2.5-14B-della-EVA parameters: normalize: true int8_mask: true dtype: bfloat16 out_dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-EVA-cogito ``` ## Second stage: ### Step 1: *Create three instruction models with a bias towards reasoning.* ```yaml merge_method: model_stock base_model: Qwen2.5-14B-Base-cogito models: - model: Qwen2.5-14B-V2-cogito - model: Qwen2.5-Coder-14B-della - model: agentica-org/DeepCoder-14B-Preview - model: PKU-DS-LAB/FairyR1-14B-Preview dtype: bfloat16 tokenizer_source: base int8_mask: true normalize: true name: Qwen2.5-14B-cogito-mst-Coder ``` ```yaml merge_method: model_stock base_model: Qwen2.5-14B-Base-cogito models: - model: Qwen2.5-14B-V2-cogito - model: Qwen2.5-14B-V6-cogito - model: FractalAIResearch/Fathom-R1-14B - model: qihoo360/Light-R1-14B-DS dtype: bfloat16 tokenizer_source: base int8_mask: true normalize: true name: Qwen2.5-14B-cogito-mst-V6 ``` ```yaml merge_method: model_stock base_model: Qwen2.5-14B-Base-cogito models: - model: Qwen2.5-14B-V2-cogito - model: Qwen2.5-14B-Nova-cogito - model: FractalAIResearch/Fathom-R1-14B - model: qihoo360/Light-R1-14B-DS dtype: bfloat16 tokenizer_source: base int8_mask: true normalize: true name: Qwen2.5-14B-cogito-mst-Nova ``` ### Step 2: *Create a pure instruction model to restore the generality of the final model.* ```yaml merge_method: model_stock base_model: Qwen2.5-14B-Base-cogito models: - model: Qwen2.5-14B-V2-cogito - model: Qwen2.5-14B-V6-cogito - model: Qwen2.5-14B-Nova-cogito - model: Qwen2.5-14B-EVA-cogito dtype: bfloat16 tokenizer_source: base int8_mask: true normalize: true name: Qwen2.5-14B-cogito-mst-it ``` ## Third stage: ### Step 1: *Create a base model with a context of 1 million tokens.* ```yaml merge_method: sce models: # Pivot model - model: Qwen/Qwen2.5-14B-Instruct-1M # Target models - model: Qwen/Qwen2.5-14B base_model: Qwen/Qwen2.5-14B-Instruct-1M parameters: select_topk: 1 dtype: bfloat16 tokenizer_source: base normalize: true int8_mask: true name: Qwen2.5-14B-1M ``` ```yaml models: - model: Qwen/Qwen2.5-14B-Instruct parameters: density: 1 weight: 1 lambda: 0.9 - model: Qwen/Qwen2.5-14B-Instruct-1M parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: Qwen2.5-14B-1M parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-della-Base-1M ``` ### Step 2: *Use the **arcee_fusion** merging method to incorporate **cogito-v1-preview-qwen-14B** into a base model with a context of 1 million tokens.* ```yaml models: - model: cogito-v1-preview-qwen-14B merge_method: arcee_fusion base_model: Qwen2.5-14B-della-Base-1M parameters: normalize: true int8_mask: true dtype: bfloat16 out_dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-cogito-Base-1M ``` ## Final stage: ```yaml merge_method: model_stock base_model: Qwen2.5-14B-cogito-Base-1M models: - model: Qwen2.5-14B-cogito-mst-Coder - model: Qwen2.5-14B-cogito-mst-V6 - model: Qwen2.5-14B-cogito-mst-Nova - model: Qwen2.5-14B-cogito-mst-it dtype: bfloat16 tokenizer_source: base int8_mask: true normalize: true name: ZYH-LLM-Qwen2.5-14B-V5 ```