YOYO-AI's picture
Update README.md
2714ebc verified
---
license: apache-2.0
language:
- en
- zh
base_model:
- agentica-org/DeepCoder-14B-Preview
- EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
- Azure99/Blossom-V6-14B
- deepcogito/cogito-v1-preview-qwen-14B
- qihoo360/Light-R1-14B-DS
- Qwen/Qwen2.5-14B
- Qwen/Qwen2.5-14B-Instruct
- Qwen/Qwen2.5-14B-Instruct-1M
- Qwen/Qwen2.5-Coder-14B
- Qwen/Qwen2.5-Coder-14B-Instruct
- arcee-ai/SuperNova-Medius
- arcee-ai/Virtuoso-Small-v2
- PKU-DS-LAB/FairyR1-14B-Preview
- FractalAIResearch/Fathom-R1-14B
pipeline_tag: text-generation
tags:
- merge
---
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/fWdB3_gWSCHsD3qnQ4ERa.jpeg)
# ZYH-LLM-Qwen2.5-14B-V5
*The ZYH-LLM-Qwen2.5-14B fifth-generation model was officially released!*
It merges **high-performance instruction, code, and reasoning models** built on the **Qwen2.5-14B**.
Recently, many high-performance reasoning models have emerged, such as:
* [deepcogito/cogito-v1-preview-qwen-14B](https://huggingface.co/deepcogito/cogito-v1-preview-qwen-14B)
* [FractalAIResearch/Fathom-R1-14B](https://huggingface.co/FractalAIResearch/Fathom-R1-14B)
* [agentica-org/DeepCoder-14B-Preview](https://huggingface.co/agentica-org/DeepCoder-14B-Preview)
* [PKU-DS-LAB/FairyR1-14B-Preview](https://huggingface.co/PKU-DS-LAB/FairyR1-14B-Preview)
* [qihoo360/Light-R1-14B-DS](https://huggingface.co/qihoo360/Light-R1-14B-DS)
These lay a good foundation for further improving model performance.
## First stage:
### Step 1:
*Create a code model*
```yaml
models:
- model: Qwen/Qwen2.5-Coder-14B-Instruct
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: Qwen/Qwen2.5-Coder-14B
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-Coder-14B-della
```
### Step 2:
*Create five different instruction models*
```yaml
models:
- model: Qwen/Qwen2.5-14B-Instruct
parameters:
density: 1
weight: 1
lambda: 0.9
- model: Qwen/Qwen2.5-14B-Instruct-1M
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: Qwen/Qwen2.5-14B
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-della-Base
```
```yaml
models:
- model: Qwen/Qwen2.5-14B-Instruct
parameters:
density: 1
weight: 1
lambda: 0.9
- model: Qwen/Qwen2.5-14B-Instruct-1M
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: arcee-ai/Virtuoso-Small-v2
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-della-V2
```
```yaml
models:
- model: Qwen/Qwen2.5-14B-Instruct
parameters:
density: 1
weight: 1
lambda: 0.9
- model: Qwen/Qwen2.5-14B-Instruct-1M
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: arcee-ai/SuperNova-Medius
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-della-Nova
```
```yaml
models:
- model: Qwen/Qwen2.5-14B-Instruct
parameters:
density: 1
weight: 1
lambda: 0.9
- model: Qwen/Qwen2.5-14B-Instruct-1M
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: Azure99/Blossom-V6-14B
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-della-V6
```
```yaml
models:
- model: Qwen/Qwen2.5-14B-Instruct
parameters:
density: 1
weight: 1
lambda: 0.9
- model: Qwen/Qwen2.5-14B-Instruct-1M
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-della-EVA
```
### Step 3:
*Use the **arcee_fusion** merging method to incorporate **cogito-v1-preview-qwen-14B** into five instruction models.*
```yaml
models:
- model: cogito-v1-preview-qwen-14B
merge_method: arcee_fusion
base_model: Qwen2.5-14B-della-Base
parameters:
normalize: true
int8_mask: true
dtype: bfloat16
out_dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-Base-cogito
```
```yaml
models:
- model: cogito-v1-preview-qwen-14B
merge_method: arcee_fusion
base_model: Qwen2.5-14B-della-V2
parameters:
normalize: true
int8_mask: true
dtype: bfloat16
out_dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-V2-cogito
```
```yaml
models:
- model: cogito-v1-preview-qwen-14B
merge_method: arcee_fusion
base_model: Qwen2.5-14B-della-V6
parameters:
normalize: true
int8_mask: true
dtype: bfloat16
out_dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-V6-cogito
```
```yaml
models:
- model: cogito-v1-preview-qwen-14B
merge_method: arcee_fusion
base_model: Qwen2.5-14B-della-Nova
parameters:
normalize: true
int8_mask: true
dtype: bfloat16
out_dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-Nova-cogito
```
```yaml
models:
- model: cogito-v1-preview-qwen-14B
merge_method: arcee_fusion
base_model: Qwen2.5-14B-della-EVA
parameters:
normalize: true
int8_mask: true
dtype: bfloat16
out_dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-EVA-cogito
```
## Second stage:
### Step 1:
*Create three instruction models with a bias towards reasoning.*
```yaml
merge_method: model_stock
base_model: Qwen2.5-14B-Base-cogito
models:
- model: Qwen2.5-14B-V2-cogito
- model: Qwen2.5-Coder-14B-della
- model: agentica-org/DeepCoder-14B-Preview
- model: PKU-DS-LAB/FairyR1-14B-Preview
dtype: bfloat16
tokenizer_source: base
int8_mask: true
normalize: true
name: Qwen2.5-14B-cogito-mst-Coder
```
```yaml
merge_method: model_stock
base_model: Qwen2.5-14B-Base-cogito
models:
- model: Qwen2.5-14B-V2-cogito
- model: Qwen2.5-14B-V6-cogito
- model: FractalAIResearch/Fathom-R1-14B
- model: qihoo360/Light-R1-14B-DS
dtype: bfloat16
tokenizer_source: base
int8_mask: true
normalize: true
name: Qwen2.5-14B-cogito-mst-V6
```
```yaml
merge_method: model_stock
base_model: Qwen2.5-14B-Base-cogito
models:
- model: Qwen2.5-14B-V2-cogito
- model: Qwen2.5-14B-Nova-cogito
- model: FractalAIResearch/Fathom-R1-14B
- model: qihoo360/Light-R1-14B-DS
dtype: bfloat16
tokenizer_source: base
int8_mask: true
normalize: true
name: Qwen2.5-14B-cogito-mst-Nova
```
### Step 2:
*Create a pure instruction model to restore the generality of the final model.*
```yaml
merge_method: model_stock
base_model: Qwen2.5-14B-Base-cogito
models:
- model: Qwen2.5-14B-V2-cogito
- model: Qwen2.5-14B-V6-cogito
- model: Qwen2.5-14B-Nova-cogito
- model: Qwen2.5-14B-EVA-cogito
dtype: bfloat16
tokenizer_source: base
int8_mask: true
normalize: true
name: Qwen2.5-14B-cogito-mst-it
```
## Third stage:
### Step 1:
*Create a base model with a context of 1 million tokens.*
```yaml
merge_method: sce
models:
# Pivot model
- model: Qwen/Qwen2.5-14B-Instruct-1M
# Target models
- model: Qwen/Qwen2.5-14B
base_model: Qwen/Qwen2.5-14B-Instruct-1M
parameters:
select_topk: 1
dtype: bfloat16
tokenizer_source: base
normalize: true
int8_mask: true
name: Qwen2.5-14B-1M
```
```yaml
models:
- model: Qwen/Qwen2.5-14B-Instruct
parameters:
density: 1
weight: 1
lambda: 0.9
- model: Qwen/Qwen2.5-14B-Instruct-1M
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: Qwen2.5-14B-1M
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-della-Base-1M
```
### Step 2:
*Use the **arcee_fusion** merging method to incorporate **cogito-v1-preview-qwen-14B** into a base model with a context of 1 million tokens.*
```yaml
models:
- model: cogito-v1-preview-qwen-14B
merge_method: arcee_fusion
base_model: Qwen2.5-14B-della-Base-1M
parameters:
normalize: true
int8_mask: true
dtype: bfloat16
out_dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-cogito-Base-1M
```
## Final stage:
```yaml
merge_method: model_stock
base_model: Qwen2.5-14B-cogito-Base-1M
models:
- model: Qwen2.5-14B-cogito-mst-Coder
- model: Qwen2.5-14B-cogito-mst-V6
- model: Qwen2.5-14B-cogito-mst-Nova
- model: Qwen2.5-14B-cogito-mst-it
dtype: bfloat16
tokenizer_source: base
int8_mask: true
normalize: true
name: ZYH-LLM-Qwen2.5-14B-V5
```