|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
- zh |
|
base_model: |
|
- agentica-org/DeepCoder-14B-Preview |
|
- EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2 |
|
- Azure99/Blossom-V6-14B |
|
- deepcogito/cogito-v1-preview-qwen-14B |
|
- qihoo360/Light-R1-14B-DS |
|
- Qwen/Qwen2.5-14B |
|
- Qwen/Qwen2.5-14B-Instruct |
|
- Qwen/Qwen2.5-14B-Instruct-1M |
|
- Qwen/Qwen2.5-Coder-14B |
|
- Qwen/Qwen2.5-Coder-14B-Instruct |
|
- arcee-ai/SuperNova-Medius |
|
- arcee-ai/Virtuoso-Small-v2 |
|
- PKU-DS-LAB/FairyR1-14B-Preview |
|
- FractalAIResearch/Fathom-R1-14B |
|
pipeline_tag: text-generation |
|
tags: |
|
- merge |
|
--- |
|
|
|
 |
|
|
|
# ZYH-LLM-Qwen2.5-14B-V5 |
|
|
|
*The ZYH-LLM-Qwen2.5-14B fifth-generation model was officially released!* |
|
|
|
It merges **high-performance instruction, code, and reasoning models** built on the **Qwen2.5-14B**. |
|
|
|
Recently, many high-performance reasoning models have emerged, such as: |
|
|
|
* [deepcogito/cogito-v1-preview-qwen-14B](https://huggingface.co/deepcogito/cogito-v1-preview-qwen-14B) |
|
* [FractalAIResearch/Fathom-R1-14B](https://huggingface.co/FractalAIResearch/Fathom-R1-14B) |
|
* [agentica-org/DeepCoder-14B-Preview](https://huggingface.co/agentica-org/DeepCoder-14B-Preview) |
|
* [PKU-DS-LAB/FairyR1-14B-Preview](https://huggingface.co/PKU-DS-LAB/FairyR1-14B-Preview) |
|
* [qihoo360/Light-R1-14B-DS](https://huggingface.co/qihoo360/Light-R1-14B-DS) |
|
|
|
These lay a good foundation for further improving model performance. |
|
|
|
## First stage: |
|
|
|
### Step 1: |
|
*Create a code model* |
|
```yaml |
|
models: |
|
- model: Qwen/Qwen2.5-Coder-14B-Instruct |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
merge_method: della |
|
base_model: Qwen/Qwen2.5-Coder-14B |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-Coder-14B-della |
|
``` |
|
### Step 2: |
|
*Create five different instruction models* |
|
```yaml |
|
models: |
|
- model: Qwen/Qwen2.5-14B-Instruct |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
- model: Qwen/Qwen2.5-14B-Instruct-1M |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
merge_method: della |
|
base_model: Qwen/Qwen2.5-14B |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-della-Base |
|
``` |
|
```yaml |
|
models: |
|
- model: Qwen/Qwen2.5-14B-Instruct |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
- model: Qwen/Qwen2.5-14B-Instruct-1M |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
merge_method: della |
|
base_model: arcee-ai/Virtuoso-Small-v2 |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-della-V2 |
|
``` |
|
```yaml |
|
models: |
|
- model: Qwen/Qwen2.5-14B-Instruct |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
- model: Qwen/Qwen2.5-14B-Instruct-1M |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
merge_method: della |
|
base_model: arcee-ai/SuperNova-Medius |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-della-Nova |
|
``` |
|
```yaml |
|
models: |
|
- model: Qwen/Qwen2.5-14B-Instruct |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
- model: Qwen/Qwen2.5-14B-Instruct-1M |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
merge_method: della |
|
base_model: Azure99/Blossom-V6-14B |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-della-V6 |
|
``` |
|
```yaml |
|
models: |
|
- model: Qwen/Qwen2.5-14B-Instruct |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
- model: Qwen/Qwen2.5-14B-Instruct-1M |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
merge_method: della |
|
base_model: EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2 |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-della-EVA |
|
``` |
|
### Step 3: |
|
*Use the **arcee_fusion** merging method to incorporate **cogito-v1-preview-qwen-14B** into five instruction models.* |
|
```yaml |
|
models: |
|
- model: cogito-v1-preview-qwen-14B |
|
merge_method: arcee_fusion |
|
base_model: Qwen2.5-14B-della-Base |
|
parameters: |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
out_dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-Base-cogito |
|
``` |
|
```yaml |
|
models: |
|
- model: cogito-v1-preview-qwen-14B |
|
merge_method: arcee_fusion |
|
base_model: Qwen2.5-14B-della-V2 |
|
parameters: |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
out_dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-V2-cogito |
|
``` |
|
```yaml |
|
models: |
|
- model: cogito-v1-preview-qwen-14B |
|
merge_method: arcee_fusion |
|
base_model: Qwen2.5-14B-della-V6 |
|
parameters: |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
out_dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-V6-cogito |
|
``` |
|
```yaml |
|
models: |
|
- model: cogito-v1-preview-qwen-14B |
|
merge_method: arcee_fusion |
|
base_model: Qwen2.5-14B-della-Nova |
|
parameters: |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
out_dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-Nova-cogito |
|
``` |
|
```yaml |
|
models: |
|
- model: cogito-v1-preview-qwen-14B |
|
merge_method: arcee_fusion |
|
base_model: Qwen2.5-14B-della-EVA |
|
parameters: |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
out_dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-EVA-cogito |
|
``` |
|
## Second stage: |
|
|
|
### Step 1: |
|
*Create three instruction models with a bias towards reasoning.* |
|
```yaml |
|
merge_method: model_stock |
|
base_model: Qwen2.5-14B-Base-cogito |
|
models: |
|
- model: Qwen2.5-14B-V2-cogito |
|
- model: Qwen2.5-Coder-14B-della |
|
- model: agentica-org/DeepCoder-14B-Preview |
|
- model: PKU-DS-LAB/FairyR1-14B-Preview |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
int8_mask: true |
|
normalize: true |
|
name: Qwen2.5-14B-cogito-mst-Coder |
|
``` |
|
```yaml |
|
merge_method: model_stock |
|
base_model: Qwen2.5-14B-Base-cogito |
|
models: |
|
- model: Qwen2.5-14B-V2-cogito |
|
- model: Qwen2.5-14B-V6-cogito |
|
- model: FractalAIResearch/Fathom-R1-14B |
|
- model: qihoo360/Light-R1-14B-DS |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
int8_mask: true |
|
normalize: true |
|
name: Qwen2.5-14B-cogito-mst-V6 |
|
``` |
|
```yaml |
|
merge_method: model_stock |
|
base_model: Qwen2.5-14B-Base-cogito |
|
models: |
|
- model: Qwen2.5-14B-V2-cogito |
|
- model: Qwen2.5-14B-Nova-cogito |
|
- model: FractalAIResearch/Fathom-R1-14B |
|
- model: qihoo360/Light-R1-14B-DS |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
int8_mask: true |
|
normalize: true |
|
name: Qwen2.5-14B-cogito-mst-Nova |
|
``` |
|
### Step 2: |
|
*Create a pure instruction model to restore the generality of the final model.* |
|
```yaml |
|
merge_method: model_stock |
|
base_model: Qwen2.5-14B-Base-cogito |
|
models: |
|
- model: Qwen2.5-14B-V2-cogito |
|
- model: Qwen2.5-14B-V6-cogito |
|
- model: Qwen2.5-14B-Nova-cogito |
|
- model: Qwen2.5-14B-EVA-cogito |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
int8_mask: true |
|
normalize: true |
|
name: Qwen2.5-14B-cogito-mst-it |
|
``` |
|
## Third stage: |
|
|
|
### Step 1: |
|
*Create a base model with a context of 1 million tokens.* |
|
```yaml |
|
merge_method: sce |
|
models: |
|
# Pivot model |
|
- model: Qwen/Qwen2.5-14B-Instruct-1M |
|
# Target models |
|
- model: Qwen/Qwen2.5-14B |
|
base_model: Qwen/Qwen2.5-14B-Instruct-1M |
|
parameters: |
|
select_topk: 1 |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
normalize: true |
|
int8_mask: true |
|
name: Qwen2.5-14B-1M |
|
``` |
|
```yaml |
|
models: |
|
- model: Qwen/Qwen2.5-14B-Instruct |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
- model: Qwen/Qwen2.5-14B-Instruct-1M |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
merge_method: della |
|
base_model: Qwen2.5-14B-1M |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-della-Base-1M |
|
``` |
|
### Step 2: |
|
*Use the **arcee_fusion** merging method to incorporate **cogito-v1-preview-qwen-14B** into a base model with a context of 1 million tokens.* |
|
```yaml |
|
models: |
|
- model: cogito-v1-preview-qwen-14B |
|
merge_method: arcee_fusion |
|
base_model: Qwen2.5-14B-della-Base-1M |
|
parameters: |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
out_dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-cogito-Base-1M |
|
``` |
|
## Final stage: |
|
|
|
```yaml |
|
merge_method: model_stock |
|
base_model: Qwen2.5-14B-cogito-Base-1M |
|
models: |
|
- model: Qwen2.5-14B-cogito-mst-Coder |
|
- model: Qwen2.5-14B-cogito-mst-V6 |
|
- model: Qwen2.5-14B-cogito-mst-Nova |
|
- model: Qwen2.5-14B-cogito-mst-it |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
int8_mask: true |
|
normalize: true |
|
name: ZYH-LLM-Qwen2.5-14B-V5 |
|
``` |