---
license: apache-2.0
base_model:
- Qwen/Qwen3-32B
- Qwen/Qwen2.5-72B-Instruct
tags:
- merge
- frankenmerge
- qwen
---

# Qwen3-72B-Synthesis

This still doesn't work, I'm trying to fix it.

A Qwen3-Architecture 72B Model Forged from `Qwen3-32B` and `Qwen2.5-72B-Instruct`.

## Model Description

**Qwen3-72B-Synthesis** is an experimental, 80-layer, 72-billion-parameter large language model. It represents a novel approach to model creation, designed to produce a model with the pure, modern **Qwen3 architecture** while inheriting the vast, high-quality knowledge of the 72B-scale **Qwen2.5-Instruct** model.

This was not a simple merge. It was a multi-phase surgical procedure involving dimensional up-scaling, architectural alignment, and a strategic "knowledge transplant" using `MergeKit`. The result is a unique checkpoint that serves as an ideal starting point for further fine-tuning.

The core philosophy was to use `Qwen/Qwen3-32B` as the architectural "foundation" and `Qwen/Qwen2.5-72B-Instruct` as the "knowledge donor."

## Model Details

*   **Architecture:** Qwen3 (RMSNorm, SwiGLU, no biases, includes `q_norm` and `k_norm`)
*   **Parameters:** ~72 Billion
*   **Layers:** 80
*   **Foundation:** `Qwen/Qwen3-32B`
*   **Donor:** `Qwen/Qwen2.5-72B-Instruct`
*   **Tokenizer:** `Qwen/Qwen3-32B` Tokenizer (`vocab_size: 151936`)

## Model Creation Process

The creation of this model was a deliberate, three-phase process designed to overcome significant architectural incompatibilities.

### Phase 1: Foundation Upscaling

First, the `Qwen/Qwen3-32B` model (64 layers, 5120 hidden dim) was up-scaled to match the target 72B dimensions. This was done using a sophisticated **self-interpolation** script, where new dimensions were created by averaging different slices of the existing weights, rather than simple tiling. This produced `Qwen3-32B-Upscaled`, a 64-layer model with the correct 72B tensor shapes and Qwen3 architecture.

### Phase 2: Donor Alignment

The `Qwen/Qwen2.5-72B-Instruct` model was architecturally incompatible with the Qwen3 target. To solve this, a new donor model, `Qwen2.5-72B-Instruct-Aligned`, was created. This process involved:
1.  Creating an empty 80-layer model shell with the pure Qwen3 architecture.
2.  Surgically removing all `.bias` tensors from the Qwen2.5 weights.
3.  Truncating the Qwen2.5 embedding and language model head layers from a vocabulary of 152064 to match Qwen3's 151936.
4.  Loading the modified Qwen2.5 weights into the pure Qwen3 shell, resulting in a perfectly compatible donor model.

### Phase 3: Knowledge Transplant via MergeKit

With two architecturally-compatible models, the final merge was performed using `MergeKit`. A "Knowledge Bridge" strategy was employed to transplant a stable reasoning core from the donor while blending the rest.

The following `MergeKit` configuration was used:

```yaml
merge_method: linear
base_model: ./Qwen3-32B-Upscaled
dtype: bfloat16

slices:
  # Slice 1: Blend the bottom 32 layers
  - merge_method: linear
    sources:
    - model: ./Qwen3-32B-Upscaled
      layer_range: [0, 32]
      parameters:
        weight: 0.5
    - model: ./Qwen2.5-72B-Instruct-Aligned
      layer_range: [0, 32]
      parameters:
        weight: 0.5

  # Slice 2: The "Knowledge Bridge" - transplant a pure block from the donor
  - merge_method: passthrough
    sources:
    - model: ./Qwen2.5-72B-Instruct-Aligned
      layer_range: [32, 48]

  # Slice 3: Blend the top layers
  - merge_method: linear
    sources:
    - model: ./Qwen3-32B-Upscaled
      layer_range: [32, 64]
      parameters:
        weight: 0.5
    - model: ./Qwen2.5-72B-Instruct-Aligned
      layer_range: [48, 80]
      parameters:
        weight: 0.5

tokenizer_source: ./Qwen3-32B-Upscaled
```

## How to Use

This model uses the standard Qwen ChatML prompt format.

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "cognitivecomputations/Qwen3-72B-Synthesis"
device = "cuda"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain the importance of the LLaMA paper in one paragraph."}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```

## Intended Use and Limitations

**This is an experimental model and should be considered a high-quality checkpoint, not a finished product.**

*   **Fine-tuning is highly recommended.** While it inherits knowledge from a powerful instruction model, the merging process can create slight incoherence between layers. A round of fine-tuning on a high-quality instruction dataset is necessary to harmonize the weights and unlock its full potential.
*   The model may exhibit unexpected behaviors, including repetitiveness or nonsensical outputs, prior to fine-tuning.
*   This model has not been aligned for safety and may produce problematic, biased, or otherwise undesirable content. The user assumes all responsibility for the output generated.

## Acknowledgements

This model would not have been possible without the foundational work of Alibaba Cloud on the Qwen models, and the powerful, flexible `MergeKit` toolkit created by Charles Goddard and Arcee.ai.