--- license: apache-2.0 base_model: - Qwen/Qwen3-32B - Qwen/Qwen2.5-72B-Instruct tags: - merge - frankenmerge - qwen --- # Qwen3-72B-Synthesis This still doesn't work, I'm trying to fix it. A Qwen3-Architecture 72B Model Forged from `Qwen3-32B` and `Qwen2.5-72B-Instruct`. ## Model Description **Qwen3-72B-Synthesis** is an experimental, 80-layer, 72-billion-parameter large language model. It represents a novel approach to model creation, designed to produce a model with the pure, modern **Qwen3 architecture** while inheriting the vast, high-quality knowledge of the 72B-scale **Qwen2.5-Instruct** model. This was not a simple merge. It was a multi-phase surgical procedure involving dimensional up-scaling, architectural alignment, and a strategic "knowledge transplant" using `MergeKit`. The result is a unique checkpoint that serves as an ideal starting point for further fine-tuning. The core philosophy was to use `Qwen/Qwen3-32B` as the architectural "foundation" and `Qwen/Qwen2.5-72B-Instruct` as the "knowledge donor." ## Model Details * **Architecture:** Qwen3 (RMSNorm, SwiGLU, no biases, includes `q_norm` and `k_norm`) * **Parameters:** ~72 Billion * **Layers:** 80 * **Foundation:** `Qwen/Qwen3-32B` * **Donor:** `Qwen/Qwen2.5-72B-Instruct` * **Tokenizer:** `Qwen/Qwen3-32B` Tokenizer (`vocab_size: 151936`) ## Model Creation Process The creation of this model was a deliberate, three-phase process designed to overcome significant architectural incompatibilities. ### Phase 1: Foundation Upscaling First, the `Qwen/Qwen3-32B` model (64 layers, 5120 hidden dim) was up-scaled to match the target 72B dimensions. This was done using a sophisticated **self-interpolation** script, where new dimensions were created by averaging different slices of the existing weights, rather than simple tiling. This produced `Qwen3-32B-Upscaled`, a 64-layer model with the correct 72B tensor shapes and Qwen3 architecture. ### Phase 2: Donor Alignment The `Qwen/Qwen2.5-72B-Instruct` model was architecturally incompatible with the Qwen3 target. To solve this, a new donor model, `Qwen2.5-72B-Instruct-Aligned`, was created. This process involved: 1. Creating an empty 80-layer model shell with the pure Qwen3 architecture. 2. Surgically removing all `.bias` tensors from the Qwen2.5 weights. 3. Truncating the Qwen2.5 embedding and language model head layers from a vocabulary of 152064 to match Qwen3's 151936. 4. Loading the modified Qwen2.5 weights into the pure Qwen3 shell, resulting in a perfectly compatible donor model. ### Phase 3: Knowledge Transplant via MergeKit With two architecturally-compatible models, the final merge was performed using `MergeKit`. A "Knowledge Bridge" strategy was employed to transplant a stable reasoning core from the donor while blending the rest. The following `MergeKit` configuration was used: ```yaml merge_method: linear base_model: ./Qwen3-32B-Upscaled dtype: bfloat16 slices: # Slice 1: Blend the bottom 32 layers - merge_method: linear sources: - model: ./Qwen3-32B-Upscaled layer_range: [0, 32] parameters: weight: 0.5 - model: ./Qwen2.5-72B-Instruct-Aligned layer_range: [0, 32] parameters: weight: 0.5 # Slice 2: The "Knowledge Bridge" - transplant a pure block from the donor - merge_method: passthrough sources: - model: ./Qwen2.5-72B-Instruct-Aligned layer_range: [32, 48] # Slice 3: Blend the top layers - merge_method: linear sources: - model: ./Qwen3-32B-Upscaled layer_range: [32, 64] parameters: weight: 0.5 - model: ./Qwen2.5-72B-Instruct-Aligned layer_range: [48, 80] parameters: weight: 0.5 tokenizer_source: ./Qwen3-32B-Upscaled ``` ## How to Use This model uses the standard Qwen ChatML prompt format. ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "cognitivecomputations/Qwen3-72B-Synthesis" device = "cuda" tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain the importance of the LLaMA paper in one paragraph."} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(device) generated_ids = model.generate( model_inputs.input_ids, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response) ``` ## Intended Use and Limitations **This is an experimental model and should be considered a high-quality checkpoint, not a finished product.** * **Fine-tuning is highly recommended.** While it inherits knowledge from a powerful instruction model, the merging process can create slight incoherence between layers. A round of fine-tuning on a high-quality instruction dataset is necessary to harmonize the weights and unlock its full potential. * The model may exhibit unexpected behaviors, including repetitiveness or nonsensical outputs, prior to fine-tuning. * This model has not been aligned for safety and may produce problematic, biased, or otherwise undesirable content. The user assumes all responsibility for the output generated. ## Acknowledgements This model would not have been possible without the foundational work of Alibaba Cloud on the Qwen models, and the powerful, flexible `MergeKit` toolkit created by Charles Goddard and Arcee.ai.