stiger1000
/

TC-MoE

Text Generation

Mixture of Experts

efficient-inference

Model card Files Files and versions

TC-MoE / README.md

stiger1000's picture

Update README.md

0c754e7 verified 6 months ago

|

history blame contribute delete

2.15 kB

	---
	license: apache-2.0
	tags:
	- moe
	- llm
	- efficient-inference
	pipeline_tag: text-generation
	---

	# TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice

	## Model Description

	TC-MoE is a novel Mixture-of-Experts (MoE) architecture that enhances traditional MoE models through expert space expansion. By applying the ternary set {-1, 0, 1} to each original expert, TC-MoE achieves:

	- 9% reduction in activated experts compared to Top-K routing
	- 1.1% average performance gain on language understanding benchmarks
	- Flexible efficiency-effectiveness trade-off via reward mechanism

	Key innovations:
	- 🎯 Ternary Expert Expansion: Creates parameter-sharing expert variants (-1, 0, +1) without significant computational overhead
	- ⚖️ Adaptive Load Balancing: Novel load balance loss for expert workload distribution
	- 🎮 Reward-Driven Routing: Dynamic control of expert activation ratios

	## Model Overview

	- Architecture: Decoder-only transformer based on LLaMA
	- Pretraining Data:
	- RedPajama (100B tokens)
	- Model Size:
	- Base (681M/2.3B params)

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("stiger1000/TC-MoE")
	tokenizer = AutoTokenizer.from_pretrained("stiger1000/TC-MoE")

	inputs = tokenizer("The capital of France is", return_tensors="pt")
	outputs = model.generate(**inputs, max_length=50)
	print(tokenizer.decode(outputs[0]))
	```

	## Training Details

	- Optimizer: AdamW (β₁=0.9, β₂=0.95)
	- Learning Rate: 1e-4 with cosine decay
	- Batch Size: 4M tokens
	- Loss Components:
	- Language Modeling Loss
	- Load Balance Loss (α₁=0.01)
	- Reward Loss (α₂=0.0)

	## Citation
	```bibtex
	@inproceedings{yan2025tcmoe,
	title={TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice},
	author={Yan, Shen and Bin, Xingyan and Zhang, Sijun and Wang, Yisen and Lin, Zhouchen},
	booktitle={The Thirteenth International Conference on Learning Representations},
	year={2025}
	}
	```

	📚 Repository: [GitHub](https://github.com/stiger1000/TC-MoE)