moonshotai
/

Moonlight-16B-A3B-Instruct

@@ -92,7 +92,7 @@ We compared Moonlight with SOTA public models at similar scale:
 ### Inference with Hugging Face Transformers
-We introduce how to use our model at inference stage using transformers library. It is recommended to use python=3.10, torch>=2.1.0, and the latest version of transformers as the development environment.
 For our pretrained model (Moonlight-16B-A3B):
 ```python
@@ -111,6 +111,7 @@ prompt = "1+1=2, 1+2="
 inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True).to(model.device)
 generated_ids = model.generate(**inputs, max_new_tokens=100)
 response = tokenizer.batch_decode(generated_ids)[0]
 ```
 For our instruct model (Moonlight-16B-A3B-Instruct):
@@ -127,7 +128,6 @@ model = AutoModelForCausalLM.from_pretrained(
 )
 tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
-prompt = "Give me a short introduction to large language model."
 messages = [
     {"role": "system", "content": "You are a helpful assistant provided by Moonshot-AI."},
     {"role": "user", "content": "Is 123 a prime?"}
@@ -135,6 +135,7 @@ messages = [
 input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
 generated_ids = model.generate(inputs=input_ids, max_new_tokens=500)
 response = tokenizer.batch_decode(generated_ids)[0]
 ```
 Moonlight has the same architecture as DeepSeek-V3, which is supported by many popular inference engines, such as VLLM and SGLang. As a result, our model can also be easily deployed using these tools.
@@ -142,8 +143,8 @@ Moonlight has the same architecture as DeepSeek-V3, which is supported by many p
 ## Citation
 If you find Moonlight is useful or want to use in your projects, please kindly cite our paper:
 ```
-@article{MoonshotAI,
-  author = {Kimi Team},
   title = {Muon is Scalable For LLM Training},
   year = {2025},
 }

 ### Inference with Hugging Face Transformers
+We introduce how to use our model at inference stage using transformers library. It is recommended to use python=3.10, torch>=2.1.0, and transformers=4.48.2 as the development environment.
 For our pretrained model (Moonlight-16B-A3B):
 ```python
 inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True).to(model.device)
 generated_ids = model.generate(**inputs, max_new_tokens=100)
 response = tokenizer.batch_decode(generated_ids)[0]
+print(response)
 ```
 For our instruct model (Moonlight-16B-A3B-Instruct):
 )
 tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
 messages = [
     {"role": "system", "content": "You are a helpful assistant provided by Moonshot-AI."},
     {"role": "user", "content": "Is 123 a prime?"}
 input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
 generated_ids = model.generate(inputs=input_ids, max_new_tokens=500)
 response = tokenizer.batch_decode(generated_ids)[0]
+print(response)
 ```
 Moonlight has the same architecture as DeepSeek-V3, which is supported by many popular inference engines, such as VLLM and SGLang. As a result, our model can also be easily deployed using these tools.
 ## Citation
 If you find Moonlight is useful or want to use in your projects, please kindly cite our paper:
 ```
+@article{MoonshotAIMuon,
+  author = {Jingyuan Liu and Jianlin Su and Xingcheng Yao and Zhejun Jiang and Guokun Lai and Yulun Du and Yidao Qin and Weixin Xu and Enzhe Lu and Junjie Yan and Yanru Chen and Huabin Zheng and Yibo Liu and Shaowei Liu and Bohong Yin and Weiran He and Han Zhu and Yuzhi Wang and Jianzhou Wang and Mengnan Dong and Zheng Zhang and Yongsheng Kang and Hao Zhang and Xinran Xu and Yutao Zhang and Yuxin Wu and Xinyu Zhou and Zhilin Yang},
   title = {Muon is Scalable For LLM Training},
   year = {2025},
 }