MiniCPM4
Collection
MiniCPM4: Ultra-Efficient LLMs on End Devices
•
29 items
•
Updated
•
76
GitHub Repo | Technical Report | Join Us
👋 Contact us in Discord and WeChat
MiniCPM4 and MiniCPM4.1 series are highly efficient large language models (LLMs) designed explicitly for end-side devices, which achieves this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems.
MiniCPM4 and MiniCPM4.1 are extremely efficient edge-side large model that has undergone efficient optimization across four dimensions: model architecture, learning algorithms, training data, and inference systems, achieving ultimate efficiency improvements.
🏗️ Efficient Model Architecture:
🧠 Efficient Learning Algorithms:
📚 High-Quality Training Data:
⚡ Efficient Inference System:
pip install vllm
import os
import multiprocessing
os.environ['VLLM_USE_V1'] = '0'
multiprocessing.set_start_method('spawn', force=True)
from vllm import LLM, SamplingParams
prompt = "北京有什么好玩的地方"
sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=1500)
llm = LLM(model="MiniCPM4.1-8B-GPTQ", trust_remote_code = True)
tokenizer = llm.get_tokenizer()
messages = [{"role": "user", "content": prompt}]
# if open think mode, use the following code
formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# if close think mode, use the following code
# formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
outputs = llm.generate([formatted_prompt], sampling_params)
print("-"*50)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}\nGenerated text: {generated_text!r}")
print("-"*50)