llmware
/

phi-3.5-onnx-qnn

Model card Files Files and versions Community

phi-3.5-onnx-qnn / README.md

doberst's picture

Update README.md

e791356 verified 12 days ago

|

990 Bytes

	---
	license: apache-2.0
	inference: false
	base_model: microsoft/Phi-3.5-mini-instruct
	base_model_relation: quantized
	tags: [green, llmware-chat, p3, onnx, qnn, emerald]
	---

	# phi-3.5-onnx-qnn

	<!-- Provide a quick summary of what the model is/does. -->

	phi-3.5-onnx-qnn is an ONNX QNN int4 quantized version of [Microsoft Phi-3.5-mini-instruct](https://www.huggingface.co/microsoft/Phi-3.5-mini-instruct), providing a small fast NPU inference implementation, optimized for NPU deployment on Windows ARM64 AI PCs with Snapdragon Elite X NPU processors.


	### Model Description

	- Developed by: microsoft
	- Model type: phi3
	- Parameters: 3.8 billion
	- Model Parent: microsoft/Phi-3.5-mini-instruct
	- Language(s) (NLP): English
	- License: Apache 2.0
	- Uses: Chat, general-purpose LLM
	- Quantization: int4


	## Model Card Contact

	[llmware on hf](https://www.huggingface.co/llmware)

	[llmware website](https://www.llmware.ai)