|
--- |
|
library_name: mlx |
|
license: apache-2.0 |
|
license_link: https://huggingface.co/Qwen/Qwen3-0.6B/blob/main/LICENSE |
|
pipeline_tag: text-generation |
|
base_model: Qwen/Qwen3-0.6B |
|
tags: |
|
- mlx |
|
--- |
|
|
|
# NexaAI/Qwen3-0.6B-8bit-MLX |
|
|
|
## Quickstart |
|
|
|
Run them directly with [nexa-sdk](https://github.com/NexaAI/nexa-sdk) installed |
|
In nexa-sdk CLI: |
|
|
|
```bash |
|
NexaAI/Qwen3-0.6B-8bit-MLX |
|
``` |
|
|
|
## Overview |
|
|
|
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: |
|
|
|
- **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios. |
|
- **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. |
|
- **Superior human preference alignment**, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. |
|
- **Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. |
|
- **Support of 100+ languages and dialects** with strong capabilities for **multilingual instruction following** and **translation**. |
|
|
|
#### Model Overview |
|
|
|
**Qwen3-0.6B** has the following features: |
|
- Type: Causal Language Models |
|
- Training Stage: Pretraining & Post-training |
|
- Number of Parameters: 0.6B |
|
- Number of Paramaters (Non-Embedding): 0.44B |
|
- Number of Layers: 28 |
|
- Number of Attention Heads (GQA): 16 for Q and 8 for KV |
|
- Context Length: 32,768 |
|
|
|
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/). |
|
|
|
## Reference |
|
**Original model card**: [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) |
|
|