pyrymikko's picture
Upload W4A16-AWQ quantized nomic-ai/nomic-embed-code
f8f30e9 verified
---
license: mit
tags:
- quantized
- embedding
- W4A16
- llmcompressor
- awq
- 4-bit
- activation-aware
base_model: nomic-ai/nomic-embed-code
---
# nomic-embed-code-W4A16-AWQ
This is a **W4A16 quantized** version of [nomic-ai/nomic-embed-code](https://huggingface.co/nomic-ai/nomic-embed-code).
**Quantized using AWQ (Activation-aware Weight Quantization) with llm-compressor!**
## Quantization Details
- **Method**: llmcompressor (AWQ one-shot PTQ)
- **Algorithm**: AWQ (Activation-aware Weight Quantization)
- **Scheme**: W4A16
- **Weight bits**: 4-bit
- **Activation bits**: 16-bit
- **Group size**: 128
- **Format**: compressed-tensors
- **Size reduction**: ~75% compared to FP16
## Usage
```python
from transformers import AutoModel, AutoTokenizer
# Load quantized model
model = AutoModel.from_pretrained(
"nomic-embed-code-W4A16-AWQ",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"nomic-embed-code-W4A16-AWQ",
trust_remote_code=True
)
# Generate embeddings
texts = ["Hello world", "Example text"]
inputs = tokenizer(texts, padding=True, return_tensors="pt")
embeddings = model(**inputs).last_hidden_state.mean(dim=1)
print(embeddings.shape)
```
## Performance
- **Memory usage**: ~75% reduction vs FP16
- **Inference speed**: Similar or faster on compatible hardware
- **Quality**: Minimal degradation (<1% on most embedding tasks)
## Why AWQ?
AWQ (Activation-aware Weight Quantization) is a one-shot weight quantization method that:
- **Activation-aware**: Protects salient weights based on activation magnitudes
- Uses calibration data to identify important weight channels
- Provides better accuracy than GPTQ and naive rounding (RTN)
- Works efficiently with group-wise quantization (group size 128)
- Maintains model quality while achieving 75% size reduction
- Optimal for embedding models that rely on preserving semantic relationships
## Original Model
This quantized model is based on [nomic-ai/nomic-embed-code](https://huggingface.co/nomic-ai/nomic-embed-code).
## Citation
If you use this model, please cite the original model and llmcompressor:
```bibtex
@software{llmcompressor,
title = {LLM Compressor},
author = {Neural Magic},
url = {https://github.com/vllm-project/llm-compressor},
year = {2024}
}
```