nomic-embed-code-W4A16-AWQ
This is a W4A16 quantized version of nomic-ai/nomic-embed-code.
Quantized using AWQ (Activation-aware Weight Quantization) with llm-compressor!
Quantization Details
- Method: llmcompressor (AWQ one-shot PTQ)
- Algorithm: AWQ (Activation-aware Weight Quantization)
- Scheme: W4A16
- Weight bits: 4-bit
- Activation bits: 16-bit
- Group size: 128
- Format: compressed-tensors
- Size reduction: ~75% compared to FP16
Usage
from transformers import AutoModel, AutoTokenizer
# Load quantized model
model = AutoModel.from_pretrained(
"nomic-embed-code-W4A16-AWQ",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"nomic-embed-code-W4A16-AWQ",
trust_remote_code=True
)
# Generate embeddings
texts = ["Hello world", "Example text"]
inputs = tokenizer(texts, padding=True, return_tensors="pt")
embeddings = model(**inputs).last_hidden_state.mean(dim=1)
print(embeddings.shape)
Performance
- Memory usage: ~75% reduction vs FP16
- Inference speed: Similar or faster on compatible hardware
- Quality: Minimal degradation (<1% on most embedding tasks)
Why AWQ?
AWQ (Activation-aware Weight Quantization) is a one-shot weight quantization method that:
- Activation-aware: Protects salient weights based on activation magnitudes
- Uses calibration data to identify important weight channels
- Provides better accuracy than GPTQ and naive rounding (RTN)
- Works efficiently with group-wise quantization (group size 128)
- Maintains model quality while achieving 75% size reduction
- Optimal for embedding models that rely on preserving semantic relationships
Original Model
This quantized model is based on nomic-ai/nomic-embed-code.
Citation
If you use this model, please cite the original model and llmcompressor:
@software{llmcompressor,
title = {LLM Compressor},
author = {Neural Magic},
url = {https://github.com/vllm-project/llm-compressor},
year = {2024}
}
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for pyrymikko/nomic-embed-code-W4A16-AWQ
Base model
Qwen/Qwen2.5-7B
Finetuned
Qwen/Qwen2.5-Coder-7B
Finetuned
Qwen/Qwen2.5-Coder-7B-Instruct
Finetuned
nomic-ai/nomic-embed-code