Add exported onnx model 'model_qint8_arm64.onnx'

#24

by Narsil - opened Jun 4

base: refs/heads/main

←

from: refs/pr/24

Discussion Files changed

-0

Narsil

Jun 4

Hello!

This pull request has been automatically generated from the export_dynamic_quantized_onnx_model function from the Sentence Transformers library.

Config

QuantizationConfig(
    is_static=False,
    format=<QuantFormat.QOperator: 0>,
    mode=<QuantizationMode.IntegerOps: 0>,
    activations_dtype=<QuantType.QUInt8: 1>,
    activations_symmetric=False,
    weights_dtype=<QuantType.QInt8: 0>,
    weights_symmetric=True,
    per_channel=True,
    reduce_range=False,
    nodes_to_quantize=[],
    nodes_to_exclude=[],
    operators_to_quantize=['Conv',
    'MatMul',
    'Attention',
    'LSTM',
    'Gather',
    'Transpose',
    'EmbedLayerNormalization'],
    qdq_add_pair_to_weight=False,
    qdq_dedicated_pair=False,
    qdq_op_type_per_channel_support_to_axis={'MatMul': 1}
)

Tip:

Consider testing this pull request before merging by loading the model from this PR with the revision argument:

from sentence_transformers import SentenceTransformer

# TODO: Fill in the PR number
pr_number = 2
model = SentenceTransformer(
    "BAAI/llm-embedder",
    revision=f"refs/pr/{pr_number}",
    backend="onnx",
    model_kwargs={"file_name": "model_qint8_arm64.onnx"},
)

# Verify that everything works as expected
embeddings = model.encode(["The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium."])
print(embeddings.shape)

similarities = model.similarity(embeddings, embeddings)
print(similarities)

Add exported onnx model 'model_qint8_arm64.onnx'55dd552a

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment